Redshift Spectrum Cost of Data Scanned

With Redshift Spectrum, you are billed at $5 per terabyte of data scanned, rounded up to the next megabyte, with a 10 megabyte minimum per query. For example, if you scan 10 gigabytes of data, you will be charged $0.05. If you scan 1 terabyte of data, you will be charged $5.

To find how much data is being transferred by Redshift spectrum queries you will have to look at system table ‘SVL_S3QUERY_SUMMARY’ and a specific field called ‘s3_scanned_bytes’ (number of bytes scanned from Amazon S3). The cost of a Redshift Spectrum query is reflected in the amount of data scanned from Amazon S3.

s3_scanned_bytes – The number of bytes scanned from Amazon S3 and sent to the Redshift Spectrum layer, based on compressed data.

For eg: You can run the below query to determine number of bytes transferred by 1 particular spectrum query

— select s3_scanned_bytes from svl_s3query_summary where query= ;

To determine sum of bytes of all queries from Redshift spectrum

— select sum(s3_scanned_bytes) from svl_s3query_summary ;

To determine sum of bytes of all queries in last one day from Redshift spectrum

— select sum(s3_scanned_bytes) from svl_s3query_summary where starttime >= – interval ’24 hours’;

sum

621900000000
(1 row)

Let’s say we have the figure like the above for one day of spectrum use, then using the Redshift spectrum pricing of $5 per terabyte of data scanned, rounded up to the next megabyte, with a 10 megabyte minimum per query. So we can calculate the cost for one day like below,

621900000000 bytes = 621900000000/1024 = 607324218.75 kilobytes
607324218.75 kilobytes = 607324218.75/1024 = 593090.057373046875 megabytes
593090.057373046875 megabytes = 593090.057373046875 /1024 = 579.189509153366089 gigabytes
579.189509153366089 gigabytes = 579.189509153366089/1024 = 0.565614755032584 terabytes

In this case you will be charged for 0.5657 terabytes.(since it is rounded to the next megabyte) $5*0.5657= 2.83$

So 2.83$ is what you will pay for scanning 0.5657 terabytes of data daily from S3 via Spectrum.

Though there is no SQL query to calculate cost, I have created one to easily summarize the approximate cost of data scanned by loading the bytes value into a cost table and running below SQL:

— create table test_cost (s3_scanned_bytes float8 ) ;
— insert into test_cost values (621900000000);
— select sum(s3_scanned_bytes/1024/1024/1024/1024) s3_scanned_tb, 5*sum(s3_scanned_bytes/1024/1024/1024/1024) cost_in_usd from test_cost ;

s3_scanned_tb | cost_in_usd
——————-+————————————————-
0.565614755032584 | 2.82807377516292
(1 row)

-+ Final Approximate Scan in Terabytes and Cost in USD +-

— select round(sum(s3_scanned_bytes/1024/1024/1024/1024),4) s3_scanned_tb, round(5*sum(s3_scanned_bytes/1024/1024/1024/1024),2) cost_in_usd from test_cost ;

s3_scanned_tb | cost_in_usd
——————-+————————————————-
0.5656 | 2.83
(1 row)

This is the easiest way to find the data transferred through Spectrum and the cost associated with it.

P.S: The system table is rotated so it is better calculate per day and store it in another table, if you need to maintain record of the bytes transferred via Spectrum.

References:

[1] Amazon Redshift pricing – https://aws.amazon.com/redshift/pricing/#Redshift_Spectrum_Pricing
[2] Monitoring Metrics in Amazon Redshift Spectrum – https://docs.aws.amazon.com/redshift/latest/dg/c-spectrum-metrics.html
[3] SVL_S3QUERY_SUMMARY – https://docs.aws.amazon.com/redshift/latest/dg/r_SVL_S3QUERY_SUMMARY.html

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s