Redshift IAM role for Copy Unload to S3

Creating IAM Policies and Roles & Associating the Role to the Redshift Cluster

———————————————————————————————————

In order to perform operations such as “COPY” and “UNLOAD” to/from a Redshift cluster, the user must provide security credentials that authorize the Amazon Redshift cluster to read data from or write data to your target destination, in this case an Amazon S3 bucket.

Step 1: Creating the policy to allow access on S3

  • On the Services menu, chose IAM  (Under security, Identity & Compliance)
  • On the left side of the IAM Console, go to “Policies” 
  • Select “Create Policy” on the top of the page
  • Select  JSON tab, and paste below in JSON. Replace ‘redshift-testing-bucket-shadmha’ with your bucket name which you are using for unload and copy

 {

   “Version”:”2012-10-17″,

   “Statement”:[

      {

         “Effect”:”Allow”,

         “Action”:[

            “s3:PutObject”,

            “s3:DeleteObject”

         ],

         “Resource”:[

            “arn:aws:s3:::redshift-testing-bucket-shadmha*”

         ]

      },

      {

         “Effect”:”Allow”,

         “Action”:[

            “s3:ListBucket”

         ],

         “Resource”:[

            “arn:aws:s3:::redshift-testing-bucket-shadmha*”

         ]

      }

   ]

  • Click on “Review Policy” and provide “Name” and “Description” for the policy
  • Click “Create Policy” and keep this name handy we will need the name of this policy to add to the IAM role in next step

Step 2: Creating the IAM Role such that the Redshift Service can request it

  • On the left menu of your IAM Console, select “Roles”
  • Select “Create Role” on the top of the page
  • Select type of trusted entity as “AWS Service” > Select the service which will be used for this role as “Amazon Redshift”
  • Select your use case as “Redshift – Customizable Allows Redshift clusters to call AWS services on your behalf.” and click “Permissions”
  • Search the policy that was previously created, select it and click on “Next”
  • Specify a “Role name”
  • Select “Create Role”

Step 3: Associating the created Role to a Redshift Cluster

  • On your AWS Console, on the Services menu, choose “Redshift”
  • On the AWS Redshift console, select the cluster in question and click on “Manage IAM roles”
  • On the pop-up screen, click on the drop box “Available roles” and select the Role created in the previous step
  • Select “Apply changes”

Considerations

——————–

As soon as the “Status” for the IAM role on the “Manage IAM roles” shows as “in-sync”, you can try “COPY” or “UNLOAD” using as CREDENTIALS the created role ARN.

Example:

Note: Modify the details such as schema and table_name, the bucket_name, and “<arn>” to the role ARN (example: “arn:aws:iam::586945000000:role/role_name”), to suit your case scenario.

Below is the example from my test cluster, Role name ‘REDSHIFTNEWROLE’ is one created in Step 2 and S3 bucket ‘redshift-testing-bucket-shadmha’ is the one we assigned policy to in Step 1.

eg:

unload (‘select * from test_char’)

to ‘s3://redshift-testing-bucket-shadmha/test_char.csv’

credentials ‘aws_iam_role=arn:aws:iam::775867435088:role/REDSHIFTNEWROLE’

delimiter ‘|’ region ‘ap-southeast-2’

parallel off:

Most common error associated when trying to copy or unload data from Redshift:

ERROR: S3ServiceException:Access Denied,Status 403,Error AccessDenied

Python Script to Copy-Unload Data to Redshift from S3

import psycopg2
import time
import sys
import datetime
from datetime import date
datetime_object = datetime.datetime.now()
print ("Start TimeStamp")
print ("---------------")
print(datetime_object)
print("")

#Progress Bar Function
def progressbar(it, prefix="", size=60, file=sys.stdout):
    count = len(it)
    def show(j):
        x = int(size*j/count)
        file.write("%s[%s%s] %i/%i\r" % (prefix, "#"*x, "."*(size-x), j, count))
        file.flush()
    show(0)
    for i, item in enumerate(it):
        yield item
        show(i+1)
    file.write("\n")
    file.flush()

#Obtaining the connection to RedShift
con=psycopg2.connect(dbname= 'dev', host='redshift.amazonaws.com',
port= '5439', user= 'awsuser', password= '*****')

#Copy Command as Variable
copy_command="copy users from 's3://redshift-test-bucket/allusers_pipe.txt' credentials 'aws_iam_role=arn:aws:iam::775088:role/REDSHIFTROLE' delimiter '|' region 'ap-southeast-2';"

#Unload Command as Variable
unload_command="unload ('select * from users') to 's3://redshift-test-bucket/users_"+str(datetime.datetime.now())+".csv' credentials 'aws_iam_role=arn:aws:iam::7755088:role/REDSHIFTROLE' delimiter '|' region 'ap-southeast-2';"

#Opening a cursor and run copy query
cur = con.cursor()
cur.execute("truncate table users;")
cur.execute(copy_command)
con.commit()

#Display Progress Bar and Put a sleep condition in seconds to make the program wait
for i in progressbar(range(100), "Copying Data into Redshift: ", 10):
    time.sleep(0.1) # any calculation you need

print("")

#Display Progress Bar and Put a sleep condition in seconds to make the program wait
for i in progressbar(range(600), "Unloading Data from Redshift to S3: ", 60):
    time.sleep(0.1) # any calculation you need

print("")

#Opening a cursor and run unload query
cur.execute(unload_command)

#Close the cursor and the connection
cur.close()
con.close()

datetime_object_2 = datetime.datetime.now()
print ("End TimeStamp")
print ("-------------")
print(datetime_object_2)
print("")