Redshift Health-Check SQL Queries

-- Query Performance Review --

$ psql -h redshift-private-2a.c2nh0wlf4z7g.ap-southeast-2.redshift.amazonaws.com -p 5439 -U awsuser -f review_query_pf.sql testdb

$ vim review_query_pf.sql


\o redshiftxxx.txt
\set vpattern 1678
\qecho -- Query Text - stl_explain
select * from stl_querytext where query = :vpattern;
\qecho -- Explain plan - stl_explain
select userid,query,nodeid,parentid,trim(plannode) plannode,trim(info) info from stl_explain where query = :vpattern;
\qecho --Review WLM Queuing for above queries - stl_wlm_query
SELECT TRIM(DATABASE) AS DB,
       w.query,
       SUBSTRING(q.querytxt,1,100) AS querytxt,
       w.queue_start_time,
       w.service_class AS class,
       w.slot_count AS slots,
       w.total_queue_time / 1000000 AS queue_seconds,
       w.total_exec_time / 1000000 exec_seconds,
       (w.total_queue_time + w.total_exec_time) / 1000000 AS total_seconds
FROM stl_wlm_query w
  LEFT JOIN stl_query q
         ON q.query = w.query
        AND q.userid = w.userid
WHERE w.query = :vpattern
--AND w.total_queue_time > 0
ORDER BY w.total_queue_time DESC,
         w.queue_start_time DESC;
\qecho --Get information about commit stats - stl_commit_stats
select startqueue,node, datediff(ms,startqueue,startwork) as queue_time, datediff(ms, startwork, endtime) as commit_time, queuelen
from stl_commit_stats
where xid in (select xid from stl_querytext where query = :vpattern)
order by queuelen desc , queue_time desc;
\qecho --Compile Time
select userid, xid,  pid, query, segment, locus,
datediff(ms, starttime, endtime) as duration, compile
from svl_compile
where query = :vpattern;
--\qecho --Understand other operations within the same PID - svl_statementtext
--select userid,xid,pid,label,starttime,endtime,sequence,type,trim(text) from svl_statementtext where pid in (select pid from stl_querytext where query = :vpattern);
\qecho --Review query work - STL_PLAN_INFO
select * from STL_PLAN_INFO where query = :vpattern;
\qecho --Review query work - svl_query_report
select * from svl_query_report where query = :vpattern order by segment,step,slice;
\qecho --Review query work - svl_query_summary
select * from svl_query_summary where query = :vpattern order by seg,step;
\qecho -- Review alert
select * from stl_alert_event_log where query = :vpattern;
\qecho -- Review STL_ERROR
select userid,process,recordtime,pid,errcode,trim(file),linenum,trim(context),trim(error) from stl_error where recordtime between (select starttime from stl_query where query = :vpattern) and (select endtime from stl_query where query = :vpattern);
\q

Sponsored Post Learn from the experts: Create a successful blog with our brand new courseThe WordPress.com Blog

WordPress.com is excited to announce our newest offering: a course just for beginning bloggers where you’ll learn everything you need to know about blogging from the most trusted experts in the industry. We have helped millions of blogs get up and running, we know what works, and we want you to to know everything we know. This course provides all the fundamental skills and inspiration you need to get your blog started, an interactive community forum, and content updated annually.

Create DMS Replication From MongoDB 4.2 on EC2 Linux to Redshift

Create DMS Replication From MongoDB 4.2 on EC2 Linux to Redshift

Summary

We will create a hub to spoke replication from Mongo DB 4.2 Database to Redshift Schema. MongoDb is installed in the same VPC as Redshift and DMS replication Instance

Main Text

MongoDB is a NoSQL Datastore where data in inserted and read in JSON format. Internally, a MongoDB document is stored as a binary JSON (BSON) file in a compressed format. Terminology for database, schema and tables in MongoDB is a bit different when compared to relational databases. Here’s how each jargon in MongoDb compares to one in Redshift/PostgreSQL

MongoDBRedshift/PostgreSQL
DatabaseSchema
CollectionTable
DocumentsRecords/Rows

MongoDB does not have a a proper structure for a schema and it is is essentially a document store so data can be inserted without defining a proper schema or a structure. This gives MongoDB a lot of flexibility and hence it is the preferred choice for modern applications as you can start development of your application without defining a data model first. It is a schema-less approach to software architecture, which has its own pros and cons.

In our example we will work through the default database in MongoDB called “admin” and create collections(tables) in it and those tables will be replicated to an equivalent schema in Redshift called “admin” which will hold the different tables. We will go about this in 3 stages:

Stage 1 : Create MongoDB on EC2 AMZN Linux, create the collections and connect to MongoDB from your client machine and check the configuration.

Stage 2: Create Redshift Cluster, Aurora PostgreSQL in Private and Public Subnet and Connect to the All Database instances and check.

Stage 3: Create DMS Replication Instance, DMS Replication Endpoints & DMS Replication Tasks.And finally we will check if all data is being replicated by DMS to all the targets.

Architecture : It is a Hub-to-Spoke Architecture with MongoDB Source Being Replicated to Multiple Heterogeneous Targets. However in this Article we will only configure Replication from MongoDB to Redshift.

Stage 1: Create a EC2 Amazon Linux in a Public Subnet and Install MongoDB in it

You can use AWS ‘MongoDB Quick Start’ guide on AWS to deploy MongoDB into a new VPC or deploy into an existing VPC. The guide has two cloud formation templates which can create a new VPC under your account, configure the public & private subnets and launch the EC2 instances with latest version of MongoDB installed. Check this link for the quick deployment options for MongoDB on AWS : https://docs.aws.amazon.com/quickstart/latest/mongodb/step2.html

OR

If you want to install MongoDB manually then you can follow the below procedure :

1. Create Amazon Linux EC2 Instance

2. Add MongoDB Repo and Install MongoDB on AMZN Linux

$ sudo vi /etc/yum.repos.d/mongodb-org-4.0.repo


[mongodb-org-4.0]
name=MongoDB Repository
baseurl=https://repo.mongodb.org/yum/amazon/2013.03/mongodb-org/testing/x86_64/
gpgcheck=1
enabled=1
gpgkey=https://www.mongodb.org/static/pgp/server-4.2.asc

$ sudo yum -y install mongodb-org
$ sudo service mongod start
$ sudo cat /var/log/mongodb/mongod.log


3 . Create Root login

 $ mongo localhost/admin
 > use admin
 > db.createUser( { user:"root", pwd:"root123", roles: [ { role: "root", db: "admin" } ] } )
 > exit

 
4. Modify mongod configuration file /etc/mongod.conf using vi editor
 
     Change below lines from

    # network interfaces
    net:
      port: 27017
      bindIp: 127.0.0.1  # Listen to local interface only, comment to listen on all interfaces.
    #security:

    To

# network interfaces
net:
  port: 27017
  bindIp: 0.0.0.0  # Listen to local interface only, comment to listen on all interfaces.
security:
  authorization: enabled
 
5. Restart mongod service
 

 $ sudo service mongod restart

Use admin DB as authentication database and ‘root’ user can be used for CDC full load task.

6. For CDC replication, a rmongodb replica needs to be setup and permissions need to be modified/added as below :

Modify mongod.conf using vi editor

$ sudo vi /etc/mongod.conf


replication:
  replSetName: rs0

$ sudo service mongod restart


7. Initiate Replica Set for CDC

$ mongo localhost/admin -u root -p
> rs.status()


{
    “ok” : 0,
    “errmsg” : “no replset config has been received”,
    “code” : 94,
    “codeName” : “NotYetInitialized”
}

> rs.initiate()


{
    “info2” : “no configuration specified. Using a default configuration for the set”,
    “me” : “ip-10-0-137-99.ap-southeast-2.compute.internal:27017”,
    “ok” : 1
}

rs0:SECONDARY> rs.status()

{
    “set” : “rs0”,
    “date” : ISODate(“2019-12-31T05:02:22.431Z”),
    “myState” : 1,
    “term” : NumberLong(1),
    “syncingTo” : “”,
    “syncSourceHost” : “”,
    “syncSourceId” : -1,
    “heartbeatIntervalMillis” : NumberLong(2000),
    “majorityVoteCount” : 1,
    “writeMajorityCount” : 1,
    “optimes” : {
        “lastCommittedOpTime” : {
            “ts” : Timestamp(1577768528, 5),
            “t” : NumberLong(1)
        },
        “lastCommittedWallTime” : ISODate(“2019-12-31T05:02:08.362Z”),
        “readConcernMajorityOpTime” : {
            “ts” : Timestamp(1577768528, 5),
            “t” : NumberLong(1)
        },
        “readConcernMajorityWallTime” : ISODate(“2019-12-31T05:02:08.362Z”),
        “appliedOpTime” : {
            “ts” : Timestamp(1577768528, 5),
            “t” : NumberLong(1)
        },
        “durableOpTime” : {
            “ts” : Timestamp(1577768528, 5),
            “t” : NumberLong(1)
        },
        “lastAppliedWallTime” : ISODate(“2019-12-31T05:02:08.362Z”),
        “lastDurableWallTime” : ISODate(“2019-12-31T05:02:08.362Z”)
    },
    “lastStableRecoveryTimestamp” : Timestamp(1577768528, 4),
    “lastStableCheckpointTimestamp” : Timestamp(1577768528, 4),
    “electionCandidateMetrics” : {
        “lastElectionReason” : “electionTimeout”,
        “lastElectionDate” : ISODate(“2019-12-31T05:02:07.346Z”),
        “electionTerm” : NumberLong(1),
        “lastCommittedOpTimeAtElection” : {
            “ts” : Timestamp(0, 0),
            “t” : NumberLong(-1)
        },
        “lastSeenOpTimeAtElection” : {
            “ts” : Timestamp(1577768527, 1),
            “t” : NumberLong(-1)
        },
        “numVotesNeeded” : 1,
        “priorityAtElection” : 1,
        “electionTimeoutMillis” : NumberLong(10000),
        “newTermStartDate” : ISODate(“2019-12-31T05:02:08.354Z”),
        “wMajorityWriteAvailabilityDate” : ISODate(“2019-12-31T05:02:08.362Z”)
    },
    “members” : [
        {
            “_id” : 0,
            “name” : “ip-10-0-137-99.ap-southeast-2.compute.internal:27017”,
            “ip” : “10.0.137.99”,
            “health” : 1,
            “state” : 1,
            “stateStr” : “PRIMARY”,
            “uptime” : 80,
            “optime” : {
                “ts” : Timestamp(1577768528, 5),
                “t” : NumberLong(1)
            },
            “optimeDate” : ISODate(“2019-12-31T05:02:08Z”),
            “syncingTo” : “”,
            “syncSourceHost” : “”,
            “syncSourceId” : -1,
            “infoMessage” : “could not find member to sync from”,
            “electionTime” : Timestamp(1577768527, 2),
            “electionDate” : ISODate(“2019-12-31T05:02:07Z”),
            “configVersion” : 1,
            “self” : true,
            “lastHeartbeatMessage” : “”
        }
    ],
    “ok” : 1,
    “$clusterTime” : {
        “clusterTime” : Timestamp(1577768528, 5),
        “signature” : {
            “hash” : BinData(0,”nDISrR4afyRUVEQVntFkkVpTJKY=”),
            “keyId” : NumberLong(“6776464228418060290”)
        }
    },
    “operationTime” : Timestamp(1577768528, 5)
}
rs0:PRIMARY>



8. Make sure security group is open for dms replication group for the port on your EC2 instance where MongoDB is running (Default MongoDB port is 27017)



9. Add a collection (table) with some data to database ‘admin’ in the mongodb installation

> show collections
db.createCollection("accounts", { capped : true, autoIndexId : true, size : 
   6142800, max : 10000 } )
db.accounts.insert({"company": "Booth-Wade",
    "location": "5681 Mitchell Heights\nFort Adamstad, UT 8019B",
    "ip_address": "192.168.110.4B",
    "name": "Mark Becker",
    "eid": 27561})
   
db.accounts.insert({"company": "Myers,  Smith and Turner",
      "location": "USS BenjaminNlinFP0 AA 40236",
      "ip_address": "172.26.254.156",
      "name": "Tyler clark",
      "eid": 87662})

db.accounts.insert({"company": "Bowen-Harris",
      "location": "Tracey Plaza East Katietown,Sc74695",
      "ip_address": "172.28.45.209",
      "name": "Veronica Gomez",
      "eid": 772122})
     
> db.accounts.find( {} )

— Insert Many —
      
      db.accounts.insertMany(
      [
    {“company”: “Booth-Wade”,
      “location”: “5681 Mitchell Heights\nFort Adamstad, UT 8019B”,
      “ip_address”: “192.168.110.4B”,
      “name”: “Mark Becker”,
      “eid”: 27561},   
    {“company”: “Myers,  Smith and Turner”,
      “location”: “USS BenjaminNlinFP0 AA 40236”,
      “ip_address”: “172.26.254.156”,
      “name”: “Tyler clark”,
      “eid”: 87662},        
    {“company”: “Bowen-Harris”,
      “location”: “Tracey Plaza East Katietown,Sc74695”,
      “ip_address”: “172.28.45.209”,
      “name”: “Veronica Gomez”,
      “eid”: 772122}
      ]);
     

— Array Insert —

  db.accounts.insertMany( [
{
  “_id”: “5e037719f45Btodlcdb492464”,
  “accounts”: [
    {
      “company”: “Booth-Wade”,
      “location”: “5681 Mitchell Heights\nFort Adamstad, UT 8019B”,
      “ip_address”: “192.168.110.4B”,
      “name”: “Mark Becker”,
      “eid”: 27561
    },
    {
      “company”: “Myers,  Smith and Turner”,
      “location”: “USS BenjaminNlinFP0 AA 40236”,
      “ip_address”: “172.26.254.156”,
      “name”: “Tyler clark”,
      “eid”: 87662
    },
    {
      “company”: “Bowen-Harris”,
      “location”: “Tracey Plaza East Katietown,Sc74695”,
      “ip_address”: “172.28.45.209”,
      “name”: “Veronica Gomez”,
      “eid”: 772122
    }
  ]
}
]);

— Inventory Nested Array Collection —
db.createCollection(“inventory”, { capped : true, autoIndexId : true, size :
   6142800, max : 10000 } )
   
   db.createCollection(“inventory_new”, { capped : true, autoIndexId : true, size :
   6142800, max : 10000 } )
   
db.inventory.insertMany( [
   { item: “journal”, instock: [ { warehouse: “A”, qty: 5 }, { warehouse: “C”, qty: 15 } ] },
   { item: “notebook”, instock: [ { warehouse: “C”, qty: 5 } ] },
   { item: “paper”, instock: [ { warehouse: “A”, qty: 60 }, { warehouse: “B”, qty: 15 } ] },
   { item: “planner”, instock: [ { warehouse: “A”, qty: 40 }, { warehouse: “B”, qty: 5 } ] },
   { item: “postcard”, instock: [ { warehouse: “B”, qty: 15 }, { warehouse: “C”, qty: 35 } ] }
]);


db.inventory_new.insertMany([
   { item: “journal”, qty: 25, tags: [“blank”, “red”], size: { h: 14, w: 21, uom: “cm” } },
   { item: “mat”, qty: 85, tags: [“gray”], size: { h: 27.9, w: 35.5, uom: “cm” } },
   { item: “mousepad”, qty: 25, tags: [“gel”, “blue”], size: { h: 19, w: 22.85, uom: “cm” } }
])

> db.inventory.find( {} )

> show collections

accounts
inventory
inventory_new
system.keys
system.users
system.version

We have created 3 user collections called ‘accounts’,’inventory’ & ‘inventory_new’. These 3 collections(tables) shall be replicated to our targets. Connect and Check from MongoDB Compass on your Client Machine

Stage 2: Install Redshift Cluster

Create a VPC with Public and Private Subnets. In a real world production scenario, it is always recommended to put your databases in a Private subnet

1. Create Public and Private Subnet

https://docs.aws.amazon.com/AmazonECS/latest/developerguide/create-public-private-vpc.html#run-VPC-wizard

2. Install Redshift in Public Subnet

3. Install Redshift in Private Subnet

If you have different scenario’s of your DMS replication in one VPC and Databases in other VPC or Replicating from on-premise to AWS VPC then you can refer this link : https://docs.aws.amazon.com/dms/latest/userguide/CHAP_ReplicationInstance.VPC.html

Stage 3: Create DMS Replication Instance, DMS Replication Endpoints & DMS Replication Tasks for MongoDB

Steps : Create Replication Instance > Create Endpoints > Create DMS Tasks

1. Create Replication instance

Go to AWS Console > Database Migration Service  > Replication Instance > Create Replication Instance

2. Create Replication Endpoints

a) MongoDB Replication Endpoint

Go to DMS Console > Endpoints > Create Endpoint. Use this link for configuration for your endpoint > https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Source.MongoDB.html

In MongoDB as source you have 2 modes available : Document Mode and Table Mode. Some important points to note in this regard are :

  • A record in MongoDB is a document, which is a data structure composed of field and value pairs. The value of a field can include other documents, arrays, and arrays of documents. A document is roughly equivalent to a row in a relational database table.
  • A collection in MongoDB is a group of documents, and is roughly equivalent to a relational database table.
  • Internally, a MongoDB document is stored as a binary JSON (BSON) file in a compressed format that includes a type for each field in the document. Each document has a unique ID.

MongoDB is officially supported on versions 2.6.x and 3.x as a database source only. But I have tested it with MongoDB 4.2, which is the latest community version and it works without any issues, However I would advise to stick with the officially certified versions. AWS DMS supports two migration modes when using MongoDB as a source. You specify the migration mode using the Metadata mode parameter using the AWS Management Console or the extra connection attribute nestingLevel when you create the MongoDB endpoint.

Document mode

In document mode, the MongoDB document is migrated as is, meaning that the document data is consolidated into a single column named _doc in a target table.

Table mode

In table mode, AWS DMS transforms each top-level field in a MongoDB document into a column in the target table. If a field is nested, AWS DMS flattens the nested values into a single column. AWS DMS then adds a key field and data types to the target table’s column set.

Connection Attributes

nestingLevel

Value : NONE

ONE

Description : NONE – Specify NONE to use document mode. Specify ONE to use table mode.

extractDocID

Value :true

false

Description : false – Use this attribute when nestingLevel is set to NONE.

Test the Endpoint

b) Create Redshift Replication Endpoint

Test Redshift Endpoint

Once you create the endpoint for Redshift it will automatically adds a DMS endpoint roles and assigns it to the Redshift role. Further down when we create S3 as target endpoint we need to add the S3 permissions via a managed policy to this same role

dms-access-for-endpoint : arn:aws:iam::775867435088:role/dms-access-for-endpoint

c) Create MongoDB-Redshift Database Migration Task

Go to DMS Console > Conversion & Migration > Database Migrations Tasks > Create Task

Before moving ahead step that the security group of Redshift allows ingress rules for port 5439 for 0.0.0.0/0 or preferably the Security Group ID of your Replication Instance is added to the ingress rules for Redshift SG over port 5439. Check this link for more information : https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Target.Redshift.html

In our case DMS Replication Instance SGID is ‘sg-0a695ef98b6e39963’. So SG of Redshift looks like below:

Refer this documentation for more complex VPC setup, It is beyond the scope of this article : https://docs.aws.amazon.com/dms/latest/userguide/CHAP_ReplicationInstance.VPC.html

Checking from Redshift..we can see all the 3 tables from mongodb ‘accounts’,’inventory’ & ‘inventory_new’ are created and also the schema ‘admin’ is automatically created by DMS.

Query all the tables to confirm data is replicated

testdb=# select * from admin.accounts;

            _id             |   array_accounts
——————————————————————————————————-
 5e037719f45Btodlcdb492464  | [ { “company” : “Booth-Wade”, “location” : “5681 Mitchell Heights\nFort Adamstad, UT 8019B”, “ip
_address” : “192.168.110.4B”, “name” : “Mark Becker”, “eid” : 27561.0 }, { “company” : “Myers,  Smith and Turner”, “location”
: “USS BenjaminNlinFP0 AA 40236”, “ip_address” : “172.26.254.156”, “name” : “Tyler clark”, “eid” : 87662.0 }, { “company” : “B
owen-Harris”, “location” : “Tracey Plaza East Katietown,Sc74695”, “ip_address” : “172.28.45.209”, “name” : “Veronica Gomez”, “
eid” : 772122.0 } ]
 9772sjs19f45Btodlcdbk49fk4 | [ { “company” : “Trust Co”, “location” : “Zetland Inc.”, “ip_address” : “12.168.210.2B”, “name”
: “Mert Cliff”, “eid” : 4343.0 }, { “company” : “Mist Ltd.”, “location” : “Cliffstone yard”, “ip_address” : “72.32.254.156”, “
name” : “Kris Loff”, “eid” : 76343.0 }, { “company” : “Coles Supermarket”, “location” : “Randwich St”, “ip_address” : “22.28.4
5.110″, “name” : “Will Markbaeur”, “eid” : 13455.0 } ]
(2 rows)

testdb=# select * from admin.inventory;


         oid__id          |   item   |                                array_instock
————————–+———-+——————————————————————————
 5e0bd854fd4602c4b6926d68 | journal  | [ { “warehouse” : “A”, “qty” : 5.0 }, { “warehouse” : “C”, “qty” : 15.0 } ]
 5e0bd854fd4602c4b6926d69 | notebook | [ { “warehouse” : “C”, “qty” : 5.0 } ]
 5e0bd854fd4602c4b6926d6a | paper    | [ { “warehouse” : “A”, “qty” : 60.0 }, { “warehouse” : “B”, “qty” : 15.0 } ]
 5e0bd854fd4602c4b6926d6b | planner  | [ { “warehouse” : “A”, “qty” : 40.0 }, { “warehouse” : “B”, “qty” : 5.0 } ]
 5e0bd854fd4602c4b6926d6c | postcard | [ { “warehouse” : “B”, “qty” : 15.0 }, { “warehouse” : “C”, “qty” : 35.0 } ]
(5 rows)


testdb=# select * from admin.inventory_new;


         oid__id          |   item   | qty |     array_tags     | size.h | size.w | size.uom
————————–+———-+—–+——————–+——–+——–+———-
 5e0bef1775d0b39f2ef66923 | journal  |  25 | [ “blank”, “red” ] |     14 |     21 | cm
 5e0bef1775d0b39f2ef66924 | mat      |  85 | [ “gray” ]         |   27.9 |   35.5 | cm
 5e0bef1775d0b39f2ef66925 | mousepad |  25 | [ “gel”, “blue” ]  |     19 |  22.85 | cm
(3 rows)

How to Create an Immutable Account Ledger on Amazon QLDB

Amazon QLDB is a new specialist database technology which uses the concept of blockchain for database which maintains a history of every transaction using an immutable & cryptographically verifiable block hash for each record. QLDB uses an immutable transactional log, known as a journal, for data storage. The journal tracks every change to your data and maintains a complete and verifiable history of changes over time.

Sourcing data from the journal, QLDB uses a cryptographic hash function (SHA-256) with a Merkle tree–based model to generate a secure output file of your ledger’s full hash chain. This output file is known as a digest and acts as a fingerprint of your data’s entire change history as of a point in time. It enables you to look back and validate the integrity of your data revisions relative to that fingerprint.

Each ledge in QLDB has exactly one journal. A journal can have multiple strands, which are partitions of the journal.

QLDB is not a de-centralized blockchain network like Bitcoin or Ethereum, if you are looking for a managed block chain network then this is what you must be after –> Amazon Managed Blockchain

/* QLDB is now Generally Available in many regions. Ensure you update your awscli to the latest version to have access to QLDB from awscli. */

$ aws –version
aws-cli/1.16.240 Python/3.7.4 Darwin/17.7.0 botocore/1.12.230

$aws qldb help
NAME
       qldb –

DESCRIPTION
       The control plane for Amazon QLDB

AVAILABLE COMMANDS
       o create-ledger

       o delete-ledger

       o describe-journal-s3-export

       o describe-ledger

       o export-journal-to-s3

       o get-block

       o get-digest

       o get-revision

       o help

Head over to your AWS Console > Amazon QLDB

  1. Create New Ledger called ‘ctransactions’


2. Go to Query Editor > Create Tables for the Account Ledger

CREATE TABLE Cardholder

CREATE TABLE Log_Record

CREATE TABLE Account



3. Create Indexes to help speed up queries against each table

CREATE INDEX ON Cardholder (CARD_NO)

CREATE INDEX ON Log_Record (TRXN_ID)

CREATE INDEX ON Account (ACCT_ID)



4. Insert 1 Record into the Ledger table ‘Account’. QLDB documents are stored in Amazon Ion format which is a Superset of JSON


INSERT INTO Account
 {
‘FirstName’ : ‘Shadab’,
‘LastName’ : ‘Mohammad’,
‘ACCT_ID’ : ‘AMZN10000001’,
‘ACCTYPE’ : ‘Current’,
‘Address’ : ‘2000, 2 Park Street, CBD’
}

documentId
   “7oOLmXPyHjXBSBCio76rJo”

This is the document ID for this transaction
   

5. Create more records and change the first record from ‘Current’ to ‘Saving’ account

Important :
Because QLDB does not enforce schema, you can insert the same document into a table
multiple times. Each insert statement commits a separate document entry to the journal, and
QLDB assigns each document a unique ID.

INSERT INTO Account
 << {
‘FirstName’ : ‘Shadab’,
‘LastName’ : ‘Mohammad’,
‘ACCT_ID’ : ‘AMZN10000001’,
‘ACCTYPE’ : ‘Saving’,
‘Address’ : ‘2000, 2 Park Street, CBD’
},
 {
‘FirstName’ : ‘William’,
‘LastName’ : ‘Tell’,
‘ACCT_ID’ : ‘AMZN10000002’,
‘ACCTYPE’ : ‘Saving’,
‘Address’ : ‘2013, 29 Campbell PDE, Bondi’
},
 {
‘FirstName’ : ‘Chris’,
‘LastName’ : ‘Wall’,
‘ACCT_ID’ : ‘AMZN10000003’,
‘ACCTYPE’ : ‘Current’,
‘Address’ : ‘2142, 15 Lane Cove, Marickville’
} >>

documentId
    “L7S9f2cONhz1JQhj8Gj8bk”
    “0007IXW5xSNDvq9gj3CZN8”
    “JELcBIyjv8uKRHBfhbCvHp”


6. Now let us check for all the changes done to this account using below query. We can check for all modifications made to the Account ID. Each modification or inser generated a unique hash we can be traced back to what that modification was

SELECT
    ACCT_ID FROM Account AS a BY ACCT_ID WHERE a.ACCT_ID = 'AMZN10000001'


ACCT_ID
    “7oOLmXPyHjXBSBCio76rJo”
    “L7S9f2cONhz1JQhj8Gj8bk”
    
As you can see there are 2 modifications to this record and we can verify what each of those modifications are
    

7. Check for the history for each new documentID for the Account
    

SELECT * FROM history(Account) AS h WHERE h.metadata.id
    = '7oOLmXPyHjXBSBCio76rJo'


{
  blockAddress: {
    strandId: “KQrxUDo7jDX4scpbKkhEVZ”,
    sequenceNo: 26
  },
  hash: {{6HVuwag6GMEki+ZRFaMdqidROpKZmsTsYcmWiAMZdmw=}},
  data: {
    FirstName: “Shadab”,
    LastName: “Mohammad”,
    ACCT_ID: “AMZN10000001”,
    ACCTYPE: “Current”,
    Address: “2000, 2 Park Street, CBD”
  },
  metadata: {
    id: “7oOLmXPyHjXBSBCio76rJo”,
    version: 0,
    txTime: 2019-09-19T05:56:20.874Z,
    txId: “2EOkrgLG2Xa7qBwEwnKzDJ”
  }
}

SELECT * FROM
    history(Account) AS h WHERE h.metadata.id = 'L7S9f2cONhz1JQhj8Gj8bk'


{
  blockAddress: {
    strandId: “KQrxUDo7jDX4scpbKkhEVZ”,
    sequenceNo: 41
  },
  hash: {{+qFvFfRx1pi9QOkotc9MAuAWs6Tw7jn3sZqlPpEjSDA=}},
  data: {
    FirstName: “Shadab”,
    LastName: “Mohammad”,
    ACCT_ID: “AMZN10000001”,
    ACCTYPE: “Saving”,
    Address: “2000, 2 Park Street, CBD”
  },
  metadata: {
    id: “L7S9f2cONhz1JQhj8Gj8bk”,
    version: 0,
    txTime: 2019-09-19T06:14:04.214Z,
    txId: “2EOkrgMRCo4CBTUdJWSjmT”
  }
}

Using the first hash we can see the account type was “Current”, and after we modified the document and checked it with the 2nd hash it now shows “Saving”. And we have a history of all modifications applied to that account to traverse back and check all modifications made to that Account ID.

8. To Verify a Document Revision

SELECT r.metadata.id, r.blockAddress FROM _ql_committed_Account AS r WHERE r.data.ACCT_ID = 'AMZN10000001'

id                                                   blockAddress
“7oOLmXPyHjXBSBCio76rJo”    {strandId:”KQrxUDo7jDX4scpbKkhEVZ”,sequenceNo:95}

“L7S9f2cONhz1JQhj8Gj8bk”        {strandId:”KQrxUDo7jDX4scpbKkhEVZ”,sequenceNo:95}


This is just the beginning of what can be possible to create in the FinTech world. No more complex audit reports and expensive 3rd part tools which are an additional overhead to manage, but your database itself keeps a history of any changes made to a transaction or account or debit/credit card.

References:

[1] Accessing Amazon QLDB Using the AWS CLI – https://docs.aws.amazon.com/qldb/latest/developerguide/Tools.CLI.html
[2] botocore – https://github.com/boto/botocore/commit/f646b7ea334c9dcf94af6ad201a45886c48c2648
[3] aws-cli – https://github.com/aws/aws-cli/commit/b3c7bd5a1b93d42ab1b75d99c33fc2b62b0dd6a4
[4] Installing the AWS CLI – Installing the AWS CLI Using pip – https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-install.html#install-tool-pip

Build and store a Hive mestastore outside an EMR cluster in a RDS MySQL database and Connect a Redshift cluster to an EMR cluster

This document addresses the specific configuration points that needs to be in place in order to build and store a Hive mestastore outside an EMR cluster in a RDS MySQL database. It also covers the steps to connect a Redshift cluster to an EMR cluster so Redshift can create and access the tables stored within the external metastore.

Resources Used:

Redshift Cluster

• RDS MySQL Instance

• EMR Cluster

Note: All resources must be in same VPC and same region for this practice.

Creating the RDS MySQL:

1 – First, start creating a RDS MySQL instance if you don’t have one already. Open AWS RDS Console and create an MySQL instance that will be used during this practice.

Note: Please make note of RDS security group, endpoint, Master User and Master Password. We will need that information later on.

2 – Once the RDS MySQL instance is created, modify its security groups to add a rule for All traffic on all Port Range to be allowed from the VPC’s default security group.

Note: This VPC’s default Security Group will be used while creating the EMR cluster later on as well but it needs to be whitelisted beforehand otherwise the EMR launching will fail while trying to reach out to the RDS MySQL.

Before creating the EMR Cluster:

3 – After creating the RDS MySQL (and open its security group to EMR) but right before creating the EMR cluster, a JSON configuration file needs to be created. This file will be ingested by EMR during the bootstrapping phase of EMR’s creation, it will basically tell EMR how to access the remote RDS MySQL database.

4 – Copy the JSON property structure from the following link (use Copy icon): https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-hive-metastore-external.html

5 – Paste it in a text editor and modify it carefully with the RDS details you noted earlier.

Note: Be careful, the value property can not contain any spaces or carriage returns. It should appear all on one line. Save it as “hiveConfiguration.json”.

6 – The final JSON configuration file should look like the following:

[
    {
      “Classification”: “hive-site”,
      “Properties”: {

        “javax.jdo.option.ConnectionURL”: “jdbc:mysql:\/\/database-1.cefjr3enh3dk.us-east-2.rds.amazonaws.com:3306\/hive?createDatabaseIfNotExist=true“,      

        “javax.jdo.option.ConnectionDriverName”:”org.mariadb.jdbc.Driver”,
        “javax.jdo.option.ConnectionUserName”:”admin“,

        “javax.jdo.option.ConnectionPassword”: “*********
      }
    }
]

Note 1: replace <hostname>, <username>, <password> with your own details:

Note 2: The part “hive?createDatabaseIfNotExist=true” determines the name of the database to be created in the MySQL RDS, in this case the database will be called “hive”.

7 – After creating above file, upload it to an S3 bucket/folder of your choice (in the same region of your resources).

Creating the EMR:

8 – Now, it is time to create the EMR cluster. To do this, open AWS EMR console and click Create Cluster button. This will prompt the Quick Options page but we won’t be using that. Click on Go to advanced options on the top of the page.

9 – This will send you to the Advanced Options page. There, under Software Configuration, select the following Applications:

Hadoop, Ganglia, Hive, Hue, Tez, Pig, Mahout

10 – In the same page, under Edit Software Settings section, click Load JSON from S3 and select the S3 bucket/path where you uploaded the previous created file “hiveConfiguration.json“. Select the file there and hit Select.

11 – In the Hardware Configuration page, make sure that the EMR cluster is in the same VPC as your MySQL RDS instance. Hit Next if you don’t want to change any Network configuration or Node types.

12 – Hit Next in the General Options page if you don’t want to change anything, although you might want to change the name of your EMR cluster here.

13 – In the next page, Security Options, make sure you have an EC2 Key Pair in that region and select it. Otherwise, create one!

Note: Create one now (if you don’t have one) before creating the EMR as you CAN’T add it later!!!

14 – Still in the Security Options page, expand the EC2 security groups panel and change both, Master and Core & Task instances to use the VPC’s default security group (the same whitelisted in the RDS MySQL security group earlier).

15 – Hit Create cluster and wait the EMR to be created. It will take some time…

Confirming that the metastore was created in the RDS MySQL

16 – Once the EMR is created, another rule needs to be added to the VPC’s default security group, one that allows SSHing into the EMR cluster on port 22 from your local IP. It should look like the following:

17 – With the right rules in place, try to connect to your EMR cluster from your local machine:

– – – chmod 600 article_key.pem 
–   ssh -i article_key.pem hadoop@ec2-18-XX-XX-XX.us-east-2.compute.amazonaws.com

18 – EMR has a MySQL client installed, use this client to connect to your MySQL database and perform few tests such as if the Security Groups are working properly and if the “hive” database was created properly

Note: You can do a telnet test from within EMR box as well to test Security Group access.

19 – To connect to the RDS MySQL, run the following command from your EMR box:

 mysql -h <rds-endpoint> -P 3306 -u <rds master user> -p <rds master password>

Examplemysql -h database-1.cefjr3enh3dk.us-east-2.rds.amazonaws.com -P 3306 -u admin123 -pPwD12345

20 – Once connected, use the following commands to verify if the Hive metastore was indeed created in the RDS. You should be able to see a database named “hive” there:

show databases;       à Lists all databases – “hive” should be there
use hive;             à Connects you to “hive” database
show tables;          à Lists all the meta tables within hive database
select * from TBLS;   à Lists all tables created in hive. At this point there’s none

Setting up necessary Spectrum Roles and Network requirements for Redshift and EMR

Note 1: Following steps assume that you already have a Redshift cluster and that you can connect to it. It will not guide you on how to create and access the Redshift cluster. 

Note 2: Since EMR, RDS MySQL share the same VPC’s default security group, they should be able to communicate to each other already. If that’s the case, you can skip Step 22 and go straight to Step 23, otherwise, If EMR and Redshift use different security groups, please do the step 22 first.

21 – Create a Role for Spectrum and attach it to your Redshift cluster. Follow the instructions here:

•       To Create the Role: https://docs.aws.amazon.com/redshift/latest/dg/c-getting-started-using-spectrum-create-role.html

•       To Associate the Role: https://docs.aws.amazon.com/redshift/latest/dg/c-getting-started-using-spectrum-add-role.html

22 – (Optional) Now that Redshift can access S3, Redshift also needs to access EMR cluster and vice-versa. Follow the steps listed under section “Enabling Your Amazon Redshift Cluster to Access Your Amazon EMR Cluster” in the following link: https://docs.aws.amazon.com/redshift/latest/dg/c-spectrum-external-schemas.html#c-spectrum-enabling-emr-access

Note: In summary, this creates an EC2 security group with Redshift’s Security Group and the EMR’s master node’s security groups inside it. Redshift’s Security Group must allow TCP in every port (0 – 65535) while EMR’s Security Group must allow TCP in port 9083 (Hive’s default). Next, you attach this newly created security group to both of your Redshift and EMR clusters.

23 – Once this is done, you should now be able to create the External Schema in Redshift, query the external tables from Redshift and also be able to create/see the schemas/tables from EMR Hive as well. However, at this point there’s no tables created yet.

Creating Tables on Hive First

24 – Log to Hive console and run the following:

> show databases;
default  (that’s the only database so far)
 
> create external table hive_table (col1 int, col2 string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘|’
location ‘s3://<your_bucket>/<your_folder>/‘;

> show tables;
hive_table  (that’s the table we just created) 

25 – Log back to your MySQL database and run the following commands:

show databases;       à Lists all databases – “hive” should be there
use hive;             à Connects you to “hive” database
show tables;          à Lists all the meta tables within hive database
select * from TBLS;   à  Lists all tables created in hive, “hive_table” is there!

Note: Now you will be able to see the newly created table “hive_table” showing on your External MySQL catalog.

Creating Schemas and Tables on Redshift Now

26 – On Redshift side, an External Schema must be created first before creating or querying the Hive tables, like following:

CREATE EXTERNAL SCHEMA emr_play                     à It can be any name, that’s a schema valid only for Redshift.
FROM HIVE METASTORE DATABASE ‘default’              à Use default database to match the database we have in Hive.  
URI ‘172.XXX.XXX.XXX‘ PORT 9083                     à EMR’s Private IP of the Master Instance. Hive’s default port is 9083.
IAM_ROLE ‘arn:aws:iam::000000000000:role/spectrum‘; à A valid Spectrum Role attached Redshift.

27 – Create the table(s):

create external table emr_play.redshift_table (col1 int, col2 varchar)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘|’
location ‘s3://<your_bucket>/<your_folder>/‘;

28 – Simply query the table now:

select * from emr_play.redshift_table;

29 – One more time, log back to your MySQL database and run the following commands again:

show databases;       à Lists all databases – “hive” should be there
use hive;             à Connects you to “hive” database
show tables;          à Lists all the meta tables within hive database
select * from TBLS;   à  Lists all tables created in hive, both tables are there!

 Note: You should be able to see the both Hive and Redshift tables now showing on your External MySQL catalog. You can also query the tables and create new tables on both Hive and Redshift side.

Deploy Oracle 18c Enterprise Edition or XE on Docker

  1. Download Oracle 18c XE from Oracles Website :
    https://www.oracle.com/database/technologies/xe-downloads.html
  2. Create Oracle Docker Project Directory

$ mkdir -p ~/oracle_docker

$ cd oracle_docker

$ git clone https://github.com/oracle/docker-images.git

$ cd docker-images

$ pwd
/Users/shadmha/oracle_docker/docker-images

$ cd ./OracleDatabase/SingleInstance/dockerfiles

  1. Copy the RPM downloaded for Oracle 18c from Oracle Website to 18.4.0 folder

$ cp ~/Downloads/oracle-database-xe-18c-1.0-1.x86_64.rpm ./18.4.0

$ pwd
/Users/shadmha/oracle_docker/docker-images/OracleDatabase/SingleInstance/dockerfiles

  1. Build the Oracle 18c XE Docker Image

$ sh buildDockerImage.sh -x -v 18.4.0

Checking Docker version.

DOCKER info:
Client:
Debug Mode: false
….
Successfully tagged oracle/database:18.4.0-xe

version: “3”
Oracle Database Docker Image for ‘xe’ version 18.4.0 is ready to be extended:

--> oracle/database:18.4.0-xe

Build completed in 752 seconds.

You will now have a docker image named oracle/database:18.4.0-xe which you can deploy to Docker as a container. The Oracle 18c XE image can be up-to 10GB so ensure you have the requisite space.

  1. Create a docker-compose.yml file and deploy the container
version: '3'
services:
 oracle:
 image: oracle/database:18.4.0-xe
 environment:
 - ORACLE_PWD=testing12345
 ports:
 - "11521:1521"

$ docker-compose up

Creating network “dockerfiles_default” with the default driver
Creating dockerfiles_oracle_1 … done
Attaching to dockerfiles_oracle_1

oracle_1 | Use https://localhost:5500/em to access Oracle Enterprise Manager for Oracle Database XE
oracle_1 | The Oracle base remains unchanged with value /opt/oracle
oracle_1 | #########################
oracle_1 | DATABASE IS READY TO USE!
oracle_1 | #########################
oracle_1 | The following output is now a tail of the alert.log:
oracle_1 | Pluggable database XEPDB1 opened read write
oracle_1 | Completed: alter pluggable database XEPDB1 open

The process can take up-to 15-20 mins for the container to start

  1. Connect using SQL Developer to your Oracle Xe Docker DB

Username : sys
Password : testing12345
Role : SYSDBA

Hostname : localhost
Port : 11521
Service Name : XE

== Connect to the Pluggable DB part of Oracle XE ==

Username : sys
Password : testing12345
Role : SYSDBA

Hostname : localhost
Port : 11521
Service Name : XEPDB1

=================================================================================================

Docker Procedure for Enteprise Edition Oracle. Only change is in Steps 3,4,5,6

=================================================================================================

  1. Download and Build the Oracle 18c EE Docker Image
    == Enterprise Edition Oracle Database 18c 18.3.0 ==
    Download Enterprise Edition Oracle Database Binary from Oracle Website For eg: 18.3 Linux : https://www.oracle.com/database/technologies/oracle18c-linux-180000-downloads.html

$ cp -p ~/Downloads/LINUX.X64_180000_db_home.zip ~/oracle_docker/docker-images/OracleDatabase/SingleInstance/dockerfiles/18.3.0/

$ cd ..

$ sh buildDockerImage.sh -e -v 18.3.0

-e Flag to install Enterprise Edition

Checking Docker version.

DOCKER info:
….
Successfully built fda25b164cd4
Successfully tagged oracle/database:18.3.0-ee

Oracle Database Docker Image for ‘ee’ version 18.3.0 is ready to be extended:
–> oracle/database:18.3.0-ee
Build completed in 2265 seconds.

  1. Now the Image build has been completed, now time to deploy the Enterprise Edition Oracle DB Container

$ pwd
/Users/shadmha/oracle_docker/docker-images/OracleDatabase/SingleInstance/dockerfiles

$ mkdir oracle_ee_18c
$ cd oracle_ee_18c
$ vim docker-compose.yml

version: '3'
services:
oracle:
image: oracle/database:18.3.0-ee
environment:
- ORACLE_PWD=testing12345
ports:
- "11522:1521"

$ docker-compose up

Creating network “oracle_ee_18c_default” with the default driver
Creating oracle_ee_18c_oracle_1 … done
Attaching to oracle_ee_18c_oracle_1
oracle_1 | ORACLE PASSWORD FOR SYS, SYSTEM AND PDBADMIN: testing12345

Database creation complete. For details check the logfiles at:
oracle_1 | /opt/oracle/cfgtoollogs/dbca/ORCLCDB.
oracle_1 | Database Information:
oracle_1 | Global Database Name:ORCLCDB
oracle_1 | System Identifier(SID):ORCLCDB
racle_1 | #########################
oracle_1 | DATABASE IS READY TO USE!
oracle_1 | #########################
oracle_1 | The following output is now a tail of the alert.log:
oracle_1 | ORCLPDB1(3):CREATE SMALLFILE TABLESPACE “USERS” LOGGING DATAFILE ‘/opt/oracle/oradata/ORCLCDB/ORCLPDB1/users01.dbf’ SIZE 5M REUSE AUTOEXTEND ON NEXT 1280K MAXSIZE UNLIMITED EXTENT MANAGEMENT LOCAL SEGMENT SPACE MANAGEMENT AUTO
oracle_1 | ORCLPDB1(3):Completed: CREATE SMALLFILE TABLESPACE “USERS” LOGGING DATAFILE ‘/opt/oracle/oradata/ORCLCDB/ORCLPDB1/users01.dbf’ SIZE 5M REUSE AUTOEXTEND ON NEXT 1280K MAXSIZE UNLIMITED EXTENT MANAGEMENT LOCAL SEGMENT SPACE MANAGEMENT AUTO
oracle_1 | ORCLPDB1(3):ALTER DATABASE DEFAULT TABLESPACE “USERS”
oracle_1 | ORCLPDB1(3):Completed: ALTER DATABASE DEFAULT TABLESPACE “USERS”
oracle_1 | 2020-02-16T04:47:06.581223+00:00
oracle_1 | ALTER SYSTEM SET control_files=’/opt/oracle/oradata/ORCLCDB/control01.ctl’ SCOPE=SPFILE;
oracle_1 | 2020-02-16T04:47:06.789543+00:00
oracle_1 | ALTER SYSTEM SET local_listener=” SCOPE=BOTH;
oracle_1 | ALTER PLUGGABLE DATABASE ORCLPDB1 SAVE STATE
oracle_1 | Completed: ALTER PLUGGABLE DATABASE ORCLPDB1 SAVE STATE

  1. Connect with SQL Developer

Username : sys
Password : testing12345
Role : SYSDBA

Hostname : localhost
Port : 11522
Service Name : ORCLCDB

== Connect to the Pluggable DB part of Oracle 18c EE ==

Username : sys
Password : testing12345
Role : SYSDBA

Hostname : localhost
Port : 11522
Service Name : ORCLPDB1

=================================================================================================

=================================================================================================

Deploy WordPress with MySQL on Docker in less than 3 minutes

First Make a WordPress Docker Project Directory

$ mkdir -p /u01/my_wordpress

$ vim docker-compose.yml

version: ‘3.3’

services:
db:
image: mysql:5.7
volumes:
– db_data:/var/lib/mysql
restart: always
environment:
MYSQL_ROOT_PASSWORD: somewordpress
MYSQL_DATABASE: wordpress
MYSQL_USER: wordpress
MYSQL_PASSWORD: wordpress

wordpress:
depends_on:
– db
image: wordpress:latest
ports:
– “8000:80”
restart: always
environment:
WORDPRESS_DB_HOST: db:3306
WORDPRESS_DB_USER: wordpress
WORDPRESS_DB_PASSWORD: wordpress
WORDPRESS_DB_NAME: wordpress
volumes:
db_data: {}

  • The Docker volume db_data persists any updates made by WordPress to the database.
  • WordPress Multisite works only on ports 80 and 443.

Now, run docker-compose up -d from your project directory. If it is not available install it using YUM

$ sudo yum install docker-compose

$ docker-compose up -d

Creating network “mywordpress_default” with the default driver
Creating volume “mywordpress_db_data” with default driver
Pulling db (mysql:5.7)…

Go to your Browser http://ip-address-local-machine:8000/ and start setup of WordPress

Deploy MySQL Server on Docker in less than 3 minutes

mkdir -p $HOME/docker/persistent_volumes/mysql

docker pull mysql/mysql-server:latest

docker run –name mysql-server-docker -d -p 3306:3306 -v $HOME/docker/persistent_volumes/mysql:/var/lib/mysql -e MYSQL_ROOT_PASSWORD=mypassword mysql/mysql-server:latest –default_authentication_plugin=mysql_native_password

docker logs mysql-server-docker

docker exec -it mysql-server-docker mysql -uroot -p

docker container list

docker stop mysql-server-docker

docker container ls -a

docker container rm mysql-server-docker

docker image list

docker image remove mysql/mysql-server

Federated Query from Redshift to Aurora PostgreSQL

Create Public Accessible Redshift Cluster and Aurora PostgreSQL/ RDS PostgreSQL cluster. The RDS PostgreSQL or Aurora PostgreSQL must be in the same VPC as your Amazon Redshift cluster. If the instance is publicly accessible, configure its security group’s inbound rule to: Type: PostgreSQL, Protocol: TCP, Port Range: 5432, Source: 0.0.0.0/0. Otherwise, if the instance is not publicly accessible, you don’t need to configure an inbound rule.

  1. Go to AWS Console > Secrets Manager > Create Secret Managers for RDS Database and Select your PostgreSQL database
  2. Create IAM policy with ARN of above Secrets manager
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "AccessSecret",
"Effect": "Allow",
"Action": [
"secretsmanager:GetResourcePolicy",
"secretsmanager:GetSecretValue",
"secretsmanager:DescribeSecret",
"secretsmanager:ListSecretVersionIds"
],
"Resource": "arn:aws:secretsmanager:us-east-1:111111111111:secret:federated-query-L9EBau"
},
{
"Sid": "VisualEditor1",
"Effect": "Allow",
"Action": [
"secretsmanager:GetRandomPassword",
"secretsmanager:ListSecrets"
],
"Resource": "*"
}
]
}
  1. Create IAM Redshift customizable role and attach the Above Policy to it.
  2. Attach the Role to your Redshift Cluster
  3. Create External Schema in Redshift
CREATE EXTERNAL SCHEMA IF NOT EXISTS myRedshiftSchema
FROM POSTGRES
DATABASE 'testdb' SCHEMA 'aurora_schema'
URI 'federated-cluster-instance-1.c2txxxxupg1.us-east-1.rds.amazonaws.com' PORT 5432
OPTIONS 'application_name=psql'
IAM_ROLE 'arn:aws:iam::1111111111111:role/federated-query-role'
SECRET_ARN 'arn:aws:secretsmanager:us-east-1:11111111111111:secret:federated-query-L9EBau';
  1. In RDS Aurora PostgreSQL create the schema which will hold the objects for federated query access
testdb=> create schema aurora_schema;
CREATE SCHEMA
testdb=> create table aurora_schema.federatedtable (id int8, name varchar(50), log_txn_date timestamp);
CREATE TABLE
testdb=> insert into aurora_schema.federatedtable values(1,'shadab','2019-12-26 00:00:00');
INSERT 0 1
testdb=> select * from aurora_schema.federatedtable;
id | name | log_txn_date
----+--------+---------------------
1 | shadab | 2019-12-26 00:00:00
(1 row)
  1. Login to your Redshift cluster and Run the Federated Query from Redshift to PostgreSQL —
testdb=# select * from myredshiftschema.federatedtable;
id | name | log_txn_date
----+--------+---------------------
1 | shadab | 2019-12-26 00:00:00
(1 row)
-- Trying out CTAS from Redshift to PostgreSQL --
testdb=# create table test as select * from myredshiftschema.federatedtable;
SELECT
testdb=# select * from test;
id | name | log_txn_date
----+--------+---------------------
1 | shadab | 2019-12-26 00:00:00
testdb=# select pg_last_query_id();
         1251

(1 row)

select * from svl_s3query_summary where query='1251';
-[ RECORD 1 ]-----------+---------------------------
userid | 100
query | 1251
xid | 7338
pid | 11655
segment | 0
step | 0
starttime | 2019-12-27 06:17:30.800652
endtime | 2019-12-27 06:17:30.947853
elapsed | 147201
aborted | 0
external_table_name | PG Subquery
file_format | Text
.
.
.

References:

[1]
Federated Query in Amazon Redshift (Preview) – https://docs.aws.amazon.com/redshift/latest/dg/federated-overview.html
[2]
CREATE EXTERNAL SCHEMA – https://docs.aws.amazon.com/redshift/latest/dg/federated-external-schema.html
[3]
Create a Secret and an IAM Role for Federated Query – https://docs.aws.amazon.com/redshift/latest/dg/federated-create-secret-iam-role.html

Run Multiple Commands on Remote Host using SSH & Sync Directories to Remote Host

#!/bin/bash

sourcefile='/var/www/html/index.html'
remotedir='/var/www/html'
rsyncremotedir='/var/www/'
rsynclocaldir='/var/www/'

### Run Multiple Commands on Remote Host ###
while IFS= read -r dest; do
   ssh -i "mynew_key.pem" $dest '
        date
	sudo yum install rsync
	sudo chown -R ec2-user:root /var/www
	sudo rm -rf /home/ec2-user/ourfile.txt
	ls -ltrh /var/www/html
        hostname' </dev/null
done < hosts.txt

### Transfer File to All Remote Hosts ###
#while IFS= read -r dest; do
#  scp -i "mynew_key.pem" $sourcefile "$dest:$remotedir"
#done <hosts.txt

# Sync Push Local Directory to Remote Hosts Using RSYNC ###
while IFS= read -r dest; do
  rsync -avz /var/www/ --delete --exclude 'stats' -e "ssh -i /home/ec2-user/mynew_key.pem" $dest:/var/www/
done <hosts.txt

# Sync Pull Remote Directory to Local Directory Using RSYNC ###
#while IFS= read -r dest; do
#  rsync -avz --dry-run --delete --exclude 'stats' -e "ssh -i /home/ec2-user/mynew_key.pem" $dest:/var/www/ /var/www/
#done <hosts.txt

Very useful in syncing Apache Web Server directories to multiple hosts when running under AWS ELB