Create DMS Replication From MongoDB 4.2 on EC2 Linux to Redshift
Summary
We will create a hub to spoke replication from Mongo DB 4.2 Database to Redshift Schema. MongoDb is installed in the same VPC as Redshift and DMS replication Instance
Main Text
MongoDB is a NoSQL Datastore where data in inserted and read in JSON format. Internally, a MongoDB document is stored as a binary JSON (BSON) file in a compressed format. Terminology for database, schema and tables in MongoDB is a bit different when compared to relational databases. Here’s how each jargon in MongoDb compares to one in Redshift/PostgreSQL
MongoDB | Redshift/PostgreSQL |
Database | Schema |
Collection | Table |
Documents | Records/Rows |
MongoDB does not have a a proper structure for a schema and it is is essentially a document store so data can be inserted without defining a proper schema or a structure. This gives MongoDB a lot of flexibility and hence it is the preferred choice for modern applications as you can start development of your application without defining a data model first. It is a schema-less approach to software architecture, which has its own pros and cons.
In our example we will work through the default database in MongoDB called “admin” and create collections(tables) in it and those tables will be replicated to an equivalent schema in Redshift called “admin” which will hold the different tables. We will go about this in 3 stages:
Stage 1 : Create MongoDB on EC2 AMZN Linux, create the collections and connect to MongoDB from your client machine and check the configuration.
Stage 2: Create Redshift Cluster, Aurora PostgreSQL in Private and Public Subnet and Connect to the All Database instances and check.
Stage 3: Create DMS Replication Instance, DMS Replication Endpoints & DMS Replication Tasks.And finally we will check if all data is being replicated by DMS to all the targets.
Architecture : It is a Hub-to-Spoke Architecture with MongoDB Source Being Replicated to Multiple Heterogeneous Targets. However in this Article we will only configure Replication from MongoDB to Redshift.

Stage 1: Create a EC2 Amazon Linux in a Public Subnet and Install MongoDB in it
You can use AWS ‘MongoDB Quick Start’ guide on AWS to deploy MongoDB into a new VPC or deploy into an existing VPC. The guide has two cloud formation templates which can create a new VPC under your account, configure the public & private subnets and launch the EC2 instances with latest version of MongoDB installed. Check this link for the quick deployment options for MongoDB on AWS : https://docs.aws.amazon.com/quickstart/latest/mongodb/step2.html
OR
If you want to install MongoDB manually then you can follow the below procedure :
1. Create Amazon Linux EC2 Instance
2. Add MongoDB Repo and Install MongoDB on AMZN Linux
$ sudo vi /etc/yum.repos.d/mongodb-org-4.0.repo
[mongodb-org-4.0]
name=MongoDB Repository
baseurl=https://repo.mongodb.org/yum/amazon/2013.03/mongodb-org/testing/x86_64/
gpgcheck=1
enabled=1
gpgkey=https://www.mongodb.org/static/pgp/server-4.2.asc
$ sudo yum -y install mongodb-org
$ sudo service mongod start
$ sudo cat /var/log/mongodb/mongod.log
3 . Create Root login
$ mongo localhost/admin
> use admin
> db.createUser( { user:"root", pwd:"root123", roles: [ { role: "root", db: "admin" } ] } )
> exit
4. Modify mongod configuration file /etc/mongod.conf using vi editor
Change below lines from
# network interfaces
net:
port: 27017
bindIp: 127.0.0.1 # Listen to local interface only, comment to listen on all interfaces.
#security:
To
# network interfaces
net:
port: 27017
bindIp: 0.0.0.0 # Listen to local interface only, comment to listen on all interfaces.
security:
authorization: enabled
5. Restart mongod service
$ sudo service mongod restart
Use admin DB as authentication database and ‘root’ user can be used for CDC full load task.
6. For CDC replication, a rmongodb replica needs to be setup and permissions need to be modified/added as below :
Modify mongod.conf using vi editor
$ sudo vi /etc/mongod.conf
replication:
replSetName: rs0
$ sudo service mongod restart
7. Initiate Replica Set for CDC
$ mongo localhost/admin -u root -p
> rs.status()
{
“ok” : 0,
“errmsg” : “no replset config has been received”,
“code” : 94,
“codeName” : “NotYetInitialized”
}
> rs.initiate()
{
“info2” : “no configuration specified. Using a default configuration for the set”,
“me” : “ip-10-0-137-99.ap-southeast-2.compute.internal:27017”,
“ok” : 1
}
rs0:SECONDARY> rs.status()
{
“set” : “rs0”,
“date” : ISODate(“2019-12-31T05:02:22.431Z”),
“myState” : 1,
“term” : NumberLong(1),
“syncingTo” : “”,
“syncSourceHost” : “”,
“syncSourceId” : -1,
“heartbeatIntervalMillis” : NumberLong(2000),
“majorityVoteCount” : 1,
“writeMajorityCount” : 1,
“optimes” : {
“lastCommittedOpTime” : {
“ts” : Timestamp(1577768528, 5),
“t” : NumberLong(1)
},
“lastCommittedWallTime” : ISODate(“2019-12-31T05:02:08.362Z”),
“readConcernMajorityOpTime” : {
“ts” : Timestamp(1577768528, 5),
“t” : NumberLong(1)
},
“readConcernMajorityWallTime” : ISODate(“2019-12-31T05:02:08.362Z”),
“appliedOpTime” : {
“ts” : Timestamp(1577768528, 5),
“t” : NumberLong(1)
},
“durableOpTime” : {
“ts” : Timestamp(1577768528, 5),
“t” : NumberLong(1)
},
“lastAppliedWallTime” : ISODate(“2019-12-31T05:02:08.362Z”),
“lastDurableWallTime” : ISODate(“2019-12-31T05:02:08.362Z”)
},
“lastStableRecoveryTimestamp” : Timestamp(1577768528, 4),
“lastStableCheckpointTimestamp” : Timestamp(1577768528, 4),
“electionCandidateMetrics” : {
“lastElectionReason” : “electionTimeout”,
“lastElectionDate” : ISODate(“2019-12-31T05:02:07.346Z”),
“electionTerm” : NumberLong(1),
“lastCommittedOpTimeAtElection” : {
“ts” : Timestamp(0, 0),
“t” : NumberLong(-1)
},
“lastSeenOpTimeAtElection” : {
“ts” : Timestamp(1577768527, 1),
“t” : NumberLong(-1)
},
“numVotesNeeded” : 1,
“priorityAtElection” : 1,
“electionTimeoutMillis” : NumberLong(10000),
“newTermStartDate” : ISODate(“2019-12-31T05:02:08.354Z”),
“wMajorityWriteAvailabilityDate” : ISODate(“2019-12-31T05:02:08.362Z”)
},
“members” : [
{
“_id” : 0,
“name” : “ip-10-0-137-99.ap-southeast-2.compute.internal:27017”,
“ip” : “10.0.137.99”,
“health” : 1,
“state” : 1,
“stateStr” : “PRIMARY”,
“uptime” : 80,
“optime” : {
“ts” : Timestamp(1577768528, 5),
“t” : NumberLong(1)
},
“optimeDate” : ISODate(“2019-12-31T05:02:08Z”),
“syncingTo” : “”,
“syncSourceHost” : “”,
“syncSourceId” : -1,
“infoMessage” : “could not find member to sync from”,
“electionTime” : Timestamp(1577768527, 2),
“electionDate” : ISODate(“2019-12-31T05:02:07Z”),
“configVersion” : 1,
“self” : true,
“lastHeartbeatMessage” : “”
}
],
“ok” : 1,
“$clusterTime” : {
“clusterTime” : Timestamp(1577768528, 5),
“signature” : {
“hash” : BinData(0,”nDISrR4afyRUVEQVntFkkVpTJKY=”),
“keyId” : NumberLong(“6776464228418060290”)
}
},
“operationTime” : Timestamp(1577768528, 5)
}
rs0:PRIMARY>
8. Make sure security group is open for dms replication group for the port on your EC2 instance where MongoDB is running (Default MongoDB port is 27017)
9. Add a collection (table) with some data to database ‘admin’ in the mongodb installation
> show collections
db.createCollection("accounts", { capped : true, autoIndexId : true, size :
6142800, max : 10000 } )
db.accounts.insert({"company": "Booth-Wade",
"location": "5681 Mitchell Heights\nFort Adamstad, UT 8019B",
"ip_address": "192.168.110.4B",
"name": "Mark Becker",
"eid": 27561})
db.accounts.insert({"company": "Myers, Smith and Turner",
"location": "USS BenjaminNlinFP0 AA 40236",
"ip_address": "172.26.254.156",
"name": "Tyler clark",
"eid": 87662})
db.accounts.insert({"company": "Bowen-Harris",
"location": "Tracey Plaza East Katietown,Sc74695",
"ip_address": "172.28.45.209",
"name": "Veronica Gomez",
"eid": 772122})
> db.accounts.find( {} )
— Insert Many —
db.accounts.insertMany(
[
{“company”: “Booth-Wade”,
“location”: “5681 Mitchell Heights\nFort Adamstad, UT 8019B”,
“ip_address”: “192.168.110.4B”,
“name”: “Mark Becker”,
“eid”: 27561},
{“company”: “Myers, Smith and Turner”,
“location”: “USS BenjaminNlinFP0 AA 40236”,
“ip_address”: “172.26.254.156”,
“name”: “Tyler clark”,
“eid”: 87662},
{“company”: “Bowen-Harris”,
“location”: “Tracey Plaza East Katietown,Sc74695”,
“ip_address”: “172.28.45.209”,
“name”: “Veronica Gomez”,
“eid”: 772122}
]);
— Array Insert —
db.accounts.insertMany( [
{
“_id”: “5e037719f45Btodlcdb492464”,
“accounts”: [
{
“company”: “Booth-Wade”,
“location”: “5681 Mitchell Heights\nFort Adamstad, UT 8019B”,
“ip_address”: “192.168.110.4B”,
“name”: “Mark Becker”,
“eid”: 27561
},
{
“company”: “Myers, Smith and Turner”,
“location”: “USS BenjaminNlinFP0 AA 40236”,
“ip_address”: “172.26.254.156”,
“name”: “Tyler clark”,
“eid”: 87662
},
{
“company”: “Bowen-Harris”,
“location”: “Tracey Plaza East Katietown,Sc74695”,
“ip_address”: “172.28.45.209”,
“name”: “Veronica Gomez”,
“eid”: 772122
}
]
}
]);
— Inventory Nested Array Collection —
db.createCollection(“inventory”, { capped : true, autoIndexId : true, size :
6142800, max : 10000 } )
db.createCollection(“inventory_new”, { capped : true, autoIndexId : true, size :
6142800, max : 10000 } )
db.inventory.insertMany( [
{ item: “journal”, instock: [ { warehouse: “A”, qty: 5 }, { warehouse: “C”, qty: 15 } ] },
{ item: “notebook”, instock: [ { warehouse: “C”, qty: 5 } ] },
{ item: “paper”, instock: [ { warehouse: “A”, qty: 60 }, { warehouse: “B”, qty: 15 } ] },
{ item: “planner”, instock: [ { warehouse: “A”, qty: 40 }, { warehouse: “B”, qty: 5 } ] },
{ item: “postcard”, instock: [ { warehouse: “B”, qty: 15 }, { warehouse: “C”, qty: 35 } ] }
]);
db.inventory_new.insertMany([
{ item: “journal”, qty: 25, tags: [“blank”, “red”], size: { h: 14, w: 21, uom: “cm” } },
{ item: “mat”, qty: 85, tags: [“gray”], size: { h: 27.9, w: 35.5, uom: “cm” } },
{ item: “mousepad”, qty: 25, tags: [“gel”, “blue”], size: { h: 19, w: 22.85, uom: “cm” } }
])
> db.inventory.find( {} )
> show collections
accounts
inventory
inventory_new
system.keys
system.users
system.version
We have created 3 user collections called ‘accounts’,’inventory’ & ‘inventory_new’. These 3 collections(tables) shall be replicated to our targets. Connect and Check from MongoDB Compass on your Client Machine
Stage 2: Install Redshift Cluster
Create a VPC with Public and Private Subnets. In a real world production scenario, it is always recommended to put your databases in a Private subnet
1. Create Public and Private Subnet
2. Install Redshift in Public Subnet
3. Install Redshift in Private Subnet
If you have different scenario’s of your DMS replication in one VPC and Databases in other VPC or Replicating from on-premise to AWS VPC then you can refer this link : https://docs.aws.amazon.com/dms/latest/userguide/CHAP_ReplicationInstance.VPC.html
Stage 3: Create DMS Replication Instance, DMS Replication Endpoints & DMS Replication Tasks for MongoDB
Steps : Create Replication Instance > Create Endpoints > Create DMS Tasks
1. Create Replication instance
Go to AWS Console > Database Migration Service > Replication Instance > Create Replication Instance
2. Create Replication Endpoints
a) MongoDB Replication Endpoint
Go to DMS Console > Endpoints > Create Endpoint. Use this link for configuration for your endpoint > https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Source.MongoDB.html
In MongoDB as source you have 2 modes available : Document Mode and Table Mode. Some important points to note in this regard are :
- A record in MongoDB is a document, which is a data structure composed of field and value pairs. The value of a field can include other documents, arrays, and arrays of documents. A document is roughly equivalent to a row in a relational database table.
- A collection in MongoDB is a group of documents, and is roughly equivalent to a relational database table.
- Internally, a MongoDB document is stored as a binary JSON (BSON) file in a compressed format that includes a type for each field in the document. Each document has a unique ID.
MongoDB is officially supported on versions 2.6.x and 3.x as a database source only. But I have tested it with MongoDB 4.2, which is the latest community version and it works without any issues, However I would advise to stick with the officially certified versions. AWS DMS supports two migration modes when using MongoDB as a source. You specify the migration mode using the Metadata mode parameter using the AWS Management Console or the extra connection attribute nestingLevel
when you create the MongoDB endpoint.
Document mode
In document mode, the MongoDB document is migrated as is, meaning that the document data is consolidated into a single column named _doc
in a target table.
Table mode
In table mode, AWS DMS transforms each top-level field in a MongoDB document into a column in the target table. If a field is nested, AWS DMS flattens the nested values into a single column. AWS DMS then adds a key field and data types to the target table’s column set.
Connection Attributes
nestingLevel
Value : NONE
ONE
Description : NONE – Specify NONE to use document mode. Specify ONE to use table mode.
extractDocID
Value :true
false
Description : false – Use this attribute when nestingLevel
is set to NONE.
Test the Endpoint
b) Create Redshift Replication Endpoint
Test Redshift Endpoint
Once you create the endpoint for Redshift it will automatically adds a DMS endpoint roles and assigns it to the Redshift role. Further down when we create S3 as target endpoint we need to add the S3 permissions via a managed policy to this same role
dms-access-for-endpoint : arn:aws:iam::775867435088:role/dms-access-for-endpoint
c) Create MongoDB-Redshift Database Migration Task
Go to DMS Console > Conversion & Migration > Database Migrations Tasks > Create Task
Before moving ahead step that the security group of Redshift allows ingress rules for port 5439 for 0.0.0.0/0 or preferably the Security Group ID of your Replication Instance is added to the ingress rules for Redshift SG over port 5439. Check this link for more information : https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Target.Redshift.html
In our case DMS Replication Instance SGID is ‘sg-0a695ef98b6e39963’. So SG of Redshift looks like below:
Refer this documentation for more complex VPC setup, It is beyond the scope of this article : https://docs.aws.amazon.com/dms/latest/userguide/CHAP_ReplicationInstance.VPC.html
Checking from Redshift..we can see all the 3 tables from mongodb ‘accounts’,’inventory’ & ‘inventory_new’ are created and also the schema ‘admin’ is automatically created by DMS.
Query all the tables to confirm data is replicated
testdb=# select * from admin.accounts;
_id | array_accounts
——————————————————————————————————-
5e037719f45Btodlcdb492464 | [ { “company” : “Booth-Wade”, “location” : “5681 Mitchell Heights\nFort Adamstad, UT 8019B”, “ip
_address” : “192.168.110.4B”, “name” : “Mark Becker”, “eid” : 27561.0 }, { “company” : “Myers, Smith and Turner”, “location”
: “USS BenjaminNlinFP0 AA 40236”, “ip_address” : “172.26.254.156”, “name” : “Tyler clark”, “eid” : 87662.0 }, { “company” : “B
owen-Harris”, “location” : “Tracey Plaza East Katietown,Sc74695”, “ip_address” : “172.28.45.209”, “name” : “Veronica Gomez”, “
eid” : 772122.0 } ]
9772sjs19f45Btodlcdbk49fk4 | [ { “company” : “Trust Co”, “location” : “Zetland Inc.”, “ip_address” : “12.168.210.2B”, “name”
: “Mert Cliff”, “eid” : 4343.0 }, { “company” : “Mist Ltd.”, “location” : “Cliffstone yard”, “ip_address” : “72.32.254.156”, “
name” : “Kris Loff”, “eid” : 76343.0 }, { “company” : “Coles Supermarket”, “location” : “Randwich St”, “ip_address” : “22.28.4
5.110″, “name” : “Will Markbaeur”, “eid” : 13455.0 } ]
(2 rows)
testdb=# select * from admin.inventory;
oid__id | item | array_instock
————————–+———-+——————————————————————————
5e0bd854fd4602c4b6926d68 | journal | [ { “warehouse” : “A”, “qty” : 5.0 }, { “warehouse” : “C”, “qty” : 15.0 } ]
5e0bd854fd4602c4b6926d69 | notebook | [ { “warehouse” : “C”, “qty” : 5.0 } ]
5e0bd854fd4602c4b6926d6a | paper | [ { “warehouse” : “A”, “qty” : 60.0 }, { “warehouse” : “B”, “qty” : 15.0 } ]
5e0bd854fd4602c4b6926d6b | planner | [ { “warehouse” : “A”, “qty” : 40.0 }, { “warehouse” : “B”, “qty” : 5.0 } ]
5e0bd854fd4602c4b6926d6c | postcard | [ { “warehouse” : “B”, “qty” : 15.0 }, { “warehouse” : “C”, “qty” : 35.0 } ]
(5 rows)
testdb=# select * from admin.inventory_new;
oid__id | item | qty | array_tags | size.h | size.w | size.uom
————————–+———-+—–+——————–+——–+——–+———-
5e0bef1775d0b39f2ef66923 | journal | 25 | [ “blank”, “red” ] | 14 | 21 | cm
5e0bef1775d0b39f2ef66924 | mat | 85 | [ “gray” ] | 27.9 | 35.5 | cm
5e0bef1775d0b39f2ef66925 | mousepad | 25 | [ “gel”, “blue” ] | 19 | 22.85 | cm
(3 rows)