Tracking Changes in Your PostgreSQL Tables: Implementing a Custom Change Data Capture (CDC)

Introduction:
Change Data Capture (CDC) is a technique used to track changes in a database, such as inserts, updates, and deletes. In this blog post, we will show you how to implement a custom CDC in PostgreSQL to track changes in your database. By using a custom CDC, you can keep a record of changes in your database and use that information in your applications, such as to provide a history of changes, track auditing information, or trigger updates in other systems

Implementing a Custom CDC in PostgreSQL:
To implement a custom CDC in PostgreSQL, you will need to create a new table to store the change information, create a trigger function that will be executed whenever a change is made in the target table, and create a trigger that will call the trigger function. The trigger function will insert a new row into the change table with the relevant information, such as the old and new values of the record, the time of the change, and any other relevant information.

To demonstrate this, we will show you an example of a custom CDC for a table called “employee”. The change table will be called “employee_cdc” and will contain columns for the timestamp, employee ID, old values, and new values of the employee record. The trigger function will be executed after an update on the “employee” table and will insert a new row into the “employee_cdc” table with the relevant information. Finally, we will show you how to query the “employee_cdc” table to retrieve a list of all changes that have occurred in the “employee” table since a certain timestamp.

  1. Create the Employee and CDC table

    To store the CDC information, you need to create a new table in your PostgreSQL database. In this example, we will create a table called “employee”, “employee_cdc”, “employee_audit” with the following columns:

CREATE TABLE employee (
id SERIAL PRIMARY KEY,
name VARCHAR(100) NOT NULL,
department VARCHAR(50) NOT NULL,
salary NUMERIC(10,2) NOT NULL,
hire_date DATE NOT NULL
);

CREATE TABLE employee_cdc (
timestamp TIMESTAMP DEFAULT now(),
employee_id INTEGER,
old_values JSONB,
new_values JSONB
);

In this table, “id” is an auto-incremented primary key, “timestamp” is a timestamp with time zone to store the time of the change, “employee_id” is the primary key of the employee record that was changed, and “old_values” and “new_values” are text columns to store the old and new values of the employee record, respectively.

2. Create the Audit table

CREATE TABLE employee_audit (
audit_timestamp TIMESTAMP DEFAULT now(),
employee_id INTEGER,
old_values JSONB,
new_values JSONB
);

3. Create the trigger function

To capture the changes in the employee table, you will need to create a trigger function that will be executed whenever a record is inserted, updated, or deleted in the table. The trigger function will insert a new row into the “employee_cdc” table with the relevant information. Here is an example trigger function:

CREATE OR REPLACE FUNCTION employee_cdc() RETURNS TRIGGER AS $$
BEGIN
IF (TG_OP = 'UPDATE') THEN
INSERT INTO employee_cdc (timestamp, employee_id, old_values, new_values)
VALUES (now(), NEW.id, row_to_json(OLD), row_to_json(NEW));
INSERT INTO employee_audit (employee_id, old_values, new_values)
VALUES (NEW.id, row_to_json(OLD), row_to_json(NEW));
END IF;
RETURN NULL;
END;
$$ LANGUAGE plpgsql;

This trigger function uses the “row_to_json” function to convert the old and new values of the employee record into JSON strings, which are then stored in the “old_values” and “new_values” columns of the “employee_cdc” table. The “NOW()” function is used to get the current timestamp.

4. Create the trigger

Now that the trigger function has been created, you need to create the trigger on the “employee” table that will call the function whenever a record is updated. You can create the trigger with the following command:

CREATE TRIGGER employee_cdc_trigger
AFTER UPDATE ON employee
FOR EACH ROW
EXECUTE FUNCTION employee_cdc();

4. Query the CDC table

In your application code, you can query the “employee_cdc” table to get a list of all changes that have occurred since a certain timestamp. For example, to get all changes since January 1st, 2023, you can use the following SQL query:

SELECT * FROM employee_cdc
WHERE timestamp >= '2023-01-01 00:00:00';

You can then process these changes as needed in your application code.

Conclusion:
In this blog post, we have shown you how to implement a custom Change Data Capture (CDC) in PostgreSQL to track changes in your database. By using a custom CDC, you can keep a record of changes in your database and use that information in your applications. Whether you are tracking changes for auditing purposes, providing a history of changes, or triggering updates in other systems, a custom CDC is a useful tool to have in your PostgreSQL toolkit.

Advertisement

Backup and Restore PostgreSQL with Few Easy Shell Scripts

PostgreSQL is the most popular Open source database and there is a lot of information available when it comes to backing up and restoring PgSQL I have used these scripts to backup production databases and restored them to new Postgres Servers. So here it goes

Backup PostgreSQL Database – Backup_Pgsql.sh

#!/bin/bash
hostname=`hostname`
# Dump DBs
  date=`date +"%Y%m%d_%H%M%N"`
  backupdir='/home/opc'
  dbname='demo'
  filename="$backupdir/${hostname}_${dbname}_${date}"
 pg_dump -U postgres --encoding utf8 -F c -f $filename.dump $dbname

Restore PostgreSQL Database – Restore_Pgsql.sh

#!/bin/bash
# Restore DB
filename='/home/opc/pgimportmaster-demo-20211129_1013.dump'
  pg_restore -U postgres -d demo -c < ./$1
exit 0

Usage for Restore

$ ./Restore_Pgsql.sh pgimportmaster-demo-20211129_1013.dump

How to Create an Immutable Account Ledger on Amazon QLDB

Amazon QLDB is a new specialist database technology which uses the concept of blockchain for database which maintains a history of every transaction using an immutable & cryptographically verifiable block hash for each record. QLDB uses an immutable transactional log, known as a journal, for data storage. The journal tracks every change to your data and maintains a complete and verifiable history of changes over time.

Sourcing data from the journal, QLDB uses a cryptographic hash function (SHA-256) with a Merkle tree–based model to generate a secure output file of your ledger’s full hash chain. This output file is known as a digest and acts as a fingerprint of your data’s entire change history as of a point in time. It enables you to look back and validate the integrity of your data revisions relative to that fingerprint.

Each ledge in QLDB has exactly one journal. A journal can have multiple strands, which are partitions of the journal.

QLDB is not a de-centralized blockchain network like Bitcoin or Ethereum, if you are looking for a managed block chain network then this is what you must be after –> Amazon Managed Blockchain

/* QLDB is now Generally Available in many regions. Ensure you update your awscli to the latest version to have access to QLDB from awscli. */

$ aws –version
aws-cli/1.16.240 Python/3.7.4 Darwin/17.7.0 botocore/1.12.230

$aws qldb help
NAME
       qldb –

DESCRIPTION
       The control plane for Amazon QLDB

AVAILABLE COMMANDS
       o create-ledger

       o delete-ledger

       o describe-journal-s3-export

       o describe-ledger

       o export-journal-to-s3

       o get-block

       o get-digest

       o get-revision

       o help

Head over to your AWS Console > Amazon QLDB

  1. Create New Ledger called ‘ctransactions’


2. Go to Query Editor > Create Tables for the Account Ledger

CREATE TABLE Cardholder

CREATE TABLE Log_Record

CREATE TABLE Account



3. Create Indexes to help speed up queries against each table

CREATE INDEX ON Cardholder (CARD_NO)

CREATE INDEX ON Log_Record (TRXN_ID)

CREATE INDEX ON Account (ACCT_ID)



4. Insert 1 Record into the Ledger table ‘Account’. QLDB documents are stored in Amazon Ion format which is a Superset of JSON


INSERT INTO Account
 {
‘FirstName’ : ‘Shadab’,
‘LastName’ : ‘Mohammad’,
‘ACCT_ID’ : ‘AMZN10000001’,
‘ACCTYPE’ : ‘Current’,
‘Address’ : ‘2000, 2 Park Street, CBD’
}

documentId
   “7oOLmXPyHjXBSBCio76rJo”

This is the document ID for this transaction
   

5. Create more records and change the first record from ‘Current’ to ‘Saving’ account

Important :
Because QLDB does not enforce schema, you can insert the same document into a table
multiple times. Each insert statement commits a separate document entry to the journal, and
QLDB assigns each document a unique ID.

INSERT INTO Account
 << {
‘FirstName’ : ‘Shadab’,
‘LastName’ : ‘Mohammad’,
‘ACCT_ID’ : ‘AMZN10000001’,
‘ACCTYPE’ : ‘Saving’,
‘Address’ : ‘2000, 2 Park Street, CBD’
},
 {
‘FirstName’ : ‘William’,
‘LastName’ : ‘Tell’,
‘ACCT_ID’ : ‘AMZN10000002’,
‘ACCTYPE’ : ‘Saving’,
‘Address’ : ‘2013, 29 Campbell PDE, Bondi’
},
 {
‘FirstName’ : ‘Chris’,
‘LastName’ : ‘Wall’,
‘ACCT_ID’ : ‘AMZN10000003’,
‘ACCTYPE’ : ‘Current’,
‘Address’ : ‘2142, 15 Lane Cove, Marickville’
} >>

documentId
    “L7S9f2cONhz1JQhj8Gj8bk”
    “0007IXW5xSNDvq9gj3CZN8”
    “JELcBIyjv8uKRHBfhbCvHp”


6. Now let us check for all the changes done to this account using below query. We can check for all modifications made to the Account ID. Each modification or inser generated a unique hash we can be traced back to what that modification was

SELECT
    ACCT_ID FROM Account AS a BY ACCT_ID WHERE a.ACCT_ID = 'AMZN10000001'


ACCT_ID
    “7oOLmXPyHjXBSBCio76rJo”
    “L7S9f2cONhz1JQhj8Gj8bk”
    
As you can see there are 2 modifications to this record and we can verify what each of those modifications are
    

7. Check for the history for each new documentID for the Account
    

SELECT * FROM history(Account) AS h WHERE h.metadata.id
    = '7oOLmXPyHjXBSBCio76rJo'


{
  blockAddress: {
    strandId: “KQrxUDo7jDX4scpbKkhEVZ”,
    sequenceNo: 26
  },
  hash: {{6HVuwag6GMEki+ZRFaMdqidROpKZmsTsYcmWiAMZdmw=}},
  data: {
    FirstName: “Shadab”,
    LastName: “Mohammad”,
    ACCT_ID: “AMZN10000001”,
    ACCTYPE: “Current”,
    Address: “2000, 2 Park Street, CBD”
  },
  metadata: {
    id: “7oOLmXPyHjXBSBCio76rJo”,
    version: 0,
    txTime: 2019-09-19T05:56:20.874Z,
    txId: “2EOkrgLG2Xa7qBwEwnKzDJ”
  }
}

SELECT * FROM
    history(Account) AS h WHERE h.metadata.id = 'L7S9f2cONhz1JQhj8Gj8bk'


{
  blockAddress: {
    strandId: “KQrxUDo7jDX4scpbKkhEVZ”,
    sequenceNo: 41
  },
  hash: {{+qFvFfRx1pi9QOkotc9MAuAWs6Tw7jn3sZqlPpEjSDA=}},
  data: {
    FirstName: “Shadab”,
    LastName: “Mohammad”,
    ACCT_ID: “AMZN10000001”,
    ACCTYPE: “Saving”,
    Address: “2000, 2 Park Street, CBD”
  },
  metadata: {
    id: “L7S9f2cONhz1JQhj8Gj8bk”,
    version: 0,
    txTime: 2019-09-19T06:14:04.214Z,
    txId: “2EOkrgMRCo4CBTUdJWSjmT”
  }
}

Using the first hash we can see the account type was “Current”, and after we modified the document and checked it with the 2nd hash it now shows “Saving”. And we have a history of all modifications applied to that account to traverse back and check all modifications made to that Account ID.

8. To Verify a Document Revision

SELECT r.metadata.id, r.blockAddress FROM _ql_committed_Account AS r WHERE r.data.ACCT_ID = 'AMZN10000001'

id                                                   blockAddress
“7oOLmXPyHjXBSBCio76rJo”    {strandId:”KQrxUDo7jDX4scpbKkhEVZ”,sequenceNo:95}

“L7S9f2cONhz1JQhj8Gj8bk”        {strandId:”KQrxUDo7jDX4scpbKkhEVZ”,sequenceNo:95}


This is just the beginning of what can be possible to create in the FinTech world. No more complex audit reports and expensive 3rd part tools which are an additional overhead to manage, but your database itself keeps a history of any changes made to a transaction or account or debit/credit card.

References:

[1] Accessing Amazon QLDB Using the AWS CLI – https://docs.aws.amazon.com/qldb/latest/developerguide/Tools.CLI.html
[2] botocore – https://github.com/boto/botocore/commit/f646b7ea334c9dcf94af6ad201a45886c48c2648
[3] aws-cli – https://github.com/aws/aws-cli/commit/b3c7bd5a1b93d42ab1b75d99c33fc2b62b0dd6a4
[4] Installing the AWS CLI – Installing the AWS CLI Using pip – https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-install.html#install-tool-pip