A Comprehensive Guide to Running PostgreSQL on Docker : One Database, Many Personalities

July 9, 2026July 9, 2026 ~ Shadab Mohammad ~ Leave a comment

PostgreSQL is much more than a conventional relational database. With the right extensions, the same core engine can become a vector store, a time-series platform, a geospatial database, an analytical engine, an extension laboratory, or the data layer behind an AI agent.

In this hands-on lab, I run several PostgreSQL personalities side by side with Docker, then connect one of them to a secured PostgreSQL Model Context Protocol (MCP) server. The aim is not to declare one image “best.” It is to demonstrate just how broad the PostgreSQL ecosystem has become, and to show the Docker details that make a multi-image lab reliable.

Lab, not production blueprint: these examples prioritize learning and isolation. Before production use, add backups, monitoring, resource limits, TLS, a secrets manager, tested upgrades, and a deliberately designed high-availability architecture.

What we are building

Container	Image	Capability	Host port	Persistent volume
`postgres18_server`	`postgres:18`	Vanilla PostgreSQL baseline	5432	`pg18_vanilla_data`
`postgres_pgvector18`	`pgvector/pgvector:0.8.4-pg18-trixie`	Vector similarity search	5433	`pgvector18_data`
`postgres_timescale18_node1`	`timescale/timescaledb-ha:pg18`	Time-series plus vector extensions	5434	`timescale18_node1_data`
`postgres_timescale18_node2`	`timescale/timescaledb-ha:pg18`	Second independent TimescaleDB instance	5435	`timescale18_node2_data`
`postgres_pglayers18`	`ghcr.io/pglayers/pglayers-full:18`	Large extension catalogue	5436	`pglayers18_data`
`postgres_ai_exts17`	`postgresai/extended-postgres:17-0.7.0`	PostgresAI/DBLab extension set	5437	`postgresai17_data`
`postgres-mcp`	`postgres-mcp-server:latest`	Agent-safe database discovery and diagnostics	8899	None

Every database receives a unique host port and a unique volume. Inside the Docker network, however, every PostgreSQL container still listens on its normal container port, 5432.

Prerequisites and safety

Docker Desktop, or Docker Engine plus the Compose plugin on Linux.
Enough disk space for six independent database clusters.
At least 8 GB of RAM if several extension-heavy images will run together; stop containers you are not actively testing.
psql, pgAdmin, or DBeaver if you want to connect from the host.
Node.js 18 or newer, including npm/npx, on the computer where Claude Desktop runs the mcp-remote bridge.
Node.js 20 or newer if you rebuild or test the PostgreSQL MCP server source outside Docker.

All passwords and tokens below are placeholders. Generate new values for your own lab. Never paste a real MCP bearer token into a blog post, source repository, screenshot, or shared configuration file.

Set reusable lab variables (Add it to .bash_profile)

			
# Choose the bootstrap administrator used by the standard PostgreSQL images.
export POSTGRES_ADMIN_USER='my_admin_user'
# Replace this value before running the lab. Use a long, unique password.
export POSTGRES_ADMIN_PASSWORD='replace-with-a-long-random-password'
# This is the application database created during first initialization.
export POSTGRES_DATABASE='my_database'
# Wait up to two minutes for a container to accept PostgreSQL connections.
# Arguments: container name, database role, and database name.
wait_for_postgres() {
  container_name="$1"
  role_name="$2"
  database_name="$3"
  for attempt in $(seq 1 60); do
    if docker exec "$container_name" \
      pg_isready -U "$role_name" -d "$database_name" >/dev/null 2>&1; then
      return 0
    fi
    sleep 2
  done
  docker logs --tail 100 "$container_name"
  return 1
}
# Confirm that Docker is available before creating anything.
docker version
docker ps

		

Environment variables are convenient for a lab, but they remain visible to processes in the shell and can appear in container metadata. Use Docker secrets or your platform’s secret manager for production.

The Docker foundation that prevents most PostgreSQL problems

Create one private network and one volume per server

			
# Create a user-defined bridge network if it does not already exist.
docker network inspect postgres-lab >/dev/null 2>&1 || \
docker network create postgres-lab
# Create independent persistent storage for each PostgreSQL personality.
docker volume create pg18_vanilla_data
docker volume create pgvector18_data
docker volume create timescale18_node1_data
docker volume create timescale18_node2_data
docker volume create pglayers18_data
docker volume create postgresai17_data

		

Never share PGDATA. Mounting one data directory into two PostgreSQL containers can corrupt the cluster. A different image or major version does not make the files interchangeable.

Understand the three image-specific storage paths

Image family	Correct container mount target	Why
PostgreSQL 18, pgvector PG18, pglayers PG18	`/var/lib/postgresql`	PostgreSQL 18 stores the cluster below a version-specific subdirectory such as `/var/lib/postgresql/18/docker`.
TimescaleDB HA PG18	`/home/postgres/pgdata`	The HA image defines `PGDATA=/home/postgres/pgdata/data`. Mounting the parent preserves the version’s complete packaged data area.
PostgresAI Extended PostgreSQL 17	`/var/lib/postgresql/data`	This is the image’s declared PGDATA/VOLUME. The lab uses a fresh named volume with `volume-nocopy` so initialization begins in an empty target.

Common `docker run` parameters

Parameter	Purpose
`--detach`	Runs the container in the background and prints its container ID.
`--name NAME`	Assigns a stable name used by `docker exec`, logs, health checks, and Docker DNS.
`--network postgres-lab`	Places the container on the private user-defined network. Other containers can reach it by name.
`--publish 127.0.0.1:H:C`	Maps host port `H` to container port `C`, but only on host loopback. Use an SSH tunnel for remote access.
`--volume VOLUME:PATH`	Persists database files outside the writable container layer.
`--mount type=volume,source=V,target=PATH,volume-nocopy`	Uses the explicit mount syntax and prevents Docker from pre-populating a new volume with files already present at the image path.
`--env NAME=value`	Supplies initialization or runtime settings. `POSTGRES_*` initialization settings only apply when PGDATA is empty.
`--shm-size=1g`	Raises shared memory above Docker’s small default, useful for parallel queries and index builds.
`--health-cmd`	Defines the command Docker uses to test database readiness.
`--health-interval`	Controls how often Docker runs the health check.
`--health-timeout`	Limits how long one health check may run.
`--health-retries`	Sets how many consecutive failures make the container unhealthy.
`IMAGE`	Selects the exact PostgreSQL distribution and tag.
`postgres -c name=value`	Overrides the image command and passes a startup-only PostgreSQL setting directly to the server.

Related command-line flags and shell syntax

Flag or syntax	Purpose
`docker network inspect NAME`	Checks whether a named network already exists and displays its metadata.
`docker network create NAME`	Creates a user-defined network with built-in container-name DNS.
`docker volume create NAME`	Creates a Docker-managed persistent volume.
`docker exec --interactive`	Keeps standard input open so `psql` can read a heredoc or accept input.
`docker exec --tty`	Allocates a terminal for a human-driven interactive `psql` session.
`docker logs --follow`	Streams new log records until you press Ctrl+C.
`docker logs --tail N`	Shows only the newest `N` log lines.
`docker logs --since 2m`	Shows logs produced during the last two minutes.
`docker inspect --format TEMPLATE`	Extracts a selected value, such as health status, from Docker metadata.
`docker update --restart=unless-stopped`	Adds a restart policy to a verified container without recreating it.
`docker restart NAME`	Stops and starts an existing container, preserving its command, environment, mounts, and published ports.
`docker manifest inspect --verbose`	Displays the platforms and detailed manifest data published for an image tag.
`docker build --tag NAME .`	Builds the Dockerfile in the current directory and assigns the resulting image a name and tag.
`docker compose config`	Resolves variables and validates the Compose model before deployment.
`docker compose up --detach`	Creates or reconciles the Compose services and leaves them running in the background.
`psql -h HOST`	Selects the database host; omitting it normally uses a local Unix socket.
`psql -p PORT`	Selects the PostgreSQL TCP port.
`psql -U ROLE`	Selects the PostgreSQL login role.
`psql -d DATABASE`	Selects the database to connect to.
`psql -W`	Forces a password prompt before connecting.
`psql -c SQL`	Runs one SQL command and exits.
`psql -v ON_ERROR_STOP=1`	Makes scripted `psql` stop immediately when any statement fails.
`ssh -N`	Creates forwarding only and does not run a remote shell command.
`ssh -L LPORT:HOST:RPORT`	Forwards a local port through SSH to a host and port visible from the remote machine.
`ssh -i KEY`	Selects the private key used to authenticate to the remote host.
`openssl rand -hex N`	Generates `N` random bytes and encodes them as twice as many hexadecimal characters.
`[ -z "${VAR:-}" ]`	Tests safely whether a shell variable is unset or empty.
`${VAR:?MESSAGE}`	Stops the current command with `MESSAGE` when a required variable is unset or empty.
`chmod 600 FILE`	Allows only the file owner to read or write a secret-bearing configuration file on POSIX systems.
`>/dev/null 2>&1`	Suppresses both normal and error output from the idempotent network-existence check.
`COMMAND_A \|\| COMMAND_B`	Runs the second command only if the first command fails.
`wait_for_postgres CONTAINER ROLE DATABASE`	Calls the helper defined above, retrying `pg_isready` for up to two minutes and printing recent logs if startup fails.
`<<'SQL'`	Feeds a literal heredoc into `psql`; the quoted marker prevents shell expansion inside the SQL body.

I deliberately omit a restart policy during the first boot. Once a database is healthy, enable unless-stopped. This makes startup errors visible instead of hiding them inside a rapid restart loop.

Lab 1: Vanilla PostgreSQL 18—the baseline

The official PostgreSQL image is the control group for the lab: no third-party extensions, no custom process supervisor, and the standard PostgreSQL 18 data layout.

			
# Start vanilla PostgreSQL 18 on host port 5432.
# The database is reachable by other lab containers as postgres18_server:5432.
docker run --detach \
  --name postgres18_server \
  --network postgres-lab \
  --publish 127.0.0.1:5432:5432 \
  --volume pg18_vanilla_data:/var/lib/postgresql \
  --env POSTGRES_USER="$POSTGRES_ADMIN_USER" \
  --env POSTGRES_PASSWORD="$POSTGRES_ADMIN_PASSWORD" \
  --env POSTGRES_DB="$POSTGRES_DATABASE" \
  --shm-size=1g \
  --health-cmd='pg_isready -U "$POSTGRES_USER" -d "$POSTGRES_DB"' \
  --health-interval=10s \
  --health-timeout=5s \
  --health-retries=12 \
  postgres:18
# Do not continue until the server accepts connections.
wait_for_postgres postgres18_server \
  "$POSTGRES_ADMIN_USER" "$POSTGRES_DATABASE"

		

			
# Review startup logs; press Ctrl+C to leave follow mode.
docker logs --follow postgres18_server
# In another terminal, inspect Docker's health result.
docker inspect --format '{{.State.Health.Status}}' postgres18_server
# Connect from inside the container; no host port is required here.
docker exec --interactive --tty postgres18_server \
  psql -U "$POSTGRES_ADMIN_USER" -d "$POSTGRES_DATABASE"
# After the first healthy boot, enable automatic restart after host reboots.
docker update --restart=unless-stopped postgres18_server

		

Docker Compose alternative

Use this instead of the preceding docker run, not at the same time. Save it as compose.yaml.

			
# Compose specification for the vanilla PostgreSQL service.
services:
  db:
    image: postgres:18
    container_name: postgres18_server
    networks:
      - postgres-lab
    ports:
      - "127.0.0.1:5432:5432"
    environment:
      POSTGRES_USER: ${POSTGRES_ADMIN_USER}
      POSTGRES_PASSWORD: ${POSTGRES_ADMIN_PASSWORD}
      POSTGRES_DB: ${POSTGRES_DATABASE}
    shm_size: 1gb
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U $$POSTGRES_USER -d $$POSTGRES_DB"]
      interval: 10s
      timeout: 5s
      retries: 12
    volumes:
      - pg18_vanilla_data:/var/lib/postgresql
# Reuse the network created earlier.
networks:
  postgres-lab:
    external: true
# Reuse the volume created earlier instead of making a project-prefixed volume.
volumes:
  pg18_vanilla_data:
    external: true

		

Compose key	Purpose
`services` / `db`	Defines the application services and gives this PostgreSQL service its Compose-local name.
`image`	Selects the container image and tag.
`container_name`	Assigns the same stable Docker name used by the `docker run` example.
`networks`	Attaches the service to `postgres-lab`.
`ports`	Publishes host-loopback port 5432 to container port 5432.
`environment`	Passes the three shell variables into the container. Compose resolves the single-dollar expressions.
`shm_size`	Allocates 1 GB for the container’s `/dev/shm`.
`healthcheck.test`	Runs `pg_isready` through a container shell. Double dollar signs defer variable expansion to that shell.
`interval`, `timeout`, `retries`	Set the health-check cadence, per-check limit, and failure threshold.
`volumes`	Mounts the persistent volume at the PostgreSQL 18 parent data directory.
`external: true`	Tells Compose to reuse the pre-created network and volume rather than make project-prefixed replacements.

			
# Validate the Compose file, then start it in detached mode.
docker compose config
docker compose up --detach

After the service is healthy, add restart: unless-stopped beneath container_name, then apply the edited Compose model:

			
# Reconcile the service after adding the verified restart policy.
docker compose up --detach

Lab 2: pgvector—PostgreSQL as a vector database

The pgvector image extends the official PostgreSQL image with the vector data type, exact distance operations, and approximate indexes such as HNSW and IVFFlat. The pinned tag below provides pgvector 0.8.4 on PostgreSQL 18 and Debian Trixie.

			
# Start an independent pgvector cluster on host port 5433.
docker run --detach \
  --name postgres_pgvector18 \
  --network postgres-lab \
  --publish 127.0.0.1:5433:5432 \
  --volume pgvector18_data:/var/lib/postgresql \
  --env POSTGRES_USER="$POSTGRES_ADMIN_USER" \
  --env POSTGRES_PASSWORD="$POSTGRES_ADMIN_PASSWORD" \
  --env POSTGRES_DB="$POSTGRES_DATABASE" \
  --shm-size=1g \
  --health-cmd='pg_isready -U "$POSTGRES_USER" -d "$POSTGRES_DB"' \
  --health-interval=10s \
  --health-timeout=5s \
  --health-retries=12 \
  pgvector/pgvector:0.8.4-pg18-trixie
# Wait for initialization before running extension SQL.
wait_for_postgres postgres_pgvector18 \
  "$POSTGRES_ADMIN_USER" "$POSTGRES_DATABASE"

		

			
# Create and exercise the extension in my_database. (Run as a single block command)
docker exec --interactive postgres_pgvector18 \
  psql -v ON_ERROR_STOP=1 \
  -U "$POSTGRES_ADMIN_USER" -d "$POSTGRES_DATABASE" <<'SQL'
-- Extensions are installed per database, not once per server.
CREATE EXTENSION IF NOT EXISTS vector;
-- Create a tiny three-dimensional vector table.
CREATE TABLE IF NOT EXISTS vector_demo (
  id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  label text NOT NULL UNIQUE,
  embedding vector(3) NOT NULL
);
-- Insert the sample embeddings once; reruns update the existing labels.
INSERT INTO vector_demo (label, embedding)
VALUES
  ('alpha', '[1,0,0]'),
  ('beta',  '[0,1,0]'),
  ('gamma', '[0.8,0.2,0]')
ON CONFLICT (label) DO UPDATE
SET embedding = EXCLUDED.embedding;
-- The <-> operator returns Euclidean/L2 distance; smaller is closer.
SELECT label, embedding <-> '[1,0,0]' AS distance
FROM vector_demo
ORDER BY distance
LIMIT 3;
SQL
# Enable restart only after the health check succeeds.
docker update --restart=unless-stopped postgres_pgvector18

		

Lab 3: TimescaleDB—time-series and high-performance vectors

The TimescaleDB HA image combines PostgreSQL with TimescaleDB and other packaged extensions. It uses PGDATA=/home/postgres/pgdata/data, so the named volume is mounted at its parent, /home/postgres/pgdata, rather than the official image’s /var/lib/postgresql path.

Two containers do not automatically form a highly available cluster. The commands below create two independent instances for comparison and failover experiments. Patroni, a distributed configuration store, replication, and a routing layer require separate configuration.

			
# Start independent TimescaleDB instance 1 on host port 5434.
docker run --detach \
  --name postgres_timescale18_node1 \
  --network postgres-lab \
  --publish 127.0.0.1:5434:5432 \
  --volume timescale18_node1_data:/home/postgres/pgdata \
  --env POSTGRES_USER=postgres \
  --env POSTGRES_PASSWORD="$POSTGRES_ADMIN_PASSWORD" \
  --env POSTGRES_DB="$POSTGRES_DATABASE" \
  --shm-size=1g \
  --health-cmd='pg_isready -U "$POSTGRES_USER" -d "$POSTGRES_DB"' \
  --health-interval=10s \
  --health-timeout=5s \
  --health-retries=12 \
  timescale/timescaledb-ha:pg18
# Start independent TimescaleDB instance 2 with a different port and volume.
docker run --detach \
  --name postgres_timescale18_node2 \
  --network postgres-lab \
  --publish 127.0.0.1:5435:5432 \
  --volume timescale18_node2_data:/home/postgres/pgdata \
  --env POSTGRES_USER=postgres \
  --env POSTGRES_PASSWORD="$POSTGRES_ADMIN_PASSWORD" \
  --env POSTGRES_DB="$POSTGRES_DATABASE" \
  --shm-size=1g \
  --health-cmd='pg_isready -U "$POSTGRES_USER" -d "$POSTGRES_DB"' \
  --health-interval=10s \
  --health-timeout=5s \
  --health-retries=12 \
  timescale/timescaledb-ha:pg18
# Wait for both independent instances before executing SQL or enabling restarts.
wait_for_postgres postgres_timescale18_node1 postgres "$POSTGRES_DATABASE"
wait_for_postgres postgres_timescale18_node2 postgres "$POSTGRES_DATABASE"

		

			
# Verify and enable the TimescaleDB and pgvectorscale extensions on node 1.
docker exec --interactive postgres_timescale18_node1 \
  psql -v ON_ERROR_STOP=1 \
  -U postgres -d "$POSTGRES_DATABASE" <<'SQL'
-- Confirm that the required extension packages are available.
SELECT name, default_version, installed_version
FROM pg_available_extensions
WHERE name IN ('timescaledb', 'vector', 'vectorscale')
ORDER BY name;
-- Enable TimescaleDB in this database.
CREATE EXTENSION IF NOT EXISTS timescaledb;
-- CASCADE also enables pgvector when vectorscale requires it.
CREATE EXTENSION IF NOT EXISTS vectorscale CASCADE;
-- Create a simple time-series table.
CREATE TABLE IF NOT EXISTS sensor_readings (
  observed_at timestamptz NOT NULL,
  sensor_id text NOT NULL,
  temperature_c double precision NOT NULL
);
-- Convert the table to a TimescaleDB hypertable.
SELECT create_hypertable(
  'sensor_readings',
  by_range('observed_at'),
  if_not_exists => TRUE
);
SQL
# Enable restart policies after both instances report healthy.
docker update --restart=unless-stopped postgres_timescale18_node1
docker update --restart=unless-stopped postgres_timescale18_node2

		

Lab 4: pglayers Full—an extension-rich PostgreSQL 18

pglayers publishes PostgreSQL extensions as composable image layers and also provides a full profile containing more than fifty extensions. Although the project documents multi-architecture layer support, the live pglayers-full:18 tag resolved to Linux AMD64 only when this article was reviewed. Check the manifest again before using it on ARM64.

This image preloads many libraries that register background workers. PostgreSQL’s default worker limit is eight, while the pglayers test suite uses 64. The full profile also configures components around the canonical postgres role, so this lab intentionally keeps POSTGRES_USER=postgres. Because this walkthrough does not initialize DocumentDB, its internal PostgreSQL background worker is disabled to suppress the missing-role warning. This setting does not disable the separately preloaded MongoDB wire-gateway library.

			
# Start pglayers Full on host port 5436.
# max_worker_processes=64 prevents the extension workers exhausting the default pool.
# The DocumentDB worker is disabled until that extension is deliberately installed.
docker run --detach \
  --name postgres_pglayers18 \
  --network postgres-lab \
  --publish 127.0.0.1:5436:5432 \
  --volume pglayers18_data:/var/lib/postgresql \
  --env POSTGRES_USER=postgres \
  --env POSTGRES_PASSWORD="$POSTGRES_ADMIN_PASSWORD" \
  --env POSTGRES_DB="$POSTGRES_DATABASE" \
  --shm-size=1g \
  --health-cmd='pg_isready -U "$POSTGRES_USER" -d "$POSTGRES_DB"' \
  --health-interval=10s \
  --health-timeout=5s \
  --health-retries=12 \
  ghcr.io/pglayers/pglayers-full:18 \
  postgres \
    -c max_worker_processes=64 \
    -c documentdb.enableBackgroundWorker=off
# Extension-rich images can take longer to initialize.
wait_for_postgres postgres_pglayers18 postgres "$POSTGRES_DATABASE"

		

			
# Check the live image architecture and database health.
docker manifest inspect --verbose \
  ghcr.io/pglayers/pglayers-full:18
docker inspect --format '{{.State.Health.Status}}' postgres_pglayers18
# Run the inspection SQL as one fail-fast script.
docker exec --interactive postgres_pglayers18 \
  psql -v ON_ERROR_STOP=1 \
  -U postgres -d "$POSTGRES_DATABASE" <<'SQL'
-- Confirm the expanded worker pool.
SHOW max_worker_processes;
-- Count and inspect the extension packages available in this image.
SELECT count(*) AS available_extensions
FROM pg_available_extensions;
SELECT name, default_version, installed_version
FROM pg_available_extensions
ORDER BY name;
SQL

		

			
# After verification, enable the restart policy from the shell.
docker update --restart=unless-stopped postgres_pglayers18

The full image makes extensions available; it does not mean every extension should be created in every database. Some extensions have background workers, database-role requirements, or mutual conflicts. Enable only what your experiment needs. To test DocumentDB, stop and recreate this container against the same named volume without the disabling -c option; command arguments cannot be changed by a simple restart. Then follow the project’s documented DocumentDB installation sequence in the configured postgres database.

Lab 5: PostgresAI Extended PostgreSQL 17

The postgresai/extended-postgres image is primarily designed for PostgresAI Database Lab workflows. Its default startup script expects an existing cluster and deliberately keeps the container alive if PostgreSQL stops. For a fresh standalone lab, appending postgres activates the inherited official initialization path. Mounting a brand-new named volume at the image’s declared PGDATA path with volume-nocopy guarantees an empty initialization target.

			
# Start the AMD64 PostgresAI PostgreSQL 17 image on host port 5437.
# Mount the image's declared PGDATA and prevent Docker from copying image-layer files.
# The final "postgres" argument is essential for first-run initialization.
docker run --detach \
  --name postgres_ai_exts17 \
  --network postgres-lab \
  --publish 127.0.0.1:5437:5432 \
  --mount type=volume,source=postgresai17_data,target=/var/lib/postgresql/data,volume-nocopy \
  --env POSTGRES_USER="$POSTGRES_ADMIN_USER" \
  --env POSTGRES_PASSWORD="$POSTGRES_ADMIN_PASSWORD" \
  --env POSTGRES_DB="$POSTGRES_DATABASE" \
  --shm-size=1g \
  --health-cmd='pg_isready -U "$POSTGRES_USER" -d "$POSTGRES_DB"' \
  --health-interval=10s \
  --health-timeout=5s \
  --health-retries=12 \
  postgresai/extended-postgres:17-0.7.0 \
  postgres
# Wait for the inherited PostgreSQL entrypoint to finish initialization.
wait_for_postgres postgres_ai_exts17 \
  "$POSTGRES_ADMIN_USER" "$POSTGRES_DATABASE"

		

			
# Inspect initialization before trying to use psql.
docker logs --tail 100 postgres_ai_exts17
docker inspect --format '{{.State.Health.Status}}' postgres_ai_exts17
# List a few useful extensions supplied by the image.
docker exec --interactive postgres_ai_exts17 \
  psql -v ON_ERROR_STOP=1 \
  -U "$POSTGRES_ADMIN_USER" -d "$POSTGRES_DATABASE" <<'SQL'
-- See whether selected extension packages are available.
SELECT name, default_version, installed_version
FROM pg_available_extensions
WHERE name IN ('vector', 'hypopg', 'pg_stat_statements', 'timescaledb')
ORDER BY name;
-- Enable only the extensions needed by this database.
CREATE EXTENSION IF NOT EXISTS vector;
CREATE EXTENSION IF NOT EXISTS hypopg;
SQL
# Enable automatic restart after a successful first boot.
docker update --restart=unless-stopped postgres_ai_exts17

		

If initdb reports that PGDATA “exists but is not empty,” do not delete files until you know what they are. Stop the container, inspect the volume, and use a new empty volume for a disposable lab. PostgreSQL will not initialize over unrelated or partial files.

Comparing the PostgreSQL personalities

Stack	Best suited to	Initialization	Architecture note	Main caution
Official PostgreSQL 18	Baseline relational and JSON workloads	Automatic on empty volume	Multi-architecture	Add extensions yourself
pgvector PG18	Embeddings and similarity search	`CREATE EXTENSION vector`	AMD64 and ARM64 tags available	One extension-focused image, not an AI platform by itself
TimescaleDB HA PG18	Time-series, telemetry, PostGIS, and vectorscale	Create/verify extensions per database	AMD64 and ARM64	Two containers are not automatically HA
pglayers Full PG18	Discovering and testing a wide extension catalogue	Create selected extensions per database	Current full tag: AMD64; verify the live manifest	Many preloaded workers; keep the `postgres` role and raise worker slots
PostgresAI Extended PG17	Database Lab and advanced extension experiments	Override the command with `postgres` for a fresh standalone cluster	Published tag is AMD64	Default startup assumes existing PGDATA

Connect with psql, pgAdmin, or DBeaver

From inside a database container, use the container’s normal port 5432—or omit the port entirely. From the host, use the mapped port from the lab table.

			
# From inside the vanilla container: local socket, no host port mapping involved.
docker exec --interactive --tty postgres18_server \
  psql -U "$POSTGRES_ADMIN_USER" -d "$POSTGRES_DATABASE"
# From the Docker host: connect to the vanilla instance on host port 5432.
psql -h 127.0.0.1 -p 5432 \
  -U "$POSTGRES_ADMIN_USER" -d "$POSTGRES_DATABASE" -W
# From the Docker host: connect to pgvector on its unique host port 5433.
psql -h 127.0.0.1 -p 5433 \
  -U "$POSTGRES_ADMIN_USER" -d "$POSTGRES_DATABASE" -W
# From the Docker host: connect to pglayers on host port 5436.
psql -h 127.0.0.1 -p 5436 \
  -U postgres -d "$POSTGRES_DATABASE" -W

		

For pgAdmin or DBeaver, use host 127.0.0.1, the mapped host port, the configured database, and the matching user. On an EC2 host, keep Docker bound to loopback and use an SSH tunnel instead of opening every database port to the internet.

			
# Forward local laptop port 5432 securely to the EC2 host's loopback port 5432.
ssh -N \
  -L 5432:127.0.0.1:5432 \
  -i /absolute/path/to/key.pem \
  ec2-user@YOUR_EC2_HOST

		

Add the PostgreSQL MCP server

The Postgres MCP Server exposes schema discovery, object inspection, bounded SQL execution, query-plan diagnostics, index recommendations, workload analysis, database monitoring, and optional Prometheus metrics. One MCP process connects to one PostgreSQL database URI. For simultaneous targets, each additional MCP instance needs its own Docker container name, host port, database role, Claude configuration key and URL, plus an allowed-origin entry matching that URL.

Build the MCP image

			
# Clone the MCP server and enter its repository before building.
git clone https://github.com/shadabshaukat/postgres-mcp-server.git
cd postgres-mcp-server
# Option A: build an unchanged checkout from its tracked build output.
docker build --tag postgres-mcp-server:latest .

		

If you modify the TypeScript source, use the following validation-and-build path instead of the final build command above.

			
# Option B: install the locked dependencies, validate the source, and rebuild.
# If you edit TypeScript source, rebuild and test before rebuilding the image.
# These commands require Node.js 20 or newer on the host.
npm ci
npm run check
npm run test:unit
npm run build
docker build --tag postgres-mcp-server:latest .

		

Create a least-privilege database role

MCP_DB_MODE=restricted adds application-level safeguards, but PostgreSQL privileges remain the real security boundary. Do not connect the MCP service as a superuser.

			
# Generate a fresh URL-safe database password and keep it in this private shell.
# Hexadecimal output contains no URI delimiter characters.
export MCP_DB_PASSWORD="$(openssl rand -hex 24)"
# Stop before creating the role if OpenSSL failed.
: "${MCP_DB_PASSWORD:?OpenSSL did not generate an MCP database password}"
# Create the MCP role and grants as one fail-fast script.
# psql safely quotes the password and database-name variables in the SQL below.
docker exec --interactive postgres18_server \
  psql -v ON_ERROR_STOP=1 \
  -v mcp_password="$MCP_DB_PASSWORD" \
  -v target_db="$POSTGRES_DATABASE" \
  -U "$POSTGRES_ADMIN_USER" -d "$POSTGRES_DATABASE" <<'SQL'
-- Create a login dedicated to MCP read access.
CREATE ROLE mcp_reader
  WITH LOGIN
  PASSWORD :'mcp_password';
-- Allow the role to connect to this database and inspect the public schema.
GRANT CONNECT ON DATABASE :"target_db" TO mcp_reader;
GRANT USAGE ON SCHEMA public TO mcp_reader;
-- Grant read access to current tables.
GRANT SELECT ON ALL TABLES IN SCHEMA public TO mcp_reader;
-- Grant read access to future tables created by this administrator.
ALTER DEFAULT PRIVILEGES IN SCHEMA public
  GRANT SELECT ON TABLES TO mcp_reader;
SQL

		

Roles are local to a PostgreSQL cluster. Before pointing MCP at pgvector, TimescaleDB, pglayers, or PostgresAI, repeat the dedicated-role and grant step in that target cluster with its administrator and database name. Do not merely change the hostname in the URI.

Optional: grant deeper MCP observability

The least-privilege role above can inspect schemas and selected data, but some workload and monitoring tools will return partial results. The vanilla image does not preload pg_stat_statements, and ordinary roles cannot see every session’s query text. If that wider visibility is acceptable in your lab, enable it explicitly:

			
# Configure the bundled statistics library; this setting needs a restart.
docker exec --interactive postgres18_server \
  psql -v ON_ERROR_STOP=1 \
  -U "$POSTGRES_ADMIN_USER" -d "$POSTGRES_DATABASE" <<'SQL'
ALTER SYSTEM SET shared_preload_libraries = 'pg_stat_statements';
SQL
# Restart so PostgreSQL can preload the library.
docker restart postgres18_server
# Wait until PostgreSQL accepts connections before running the next SQL script.
wait_for_postgres postgres18_server \
  "$POSTGRES_ADMIN_USER" "$POSTGRES_DATABASE"
# Create the extension and grant the broad built-in monitoring role.
docker exec --interactive postgres18_server \
  psql -v ON_ERROR_STOP=1 \
  -U "$POSTGRES_ADMIN_USER" -d "$POSTGRES_DATABASE" <<'SQL'
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
GRANT pg_monitor TO mcp_reader;
SQL

		

pg_monitor exposes cluster-wide monitoring information, so grant it only after reviewing that visibility. HypoPG is not packaged in the vanilla image; MCP can still recommend indexes there, but hypothetical-index validation remains unavailable unless you choose an image that supplies HypoPG.

Run the MCP server on the private Docker network

			
# Generate a fresh 64-character bearer token for this MCP instance.
# Keep it out of shell history, recordings, screenshots, and source control.
export POSTGRES_MCP_TOKEN="$(openssl rand -hex 32)"
# Stop immediately if OpenSSL failed and left the token empty.
: "${POSTGRES_MCP_TOKEN:?OpenSSL did not generate an MCP token}"
# Start a restricted, bearer-authenticated Streamable HTTP MCP endpoint.
# Docker DNS resolves postgres18_server directly on the private network.
docker run --detach \
  --name postgres-mcp \
  --network postgres-lab \
  --publish 127.0.0.1:8899:8899 \
  --read-only \
  --tmpfs /tmp \
  --security-opt no-new-privileges:true \
  --env "DATABASE_URI=postgresql://mcp_reader:${MCP_DB_PASSWORD:?MCP_DB_PASSWORD is not set}@postgres18_server:5432/${POSTGRES_DATABASE:?POSTGRES_DATABASE is not set}?sslmode=disable" \
  --env PGSSLMODE=disable \
  --env MCP_TRANSPORT=http \
  --env MCP_HTTP_HOST=0.0.0.0 \
  --env MCP_HTTP_PORT=8899 \
  --env MCP_HTTP_PATH=/mcp \
  --env MCP_DB_MODE=restricted \
  --env "MCP_AUTH_TOKEN=${POSTGRES_MCP_TOKEN:?POSTGRES_MCP_TOKEN is not set}" \
  --env 'MCP_ALLOWED_HOSTS=localhost,127.0.0.1' \
  --env 'MCP_ALLOWED_ORIGINS=http://localhost:8899,http://127.0.0.1:8899' \
  postgres-mcp-server:latest

		

Claude needs the same bearer token. While this private shell or SSH session is still open, transfer it directly into your password manager or secure clipboard. If you must display it, do so once in a private, non-recorded terminal and clear the terminal scrollback afterward:

			
# Display the token only in a private terminal so it can be copied to Claude.
printf '%s\n' "$POSTGRES_MCP_TOKEN"

MCP parameter reference

Parameter	Purpose
`--network postgres-lab`	Lets the MCP container reach the selected database by container name and internal port 5432.
`--publish 127.0.0.1:8899:8899`	Exposes MCP only on host loopback. It is not directly reachable from the network.
`--read-only`	Makes the MCP container filesystem read-only.
`--tmpfs /tmp`	Provides a temporary writable in-memory directory required by some runtime operations.
`--security-opt no-new-privileges:true`	Prevents processes from gaining additional Linux privileges.
`DATABASE_URI`	Selects exactly one PostgreSQL target. Use container DNS and port 5432 on the shared network.
`PGSSLMODE=disable`	Disables TLS only for this trusted, private container network. Use certificate verification for remote databases.
`MCP_TRANSPORT=http`	Enables Streamable HTTP. The legacy value `sse` is only an alias; legacy SSE endpoints require a separate opt-in.
`MCP_HTTP_HOST=0.0.0.0`	Listens on all interfaces inside the container. The host-side publish remains safely bound to 127.0.0.1.
`MCP_HTTP_PORT=8899`	Sets the HTTP listener port inside the container.
`MCP_HTTP_PATH=/mcp`	Sets the Streamable HTTP MCP endpoint path.
`MCP_DB_MODE=restricted`	Enables read-oriented SQL inspection, read-only transactions, row limits, and timeouts.
`MCP_AUTH_TOKEN`	Sets the static Bearer token. The server requires at least 16 characters.
`MCP_ALLOWED_HOSTS`	Restricts accepted HTTP Host values when the internal listener is non-loopback.
`MCP_ALLOWED_ORIGINS`	Restricts browser-style Origin values when an Origin header is present.

			
# Confirm the MCP process, database connection, and readiness endpoint.
docker logs postgres-mcp
curl http://127.0.0.1:8899/healthz
curl http://127.0.0.1:8899/readyz
# Enable automatic restart only after readiness succeeds.
docker update --restart=unless-stopped postgres-mcp

		

To point MCP at another lab server, first create mcp_reader and its grants in that cluster, then recreate the MCP container with that target’s hostname, database, and generated password in DATABASE_URI. The private-network port remains 5432. A simultaneous second MCP target also needs a unique --name, a different host-side published port, a matching allowed origin, and a distinct client configuration entry.

Configure Claude Desktop

Claude Desktop starts the community mcp-remote bridge as a local process and forwards it to the loopback-only Streamable HTTP endpoint. The example pins version 0.1.38, verified when this article was reviewed, instead of downloading an unspecified future release. Replace the placeholder with the token generated above; the value must include the Bearer prefix.

			
{
  "mcpServers": {
    "postgres": {
      "command": "npx",
      "args": [
        "-y",
        "mcp-remote@0.1.38",
        "http://127.0.0.1:8899/mcp",
        "--allow-http",
        "--transport",
        "http-only",
        "--header",
        "Authorization:${AUTH_HEADER}"
      ],
      "env": {
        "AUTH_HEADER": "Bearer REPLACE_WITH_THE_GENERATED_TOKEN"
      }
    }
  }
}

		

-y allows npx to install/run the pinned bridge without an interactive confirmation.
--allow-http is acceptable here only because the endpoint is local loopback.
--transport http-only selects Streamable HTTP.
--header adds the required Authorization header.
If Claude cannot locate npx, replace it with the absolute path returned by command -v npx.

Restrict the Claude configuration file to your account where the operating system supports POSIX permissions—for example, chmod 600 "$HOME/Library/Application Support/Claude/claude_desktop_config.json" on macOS. The token is still plaintext in that file, so do not share it. Completely quit and reopen Claude Desktop after changing the configuration. A cloud-hosted connector cannot reach 127.0.0.1 on your computer; this is a local Claude Desktop configuration.

MCP on EC2: tunnel it instead of publishing it

			
# Run this on the laptop that hosts Claude Desktop.
# It maps laptop port 8899 to the EC2 host's loopback-only MCP endpoint.
ssh -N \
  -L 8899:127.0.0.1:8899 \
  -i /absolute/path/to/key.pem \
  ec2-user@YOUR_EC2_HOST

		

Claude still connects to http://127.0.0.1:8899/mcp. Do not open port 8899 in the EC2 security group merely to make the demo reachable.

Troubleshooting playbook

Symptom	Likely cause	Fix
`port is already allocated`	Two containers publish the same host port.	Use the unique host-port map in this article. Container port 5432 remains unchanged.
`initdb: directory exists but is not empty`	The mounted PGDATA contains files, perhaps from an earlier or incorrect mount.	Inspect it first. For a disposable lab, create a new empty volume.
Missing `.s.PGSQL.5432` socket	PostgreSQL did not finish starting.	Run `docker logs --tail 200 CONTAINER`; do not treat a running container as proof of a running database.
Host port is listening but PostgreSQL refuses connections	Docker published the port even though the database process failed.	Check container health, logs, and `pg_isready`.
pglayers repeatedly says to increase `max_worker_processes`	The full profile exhausted PostgreSQL’s default worker pool.	Start it with `postgres -c max_worker_processes=64`.
pglayers reports `role "postgres" does not exist`	A custom `POSTGRES_USER` replaced the canonical bootstrap role while bundled workers expect `postgres`.	For a fresh full-profile lab, initialize with `POSTGRES_USER=postgres`.
DocumentDB worker role warning	The DocumentDB library is preloaded but its extension has not created the required role.	Create DocumentDB in its configured database or disable that background worker if DocumentDB is not part of the test.
PostgresAI container is up but PostgreSQL is not	The default DBLab-oriented script expects initialized PGDATA.	Use an empty PG17 volume and append `postgres` to the image command.
Changed `POSTGRES_USER` or `POSTGRES_DB` has no effect	The volume already contains an initialized cluster.	Change roles/databases with SQL, or initialize a new empty volume.
MCP cannot reach PostgreSQL	The URI uses `localhost` inside the MCP container.	Use the shared Docker network and the PostgreSQL container name.
MCP returns HTTP 401	The bearer token is missing, stale, or lacks the `Bearer` prefix.	Use the same generated token in the container and client header.
Claude rejects its configuration	Malformed JSON, missing closing brace, or unavailable `npx`.	Validate the JSON and use an absolute `npx` path if necessary.

Production hardening checklist

Pin immutable image tags or digests and test upgrades before deployment.
Use a secret manager rather than plaintext environment variables.
Bind database and MCP ports to private interfaces; prefer SSH tunnels, private networks, or VPN access.
Use TLS with hostname and certificate verification for remote PostgreSQL endpoints.
Give MCP a dedicated least-privilege PostgreSQL role; keep restricted mode enabled.
Do not enable EXPLAIN ANALYZE or unrestricted MCP mode without understanding that queries or writes will execute.
Add tested backups, restore drills, monitoring, WAL management, disk alerts, and capacity limits.
Do not call two standalone TimescaleDB containers “HA” until replication, leader election, routing, and failover have been configured and tested.
Prefer a deliberately composed extension image over an everything-enabled bundle for production.

Conclusion

This lab demonstrates why PostgreSQL earns the “Swiss Army knife” description. The core database remains familiar, while extensions change the workload it can address: pgvector adds similarity search, TimescaleDB adds time-series behavior, pglayers turns extension discovery into a composable workflow, PostgresAI packages a broad Database Lab toolset, and MCP makes PostgreSQL safely inspectable by AI clients.

The real lesson is not only PostgreSQL’s flexibility. It is that Docker isolation matters: unique ports, unique volumes, image-correct PGDATA paths, explicit health checks, canonical roles where an image expects them, and least-privilege connections. Get those foundations right and the PostgreSQL ecosystem becomes an unusually capable platform for experimentation.

Primary references

I build an Enterprise grade MCP Server for Postgres

May 28, 2026 ~ Shadab Mohammad ~ Leave a comment

Github link for MCP Server : https://github.com/shadabshaukat/postgres-mcp-server/

Real-time Data Replication from Amazon RDS to Oracle Autonomous Database using OCI GoldenGate

March 23, 2022March 23, 2022 ~ Shadab Mohammad ~ Leave a comment

Article first appeared here

Introduction

Goldengate Microservices 21c is the latest version of the microservices architecture which makes creating data mesh and data fabric across different public clouds as easy as a few clicks. Goldengate is available on OCI as a fully managed service with auto-scaling. It does not.require installation of Goldengate software at either the source or Target db instances. Goldengate uses a capture and apply mechanism for replication using trail files. Both the extract (capture) and replicat (apply) processes run on the Goldengate replication instance which acts as a hub.

Let us go ahead and create a data pipeline for replicating Data in real-time using Oracle Cloud Infrastructure (OCI) Goldengate 21c from Amazon RDS Oracle Instance to an Oracle Autonomous database in OCI. Below are some of the common use cases for this solution :

Use Cases

Cross-cloud replication of Oracle Database from AWS RDS to OCI
Migration of Oracle Database with Zero Downtime from AWS RDS to OCI
Creating Multi-Cloud Microservices Application with Oracle database as the persistent data store
Creating a Multi-cloud Data Mesh for Oracle Database

Architecture

Source : Amazon RDS Oracle 19c EE

Target : OCI Autonomous Transaction Processing 19c

Replication Hub : OCI Goldengate 21c Microservices

Network : Site-to Site IPsec VPN or Fastconnect (Direct Connect on AWS)

The solution is broadly divided into four phases :

Setup of RDS Instance and Preparing Source for Goldengate replication
Setup of OCI Autonomous Database and Preparing Target for Goldengate Replication
Deployment of OCI Goldengate and Creation of Deployment and Register Source and Target Databases
Create Extract (Capture) and Replicate (Apply) process on OCI Goldengate

Phase 1 — AWS Setup : RDS Source and Enable Goldengate Capture

The first part of the setup requires us to provision a VPC, Subnet Group and Oracle 19c RDS Instance on AWS. Please ensure all the requistie Network constructs like security groups are in place for connectivity from OCI Goldengate to RDS. In a production scenario it would be betetr to have the RDS instance without a public endpoint and have a Fastconnect setup from AWS to OCI

Create a VPC and RDS Subnet Group

2. Create RDS Oracle Instance 19.1 EE with super user as ‘admin’

3. Create a new DB Parameter Group for 19.1 EE with parameter ENABLE_GOLDENGATE_REPLICATION set to TRUE

4. Change the parameter group of the RDS instance and reboot the RDS Oracle instance once the parameter group has been applied. Double-check to confirm the parameter ENABLE_GOLDENGATE_REPLICATION is set to True and the correct parameter group is applied to the RDS isntance

5. Set the log retention period on the source DB with ‘admin’ user

exec rdsadmin.rdsadmin_util.set_configuration('archivelog retention hours',24);commit;

6. Create a new user account to be used for Goldengateon the RDS instance with ‘admin’ user

CREATE TABLESPACE administrator;

CREATE USER oggadm1 IDENTIFIED BY “*********” DEFAULT TABLESPACE ADMINISTRATOR TEMPORARY TABLESPACE TEMP;

commit;

7. Grant account privileges on the source RDS instance with ‘admin’ user

GRANT CREATE SESSION, ALTER SESSION TO oggadm1;

GRANT RESOURCE TO oggadm1;

GRANT SELECT ANY DICTIONARY TO oggadm1;

GRANT FLASHBACK ANY TABLE TO oggadm1;

GRANT SELECT ANY TABLE TO oggadm1;

GRANT SELECT_CATALOG_ROLE TO admin WITH ADMIN OPTION;

exec rdsadmin.rdsadmin_util.grant_sys_object (‘DBA_CLUSTERS’, ‘OGGADM1’);

exec rdsadmin.rdsadmin_util.grant_sys_object (‘DBA_CLUSTERS’, ‘ADMIN’);

GRANT EXECUTE ON DBMS_FLASHBACK TO oggadm1;

GRANT SELECT ON SYS.V_$DATABASE TO oggadm1;

GRANT ALTER ANY TABLE TO oggadm1;

grant unlimited tablespace TO oggadm1;

grant EXECUTE_CATALOG_ROLE to admin WITH ADMIN OPTION;

commit;

8. Finally, grant the privileges needed by a user account to be a GoldenGate administrator. The package that you use to perform the grant, dbms_goldengate_auth or rdsadmin_dbms_goldengate_auth, depends on the Oracle DB engine version.

— With admin user on RDS Oracle instance for Oracle Database version lower than 12.2 —

exec dbms_goldengate_auth.grant_admin_privilege (grantee=>’OGGADM1′,privilege_type=>’capture’,grant_select_privileges=>true, do_grants=>TRUE);

exec dbms_goldengate_auth.grant_admin_privilege(‘OGGADM1′,container=>’all’);

exec dbms_goldengate_auth.grant_admin_privilege(‘OGGADM1’);

commit;

— For Oracle DB versions that are later than or equal to Oracle Database 12c Release 2 (12.2), which requires patch level 12.2.0.1.ru-2019–04.rur-2019–04.r1 or later, run the following PL/SQL program.

exec rdsadmin.rdsadmin_dbms_goldengate_auth.grant_admin_privilege (grantee=>’OGGADM1′, privilege_type=>’capture’,grant_select_privileges=>true, do_grants=>TRUE);

commit;

To revoke privileges, use the procedure revoke_admin_privilege in the same package.

9. TNS entry for AWS RDS Instance

OGGTARGET=(DESCRIPTION=(ENABLE=BROKEN)(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=orcl.*****.ap-southeast-2.rds.amazonaws.com)(PORT=1521)))(CONNECT_DATA=(SID=ORCL)))– To be added to Registered Database in OCI –(DESCRIPTION=(ENABLE=BROKEN)(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=orcl.****.ap-southeast-2.rds.amazonaws.com)(PORT=1521)))(CONNECT_DATA=(SID=ORCL)))

Alias (to be used later in OCI GG configuration) : ORCLAWS

10. Create Test Table in RDS Oracle Instance

CREATE TABLE oggadm1.test (id number,name varchar2(100));

insert into oggadm1.test values (1,’Shadab’);

insert into oggadm1.test values (2,’Mohammad’);

commit;

11. Enable supplemental logging on with Admin user

Ref :https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Appendix.Oracle.CommonDBATasks.Log.html#Appendix.Oracle.CommonDBATasks.SettingForceLogging

— Enable Force logging —

EXEC rdsadmin.rdsadmin_util.force_logging(p_enable => true);

— Enable Supplemental logging —

begin rdsadmin.rdsadmin_util.alter_supplemental_logging(p_action => ‘ADD’);

end;

— Enable Force logging —

EXEC rdsadmin.rdsadmin_util.force_logging(p_enable => true);

— Enable Supplemental logging —

begin rdsadmin.rdsadmin_util.alter_supplemental_logging(p_action => ‘ADD’);

end;

— Enable Force logging —

EXEC rdsadmin.rdsadmin_util.force_logging(p_enable => true);

— Enable Supplemental logging —

begin rdsadmin.rdsadmin_util.alter_supplemental_logging(p_action => ‘ADD’);

end;

Phase 2 — OCI Setup : Autonomous Database

We will provision the VCN, Autonomous Database on OCI and enable the goldengate replication user

Create VCN

2. Create Autonomous Transaction Processing Database with Network Options and mTLS not required

3. Unlock ggadmin user in the ATP

                           alter user ggadmin identified by ****** account unlock;

4. Create Table ‘test’ in admin schema and do initial load (Normally this has to be done using data pump but it is beyond the scope of this article)

CREATE TABLE test (id number,name varchar2(100));

insert into test values (1,’Shadab’);

insert into test values (2,’Mohammad’);

commit;

select * from test;

Phase 3 — OCI Setup : Goldengate

Go to OCI Console Go to Oracle Database > Goldengate > Deployments > Create Deployment

2. Go to Oracle Database > Goldengate > Registered Databases

a. Add the ATP database created above with the ggadmin user

b. Add the RDS instance database using oggadm1 user

3. Test the connectivity to both databases , it should in console as Active

4. Go the launch URL for the Goldengate deployment username and password as per step 1.

                         eg : https://e*******q.deployment.goldengate.ap-sydney-1.oci.oraclecloud.com/

Phase 4 — Create , Extract (Capture) and Replicat (Apply) and Start the Replication

1. Create an Integrated Extract from Administration Service, click on the plus symbol next to the extract section

Go to Main Page > Configuration > Login to AWS RDS instance

a. Create Checkpoint table oggadm1.ckpt

b. Add Tran Data for Schema oggadm1

EXTRACT AWSEXT

USERIDALIAS ORCLAWS DOMAIN OracleGoldenGate

EXTTRAIL AW

TABLE OGGADM1.*;

2. Create Non-integrated replicat for ADB on trail file ‘aw’. click on the plus symbol next to the Replicat section

Go to Main Page > Configuration > Login to ATP instance

a. Create Checkpoint table admin.ckpt

b. Add Tran Data for Schema admin

c. Add heartbeat table

REPLICAT adbrep

USERIDALIAS FundsInsight DOMAIN OracleGoldenGate

MAP OGGADM1.TEST, TARGET ADMIN.TEST;

The status should be green on the OCI Goldengate Administration Dashboard

3. Insert transaction at RDS source

                            insert into oggadm1.test values(3,'Utuhengal');commit;

4. Check at ADB Target

                            select * from test;

Conclusion:

We have created cross-cloud replication from an Oracle Database sitting inside AWS to an Oracle Autonomous Database running on OCI. The idea was to demonstrate the capability and ease of Goldengate Microservices to run a a replication hub on OCI and let you create real-time change data capture across two different public clouds. Every component used in this architecture is a fully managed service without the need of managing any servers or installing any agents on either source or target as they are fully managed cloud services without access to under-lying host.

References:

Setup of Goldengate for RDS : https://jinyuwang.weebly.com/cloud-service/how-to-capture-data-from-oracle-database-on-aws-rds-with-oracle-goldengate
Goldengate Setup for RDS Source :https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Appendix.OracleGoldenGate.html#Appendix.OracleGoldenGate.rds-source-ec2-hub
RDS Common Tasks :https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Appendix.Oracle.CommonDBATasks.Log.html
OCI Goldengate Database Registration : https://docs.oracle.com/en/cloud/paas/goldengate-service/using/database-registrations.html#GUID-899B90FF-DF9A-481D-A531-BB9D25005EB9
Apex Livelab for OCI Goldengate Microservices 21c :https://apexapps.oracle.com/pls/apex/dbpm/r/livelabs/workshop-attendee-2?p210_workshop_id=797&p210_type=3&session=113817274271778
OCI Goldengate Blog : https://blogs.oracle.com/dataintegration/post/new-oci-goldengate-service-is-first-of-any-major-cloud-provider-to-deliver-operational-and-analytic-integration-into-a-single-data-fabric
Getting Started with Goldengate : https://docs.oracle.com/goldengate/c1230/gg-winux/GGCON/getting-started-oracle-goldengate.htm#GGCON-GUID-61088509-F951-4737-AE06-29DAEAD01C0C

Backup and Restore PostgreSQL with Few Easy Shell Scripts

November 30, 2021November 30, 2021 ~ Shadab Mohammad ~ Leave a comment

PostgreSQL is the most popular Open source database and there is a lot of information available when it comes to backing up and restoring PgSQL I have used these scripts to backup production databases and restored them to new Postgres Servers. So here it goes

Backup PostgreSQL Database – Backup_Pgsql.sh

#!/bin/bash
hostname=`hostname`
# Dump DBs
  date=`date +"%Y%m%d_%H%M%N"`
  backupdir='/home/opc'
  dbname='demo'
  filename="$backupdir/${hostname}_${dbname}_${date}"
 pg_dump -U postgres --encoding utf8 -F c -f $filename.dump $dbname

Restore PostgreSQL Database – Restore_Pgsql.sh

#!/bin/bash
# Restore DB
filename='/home/opc/pgimportmaster-demo-20211129_1013.dump'
  pg_restore -U postgres -d demo -c < ./$1
exit 0

Usage for Restore

$ ./Restore_Pgsql.sh pgimportmaster-demo-20211129_1013.dump

Build and store a Hive mestastore outside an EMR cluster in a RDS MySQL database and Connect a Redshift cluster to an EMR cluster

March 7, 2020 ~ Shadab Mohammad ~ Leave a comment

This document addresses the specific configuration points that needs to be in place in order to build and store a Hive mestastore outside an EMR cluster in a RDS MySQL database. It also covers the steps to connect a Redshift cluster to an EMR cluster so Redshift can create and access the tables stored within the external metastore.

Resources Used:

• Redshift Cluster

• RDS MySQL Instance

• EMR Cluster

Note: All resources must be in same VPC and same region for this practice.

Creating the RDS MySQL:

1 – First, start creating a RDS MySQL instance if you don’t have one already. Open AWS RDS Console and create an MySQL instance that will be used during this practice.

Note: Please make note of RDS security group, endpoint, Master User and Master Password. We will need that information later on.

2 – Once the RDS MySQL instance is created, modify its security groups to add a rule for All traffic on all Port Range to be allowed from the VPC’s default security group.

Note: This VPC’s default Security Group will be used while creating the EMR cluster later on as well but it needs to be whitelisted beforehand otherwise the EMR launching will fail while trying to reach out to the RDS MySQL.

Before creating the EMR Cluster:

3 – After creating the RDS MySQL (and open its security group to EMR) but right before creating the EMR cluster, a JSON configuration file needs to be created. This file will be ingested by EMR during the bootstrapping phase of EMR’s creation, it will basically tell EMR how to access the remote RDS MySQL database.

4 – Copy the JSON property structure from the following link (use Copy icon): h ttps://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-hive-metastore-external.html

5 – Paste it in a text editor and modify it carefully with the RDS details you noted earlier.

Note: Be careful, the value property can not contain any spaces or carriage returns. It should appear all on one line. Save it as “hiveConfiguration.json”.

6 – The final JSON configuration file should look like the following:

[
    {
      “Classification”: “hive-site”,
      “Properties”: {

“javax.jdo.option.ConnectionURL”: “jdbc:mysql:\/\/database-1.cefjr3enh3dk.us-east-2.rds.amazonaws.com:3306\/hive?createDatabaseIfNotExist=true“,

“javax.jdo.option.ConnectionDriverName”:”org.mariadb.jdbc.Driver”,
“javax.jdo.option.ConnectionUserName”:”admin“,

        “javax.jdo.option.ConnectionPassword”: “*********“
      }
    }
]

Note 1: replace <hostname>, <username>, <password> with your own details:

Note 2: The part “hive?createDatabaseIfNotExist=true” determines the name of the database to be created in the MySQL RDS, in this case the database will be called “hive”.

7 – After creating above file, upload it to an S3 bucket/folder of your choice (in the same region of your resources).

Creating the EMR:

8 – Now, it is time to create the EMR cluster. To do this, open AWS EMR console and click Create Cluster button. This will prompt the Quick Options page but we won’t be using that. Click on Go to advanced options on the top of the page.

9 – This will send you to the Advanced Options page. There, under Software Configuration, select the following Applications:

Hadoop, Ganglia, Hive, Hue, Tez, Pig, Mahout

10 – In the same page, under Edit Software Settings section, click Load JSON from S3 and select the S3 bucket/path where you uploaded the previous created file “hiveConfiguration.json“. Select the file there and hit Select.

11 – In the Hardware Configuration page, make sure that the EMR cluster is in the same VPC as your MySQL RDS instance. Hit Next if you don’t want to change any Network configuration or Node types.

12 – Hit Next in the General Options page if you don’t want to change anything, although you might want to change the name of your EMR cluster here.

13 – In the next page, Security Options, make sure you have an EC2 Key Pair in that region and select it. Otherwise, create one!

Note: Create one now (if you don’t have one) before creating the EMR as you CAN’T add it later!!!

14 – Still in the Security Options page, expand the EC2 security groups panel and change both, Master and Core & Task instances to use the VPC’s default security group (the same whitelisted in the RDS MySQL security group earlier).

15 – Hit Create cluster and wait the EMR to be created. It will take some time…

Confirming that the metastore was created in the RDS MySQL

16 – Once the EMR is created, another rule needs to be added to the VPC’s default security group, one that allows SSHing into the EMR cluster on port 22 from your local IP. It should look like the following:

17 – With the right rules in place, try to connect to your EMR cluster from your local machine:

– – – chmod 600 article_key.pem
– ssh -i article_key.pem hadoop@ec2-18-XX-XX-XX.us-east-2.compute.amazonaws.com

18 – EMR has a MySQL client installed, use this client to connect to your MySQL database and perform few tests such as if the Security Groups are working properly and if the “hive” database was created properly

Note: You can do a telnet test from within EMR box as well to test Security Group access.

19 – To connect to the RDS MySQL, run the following command from your EMR box:

mysql -h <rds-endpoint> -P 3306 -u <rds master user> -p <rds master password>

Example: mysql -h database-1.cefjr3enh3dk.us-east-2.rds.amazonaws.com -P 3306 -u admin123 -pPwD12345

20 – Once connected, use the following commands to verify if the Hive metastore was indeed created in the RDS. You should be able to see a database named “hive” there:

show databases;       à Lists all databases – “hive” should be there
use hive;             à Connects you to “hive” database
show tables;          à Lists all the meta tables within hive database
select * from TBLS;   à Lists all tables created in hive. At this point there’s none

Setting up necessary Spectrum Roles and Network requirements for Redshift and EMR

Note 1: Following steps assume that you already have a Redshift cluster and that you can connect to it. It will not guide you on how to create and access the Redshift cluster.

Note 2: Since EMR, RDS MySQL share the same VPC’s default security group, they should be able to communicate to each other already. If that’s the case, you can skip Step 22 and go straight to Step 23, otherwise, If EMR and Redshift use different security groups, please do the step 22 first.

21 – Create a Role for Spectrum and attach it to your Redshift cluster. Follow the instructions here:

• To Create the Role: https://docs.aws.amazon.com/redshift/latest/dg/c-getting-started-using-spectrum-create-role.html

• To Associate the Role: https://docs.aws.amazon.com/redshift/latest/dg/c-getting-started-using-spectrum-add-role.html

22 – (Optional) Now that Redshift can access S3, Redshift also needs to access EMR cluster and vice-versa. Follow the steps listed under section “Enabling Your Amazon Redshift Cluster to Access Your Amazon EMR Cluster” in the following link: https://docs.aws.amazon.com/redshift/latest/dg/c-spectrum-external-schemas.html#c-spectrum-enabling-emr-access

Note: In summary, this creates an EC2 security group with Redshift’s Security Group and the EMR’s master node’s security groups inside it. Redshift’s Security Group must allow TCP in every port (0 – 65535) while EMR’s Security Group must allow TCP in port 9083 (Hive’s default). Next, you attach this newly created security group to both of your Redshift and EMR clusters.

23 – Once this is done, you should now be able to create the External Schema in Redshift, query the external tables from Redshift and also be able to create/see the schemas/tables from EMR Hive as well. However, at this point there’s no tables created yet.

Creating Tables on Hive First

24 – Log to Hive console and run the following:

> show databases;
default (that’s the only database so far)

> create external table hive_table (col1 int, col2 string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘|’
location ‘s3://<your_bucket>/<your_folder>/‘;

> show tables;
hive_table (that’s the table we just created)

25 – Log back to your MySQL database and run the following commands:

Note: Now you will be able to see the newly created table “hive_table” showing on your External MySQL catalog.

Creating Schemas and Tables on Redshift Now

26 – On Redshift side, an External Schema must be created first before creating or querying the Hive tables, like following:

CREATE EXTERNAL SCHEMA emr_play                     à It can be any name, that’s a schema valid only for Redshift.
FROM HIVE METASTORE DATABASE ‘default’              à Use default database to match the database we have in Hive.
URI ‘172.XXX.XXX.XXX‘ PORT 9083                     à EMR’s Private IP of the Master Instance. Hive’s default port is 9083.
IAM_ROLE ‘arn:aws:iam::000000000000:role/spectrum‘; à A valid Spectrum Role attached Redshift.

27 – Create the table(s):

create external table emr_play.redshift_table (col1 int, col2 varchar)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘|’
location ‘s3://<your_bucket>/<your_folder>/‘;

28 – Simply query the table now:

select * from emr_play.redshift_table;

29 – One more time, log back to your MySQL database and run the following commands again:

Note: You should be able to see the both Hive and Redshift tables now showing on your External MySQL catalog. You can also query the tables and create new tables on both Hive and Redshift side.

Python Script to Create a Data Pipeline Loading Data From RDS Aurora MySQL To Redshift

April 27, 2019April 27, 2019 ~ Shadab Mohammad ~ 2 Comments

In this tutorial we will create a Python script which will build a data pipeline to load data from Aurora MySQL RDS to an S3 bucket and copy that data to a Redshift cluster.

One of the assumptions is you have basic understanding of AWS, RDS, MySQL, S3, Python and Redshift. Even if you don’t it’s alright I will explain briefly about each of them to the non-cloud DBA’s

AWS- Amazon Web Services. It is the cloud infrastructure platform from Amazon which can be used to build and host anything from a static website to a globally scalable service like Netflix

RDS – Relational Database Service or RDS or short is Amazons managed relational database service for databases like it’s own Aurora, MySQL, Postgres, Oracle and SQL Server

S3- Simple Storage Service is AWS’s distributed storage which can scale almost infinitely. Data in S3 is stored in Buckets. Think of buckets as Directories but DNS name compliant and cloud hosted

Python – A programming language which is now the defacto standard for data science and engineering

Redshift- AWS’s Petabyte scale Data warehouse which is binary compatible to PostgreSQL but uses a columnar storage engine

The source in this tutorial is a RDS Aurora MySQL database and target is a Redshift cluster. The data is staged in an S3 bucket. With Aurora MySQL you can unload data directly to a S3 bucket but in my script I will offload the table to a local filesystem and then copy it to the S3 bucket. This will give you flexibility in-case you are not using Aurora but a standard MySQL or Maria DB

Environment:

Python 3.7.2 with pip
Ec2 instance with the Python 3.7 installed along with all the Python packages
Source DB- RDS Aurora MySQL 5.6 compatible
Destination DB – Redshift Cluster
Database : Dev , Table : employee in both databases which will be used for the data transfer
S3 bucket for staging the data
AWS Python SDK Boto3

Make sure both the RDS Aurora MySQL and Redshift cluster has security groups which have have IP of the Ec2 instance for inbound connections (Host and Port)

Create the table ’employee’ in both the Aurora and Redshift Clusters

Aurora MySQL 5.6

CREATE TABLE `employee` (
  `id` int(11) NOT NULL,
  `first_name` varchar(45) DEFAULT NULL,
  `last_name` varchar(45) DEFAULT NULL,
  `phone_number` varchar(45) DEFAULT NULL,
  `address` varchar(200) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

Redshift

DROP TABLE IF EXISTS employee CASCADE;

CREATE TABLE employee
(
   id            bigint         NOT NULL,
   first_name    varchar(45),
   last_name     varchar(45),
   phone_number  bigint,
   address       varchar(200)
);

ALTER TABLE employee
   ADD CONSTRAINT employee_pkey
   PRIMARY KEY (id);

COMMIT;

2. Install Python 3.7.2 and install all the packages needed by the script

sudo /usr/local/bin/python3.7 -m pip install boto3
sudo /usr/local/bin/python3.7 -m pip install psycopg2-binary
sudo /usr/local/bin/python3.7 -m pip install pymysql
sudo /usr/local/bin/python3.7 -m pip install json
sudo /usr/local/bin/python3.7 -m pip install pymongo

3. Insert sample data into the source RDS Aurora DB

$ mysql -u awsuser -h shadmha-cls-aurora.ap-southeast-2.rds.amazonaws.com -p dev

INSERT INTO `employee` VALUES (1,'shadab','mohammad','04447910733','Randwick'),(2,'kris','joy','07761288888','Liverpool'),(3,'trish','harris','07766166166','Freshwater'),(4,'john','doe','08282828282','Newtown'),(5,'mary','jane','02535533737','St. Leonards'),(6,'sam','rockwell','06625255252','Manchester');

SELECT * FROM employee;

4. Download and Configure AWS command line interface

The AWS Python SDK boto3 requires AWS CLI for the credentials to connect to your AWS account. Also for uploading the file to S3 we need boto3 functions. Install AWS CLI on Linux and configure it.

$ aws configure
AWS Access Key ID [****************YGDA]:
AWS Secret Access Key [****************hgma]:
Default region name [ap-southeast-2]:
Default output format [json]:

5. Python Script to execute the Data Pipeline (datapipeline.py)

import boto3
import psycopg2
import pymysql
import csv
import time
import sys
import os
import datetime
from datetime import date
datetime_object = datetime.datetime.now()
print ("###### Data Pipeline from Aurora MySQL to S3 to Redshift ######")
print ("")
print ("Start TimeStamp")
print ("---------------")
print(datetime_object)
print ("")


# Connect to MySQL Aurora and Download Table as CSV File
db_opts = {
    'user': 'awsuser',
    'password': '******',
    'host': 'shadmha-cls-aurora.ap-southeast-2.rds.amazonaws.com',
    'database': 'dev'
}

db = pymysql.connect(**db_opts)
cur = db.cursor()

sql = 'SELECT * from employee'
csv_file_path = '/home/centos/my_csv_file.csv'

try:
    cur.execute(sql)
    rows = cur.fetchall()
finally:
    db.close()

# Continue only if there are rows returned.
if rows:
    # New empty list called 'result'. This will be written to a file.
    result = list()

    # The row name is the first entry for each entity in the description tuple.
    column_names = list()
    for i in cur.description:
        column_names.append(i[0])

    result.append(column_names)
    for row in rows:
        result.append(row)

    # Write result to file.
    with open(csv_file_path, 'w', newline='') as csvfile:
        csvwriter = csv.writer(csvfile, delimiter='|', quotechar='"', quoting=csv.QUOTE_MINIMAL)
        for row in result:
            csvwriter.writerow(row)
else:
    sys.exit("No rows found for query: {}".format(sql))


# Upload Generated CSV File to S3 Bucket
s3 = boto3.resource('s3')
bucket = s3.Bucket('mybucket-shadmha')
s3.Object('mybucket-shadmha', 'my_csv_file.csv').put(Body=open('/home/centos/my_csv_file.csv', 'rb'))


#Obtaining the connection to RedShift
con=psycopg2.connect(dbname= 'dev', host='redshift-cluster-1.ap-southeast-2.redshift.amazonaws.com',
port= '5439', user= 'awsuser', password= '*********')

#Copy Command as Variable
copy_command="copy employee from 's3://mybucket-shadmha/my_csv_file.csv' credentials 'aws_iam_role=arn:aws:iam::775888:role/REDSHIFT' delimiter '|' region 'ap-southeast-2' ignoreheader 1 removequotes ;"

#Opening a cursor and run copy query
cur = con.cursor()
cur.execute("truncate table employee;")
cur.execute(copy_command)
con.commit()

#Close the cursor and the connection
cur.close()
con.close()

# Remove the S3 bucket file and also the local file
DelLocalFile = 'aws s3 rm s3://mybucket-shadmha/my_csv_file.csv --quiet'
DelS3File = 'rm /home/centos/my_csv_file.csv'
os.system(DelLocalFile)
os.system(DelS3File)


datetime_object_2 = datetime.datetime.now()
print ("End TimeStamp")
print ("-------------")
print (datetime_object_2)
print ("")

6. Run the Script or Schedule in Crontab as a Job

$ python3.7 datapipeline.py

Crontab to execute Job daily at 10:30 am

30 10 * * * /usr/local/bin/python3.7 /home/centos/datapipeline.py &>> /tmp/datapipeline.log

7. Check the table in destination Redshift Cluster and all the records should be visible their

SELECT * FROM employee;

This tutorial was done using a small table and very minimum data. But with S3’s distributed nature and massive scale and Redshift as a Data warehouse you can build data pipelines for very large datasets. Redhsift being an OLAP database and Aurora OLTP, many real-life scenarios requires offloading data from your OLTP apps to data warehouses or data marts to perform Analytics on it.

AWS also has an excellent managed solution called Data Pipelines which can automate the movement and transform of Data. But many a times for developing customized solutions Python is the best tool for the job.

Enjoy this script and please let me know in your comments or on Twitter (@easyoradba) if you have any issues or what else would you like me to post for data engineering.

Ebook : Advanced Architecture of Oracle Database on AWS

April 21, 2019April 21, 2019 ~ Shadab Mohammad ~ Leave a comment

https://www.google.com.au/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&ved=2ahUKEwif_qXs3uDhAhWCbn0KHZRvAOQQFjAAegQIAxAC&url=https%3A%2F%2Fd1.awsstatic.com%2Fwhitepapers%2Faws-advanced-architectures-for-oracle-db-on-ec2.pdf&usg=AOvVaw2n_IDbWkc04JXM5whGoFLG

Move Oracle Database 12c from On-Premise to AWS RDS Oracle Instance using SQL Developer

March 17, 2019 ~ Shadab Mohammad ~ Leave a comment

Amazon Web Services has been gaining popularity in the last few years since cloud computing has been in the spotlight. Slowly the Traditional Enterprises are making the journey to the cloud. Oracle is considered one of the most mission critical application in the Enterprise. Moving Oracle Database to cloud can bring its own benefits both from an operational and financial perspective.

In this exercise we will move an on-premise Oracle DB schema to an AWS RDS Instance running Oracle 12cR1

Pre-requisites :

1. You already have a source Oracle database installed

2. You know how to provision an AWS RDS Oracle Instance

3. You have access to both instances

4. You have basic understanding of AWS S3 and AWS console

5. You have the latest version of SQL Developer installed on your machine

Source DB:

Oracle 12cR1 (12.1.0.2) running on CentOS 7.1

Destination DB:

Oracle 12cR1 running on AWS RDS Instance

High Level Steps to Migrate:

1. Create the destination Oracle 12CR1 instance on AWS. It is one of the easiest things to provision an Oracle DB on AWS RDS

2. Connect to Both Source(on-Prem) and Destination(AWS) Database from SQL Developer

3. Go to Tools > Database Copy and Select Source and Destination Databases

I prefer to do Tablespace Copy since most of the Apps i work reside in a single tablespace. But this depends on your choice. You can either chose Objects, Schemas or even entire Tablespaces to be copied across.

IMPORTANT : Make sure you have created the source schema in destination database before proceeding to next step else you will get an error “User does not exist”

In Destination AWS RDS run below commands

SQL> create user <source-schema-name> identified by <password123>;

SQL> grant dba to <source-schema-name>;

4. Start the Database Copy

5. Check from Performance Insights Console to Check whats happening in the background

6. Query the Destination Database to See if the Objects are valid and have arrived

SQL> select * from user_tables;

SQL> select * from dba_objects where status=’INVALID’;

What we are building

Prerequisites and safety

Set reusable lab variables (Add it to .bash_profile)

The Docker foundation that prevents most PostgreSQL problems

Create one private network and one volume per server

Understand the three image-specific storage paths

Common docker run parameters

Related command-line flags and shell syntax

Lab 1: Vanilla PostgreSQL 18—the baseline

Docker Compose alternative

Lab 2: pgvector—PostgreSQL as a vector database

Lab 3: TimescaleDB—time-series and high-performance vectors

Lab 4: pglayers Full—an extension-rich PostgreSQL 18

Lab 5: PostgresAI Extended PostgreSQL 17

Comparing the PostgreSQL personalities

Connect with psql, pgAdmin, or DBeaver

Add the PostgreSQL MCP server

Build the MCP image

Create a least-privilege database role

Optional: grant deeper MCP observability

Run the MCP server on the private Docker network

MCP parameter reference

Configure Claude Desktop

MCP on EC2: tunnel it instead of publishing it

Troubleshooting playbook

Production hardening checklist

Conclusion

Primary references

Introduction

Use Cases

Architecture

Phase 4 — Create , Extract (Capture) and Replicat (Apply) and Start the Replication

Conclusion:

References:

Common `docker run` parameters