Systems Engineering

Designing a Multi-Region Postgres Topology: Read Replicas, Logical Replication, and Safe Failover

Learn how to build a resilient multi-region PostgreSQL topology. Deep dive into logical replication, replication lag, geo-routing, and disaster recovery strategies.

Sachin SharmaCreator

Jun 1, 2026

5 min read

Designing a Multi-Region Postgres Topology: Read Replicas, Logical Replication, and Safe Failover

Featured Resource

Quick Overview

Learn how to build a resilient multi-region PostgreSQL topology. Deep dive into logical replication, replication lag, geo-routing, and disaster recovery strategies.

Designing a Multi-Region Postgres Topology: Read Replicas, Logical Replication, and Safe Failover

As SaaS applications expand globally, hosting your database in a single cloud region becomes a severe liability. If your application servers in Frankfurt or Singapore have to reach back to Virginia for every SQL query, database latency will ruin the user experience. Furthermore, a single region outage can cause total service downtime.

To deliver sub-50ms global performance and maximum resilience, you need a Multi-Region Database Topology.

While global caching works for static assets, transactional data requires complex replication topologies. In this guide, we'll design a production-grade Multi-Region PostgreSQL Topology using physical streaming replication, logical replication, geo-aware routing, and automated failover orchestrators.

⚡ 1. Replication Paradigms: Physical vs Logical

PostgreSQL offers two main mechanisms to synchronize data between database nodes across geographic regions:

A. Physical (Streaming) Replication

Physical replication transfers exact, byte-for-byte binary changes (Write-Ahead Logs - WAL) from a primary node to read replicas.

Pros: Extremely reliable, fast, zero configuration, and guarantees 100% database parity.
Cons: Replicas must be read-only. It is an all-or-nothing approach—you cannot replicate a single table or combine different database versions.

B. Logical Replication

Logical replication decodes Write-Ahead Logs into logical SQL operations (e.g., "Insert this row into Table X") and streams these events over the network.

Pros: Supports active-active write patterns, schema transformations, and regional data partitioning (replicating only user accounts in Europe to the Europe node).
Cons: Higher configuration complexity, potential conflict resolution scenarios, and slight CPU overhead for decoding.

For most global enterprise systems, the optimal layout is a hybrid model: a central primary region with physical replicas in satellite regions for low-latency reads, combined with logical replication channels for isolated microservices.

[Primary - US East] ──(Physical Streaming Replication)──> [Read Replica - EU West]
        │
(Logical Replication)
        ▼
[Billing Node - AP South]

🏗️ 2. Architectural Design: Geo-Aware Routing

To minimize database round-trip times, we split database queries inside our application middleware into two categories:

2.
Reads (GET API Requests): Directed straight to the nearest local physical read replica.
4.
Writes (POST/PUT/DELETE API Requests): Proxy-routed over a dedicated global network back to the central Primary region.

Here is how a Node.js Express middleware automatically routes queries using geo-aware client pooling:


javascript
import pg from 'pg';

// Configure pools for both the primary (writable) and the closest read replica
const primaryPool = new pg.Pool({
  connectionString: process.env.PRIMARY_DATABASE_URL // Virginia
});

const localReadPool = new pg.Pool({
  connectionString: process.env.LOCAL_READ_REPLICA_URL // Frankfurt/Ireland replica
});

export async function databaseQuery(sql, params, isWriteOperation = false) {
  const pool = isWriteOperation ? primaryPool : localReadPool;
  
  const startTime = performance.now();
  try {
    const result = await pool.query(sql, params);
    console.log(`📊 Query executed in \${(performance.now() - startTime).toFixed(2)}ms`);
    return result.rows;
  } catch (err) {
    console.error("❌ Database query error:", err);
    throw err;
  }
}

💻 3. Setting Up Logical Replication in Postgres

Let's configure Postgres Logical Replication to share a product inventory table between our primary US node and a billing server in Frankfurt.

On the Primary Node (Publisher)

First, adjust your Postgres configuration (postgresql.conf) to set the replication level:


ini
wal_level = logical
max_replication_slots = 10
max_wal_senders = 10

Restart the server, then create a publication for the chosen table:


sql
-- Create publication for inventory changes
CREATE PUBLICATION inventory_pub FOR TABLE products;

On the Replica Node (Subscriber)

Ensure the replica node has the table structure defined. Then, establish the subscription pointing back to the publisher's connection credentials:


sql
-- Create subscription on the replica
CREATE SUBSCRIPTION inventory_sub 
CONNECTION 'host=us-primary.sachinsharma.dev dbname=production user=replicator password=secure_password' 
PUBLICATION inventory_pub;

PostgreSQL will immediately trigger an initial copy of the data, then keep the Frankfurt replica updated in real-time as inventory changes occur in the US!

🛡️ 4. Handling Replication Lag & Read-After-Write Consistency

A classic issue in multi-region setups is read-after-write lag. If a user updates their profile (write goes to US) and is immediately redirected to a dashboard (read goes to the local EU replica), the replica might not have received the update yet due to network transit lag. The user sees their old profile, leading to support complaints.

The Solution: Version Tracking & Read Promotion

To solve this, we store a lightweight monotonic version token or a last-updated timestamp in the client's session cookies.

If the cookie indicates the user just performed a write within the last 2 seconds, our database middleware bypasses the local read replica and routes reads directly to the primary node.


javascript
export async function executeMiddleware(req, res, next) {
  const lastWriteTime = req.cookies['last_write_timestamp'];
  
  if (lastWriteTime && (Date.now() - Number(lastWriteTime) < 2000)) {
    // Force read queries to go to the primary region to prevent staleness
    req.forcePrimaryReads = true;
  } else {
    req.forcePrimaryReads = false;
  }
  next();
}

🏁 5. Disaster Recovery and Safe Failover

In a multi-region setup, hardware failures are inevitable. If your primary US node goes dark, you must trigger a safe, rapid failover:

2.
Isolate the Primary (Fencing): Completely shut down the failing primary node to prevent "Split-Brain" scenarios (where two nodes think they are both the writer, corrupting data integrity).
4.
Select the Best Replica: Find the replica with the least replication lag from the primary.
6.
Promote the Replica: Execute the Postgres promotion command:
```
bash
pg_ctl promote -D /var/lib/postgresql/data
```
8.
Re-route Traffic: Dynamically update geo-routing proxies (e.g., PgBouncer or Cloudflare Tunnel) to route write queries to the newly promoted primary node.

By incorporating geo-aware routing, logical replication channels, and session-based consistency tracking, you build a robust multi-region database topology ready to support global scale with minimal latencies.

Postgres Multi-Region Databases Logical Replication High Availability Database Design SaaS Infrastructure

Sachin Sharma

Software Developer

Building digital experiences at the intersection of design and code. Sharing weekly insights on engineering, productivity, and the future of tech.

SQLite on the Edge: Replicating Databases with LiteFS and Fly.io

A technical dive into distributed edge storage, exploring how LiteFS replicates SQLite databases across global Fly.io regions using FUSE and lease-based consensus.

Implementing Post-Quantum Cryptography in Next.js: Securing APIs against Future Decryption

Future-proof your web applications today. Learn how to secure Next.js API routes using Post-Quantum Cryptography (PQC) algorithms like ML-KEM and Kyber.