Read Replicas for Scaling

Introduction

You’re the on-call engineer at midnight. Your PostgreSQL database is screaming—CPU pegged at 95%. Your monitoring dashboard shows something interesting: 90% of all queries are SELECT statements. Only 10% are writes (INSERT, UPDATE, DELETE). Adding more CPU to your primary database might buy you time, but it won’t scale. The math is brutal: if your read load doubles next quarter, you’ll need to double your database machine again.

But here’s what you realize: you don’t need a faster database. You need more databases. Specifically, you need read replicas.

Read replicas are dedicated copies of your database that serve only read queries. Your primary database handles all writes and sends those changes downstream to the replicas. Clients send SELECT queries to the replicas instead of the primary. Suddenly, your read capacity multiplies. This is one of the most powerful scaling patterns in distributed systems, especially for read-heavy workloads.

In this chapter, we’ll explore how read replicas work, when to use them, how to set them up, and the real trade-offs you’ll face. We’re building on everything we learned about replication fundamentals—now we’re applying it strategically to solve a specific problem: read scaling.

What Are Read Replicas?

Read replicas are secondary database instances that contain an exact copy of your primary database. They’re continuously updated via binary log replication (or similar mechanisms). The critical difference from a general replica is purpose: read replicas exist specifically to serve SELECT queries, never writes.

Here’s the architecture in a sentence: your application sends all writes to the primary database, and reads to the replicas. The primary asynchronously replicates its changes to the replicas. This is called the read/write split pattern.

How Read Replicas Differ from Primary Databases

Aspect	Primary Database	Read Replica
Accepts writes	Yes	No
Accepts reads	Yes	Yes
Updates other replicas	Yes	No
Contains full dataset	Yes	Yes (initially)
Replication lag	0	Milliseconds to seconds
When to scale it	Write bottleneck	Read bottleneck

Types of Read Replicas

Same-Region Replicas: Located in the same availability zone or region as the primary. Used for immediate read scaling and failover. Lower latency, higher cost (slight premium for cross-zone traffic).

Cross-Region Replicas: Deployed in different geographic regions. Serve local reads for users in those regions, reducing latency. Essential for global applications. Replication lag may be higher due to network distance.

Specialized Replicas: Replicas with different configurations than the primary.

Analytics Replicas: May have different indexes optimized for OLAP queries rather than OLTP
Reporting Replicas: Lower isolation levels (read uncommitted), trades consistency for speed
Backup Replicas: Kept offline, only brought online for backup operations

The Restaurant Analogy

Imagine a popular restaurant with one chef and six waiters. The chef (primary database) is the single writer—they prepare all dishes. The waiters (read replicas) are the readers—they serve customers (application clients) with the prepared dishes.

Before read replicas, you had one chef doing both cooking and serving. Customers wait forever. Adding another chef is expensive and complex—now two chefs need to coordinate on which dish to make.

With read replicas, the single chef focuses on cooking. Waiters keep a copy of the current menu (database) and serve customers in parallel. If a new dish is added (write), the chef tells all waiters immediately. But if there’s a tiny delay in the restaurant’s phone system, one waiter might not know about the new dish for a few seconds—that’s replication lag.

The more waiters you add, the more customers you serve. But eventually, the chef becomes the bottleneck—they can’t keep up with all the dish requests. That’s when read replicas stop helping, and you need different strategies (partitioning, caching, write optimization).

Setting Up Read Replicas in Managed Services

Most cloud databases make read replicas trivially easy. Let’s look at the real-world process.

AWS RDS Read Replicas

With AWS RDS, creating a read replica is one API call:

aws rds create-db-instance-read-replica \
  --db-instance-identifier mydb-replica-1 \
  --source-db-instance-identifier mydb-primary \
  --db-instance-class db.t3.large \
  --storage-type gp3 \
  --multi-az \
  --region us-east-1

Key parameters:

source-db-instance-identifier: Which database to replicate from
db-instance-class: Instance type for the replica (can differ from primary)
multi-az: Deploy the replica across availability zones for high availability
region: Can be same-region (cheaper) or different-region (latency reduction)

AWS handles replication automatically using the database engine’s native binary log mechanism. For MySQL, it’s the binary log. For PostgreSQL, it’s streaming replication via WAL (Write-Ahead Log).

Promoting a Read Replica

When your primary fails or you need to shard your data, you can promote a replica to a standalone primary:

aws rds promote-read-replica \
  --db-instance-identifier mydb-replica-1

This is almost instantaneous. The promoted replica can now accept writes. Replication stops, and it becomes independent.

Application-Level Read/Write Routing

The database doesn’t know which queries should go where. Your application must decide: is this a write? Send to primary. Is this a read? Send to a replica.

Pattern 1: Middleware-Based Routing

Tools like ProxySQL or PgPool-II sit between your application and databases. They intercept SQL and route reads to replicas.

ProxySQL Configuration Example:

# Define the MySQL servers
mysql_servers:
(
  {
    hostgroup_id=0,  # Primary group
    hostname="primary.rds.amazonaws.com",
    port=3306,
    weight=1000,
    status="ONLINE"
  },
  {
    hostgroup_id=1,  # Read replica group
    hostname="replica-1.rds.amazonaws.com",
    port=3306,
    weight=1000,
    status="ONLINE"
  },
  {
    hostgroup_id=1,
    hostname="replica-2.rds.amazonaws.com",
    port=3306,
    weight=1000,
    status="ONLINE"
  }
)

# Route SELECT to read replicas, everything else to primary
mysql_query_rules:
(
  {
    rule_id=1,
    match_pattern="^SELECT",
    destination_hostgroup=1,
    apply=1
  },
  {
    rule_id=2,
    match_pattern="^[^S]",
    destination_hostgroup=0,
    apply=1
  }
)

Advantages: Transparent to application code, works with any language, automatic failover. Disadvantages: Additional network hop, connection pooling complexity, doesn’t understand transaction context.

Pattern 2: Application-Level Routing

Your code explicitly routes queries. Modern ORMs make this straightforward.

Django (Python) Example:

from django.db import connections

class ReadReplicaRouter:
    def db_for_read(self, model, **hints):
        return 'replica'  # Route reads to replica database

    def db_for_write(self, model, **hints):
        return 'default'  # Route writes to primary

    def allow_relation(self, obj1, obj2, **hints):
        return True

    def allow_migrate(self, db, app_label, model_name=None, **hints):
        return db == 'default'  # Migrations only on primary

# settings.py
DATABASES = {
    'default': {  # Primary
        'ENGINE': 'django.db.backends.postgresql',
        'NAME': 'mydb',
        'HOST': 'primary.rds.amazonaws.com',
        'USER': 'admin',
        'PASSWORD': os.environ['DB_PASSWORD'],
    },
    'replica': {  # Read replica
        'ENGINE': 'django.db.backends.postgresql',
        'NAME': 'mydb',
        'HOST': 'replica-1.rds.amazonaws.com',
        'USER': 'readonly',
        'PASSWORD': os.environ['REPLICA_PASSWORD'],
    }
}

DATABASE_ROUTERS = ['myapp.routers.ReadReplicaRouter']

# Usage in views
from django.contrib.auth.models import User

# This query goes to replica
users = User.objects.using('replica').filter(active=True)

# This query goes to primary
user = User.objects.create(username='alice', email='[email protected]')

Node.js Example with Sequelize:

const { Sequelize } = require('sequelize');

const sequelize = new Sequelize('mydb', 'admin', process.env.DB_PASSWORD, {
  replication: {
    read: [
      { host: 'replica-1.rds.amazonaws.com', username: 'readonly', password: process.env.REPLICA_PASSWORD },
      { host: 'replica-2.rds.amazonaws.com', username: 'readonly', password: process.env.REPLICA_PASSWORD },
    ],
    write: { host: 'primary.rds.amazonaws.com', username: 'admin', password: process.env.DB_PASSWORD }
  },
  dialect: 'postgres',
});

// Reads automatically go to a replica (round-robin by default)
const user = await User.findOne({ where: { id: 1 } });

// Writes go to primary
const newUser = await User.create({ username: 'bob', email: '[email protected]' });

Connection Routing Strategies

Once you have multiple replicas, how do you choose which one?

Round-Robin: Cycle through replicas. Simple, fair distribution.

Least-Connections: Route to the replica with fewest active connections. Better for variable query costs.

Latency-Based: Route to the replica with lowest response time. Best for geographic distribution.

Consistent Hashing: Route by user_id or session. Ensures the same client talks to the same replica (better for caching, warm data).

Specialized Replica Configurations

Not all replicas need to be identical to the primary.

Analytics Replicas

Your primary is optimized for OLTP: fast inserts and point lookups. An analytics replica can have different indexes.

-- On primary: Simple indexes for transaction processing
CREATE INDEX idx_users_id ON users(id);
CREATE INDEX idx_orders_created ON orders(created_at);

-- On analytics replica: Star schema, bitmap indexes, columnar format
CREATE INDEX idx_orders_user_created ON orders(user_id, created_at);
CREATE INDEX idx_orders_status_amount ON orders(status, amount);
-- Some databases (PostgreSQL 10+) support inclusion indexes
CREATE INDEX idx_orders_covering ON orders(status) INCLUDE (user_id, amount);

Replication still works—the replica receives all writes and applies them. But it maintains different metadata.

Lower Isolation for Reads

Sometimes you can trade consistency for speed. A read-only replica might use a lower isolation level:

-- On the replica
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;

-- Now reads don't block on in-flight transactions
-- You might see "dirty" data (uncommitted writes), but for analytics that's often okay

Calculating Read Replica Capacity

How many replicas do you need? Use this formula:

Number of Replicas = ceil(Total Read QPS / Single Replica Capacity)
  * Safety Factor (typically 1.5 to 2.0)

Example:

Your application generates 10,000 read queries per second
Benchmarks show each replica can handle 2,500 QPS with acceptable latency
Required replicas: ceil(10,000 / 2,500) = 4 replicas
With 1.5x safety factor (traffic spikes, failover): 4 * 1.5 = 6 replicas

Add extra capacity for:

Failover (one replica goes down, redistribute its load)
Traffic spikes (30-50% above baseline)
Planned maintenance (rolling updates)

Replication Lag and Consistency

The biggest challenge with read replicas is eventual consistency. Changes don’t appear instantly on replicas.

Sources of Lag

graph LR
    A["Write to Primary"] --> B["Primary writes to disk"]
    B --> C["Send binary log to replica"]
    C --> D["Replica applies changes"]
    D --> E["Replica writes to disk"]
    style C fill:#ff9999
    style D fill:#ff9999
    C -->|Network latency| D
    D -->|Disk I/O| E

Typical lag ranges: 10-100ms in same region, 100ms-5 seconds cross-region.

Handling Stale Reads

Problem: A user updates their profile and immediately loads the dashboard, but the read replica hasn’t caught up yet.

Solutions:

Read-After-Write Consistency: Route writes and subsequent reads to the primary for a user session.

// After a write, pin the user to primary for 5 seconds
await db.write('users', { id: user.id, name: 'Alice' });
session.primaryUntil = Date.now() + 5000;

// Subsequent reads
if (Date.now() < session.primaryUntil) {
  user = await db.read('users', { id: user.id }, { useReplica: false });
} else {
  user = await db.read('users', { id: user.id });
}

Version Vectors: Client tracks version and requests “at least this version”.

// Write returns a version
const result = await db.write('inventory', { product_id: 123, qty: 50 });
const version = result.version;

// Read with consistency requirement
const inventory = await db.read('inventory', { product_id: 123 }, {
  minVersion: version
});

Strong Consistency for Critical Reads: Some queries always hit primary.

// Cart operations are critical—don't show stale data
const cart = await db.read('carts', { user_id }, { useReplica: false });

// Product recommendations can use stale replicas
const products = await db.read('products', { category: 'electronics' });

Monitoring Replication Lag

Always monitor this:

-- MySQL
SHOW SLAVE STATUS\G
-- Look at: Seconds_Behind_Master

-- PostgreSQL
SELECT now() - pg_last_wal_receive_lsn()::timestamptz;
-- Also check replication slot lag
SELECT slot_name, restart_lsn, confirmed_flush_lsn FROM pg_replication_slots;

Set up alerts: if lag exceeds 1 second, page the on-call engineer.

Global Read Replicas

For global applications, cross-region replicas are essential.

graph TB
    subgraph "us-east-1"
        direction TB
        Primary["🟢 Primary DB<br/>us-east-1"]
        App1["App Servers<br/>us-east-1"]
        App1 -->|writes| Primary
        App1 -->|reads| Primary
    end

    subgraph "eu-west-1"
        direction TB
        Replica1["🟡 Read Replica<br/>eu-west-1"]
        App2["App Servers<br/>eu-west-1"]
        App2 -->|reads| Replica1
        App2 -->|writes| Primary
    end

    subgraph "ap-southeast-1"
        direction TB
        Replica2["🟡 Read Replica<br/>ap-southeast-1"]
        App3["App Servers<br/>ap-southeast-1"]
        App3 -->|reads| Replica2
        App3 -->|writes| Primary
    end

    Primary -->|stream replication| Replica1
    Primary -->|stream replication| Replica2
    style Primary fill:#99ff99
    style Replica1 fill:#ffff99
    style Replica2 fill:#ffff99

Benefits:

Users in EU read from EU (60ms savings vs transatlantic latency)
Reduced egress bandwidth costs (data stays in region)
Disaster recovery (if primary region fails, promote EU replica)

Challenges:

Higher replication lag (network distance)
Writes still go to primary (network round-trip for every write)
Cost multiplies (three regions = roughly 3x storage/compute)

Connection Pooling with Read Replicas

Each replica connection consumes memory. Connection pooling becomes critical.

PgBouncer per replica is a common pattern:

# /etc/pgbouncer/pgbouncer.ini
[databases]
mydb_primary = host=primary.rds.amazonaws.com port=5432 dbname=mydb
mydb_replica_1 = host=replica-1.rds.amazonaws.com port=5432 dbname=mydb
mydb_replica_2 = host=replica-2.rds.amazonaws.com port=5432 dbname=mydb

[pgbouncer]
pool_mode = transaction
max_client_conn = 10000
default_pool_size = 25
min_pool_size = 10
reserve_pool_size = 5

Your application connects to pgbouncer, not the database directly. Pgbouncer multiplexes to the actual replicas, reusing connections efficiently.

Trade-Offs: When Read Replicas Fall Short

Cost

Each replica doubles your storage costs. Cross-region replicas are expensive (egress fees, machine costs in multiple regions). If you have 10 replicas, you’ve multiplied your database bill by 10.

Complexity

You now manage multiple databases. Connection routing is harder. Failover is more complex. Monitoring replication lag adds operational burden.

Write Scaling

Read replicas don’t help with write bottlenecks. If your workload is 50% reads and 50% writes, adding replicas helps only half your problem. You need partitioning or sharding for write scaling.

Eventual Consistency Issues

For strong consistency requirements (financial transactions, inventory), read replicas may not work. You can’t reliably read your own write if the replica hasn’t caught up. Workarounds (pinning to primary) reduce the benefits.

Replication Lag Under Load

High write volume can cause replication lag to spike. If your primary commits 10,000 writes per second, replicas might lag several seconds. Analytics queries start returning stale data.

When NOT to Use Read Replicas

Write-heavy workloads (more than 30-40% writes): Replicas don’t solve the core problem
Strong consistency requirements: Medical records, financial ledgers
Very low latency requirements: Cross-region lag is too high
Small workloads: A single database is simpler and cheaper

Read Replicas vs. Caching

Read replicas and caching are complementary, not competitive.

Strategy	Use Case	Latency	Consistency	Cost
Database only	Small load, simple schema	1-10ms	Strong	Low
Read replicas	Medium-high load, low consistency tolerance	10-100ms	Eventual	Medium
Read replicas + caching	High load, hot datasets	1-5ms (cache hit)	Eventual (with invalidation)	Higher

A mature system uses both: Redis caches frequently accessed data (products, user profiles), read replicas handle the long tail of queries that aren’t cacheable.

Key Takeaways

Read replicas multiply your read capacity by distributing SELECT queries across multiple database copies while keeping a single primary for writes
Set them up in cloud-managed services (AWS RDS, Google Cloud SQL) with a single API call; use middleware (ProxySQL) or application-level routing for query direction
Replication lag is the price of scale—plan for eventual consistency, use read-after-write techniques for user-facing data, and always monitor lag
Specialized replicas (analytics, lower isolation) let you optimize different access patterns without impacting the primary
Cross-region replicas reduce latency for global users but increase cost and complexity; write latency still crosses the globe
Read replicas don’t solve write bottlenecks—when your primary is write-heavy, you need different strategies like partitioning or write optimization

Practice Scenarios

Scenario 1: The 95% Read Workload

Your PostgreSQL database serves 50,000 reads per second and 5,000 writes per second. Each server can handle 12,500 QPS. CPU on the primary is at 80%. Design a replica strategy.

How many read replicas do you need?
Where would you deploy them (same region or multi-region)?
How would you route queries in your application?
What monitoring would you set up?

Scenario 2: Read-After-Write Consistency

A user edits their profile picture. They immediately load their profile page. The read replica hasn’t caught up yet (200ms lag), so they see the old picture. How would you solve this?

Suggest three different approaches, with pros and cons for each
What trade-offs does each involve?

Scenario 3: Cross-Region Replication and Failover

You deploy read replicas in us-east-1 (primary), eu-west-1, and ap-southeast-1. The primary region suffers a complete outage. Walk through the failover process:

Which replica becomes the new primary?
What happens to writes from the EU while you’re failing over?
How do you handle users in the AP region during the failover?
What data might you lose?

Connection to Next Steps

Read replicas solve read scaling beautifully, but they’re only part of the story. What happens when the primary fails? That’s where failover and recovery come in. We’ll explore how to automatically detect primary failures, promote a replica, and keep your system running. We’ll also discuss more advanced replication topologies—cascading replicas, multi-primary setups, and the consistency guarantees each provides.