Read Replicas for Scaling
Introduction
You’re the on-call engineer at midnight. Your PostgreSQL database is screaming—CPU pegged at 95%. Your monitoring dashboard shows something interesting: 90% of all queries are SELECT statements. Only 10% are writes (INSERT, UPDATE, DELETE). Adding more CPU to your primary database might buy you time, but it won’t scale. The math is brutal: if your read load doubles next quarter, you’ll need to double your database machine again.
But here’s what you realize: you don’t need a faster database. You need more databases. Specifically, you need read replicas.
Read replicas are dedicated copies of your database that serve only read queries. Your primary database handles all writes and sends those changes downstream to the replicas. Clients send SELECT queries to the replicas instead of the primary. Suddenly, your read capacity multiplies. This is one of the most powerful scaling patterns in distributed systems, especially for read-heavy workloads.
In this chapter, we’ll explore how read replicas work, when to use them, how to set them up, and the real trade-offs you’ll face. We’re building on everything we learned about replication fundamentals—now we’re applying it strategically to solve a specific problem: read scaling.
What Are Read Replicas?
Read replicas are secondary database instances that contain an exact copy of your primary database. They’re continuously updated via binary log replication (or similar mechanisms). The critical difference from a general replica is purpose: read replicas exist specifically to serve SELECT queries, never writes.
Here’s the architecture in a sentence: your application sends all writes to the primary database, and reads to the replicas. The primary asynchronously replicates its changes to the replicas. This is called the read/write split pattern.
How Read Replicas Differ from Primary Databases
| Aspect | Primary Database | Read Replica |
|---|---|---|
| Accepts writes | Yes | No |
| Accepts reads | Yes | Yes |
| Updates other replicas | Yes | No |
| Contains full dataset | Yes | Yes (initially) |
| Replication lag | 0 | Milliseconds to seconds |
| When to scale it | Write bottleneck | Read bottleneck |
Types of Read Replicas
Same-Region Replicas: Located in the same availability zone or region as the primary. Used for immediate read scaling and failover. Lower latency, higher cost (slight premium for cross-zone traffic).
Cross-Region Replicas: Deployed in different geographic regions. Serve local reads for users in those regions, reducing latency. Essential for global applications. Replication lag may be higher due to network distance.
Specialized Replicas: Replicas with different configurations than the primary.
- Analytics Replicas: May have different indexes optimized for OLAP queries rather than OLTP
- Reporting Replicas: Lower isolation levels (read uncommitted), trades consistency for speed
- Backup Replicas: Kept offline, only brought online for backup operations
The Restaurant Analogy
Imagine a popular restaurant with one chef and six waiters. The chef (primary database) is the single writer—they prepare all dishes. The waiters (read replicas) are the readers—they serve customers (application clients) with the prepared dishes.
Before read replicas, you had one chef doing both cooking and serving. Customers wait forever. Adding another chef is expensive and complex—now two chefs need to coordinate on which dish to make.
With read replicas, the single chef focuses on cooking. Waiters keep a copy of the current menu (database) and serve customers in parallel. If a new dish is added (write), the chef tells all waiters immediately. But if there’s a tiny delay in the restaurant’s phone system, one waiter might not know about the new dish for a few seconds—that’s replication lag.
The more waiters you add, the more customers you serve. But eventually, the chef becomes the bottleneck—they can’t keep up with all the dish requests. That’s when read replicas stop helping, and you need different strategies (partitioning, caching, write optimization).
Setting Up Read Replicas in Managed Services
Most cloud databases make read replicas trivially easy. Let’s look at the real-world process.
AWS RDS Read Replicas
With AWS RDS, creating a read replica is one API call:
aws rds create-db-instance-read-replica \
--db-instance-identifier mydb-replica-1 \
--source-db-instance-identifier mydb-primary \
--db-instance-class db.t3.large \
--storage-type gp3 \
--multi-az \
--region us-east-1
Key parameters:
- source-db-instance-identifier: Which database to replicate from
- db-instance-class: Instance type for the replica (can differ from primary)
- multi-az: Deploy the replica across availability zones for high availability
- region: Can be same-region (cheaper) or different-region (latency reduction)
AWS handles replication automatically using the database engine’s native binary log mechanism. For MySQL, it’s the binary log. For PostgreSQL, it’s streaming replication via WAL (Write-Ahead Log).
Promoting a Read Replica
When your primary fails or you need to shard your data, you can promote a replica to a standalone primary:
aws rds promote-read-replica \
--db-instance-identifier mydb-replica-1
This is almost instantaneous. The promoted replica can now accept writes. Replication stops, and it becomes independent.
Application-Level Read/Write Routing
The database doesn’t know which queries should go where. Your application must decide: is this a write? Send to primary. Is this a read? Send to a replica.
Pattern 1: Middleware-Based Routing
Tools like ProxySQL or PgPool-II sit between your application and databases. They intercept SQL and route reads to replicas.
ProxySQL Configuration Example:
# Define the MySQL servers
mysql_servers:
(
{
hostgroup_id=0, # Primary group
hostname="primary.rds.amazonaws.com",
port=3306,
weight=1000,
status="ONLINE"
},
{
hostgroup_id=1, # Read replica group
hostname="replica-1.rds.amazonaws.com",
port=3306,
weight=1000,
status="ONLINE"
},
{
hostgroup_id=1,
hostname="replica-2.rds.amazonaws.com",
port=3306,
weight=1000,
status="ONLINE"
}
)
# Route SELECT to read replicas, everything else to primary
mysql_query_rules:
(
{
rule_id=1,
match_pattern="^SELECT",
destination_hostgroup=1,
apply=1
},
{
rule_id=2,
match_pattern="^[^S]",
destination_hostgroup=0,
apply=1
}
)
Advantages: Transparent to application code, works with any language, automatic failover. Disadvantages: Additional network hop, connection pooling complexity, doesn’t understand transaction context.
Pattern 2: Application-Level Routing
Your code explicitly routes queries. Modern ORMs make this straightforward.
Django (Python) Example:
from django.db import connections
class ReadReplicaRouter:
def db_for_read(self, model, **hints):
return 'replica' # Route reads to replica database
def db_for_write(self, model, **hints):
return 'default' # Route writes to primary
def allow_relation(self, obj1, obj2, **hints):
return True
def allow_migrate(self, db, app_label, model_name=None, **hints):
return db == 'default' # Migrations only on primary
# settings.py
DATABASES = {
'default': { # Primary
'ENGINE': 'django.db.backends.postgresql',
'NAME': 'mydb',
'HOST': 'primary.rds.amazonaws.com',
'USER': 'admin',
'PASSWORD': os.environ['DB_PASSWORD'],
},
'replica': { # Read replica
'ENGINE': 'django.db.backends.postgresql',
'NAME': 'mydb',
'HOST': 'replica-1.rds.amazonaws.com',
'USER': 'readonly',
'PASSWORD': os.environ['REPLICA_PASSWORD'],
}
}
DATABASE_ROUTERS = ['myapp.routers.ReadReplicaRouter']
# Usage in views
from django.contrib.auth.models import User
# This query goes to replica
users = User.objects.using('replica').filter(active=True)
# This query goes to primary
user = User.objects.create(username='alice', email='[email protected]')
Node.js Example with Sequelize:
const { Sequelize } = require('sequelize');
const sequelize = new Sequelize('mydb', 'admin', process.env.DB_PASSWORD, {
replication: {
read: [
{ host: 'replica-1.rds.amazonaws.com', username: 'readonly', password: process.env.REPLICA_PASSWORD },
{ host: 'replica-2.rds.amazonaws.com', username: 'readonly', password: process.env.REPLICA_PASSWORD },
],
write: { host: 'primary.rds.amazonaws.com', username: 'admin', password: process.env.DB_PASSWORD }
},
dialect: 'postgres',
});
// Reads automatically go to a replica (round-robin by default)
const user = await User.findOne({ where: { id: 1 } });
// Writes go to primary
const newUser = await User.create({ username: 'bob', email: '[email protected]' });
Connection Routing Strategies
Once you have multiple replicas, how do you choose which one?
Round-Robin: Cycle through replicas. Simple, fair distribution.
Least-Connections: Route to the replica with fewest active connections. Better for variable query costs.
Latency-Based: Route to the replica with lowest response time. Best for geographic distribution.
Consistent Hashing: Route by user_id or session. Ensures the same client talks to the same replica (better for caching, warm data).
Specialized Replica Configurations
Not all replicas need to be identical to the primary.
Analytics Replicas
Your primary is optimized for OLTP: fast inserts and point lookups. An analytics replica can have different indexes.
-- On primary: Simple indexes for transaction processing
CREATE INDEX idx_users_id ON users(id);
CREATE INDEX idx_orders_created ON orders(created_at);
-- On analytics replica: Star schema, bitmap indexes, columnar format
CREATE INDEX idx_orders_user_created ON orders(user_id, created_at);
CREATE INDEX idx_orders_status_amount ON orders(status, amount);
-- Some databases (PostgreSQL 10+) support inclusion indexes
CREATE INDEX idx_orders_covering ON orders(status) INCLUDE (user_id, amount);
Replication still works—the replica receives all writes and applies them. But it maintains different metadata.
Lower Isolation for Reads
Sometimes you can trade consistency for speed. A read-only replica might use a lower isolation level:
-- On the replica
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
-- Now reads don't block on in-flight transactions
-- You might see "dirty" data (uncommitted writes), but for analytics that's often okay
Calculating Read Replica Capacity
How many replicas do you need? Use this formula:
Number of Replicas = ceil(Total Read QPS / Single Replica Capacity)
* Safety Factor (typically 1.5 to 2.0)
Example:
- Your application generates 10,000 read queries per second
- Benchmarks show each replica can handle 2,500 QPS with acceptable latency
- Required replicas: ceil(10,000 / 2,500) = 4 replicas
- With 1.5x safety factor (traffic spikes, failover): 4 * 1.5 = 6 replicas
Add extra capacity for:
- Failover (one replica goes down, redistribute its load)
- Traffic spikes (30-50% above baseline)
- Planned maintenance (rolling updates)
Replication Lag and Consistency
The biggest challenge with read replicas is eventual consistency. Changes don’t appear instantly on replicas.
Sources of Lag
graph LR
A["Write to Primary"] --> B["Primary writes to disk"]
B --> C["Send binary log to replica"]
C --> D["Replica applies changes"]
D --> E["Replica writes to disk"]
style C fill:#ff9999
style D fill:#ff9999
C -->|Network latency| D
D -->|Disk I/O| E
Typical lag ranges: 10-100ms in same region, 100ms-5 seconds cross-region.
Handling Stale Reads
Problem: A user updates their profile and immediately loads the dashboard, but the read replica hasn’t caught up yet.
Solutions:
- Read-After-Write Consistency: Route writes and subsequent reads to the primary for a user session.
// After a write, pin the user to primary for 5 seconds
await db.write('users', { id: user.id, name: 'Alice' });
session.primaryUntil = Date.now() + 5000;
// Subsequent reads
if (Date.now() < session.primaryUntil) {
user = await db.read('users', { id: user.id }, { useReplica: false });
} else {
user = await db.read('users', { id: user.id });
}
- Version Vectors: Client tracks version and requests “at least this version”.
// Write returns a version
const result = await db.write('inventory', { product_id: 123, qty: 50 });
const version = result.version;
// Read with consistency requirement
const inventory = await db.read('inventory', { product_id: 123 }, {
minVersion: version
});
- Strong Consistency for Critical Reads: Some queries always hit primary.
// Cart operations are critical—don't show stale data
const cart = await db.read('carts', { user_id }, { useReplica: false });
// Product recommendations can use stale replicas
const products = await db.read('products', { category: 'electronics' });
Monitoring Replication Lag
Always monitor this:
-- MySQL
SHOW SLAVE STATUS\G
-- Look at: Seconds_Behind_Master
-- PostgreSQL
SELECT now() - pg_last_wal_receive_lsn()::timestamptz;
-- Also check replication slot lag
SELECT slot_name, restart_lsn, confirmed_flush_lsn FROM pg_replication_slots;
Set up alerts: if lag exceeds 1 second, page the on-call engineer.
Global Read Replicas
For global applications, cross-region replicas are essential.
graph TB
subgraph "us-east-1"
direction TB
Primary["🟢 Primary DB<br/>us-east-1"]
App1["App Servers<br/>us-east-1"]
App1 -->|writes| Primary
App1 -->|reads| Primary
end
subgraph "eu-west-1"
direction TB
Replica1["🟡 Read Replica<br/>eu-west-1"]
App2["App Servers<br/>eu-west-1"]
App2 -->|reads| Replica1
App2 -->|writes| Primary
end
subgraph "ap-southeast-1"
direction TB
Replica2["🟡 Read Replica<br/>ap-southeast-1"]
App3["App Servers<br/>ap-southeast-1"]
App3 -->|reads| Replica2
App3 -->|writes| Primary
end
Primary -->|stream replication| Replica1
Primary -->|stream replication| Replica2
style Primary fill:#99ff99
style Replica1 fill:#ffff99
style Replica2 fill:#ffff99
Benefits:
- Users in EU read from EU (60ms savings vs transatlantic latency)
- Reduced egress bandwidth costs (data stays in region)
- Disaster recovery (if primary region fails, promote EU replica)
Challenges:
- Higher replication lag (network distance)
- Writes still go to primary (network round-trip for every write)
- Cost multiplies (three regions = roughly 3x storage/compute)
Connection Pooling with Read Replicas
Each replica connection consumes memory. Connection pooling becomes critical.
PgBouncer per replica is a common pattern:
# /etc/pgbouncer/pgbouncer.ini
[databases]
mydb_primary = host=primary.rds.amazonaws.com port=5432 dbname=mydb
mydb_replica_1 = host=replica-1.rds.amazonaws.com port=5432 dbname=mydb
mydb_replica_2 = host=replica-2.rds.amazonaws.com port=5432 dbname=mydb
[pgbouncer]
pool_mode = transaction
max_client_conn = 10000
default_pool_size = 25
min_pool_size = 10
reserve_pool_size = 5
Your application connects to pgbouncer, not the database directly. Pgbouncer multiplexes to the actual replicas, reusing connections efficiently.
Trade-Offs: When Read Replicas Fall Short
Cost
Each replica doubles your storage costs. Cross-region replicas are expensive (egress fees, machine costs in multiple regions). If you have 10 replicas, you’ve multiplied your database bill by 10.
Complexity
You now manage multiple databases. Connection routing is harder. Failover is more complex. Monitoring replication lag adds operational burden.
Write Scaling
Read replicas don’t help with write bottlenecks. If your workload is 50% reads and 50% writes, adding replicas helps only half your problem. You need partitioning or sharding for write scaling.
Eventual Consistency Issues
For strong consistency requirements (financial transactions, inventory), read replicas may not work. You can’t reliably read your own write if the replica hasn’t caught up. Workarounds (pinning to primary) reduce the benefits.
Replication Lag Under Load
High write volume can cause replication lag to spike. If your primary commits 10,000 writes per second, replicas might lag several seconds. Analytics queries start returning stale data.
When NOT to Use Read Replicas
- Write-heavy workloads (more than 30-40% writes): Replicas don’t solve the core problem
- Strong consistency requirements: Medical records, financial ledgers
- Very low latency requirements: Cross-region lag is too high
- Small workloads: A single database is simpler and cheaper
Read Replicas vs. Caching
Read replicas and caching are complementary, not competitive.
| Strategy | Use Case | Latency | Consistency | Cost |
|---|---|---|---|---|
| Database only | Small load, simple schema | 1-10ms | Strong | Low |
| Read replicas | Medium-high load, low consistency tolerance | 10-100ms | Eventual | Medium |
| Read replicas + caching | High load, hot datasets | 1-5ms (cache hit) | Eventual (with invalidation) | Higher |
A mature system uses both: Redis caches frequently accessed data (products, user profiles), read replicas handle the long tail of queries that aren’t cacheable.
Key Takeaways
- Read replicas multiply your read capacity by distributing SELECT queries across multiple database copies while keeping a single primary for writes
- Set them up in cloud-managed services (AWS RDS, Google Cloud SQL) with a single API call; use middleware (ProxySQL) or application-level routing for query direction
- Replication lag is the price of scale—plan for eventual consistency, use read-after-write techniques for user-facing data, and always monitor lag
- Specialized replicas (analytics, lower isolation) let you optimize different access patterns without impacting the primary
- Cross-region replicas reduce latency for global users but increase cost and complexity; write latency still crosses the globe
- Read replicas don’t solve write bottlenecks—when your primary is write-heavy, you need different strategies like partitioning or write optimization
Practice Scenarios
Scenario 1: The 95% Read Workload
Your PostgreSQL database serves 50,000 reads per second and 5,000 writes per second. Each server can handle 12,500 QPS. CPU on the primary is at 80%. Design a replica strategy.
- How many read replicas do you need?
- Where would you deploy them (same region or multi-region)?
- How would you route queries in your application?
- What monitoring would you set up?
Scenario 2: Read-After-Write Consistency
A user edits their profile picture. They immediately load their profile page. The read replica hasn’t caught up yet (200ms lag), so they see the old picture. How would you solve this?
- Suggest three different approaches, with pros and cons for each
- What trade-offs does each involve?
Scenario 3: Cross-Region Replication and Failover
You deploy read replicas in us-east-1 (primary), eu-west-1, and ap-southeast-1. The primary region suffers a complete outage. Walk through the failover process:
- Which replica becomes the new primary?
- What happens to writes from the EU while you’re failing over?
- How do you handle users in the AP region during the failover?
- What data might you lose?
Connection to Next Steps
Read replicas solve read scaling beautifully, but they’re only part of the story. What happens when the primary fails? That’s where failover and recovery come in. We’ll explore how to automatically detect primary failures, promote a replica, and keep your system running. We’ll also discuss more advanced replication topologies—cascading replicas, multi-primary setups, and the consistency guarantees each provides.