System Design Fundamentals

Master-Slave Replication

A

Master-Slave Replication

Introduction

Imagine your e-commerce platform is thriving. You’re processing 10,000 read requests per second — customers checking product prices, viewing inventory, reading reviews — but only 500 writes per second as they place orders and add items to carts. Your single database server is buckling under the read load. CPU spikes during peak hours. Query latency creeps toward the unacceptable. You need a way to distribute reads across multiple copies of the data while keeping writes centralized and consistent.

Enter master-slave replication (also called primary-replica or primary-standby replication).

This architecture solves a fundamental problem in distributed systems: how do you scale read-heavy workloads without fragmenting your data or losing write consistency? In the previous chapters, we covered database fundamentals and how data is stored and indexed. Now we’re looking at how to replicate that data across multiple machines in a way that’s practical, reliable, and operationally manageable.

Master-slave replication isn’t perfect — it has clear limitations and edge cases we’ll explore. But it’s the foundational replication pattern that powers billions of transactions worldwide, and understanding it deeply prepares you for more sophisticated patterns like multi-master replication (Chapter 50) and consistency models across distributed systems (Chapter 52).

What is Master-Slave Replication?

At its core, master-slave replication is deceptively simple: one master (primary) node accepts all writes. Multiple slave (replica) nodes receive asynchronous copies of those writes and serve only reads.

Let’s unpack the key components:

The Master (Primary)

  • Accepts all write operations (INSERT, UPDATE, DELETE)
  • Is the single source of truth for your data
  • Writes changes to its replication log before applying them locally
  • Responsible for broadcasting changes to slaves

The Slaves (Replicas)

  • Receive a stream of changes from the master
  • Apply these changes in the same order the master did
  • Serve read queries with potentially stale data
  • Cannot accept writes (in the basic model)
  • Can be promoted to master if the current master fails

The Replication Log The replication log is the mechanism that makes this all work. When you write data to the master, before the write is applied to the database, it’s written to the replication log. This log is then streamed to replicas, which apply the same changes in order. Think of it as an append-only stream of “change events” that describes every modification to the database.

Different databases use different names for this log:

  • PostgreSQL: Write-Ahead Log (WAL) with streaming replication
  • MySQL: Binary Log (binlog)
  • MongoDB: Oplog (operation log)
  • Oracle: Redo logs with Data Guard

How Changes Propagate: Replication Strategies

Once a change reaches the replication log, how does it get copied to replicas? There are three main approaches, each with different trade-offs:

Statement-Based Replication

The master logs the SQL statements themselves:

MASTER LOG: INSERT INTO orders (user_id, total) VALUES (42, 99.99);

The slave executes the same statement. This is compact — you’re only logging the statement, not the data. But it’s dangerous. Suppose your INSERT uses NOW() or RAND() — the slave will execute NOW() at a different time than the master, potentially creating different data. Non-deterministic functions are replication killers in this mode.

Pros: Compact logs, easy to understand Cons: Non-deterministic functions break replication, complex stored procedures can cause divergence

Row-Based Replication

The master logs the actual changes to rows:

MASTER LOG: ROW_CHANGE: table=orders, pk=12345, old={user_id:42, total:0}, new={user_id:42, total:99.99}

Now the slave isn’t executing statements — it’s applying the exact same row changes. Determinism isn’t a problem because you’re not executing functions; you’re just modifying data. This is safer.

Pros: Deterministic, works with any function, safe for stored procedures Cons: Larger log files, more network bandwidth, harder to audit with raw logs

Mixed Replication

Modern databases often default to mixed mode: use statement-based when it’s safe, switch to row-based for non-deterministic operations. This balances safety and efficiency.

Replication Topologies

Master-slave isn’t limited to one master and one slave. You can arrange replicas in different patterns:

graph TD
    A["Master<br/>(accepts writes)"] --> B["Slave 1<br/>(serves reads)"]
    A --> C["Slave 2<br/>(serves reads)"]
    A --> D["Slave 3<br/>(serves reads)"]
    style A fill:#ff6b6b
    style B fill:#4ecdc4
    style C fill:#4ecdc4
    style D fill:#4ecdc4

Single Master with Multiple Slaves This is the standard pattern. One master accepts all writes. Multiple slaves distribute the read load. If a slave fails, you’ve still got other slaves and the master. If the master fails, you can promote one slave to master (though this involves complexity — see failover challenges below).

graph TD
    A["Master"] --> B["Slave 1"]
    B --> C["Slave 2"]
    C --> D["Slave 3"]
    style A fill:#ff6b6b
    style B fill:#4ecdc4
    style C fill:#4ecdc4
    style D fill:#4ecdc4

Cascading Replication (Replica of a Replica) Sometimes you replicate across geographic regions. A slave in your primary datacenter acts as a master for slaves in secondary datacenters. This reduces the network load on your primary master and helps with geographic distribution. The trade-off: replication lag compounds as changes flow down the chain.

The Newspaper Printing Analogy

Think of a newspaper operation. The editor (master) reviews and approves all stories. The approved content goes to the printing presses (replicas), which produce physical copies. Readers (your application) pick up copies from any newsstand (replica). All edits flow through the editor, but readers get served from distributed locations.

If a printing press breaks down, other presses keep running. Readers don’t go without newspapers. But if the editor stops approving stories, no new content reaches any press — though existing papers stay in distribution. Eventually, you’d need a new editor (promoting a replica to master).

The speed at which new stories reach the newsstands is like replication lag. If the printing presses are far away, there’s delay between approval and distribution.

How Replication Works Under the Hood

Let’s trace the technical flow in PostgreSQL, a database many system designers use:

PostgreSQL Streaming Replication

  1. Write at Master: Application writes to the master. The master writes the change to its Write-Ahead Log (WAL).

  2. WAL Persists: The master fsync’s the WAL to disk (durability).

  3. Master Applies Change: The change is applied to the actual table data.

  4. WAL Streaming: The master streams WAL records to connected replicas.

  5. Replica Receives: The standby replica reads the WAL stream.

  6. Replica Applies: The replica writes the WAL to its own disk, then applies the changes to its tables. It can now serve reads with this data.

  7. Acknowledgment: The replica acknowledges it has received the WAL (though it may not have fully applied it yet).

Here’s a simplified PostgreSQL configuration:

-- On Master: postgresql.conf
wal_level = replica              # Enable WAL logging for replication
max_wal_senders = 10             # Number of replication connections
wal_keep_size = 1GB              # Keep 1GB of WAL for late replicas
synchronous_commit = on          # Balance safety and performance
synchronous_standby_names = 'standby1,standby2'  # Which replicas to wait for

-- On Replica: postgresql.conf
hot_standby = on                 # Allow read queries on replica
standby_mode = on                # Operate in standby mode
primary_conninfo = 'host=master-ip port=5432 user=repl password=secret'

MySQL Binary Log Replication

MySQL works similarly but uses binary logs (binlog) instead of WAL:

-- On Master: my.cnf
[mysqld]
log_bin = mysql-bin              # Enable binary logging
server_id = 1                    # Unique ID for this server
binlog_format = ROW              # Use row-based replication
binlog_retention_days = 10       # Keep binlogs for 10 days
binlog_cache_size = 32K          # Buffer for transaction binlog

-- Create replication user
CREATE USER 'repl'@'slave-ip' IDENTIFIED BY 'repl_password';
GRANT REPLICATION SLAVE ON *.* TO 'repl'@'slave-ip';

-- On Slave: my.cnf
[mysqld]
server_id = 2                    # Different from master
relay_log = mysql-relay-bin      # Local relay log
relay_log_index = mysql-relay-bin.index
read_only = ON                   # Prevent writes

-- Configure slave to connect to master
CHANGE MASTER TO
  MASTER_HOST='master-ip',
  MASTER_USER='repl',
  MASTER_PASSWORD='repl_password',
  MASTER_LOG_FILE='mysql-bin.000001',
  MASTER_LOG_POS=12345;

START SLAVE;
SHOW SLAVE STATUS;  -- Check replication status

Consistency Considerations: The Replication Lag Problem

Here’s where theory meets harsh reality. Master-slave replication is asynchronous by default. The master doesn’t wait for slaves to acknowledge writes. This is fast — your application gets an immediate “write successful” response — but it creates a consistency window.

Between the moment the master writes something and the moment a slave has applied it, those two systems disagree. If a user writes data to the master and immediately reads from a replica, they might get stale (or missing) data.

graph LR
    A["Master<br/>receives write"] -->|acknowledge to client| B["Replica<br/>replication lag<br/>5-100ms"]
    C["Replica<br/>applies change"]
    B -.->|propagate| C
    style A fill:#ff6b6b
    style B fill:#ffeb3b
    style C fill:#4ecdc4

This is eventual consistency. The system will eventually be consistent, but not immediately.

Semi-Synchronous Replication

One way to reduce this window is semi-synchronous replication. The master waits for at least one replica to acknowledge it has received (and ideally applied) the write before returning success to the client.

-- PostgreSQL
SET synchronous_commit = on;  -- Wait for replica ACK

Now your write is slower (you wait for network round-trip to replica), but you’re guaranteed that at least two copies of the data exist before the master confirms the write. If the master crashes, you won’t lose the data because the replica has it.

Trade-off: Increased write latency (network round-trip time, usually 1-10ms) but better durability.

Read-Your-Writes Consistency

Applications often implement a pattern called read-your-writes consistency: after writing, read from the master (or a sync’d replica). This guarantees you see your own changes.

# Application-level read/write routing
def create_order(user_id, items):
    # Write to master
    result = db_master.execute(
        "INSERT INTO orders (user_id, items) VALUES (%s, %s)",
        (user_id, items)
    )
    order_id = result.lastrowid

    # Read from master to guarantee we see the write
    order = db_master.execute(
        "SELECT * FROM orders WHERE id = %s",
        (order_id,)
    )
    return order

def get_user_orders(user_id):
    # Reads can go to replica (eventually consistent)
    orders = db_replica.execute(
        "SELECT * FROM orders WHERE user_id = %s",
        (user_id,)
    )
    return orders

Setting Up Replication in Practice

PostgreSQL Streaming Replication Walkthrough

  1. Configure the Master with the config above
  2. Take a Base Backup:
pg_basebackup -h master-ip -D /var/lib/postgresql/data \
  -U repl --progress --write-recovery-conf --wal-method=stream
  1. Start the Replica: PostgreSQL detects the standby.signal file and begins streaming from the master
  2. Monitor:
-- On master
SELECT client_addr, state, write_lsn FROM pg_stat_replication;

MySQL Replication Setup

  1. Dump the master with a consistent snapshot:
mysqldump -u root -p --all-databases --master-data=2 > dump.sql
  1. Load on replica:
mysql -u root -p < dump.sql
  1. Configure slave (as shown above) and run START SLAVE;

  2. Monitor:

SHOW SLAVE STATUS\G

Look for Seconds_Behind_Master — this tells you replication lag in seconds.

The Hard Problems: Replication Lag and Failover

Replication Lag in Action

Replication lag is usually measured in milliseconds on a well-tuned system, but network hiccups, heavy load, or slow replicas can push it to seconds. What happens then?

Imagine a user places an order on your e-commerce platform (write goes to master). The order confirmation page reads the order status from a replica. If that replica is 2 seconds behind and the read hits that replica before the order data reaches it… the user sees “order not found” and gets confused.

Solutions:

  • Sticky sessions: Route the user’s subsequent reads to the master
  • Replication lag monitoring: Alert operations if lag exceeds a threshold
  • Hybrid approach: Reads within X milliseconds of a write go to master; older reads can go to replicas

Failover: Promoting a Replica to Master

When the master dies (hardware failure, software crash, network partition), you need to promote a replica to master. But this is dangerous:

  1. Data Loss: If you promote the replica that’s furthest behind, you’ve lost the writes that were on the master but hadn’t replicated yet.

  2. Split-Brain: If the master is just partitioned (network failure) and still running, you now have two masters accepting writes. Data diverges. Disaster.

  3. Consistency Checkpoints: Even after promotion, other replicas need to know where to continue replicating from in the new master’s logs.

To handle failover safely, you need orchestration tools like Patroni (PostgreSQL) or MySQL Router with Group Replication. These tools:

  • Monitor the master continuously
  • Automatically promote the most caught-up replica when master dies
  • Prevent split-brain with quorum-based decisions
  • Redirect application traffic to the new master

When Master-Slave Meets Real Workloads

Here’s a practical scenario: you’re handling 10K read/sec but only 500 write/sec (as we started with). You have:

  • 1 Master (handles all 500 write/sec + read/sec)
  • 4 Replicas (distribute 10K read/sec, ~2.5K each with proper routing)

Your application uses a simple read/write router:

import mysql.connector
from random import choice

class ReadWriteRouter:
    def __init__(self):
        self.master = mysql.connector.connect(
            host='master-db.prod',
            user='app',
            password='secret',
            database='ecommerce'
        )
        self.replicas = [
            mysql.connector.connect(
                host=f'replica-{i}.prod',
                user='app',
                password='secret',
                database='ecommerce'
            )
            for i in range(1, 5)
        ]

    def execute_write(self, query, args=None):
        cursor = self.master.cursor()
        cursor.execute(query, args or ())
        self.master.commit()
        return cursor.lastrowid or cursor.rowcount

    def execute_read(self, query, args=None):
        replica = choice(self.replicas)
        cursor = replica.cursor(dictionary=True)
        cursor.execute(query, args or ())
        return cursor.fetchall()

# Usage
router = ReadWriteRouter()

# All writes to master
order_id = router.execute_write(
    "INSERT INTO orders (user_id, total) VALUES (%s, %s)",
    (42, 99.99)
)

# Reads from replica (load balanced)
orders = router.execute_read(
    "SELECT * FROM orders WHERE user_id = %s",
    (42,)
)

This simple pattern scales read capacity without touching the application business logic.

Monitoring Replication Health

What do you monitor? Several key metrics:

MetricToolWhat It Means
Replication LagSHOW SLAVE STATUS (MySQL) or pg_stat_replication (PostgreSQL)How far behind is the replica? If it exceeds your threshold, alert.
Master Write RateBinlog events/sec or WAL bytes/secIs the write load increasing? Will replicas keep up?
Replica Apply RateEvents/sec applied on replicaIs the replica processing changes fast enough?
Network LatencyPing from master to replicaIs network capacity the bottleneck?
Replica Disk I/OIOPS on relay logIs the replica’s disk saturated?
Master Connection CountSHOW PROCESSLISTHow many replica connections? Do you have capacity for more?

Trade-Offs: When Master-Slave Falls Short

Master-slave replication is powerful, but it has fundamental limitations:

Write Scalability: All writes go to the master. You can’t distribute write load. If your write load grows to exceed the master’s capacity, you’re stuck. You need multi-master or sharding.

Single Point of Failure for Writes: The master is a chokepoint. Even though replicas can handle reads, no writes are possible while the master is down (unless you promote a replica, which is complex and error-prone).

Replication Lag: As we’ve discussed, writes aren’t immediately visible on replicas. Applications must handle eventual consistency.

Failover Complexity: Promoting a replica requires coordination, quorum checks, and application reconfiguration. Automated tools help, but complexity remains.

Statement vs Row-Based Trade-Offs:

  • Statement-based is compact but breaks with non-deterministic functions
  • Row-based is safe but verbose and harder to debug

Cascading Lag: If you use replica-of-replica topologies (for geographic distribution), replication lag compounds at each level.

Compare this to multi-master replication (Chapter 50), which distributes writes but introduces conflict resolution complexity.

Key Takeaways

  • Master-slave replication distributes read load by routing reads to replicas while keeping all writes on a single master
  • The replication log (WAL, binlog, oplog) is the mechanism that ensures slaves stay synchronized with the master
  • Replication is asynchronous by default, creating eventual consistency; semi-synchronous mode trades write latency for better durability guarantees
  • Replication lag (the gap between master writes and replica application) is the fundamental consistency trade-off; applications must handle stale reads or implement read-your-writes patterns
  • Failover (promoting a replica to master) requires careful orchestration to avoid data loss and split-brain scenarios
  • Master-slave works excellently for read-heavy workloads but doesn’t scale write capacity; multi-master or sharding are needed for write scalability

Practice Scenarios

Scenario 1: Replication Lag During Load Spike Your social media platform is handling a trending topic. User engagement spikes from 5K read/sec to 50K read/sec. Two of your four replicas start lagging (10+ seconds behind master). Some users post a comment and don’t see it immediately when they reload. What are three approaches to handle this?

Hints: Consider traffic shaping, replica performance, and application-side consistency patterns.

Scenario 2: Planning Geographic Distribution You’re expanding to three regions (US-East, US-West, EU). Your master is in US-East. You want to serve reads locally from each region while keeping writes centralized. Replication lag must stay under 500ms for decent user experience. Describe your topology. What challenges might you face, and how would you monitor them?

Hints: Consider cascading replication, cross-region latency, and quorum-based consistency checks.

Scenario 3: Failover Decision Your master dies unexpectedly. You have three replicas: Replica A is 2 seconds behind, Replica B is 5 seconds behind, Replica C is 10 minutes behind (it was doing a backup and network replication paused). Which do you promote, and what data loss might occur? How would you ensure the others catch up?

Hints: Consider loss-critical operations (payments, orders) vs. non-critical (view counts, likes). Think about consistency checkpoints in the replication logs.

Looking Ahead

Master-slave replication is the foundation, but it has a ceiling. If your business grows and you need to distribute writes across multiple datacenters, or if write load exceeds a single master’s capacity, you move to multi-master (active-active) replication (Chapter 50). Multi-master distributes both reads and writes but introduces conflict resolution — a problem master-slave avoids by keeping writes centralized.

We’ll explore that complexity next, and you’ll appreciate why master-slave is often the right choice for as long as your architecture can support it.