Sync vs Async Replication
The Trading Platform’s Dilemma
Imagine you’re building a financial trading platform that processes thousands of transactions per second. Every trade must be recorded reliably—if a primary database crashes mid-afternoon, not a single transaction can be lost. The exchange’s reputation depends on it.
But there’s a catch: if you wait for every replica to confirm receipt of each write before responding to the client, you introduce latency. Traders milliseconds behind the market are traders losing money. The question becomes: how tight is the consistency requirement? How much latency can you afford? And critically, what data loss can you tolerate if disaster strikes?
This is the fundamental tension at the heart of replication strategy. In this chapter, we’ll explore how synchronous replication ensures consistency at the cost of latency, how asynchronous replication trades durability for speed, and how semi-synchronous and consensus-based approaches attempt to find the middle ground. Understanding this trade-off is essential for building systems that are both reliable and responsive.
What Synchronous Replication Does
In synchronous replication, the primary database doesn’t confirm a write to the client until at least one (or all) replicas have durably written the same data. The primary essentially pauses, waits for acknowledgment from its standby servers, and only then returns success to the client.
Here’s the flow:
- Client sends write request to primary
- Primary writes to its own WAL (Write-Ahead Log)
- Primary sends WAL segment to replica(s)
- Replica durably writes to its own WAL
- Replica sends acknowledgment back to primary
- Primary confirms write success to client
The key guarantee: at the moment the client receives “success,” multiple database instances have the data safely written to disk. If the primary crashes immediately, the replica can become the new primary without losing that transaction.
The cost: the client must wait for the round-trip time to the replica plus the disk write on the replica side. On a fast network in the same datacenter, this might add 5–20ms. Across geographic regions, you’re looking at 50–500ms depending on distance. For a transaction processing system expecting microsecond-scale response times, this is significant.
The Asynchronous Alternative
Asynchronous replication flips the priority: the primary confirms the write to the client as soon as it has written to its own durable storage. The replica catches up in the background, on its own schedule.
- Client sends write request to primary
- Primary writes to its own WAL
- Primary confirms write success to client immediately
- Primary ships WAL segment to replica(s) in the background
- Replica catches up asynchronously
The advantage is obvious: no waiting. Clients see immediate acknowledgment, and throughput scales linearly with disk speed on the primary alone. The disadvantage is equally clear: at any given moment, the replica lags behind. If the primary crashes before the replica catches up, those in-flight transactions are lost forever.
The risk window is defined by the replication lag—the distance between the primary’s log position and the replica’s position. In a well-configured system with fast networks, this might be a few milliseconds. In a congested or geographically distributed system, it could be seconds or minutes.
The Middle Ground: Semi-Synchronous Replication
Semi-synchronous replication is a pragmatic compromise. The primary waits for acknowledgment from at least one replica before confirming the write, but not all replicas. This ensures that two copies of the data exist (primary plus one replica) while minimizing latency.
The flow resembles synchronous replication:
- Client sends write request to primary
- Primary writes to its own WAL
- Primary sends to all replicas in parallel
- At least one replica durably writes and acknowledges
- Primary confirms to client
If you have three replicas, you get the durability guarantee of two copies of the data (primary plus one ack’d replica) without waiting for all three. The non-acking replicas catch up asynchronously.
The recovery profile sits between the pure modes: you lose transactions only if the primary AND the acknowledged replica(s) all fail simultaneously—a much rarer scenario than primary-only failure.
Visualizing the Write Flow
Let’s see how these modes differ in timing and guarantees:
sequenceDiagram
participant Client
participant Primary
participant Replica
Note over Primary,Replica: Synchronous Replication
Client->>Primary: Write request
Primary->>Primary: Write to WAL
Primary->>Replica: Send WAL
Replica->>Replica: Write to WAL
Replica-->>Primary: Acknowledge
Primary-->>Client: Confirm success
Note over Primary,Replica: Asynchronous Replication
Client->>Primary: Write request
Primary->>Primary: Write to WAL
Primary-->>Client: Confirm success
Primary->>Replica: Send WAL (background)
Replica->>Replica: Write to WAL
Notice the difference: in synchronous mode, the client is blocked until the replica acknowledges. In asynchronous mode, the client returns immediately, and replication happens in the background.
Technical Implementation Across Databases
PostgreSQL: Synchronous Standby Configuration
PostgreSQL offers fine-grained control via the synchronous_standby_names parameter:
-- Require acknowledgment from one standby named 'standby1'
ALTER SYSTEM SET synchronous_standby_names = 'standby1';
-- Require acknowledgment from any one of multiple standbys
ALTER SYSTEM SET synchronous_standby_names = 'ANY 1 (standby1, standby2, standby3)';
-- Require acknowledgment from ALL standbys
ALTER SYSTEM SET synchronous_standby_names = '(standby1, standby2, standby3)';
-- Empty list = asynchronous mode
ALTER SYSTEM SET synchronous_standby_names = '';
SELECT pg_ctl_reload_config();
When you require synchronous replication, any write that doesn’t receive timely acknowledgment will hang. If a standby is offline or network fails, writes block. PostgreSQL provides a fallback: you can configure it to degrade to asynchronous mode after a timeout, sacrificing durability to maintain availability.
The replication lag can be monitored with:
SELECT
client_addr,
write_lsn,
flush_lsn,
replay_lsn,
write_lsn - replay_lsn AS replication_lag_bytes
FROM pg_stat_replication;
MySQL: Semi-Synchronous Replication
MySQL’s semi-synchronous mode is controlled with:
-- Enable on primary
SET GLOBAL rpl_semi_sync_master_enabled = 1;
SET GLOBAL rpl_semi_sync_master_timeout = 10000; -- 10 seconds in milliseconds
-- Enable on replica
SET GLOBAL rpl_semi_sync_slave_enabled = 1;
When a replica acknowledges receipt (note: just receipt, not durability), the primary continues. If the timeout expires without acknowledgment, MySQL automatically falls back to asynchronous replication until a replica reconnects. This prevents indefinite write stalls.
The trade-off is subtle: MySQL’s acknowledgment happens earlier than PostgreSQL’s (replica receives data, not necessarily durable), so latency is lower but durability guarantees are weaker.
Advanced Approaches: Chain Replication and Consensus
For systems requiring both high durability and acceptable latency, researchers proposed chain replication: data flows through a chain of replicas. Each replica forwards the write to the next before acknowledging. This provides the durability of waiting for multiple replicas without the parallelism penalty—each replica waits for only its successor, not all at once.
Consensus-based replication (Raft, Paxos) takes a different approach. Systems like etcd, CockroachDB, and consul use consensus algorithms where the primary (leader) must have agreement from a quorum of nodes (typically majority) before considering a write committed. This guarantees durability AND that reads from any node in the quorum see the same data, eliminating split-brain scenarios entirely.
CockroachDB implements Raft-based replication:
-- By default, data is replicated across 3 zones automatically
-- Reads and writes require quorum acknowledgment
-- Stronger consistency than traditional primary-replica
SELECT * FROM system.zones WHERE id < 100;
The drawback of consensus: greater complexity, more network messages per write, and latency is determined by the slowest node in the quorum.
Performance Impact: The Numbers
Let’s quantify the cost of synchronous replication. Here’s a simplified benchmark:
Scenario: Write 10,000 transactions
Asynchronous (no waiting for replica):
- Primary writes to disk: ~0.5ms per transaction
- Total time: 5 seconds
- Throughput: 2,000 tx/sec
Synchronous (wait for replica in same datacenter):
- Primary write: 0.5ms
- Network round-trip to replica: 5ms
- Replica disk write: 0.5ms
- Network return: 5ms
- Total per transaction: ~11ms
- Total time: 110 seconds
- Throughput: 91 tx/sec
Synchronous (cross-region replica, e.g., US to Europe):
- Primary write: 0.5ms
- Network round-trip: 150ms (transatlantic)
- Replica disk write: 0.5ms
- Total per transaction: ~151ms
- Total time: 1,510 seconds
- Throughput: 6.6 tx/sec
This explains why synchronous replication across regions is almost never used for write-heavy workloads. The latency penalty is unacceptable.
One mitigation: pipelining. Instead of confirming one write before accepting the next, the primary can send multiple unconfirmed writes to the replica and batch acknowledgments. This reduces the latency-per-write while preserving durability guarantees, though it complicates failure recovery (you must replay the batch correctly).
Durability vs. Latency Trade-off Matrix
Here’s how the three modes compare across critical dimensions:
| Dimension | Synchronous | Semi-Synchronous | Asynchronous |
|---|---|---|---|
| Data loss risk (RPO) | Zero (if replica acked) | One replica + primary can fail | Replication lag window |
| Write latency | High (network + disk) | Medium | Low |
| Write throughput | Low | Medium | High |
| Replica consistency | Exact | Exact | Eventually consistent |
| Complexity | High (deadlock handling) | Medium | Low |
| Fallback under load | Manual intervention | Can degrade to async | N/A |
| Network sensitivity | High | Medium | Low |
| Suitable for cross-region | No | Conditional | Yes |
The choice depends entirely on your requirements. A financial trading platform prioritizes durability and might accept 50ms latency. A social media platform prioritizes throughput and accepts eventual consistency.
Failure Scenarios and Edge Cases
Scenario 1: Primary Crashes, Replica Lags
Asynchronous replication: If the primary crashes with 500ms of replication lag, those 500ms of transactions are lost. They exist only on the crashed primary. When you promote the replica to primary, your data is as it was 500ms ago. This is unacceptable for critical transactions but often acceptable for user-generated content or analytics.
Synchronous replication: If the primary crashes, the replica has every acknowledged transaction. No data loss (by design), but you may have unconfirmed writes still in the client buffer—these are retried or lost at the application layer.
Scenario 2: Network Partition
The primary is alive but the replica is unreachable. What happens?
Synchronous: Writes block indefinitely (or until timeout). If you have a timeout and fallback to async, you sacrifice the durability guarantee you relied on.
Asynchronous: Writes continue; the replica falls further behind. When the network heals, it catches up. This “partition tolerance” is why asynchronous replication is preferred when network reliability is uncertain.
Semi-synchronous: Behaves like synchronous during partition (writes block), then often degrades to asynchronous after timeout. MySQL’s rpl_semi_sync_master_timeout is exactly for this scenario.
Scenario 3: Split-Brain (Two Primaries)
This is the nightmare scenario. If the network partitions and both primary and replica think the other is dead, you could have two nodes accepting writes simultaneously. The data in the two branches diverges, and reconciliation is impossible without data loss.
Synchronous/Semi-sync: Still vulnerable if not paired with quorum-based consensus or an external arbitrator (like ZooKeeper).
Consensus-based (Raft): Prevents split-brain automatically. Only a quorum (majority) can write. If partitioned, the minority partition rejects writes. This is why you’ll often see 3-node Raft clusters—partition leaves one side with 2 (majority) and other with 1 (minority).
Choosing Your Replication Strategy
Banking and Financial Systems
Recommendation: Synchronous or consensus-based replication (Raft, Paxos). Every transaction must be durable. Cross-region replication is often asynchronous, with synchronous replication within a region and automatic failover between regions. Some banks use synchronous writes to two datacenters in the same metro area (high-speed link) and async writes to a backup region.
E-commerce Platforms
Recommendation: Semi-synchronous or asynchronous with careful monitoring. Order data is important, but a few seconds of lag is acceptable if the primary is healthy. Once data hits the primary’s disk, you accept some replication lag. Pair with regular backups and point-in-time recovery capabilities.
Social Media and Content Platforms
Recommendation: Asynchronous. Users tolerate eventual consistency. A new post might take 100ms to appear on a replica, and that’s fine. Throughput is paramount. Use read replicas liberally for queries.
Caching and Session Stores
Recommendation: Asynchronous or even no replication. If the primary (Redis, Memcached) fails, re-populate from a slower source (database, API). Replication lag doesn’t matter because the data is inherently temporary.
Analytics and Data Warehouses
Recommendation: Asynchronous, or batch-based replication. Queries are infrequent and batched. Lag of minutes or hours is acceptable. Optimize for throughput on the primary.
Pro Tips and Monitoring
Pro Tip: Don’t just set synchronous replication and forget it. Monitor your replication lag continuously. Set alerts for lag exceeding a threshold. If lag increases, investigate: is the replica overwhelmed? Is the network slow? Is the primary experiencing high write volume?
Pro Tip: Use a hybrid approach. For critical tables (users, payments), use synchronous replication. For non-critical tables (analytics, logs), use asynchronous. Database features like PostgreSQL’s synchronous_commit = local allow per-transaction configuration.
Pro Tip: Test your failover procedure. If you rely on synchronous replication, can you promote the standby quickly? How long does the primary detect the standby is offline? Does your application retry writes intelligently?
Did you know: PostgreSQL’s synchronous_standby_names parameter is empty by default, meaning all replication is asynchronous. You must explicitly configure synchronous replication. This is a conservative default that prioritizes availability.
Did you know: MySQL’s semi-synchronous mode only requires the replica to receive the write, not confirm it’s durable on disk. PostgreSQL goes further and requires durability before acknowledging. Both have trade-offs.
Replication Lag Monitoring Queries
For PostgreSQL:
-- Check replication lag in bytes and seconds
SELECT
client_addr,
state,
sync_state,
write_lsn,
flush_lsn,
(pg_wal_lsn_diff(write_lsn, '0/0') / 1024.0 / 1024.0) AS write_lag_mb,
EXTRACT(EPOCH FROM write_lag) AS write_lag_seconds
FROM pg_stat_replication;
-- Check when replica last caught up
SELECT
application_name,
EXTRACT(EPOCH FROM (now() - pg_last_wal_receive_lsn() > pg_last_xact_replay_timestamp())) AS lag_seconds
FROM pg_stat_replication;
For MySQL:
-- Check replica lag (requires replication to be running)
SHOW REPLICA STATUS\G
-- Key metrics:
-- Seconds_Behind_Master: replication lag in seconds
-- Retrieved_Gtid_Set: what the replica has received
-- Executed_Gtid_Set: what the replica has executed
Key Takeaways
-
Synchronous replication ensures every write is replicated before confirming to the client, providing zero data loss at the cost of latency. Use it when durability is non-negotiable.
-
Asynchronous replication confirms writes immediately and replicates in the background, offering throughput and low latency at the cost of potential data loss during crashes. The replication lag window defines the risk.
-
Semi-synchronous replication is a practical middle ground: confirm when at least one replica acknowledges, reducing the blast radius of failure without the latency penalty of waiting for all replicas.
-
Network distance dominates the latency cost of synchronous replication. Same-datacenter replication adds 10–20ms; cross-region adds 100–300ms. This is why synchronous replication is almost never used across geographic regions for write-heavy workloads.
-
Consensus-based approaches (Raft, Paxos) eliminate split-brain scenarios by requiring quorum acknowledgment before considering a write committed. They’re more complex but provide both durability and correctness guarantees.
-
Choose your mode based on your use case: banking demands synchronous; social media wants asynchronous; most services benefit from semi-synchronous or hybrid approaches per table.
Design Challenge Scenarios
Scenario 1: The Latency Crisis
You’ve just deployed synchronous replication to your e-commerce platform. Write latency jumped from 50ms to 800ms. Customers are complaining. You have three replicas in the same datacenter, all synchronized. What’s the bottleneck, and how do you fix it?
Hints: Consider network round-trip times, replica disk I/O, parallel vs. serial acknowledgment, and whether all three replicas need to acknowledge or just one.
Scenario 2: Designing for Disaster
Your fintech company requires zero data loss for critical transactions. You have data centers in New York, London, and Tokyo. Customers expect sub-500ms write latency. Synchronous replication to all three regions would violate the latency SLA. How do you architect replication to meet both durability and latency requirements?
Hints: Consider a layered approach: synchronous within a region, asynchronous between regions. Think about failover: if the primary region fails, what becomes the primary?
Scenario 3: Handling the Network Partition
Your semi-synchronous MySQL cluster experiences a network partition. The primary is in one partition with one replica; two replicas are in the other partition. Writes queue up on the primary. After 10 seconds, the replica times out and stops acknowledging. MySQL falls back to asynchronous mode. When the partition heals, how do you re-establish consistency? What’s the risk?
Hints: Consider writes that happened after fallback, split-brain risks, and automated vs. manual recovery procedures.
Connection to Replication Lag and Consistency
Understanding synchronous vs. asynchronous replication is foundational because it determines your system’s consistency profile. In the next section, we’ll dive deeper into replication lag—how to measure it, predict it, and design systems that tolerate it. We’ll also explore consistency models and how your replication choice constrains the guarantees you can make to your clients. The relationship is intimate: your replication mode is the primary lever controlling the lag-latency-durability trade-off.