System Design Fundamentals

Discussing Trade-offs

A

Discussing Trade-offs

The Hidden Skill Separating Good Engineers from Great Ones

Here’s a uncomfortable truth: in system design interviews, knowing about Cassandra, Redis, and Kafka isn’t what sets you apart. What distinguishes senior engineers from mid-level ones is the ability to articulate why you chose one approach over another.

When you say “I’d use Cassandra here,” the interviewer’s mental response is: “So what? Why Cassandra and not PostgreSQL?” They want to hear you reason through the decision: “We need high write throughput with eventual consistency acceptable for this use case. Cassandra gives us 100k writes/sec with tunable replication. But if strong consistency were a requirement, I’d choose PostgreSQL with read replicas instead, accepting lower write throughput.”

This is the skill that matters. Every architecture decision is a trade-off. Your job in an interview isn’t to find the “correct” answer—it’s to demonstrate that you understand what you’re trading away and why that trade-off is worthwhile given the specific constraints.

The Universal Trade-off Dimensions

Before you can discuss trade-offs effectively, you need to recognize the major dimensions where they exist:

DimensionTrade-offInterview Reality
ConsistencyStrong vs EventualCAP theorem drives many decisions; most modern systems accept eventual consistency for availability
LatencyMilliseconds vs SecondsCaching adds complexity but critical for user experience; 100ms difference matters
ThroughputHandling more requestsSharding increases complexity; batch processing trades latency for throughput
SimplicityEasy to build vs FlexibleMonoliths are simpler; microservices give flexibility but operational complexity
CostDollar amountOften the deciding factor in real systems; expensive doesn’t always mean better
DurabilityData safetyReplication across regions increases latency and cost

The key insight: there are no universal answers. A decision that’s perfect for a startup is wrong for Meta. A choice that works for a payment system is terrible for a social media feed.

The “It Depends” Framework

The most dangerous words in a system design interview are absolute statements: “Always use X” or “Never do Y.” Real engineers say “it depends.”

Structure your trade-off discussions like this:

  1. State the decision point — “We need to decide how to store user sessions. The main options are:”
  2. Present multiple options — “We could use in-memory cache (Redis), a database (PostgreSQL), or a distributed cache (Memcached)”
  3. Define the criteria that matter — “For this social media platform, we care about: response latency under 50ms, ability to handle 100k concurrent users, and acceptable data loss of up to 5 minutes”
  4. Evaluate each option — “Redis gives us sub-millisecond latency and handles the concurrency, but if the server crashes we lose all sessions. PostgreSQL is durable but adds network latency. Memcached is fastest but doesn’t persist.”
  5. Make a recommendation — “I’d use Redis with replication to another region. We get sub-millisecond latency for the 95th percentile, and regional replication protects against single-node failure.”
  6. Acknowledge trade-offs — “We’re accepting possible data loss during a simultaneous multi-region failure, which is acceptable given our SLA.”

Then add the crucial part: “But if the requirements changed…”

“If users absolutely couldn’t lose their session, I’d switch to PostgreSQL with primary-replica setup, accepting the 5-10ms latency increase, because durability becomes more important than speed.”

This last sentence is gold. It shows you understand the full decision space.

Pro Tip: Interviewers rarely ask “what would you do?” They ask “what would you change if…?” Anticipating these follow-ups by explicitly stating your trade-offs prevents being caught off-guard.

How to Structure a Complete Trade-off Discussion

Let’s walk through designing a database choice for a real-world problem: building the persistent storage layer for a social media platform with 100 million users.

The Decision Point “We need to choose a database for storing user posts. Our requirements are: handle 1M posts per day, serve read queries in under 50ms, strong consistency for the same user’s posts, but eventual consistency across different users’ feeds.”

The Options We realistically have three candidates:

  • PostgreSQL with read replicas
  • DynamoDB (managed NoSQL)
  • Cassandra (distributed NoSQL)

The Criteria What actually matters here?

  • Write throughput: 1M/day ≈ 12 writes/sec (peak maybe 100/sec)
  • Read throughput: Much higher; feeds require reading thousands of posts/second
  • Consistency model: Strong within a user’s own posts, eventual for global feeds
  • Operational burden: How many people does it take to maintain?
  • Cost: Managed vs self-hosted trade-off

Evaluation

PostgreSQL with read replicas:

  • Handles our write volume easily (100/sec is nothing)
  • Strong consistency out of the box
  • Replica lag for reads is manageable (few hundred milliseconds)
  • Requires our ops team to manage failover, backups, scaling
  • Cost is reasonable but grows with data

DynamoDB:

  • Serverless; no operational overhead
  • Handles our volume easily
  • Eventual consistency by default (can do strongly consistent reads with performance penalty)
  • Read replicas are built-in via global secondary indexes
  • More expensive at scale, but no ops cost

Cassandra:

  • Proven at massive scale
  • Handles write throughput beautifully
  • Tunable consistency (can dial it up for important operations)
  • Requires dedicated ops expertise and infrastructure
  • Lower dollar cost than managed services, but high operational cost

The Recommendation “I’d start with PostgreSQL with read replicas. Our volume doesn’t justify Cassandra’s operational complexity, and for a social platform, the read latency is acceptable. Read replicas give us the scale we need, and strong consistency for a user’s own posts is important.

If we scale to 10B posts per day and PostgreSQL can’t handle the write throughput, then I’d migrate to Cassandra. The pain of managing it becomes worthwhile when PostgreSQL becomes the bottleneck.”

Acknowledge What You’re Giving Up “We’re accepting eventual consistency for cross-user feeds, which means occasionally users might not see a post immediately after it’s published. We’re also accepting the operational overhead of managing PostgreSQL replicas—failover, backups, monitoring. But our timeline to scale doesn’t justify more complexity yet.”

This is a complete trade-off discussion. Notice we didn’t just pick a technology—we picked it given our constraints and explained what we’d change if those constraints evolved.

Common Trade-off Categories in Interviews

Certain trade-offs come up repeatedly. Familiarize yourself with each:

SQL vs NoSQL

  • SQL: Strong consistency, flexible queries, ACID transactions, simpler joins
  • NoSQL: Higher throughput, horizontal scaling, simpler data models, operational complexity
  • Choose SQL if: complex queries, relationships matter, strong consistency required
  • Choose NoSQL if: massive scale, simple access patterns, eventual consistency acceptable

Synchronous vs Asynchronous Communication

  • Sync: Simple to reason about, immediate feedback, easier to debug, blocking operations
  • Async: Decoupled systems, higher throughput, eventual consistency, operational complexity
  • Use sync for: payment processing (must complete), user-facing operations
  • Use async for: email, notifications, batch processing, non-critical updates

Caching Strategy Trade-offs

  • Cache-aside: Simple, but cache misses hit the database
  • Write-through: Keeps cache consistent, but writes are slower
  • Write-behind: Fast writes, but data loss risk if cache fails before write to database
  • Your choice depends on consistency requirements and acceptable staleness

Consistency Models

  • Strong: Immediate visibility of updates, but higher latency, reduced availability
  • Eventual: Higher availability, lower latency, but temporary inconsistency
  • Causal: Middle ground; updates from same source always visible in order
  • Real systems often mix: strong consistency for critical data, eventual for non-critical

Monolith vs Microservices

  • Monolith: Simple deployment, easier debugging, lower operational overhead, harder to scale components
  • Microservices: Independent scaling, technology flexibility, complexity in coordination
  • Start with monolith unless clear reason (different teams, different tech needs, massive scale)

The “What Could Go Wrong?” Technique

After making a decision, proactively discuss failure modes. This separates thoughtful engineers from ones who haven’t thought through consequences.

Example: “We’ll use Redis for caching. But what if the Redis instance crashes?”

A junior engineer stops here and hopes the interviewer doesn’t ask. A senior engineer continues:

“If Redis crashes, cache misses spike and database load increases. To mitigate:

  1. Use a cluster (multiple Redis nodes with replication)
  2. Implement circuit breakers—if cache is unavailable, degrade gracefully
  3. Pre-warm critical caches on startup
  4. Monitor cache hit rate; alert if it drops below 80%

The trade-off is added complexity and cost for resilience.”

Did You Know? Twitter famously scaled their feed service by carefully choosing which consistency guarantees mattered. Post creation required strong consistency. Feed reads used eventual consistency. This selective consistency model let them serve billions of requests cheaply.

Quantifying Trade-offs

Vague statements are weak. Specific numbers are strong.

Weak: “Caching helps with performance” Strong: “Adding a 5-minute TTL cache reduces database queries by 75%, cutting our database server count from 10 to 3 at peak load, saving $100k/month in infrastructure costs. The trade-off is that users occasionally see stale data up to 5 minutes old, which is acceptable for our use case.”

Always try to attach numbers:

  • Latency impact (add 50ms for X benefit)
  • Cost impact (spend $500k to save $2M/year)
  • Throughput impact (handle 3x more requests)
  • Consistency trade-off (occasional 5-minute staleness vs no data loss)

Practice: Walk Through Your Own Trade-offs

Here are three scenarios to practice with:

Scenario 1: URL Shortener Database Requirements: 100M shortened URLs, 10k reads/sec, availability critical Decision: MySQL vs DynamoDB vs Redis Structure your analysis using the framework above.

Scenario 2: Real-time Chat System Requirements: 500k concurrent users, messages must arrive in order, under 100ms latency Decision: WebSockets vs polling vs Server-Sent Events Discuss consistency guarantees, scalability, and failure modes.

Scenario 3: Video Recommendation Engine Requirements: Process 1M videos, serve recommendations in 200ms, 10M daily active users Decision: Batch processing vs real-time pipeline vs hybrid Discuss freshness vs consistency vs operational overhead.

When the Interviewer Challenges You

It will happen. You’ll propose a solution and hear: “What if we needed 10x higher throughput?” or “What about data durability?”

This is good. The interviewer is testing whether you can reason adaptively. Your response:

  1. Don’t defend rigidly — “Actually, I chose this because…” sounds defensive
  2. Acknowledge the new constraint — “Good point. If durability became critical, that changes the decision.”
  3. Restructure the trade-off — Walk through the framework again with the new constraint
  4. Show your reasoning — “With durability as a requirement, I’d move from Redis to PostgreSQL, accepting the latency increase because data safety outweighs speed here.”

The interviewer wants to see flexible thinking, not attachment to initial decisions.

Key Takeaways

  • Trade-offs are the currency of system design. Your job is to make them explicit and defensible.
  • Use the framework: Decision point → Options → Criteria → Evaluation → Recommendation → Trade-offs
  • Quantify whenever possible. “Saves 50% latency” is better than “faster.”
  • Always explain what you’re giving up. “We’re accepting eventual consistency” is better than ignoring the problem.
  • Proactively discuss failure modes. “If this component fails, here’s how we handle it” shows maturity.
  • Master the “it depends” answer. Show that you understand the full decision space and would adjust your choice if constraints changed.

Next, we’ll catalog the actual questions you’re likely to encounter and provide solution frameworks for each.