Recommended Books
Books offer something that blog posts, videos, and conference talks simply cannot: depth. When you finish a technical book, you don’t just understand individual concepts — you’ve internalized an entire philosophy, seen how ideas connect across chapters, and developed a mental model that guides your thinking.
This curated list covers the essential reads for system design, distributed systems, and software architecture. We’ve organized them by topic and skill level so you can pick what resonates with where you are in your journey.
Foundational System Design
Designing Data-Intensive Applications by Martin Kleppmann
This is the bible of distributed systems, and there’s no way around it. Kleppmann takes you through storage engines, replication strategies, partitioning schemes, transactions, and consensus algorithms — the core building blocks of every large-scale system you’ll encounter. The writing is unusually clear for such technical material; he explains not just the “what” but the “why” behind design decisions.
This book maps directly to Chapters 8–14 of this textbook, where we dive into databases, caching, and distributed consensus. If you only read one book from this list, make it this one. Best for intermediate and advanced engineers. Beginners should start elsewhere, but come back to this book as you progress.
System Design Interview Volumes 1 & 2 by Alex Xu
If you’re preparing for system design interviews, these volumes are invaluable. They walk you through real system designs step-by-step: YouTube, TikTok, Google Maps, Uber, Airbnb. Each design starts from blank paper and shows you the thinking process: identifying requirements, sketching the high-level architecture, diving into components, and discussing trade-offs.
These volumes map to Chapters 22–23, where we focus on putting everything together. The strength here is practicality and pattern recognition. You’ll see that many large-scale systems follow similar patterns with domain-specific variations. Best for all levels, but especially valuable for intermediate engineers preparing for interviews.
Web Scalability for Startup Engineers by Artur Ejsmont
This book hits the sweet spot between being practical and not overwhelming. Ejsmont covers horizontal scaling, caching layers, queuing systems, data pipelines, and monitoring — the patterns that startups actually use when they’re growing fast. It’s more accessible than academic texts but more grounded than blog posts.
Maps to Chapters 4–7 on scaling and infrastructure. If you’re building your first scaled system or moving beyond monolithic architecture, this is a great companion. Best for beginners to intermediate engineers who want practical guidance without deep theory.
Software Architecture
Clean Architecture by Robert C. Martin
Architecture isn’t just about servers and databases; it’s about how you organize code. Martin’s principles of component design, dependency rules, and clear architecture boundaries apply whether you’re building a monolith or microservices. He argues (persuasively) that your architecture should allow for easy testing, maintenance, and evolution.
Maps to Chapter 16 on system architecture patterns. This book is invaluable if you’re designing the internal structure of services, not just their infrastructure. Best for intermediate engineers who want their systems to remain maintainable as they scale.
Building Microservices by Sam Newman
The definitive guide to microservices: not just “should we use microservices?” but “how do we decompose systems, communicate between services, handle failures, and deploy them?” Newman focuses on organizational structure alongside technical patterns. He’s honest about the trade-offs — microservices aren’t always the answer.
Maps to Chapter 16 on distributed architectures. If you’re designing or migrating to microservices, this book is essential. Best for intermediate to advanced engineers responsible for architecture decisions.
Fundamentals of Software Architecture by Mark Richards & Neal Ford
This book surveys architecture styles (monolithic, event-driven, microservices, serverless), discusses patterns, and most importantly, teaches you how to analyze trade-offs. Not every system needs the same architecture. Richards and Ford show you how to think about architecture decisions in terms of scalability, performance, fault tolerance, and cost.
Best for intermediate engineers who want a comprehensive overview of architectural options. It’s broader and more concise than other books here, which makes it great for skimming and reference.
Distributed Systems Theory
Distributed Systems by Maarten van Steen & Andrew Tanenbaum
This is more academic than the books above, but it’s worth mentioning because it’s the most comprehensive treatment of consistency models, fault tolerance, leader election, and coordination. If you want to understand the theory behind the architectures you’re building, this is the deep dive.
Maps to Chapters 11–13 on distributed consensus and consistency. Best for advanced engineers or those preparing for technical leadership roles. Start with more practical books and come back to this one as you mature.
Understanding Distributed Systems by Roberto Vitillo
A more recent book that bridges practical and theoretical. Vitillo explains consistency, availability, replication, and consensus with a focus on what practitioners actually care about. Less academic than van Steen, but more rigorous than blog posts.
A great middle ground if DDIA feels overwhelming in places and you want another perspective on distributed systems concepts.
Operations and Reliability
Site Reliability Engineering: How Google Runs Production Systems
Available free online from Google. This book defines SLOs, error budgets, monitoring, alerting, and incident response. It’s opinionated (based on Google’s scale), but the principles translate across companies. You’ll learn why reliability isn’t just an engineering problem — it’s a business decision.
Maps to Chapters 17–18 on monitoring and operational excellence. Whether you’re an engineer or an engineering leader, understanding SRE principles changes how you approach reliability. Best for intermediate engineers and above.
The Phoenix Project by Gene Kim, Kevin Behr, George Spafford
Unlike other books here, this is written as a novel. You follow a plant manager (and IT manager) as they implement DevOps principles to save their company. It’s engaging and makes the principles stick because you see the consequences of poor practices in real time.
Not a detailed technical reference, but invaluable for understanding the cultural and organizational side of reliability. Best for engineers moving toward leadership or working in organizations struggling with deployment frequency and incident response.
Release It! by Michael Nygard
Nygard covers stability patterns (circuit breakers, bulkheads, timeouts, retry logic) and anti-patterns that cause production failures. Every pattern is illustrated with a real war story. If you’ve ever been paged at 2 AM because a failing service brought down your entire platform, you’ll appreciate his focus on real-world failure modes.
Maps to Chapter 17 on resilience patterns. Essential reading for anyone building systems where failures are costly. Best for intermediate engineers responsible for system reliability.
Databases
Database Internals by Alex Petrov
Want to understand how B-trees work? LSM trees? Write-ahead logs? Petrov goes deep into storage engine design, which is crucial for understanding why different databases behave differently. This book will change how you think about database selection and tuning.
Maps to Chapters 8–9 on storage systems. This is specialized — you don’t need it for every career stage, but if you’re designing data systems or debugging performance issues, this book is invaluable. Best for advanced engineers or those specializing in data infrastructure.
Reading Order by Experience Level
Just starting out (0–2 years):
- Web Scalability for Startup Engineers
- System Design Interview (Vol. 1)
- Building Microservices (chapters 1–3)
- Site Reliability Engineering (Introduction and Chapter 2)
Intermediate (2–5 years):
- Designing Data-Intensive Applications (Chapters 1–6, skip deep consensus sections initially)
- System Design Interview (Both volumes)
- Building Microservices
- Clean Architecture
- Release It!
Advanced (5+ years):
- Designing Data-Intensive Applications (complete)
- Distributed Systems by van Steen & Tanenbaum
- Database Internals
- Fundamentals of Software Architecture
- Site Reliability Engineering (deep read with your team)
Pro tip: Don’t try to read all of these in order. Pick one that matches your current challenge or curiosity. Read 20 minutes a day consistently — the cumulative effect of reading one chapter a week compounds over a year. And don’t feel pressure to read cover-to-cover; it’s fine to jump to chapters that interest you, especially in reference-style books like Database Internals.