Popular Databases
The database landscape has exploded in recent years. Gone are the days when PostgreSQL or MySQL were your only options. Today, we choose databases based on our specific access patterns, consistency requirements, and scale expectations. This appendix provides a practical reference for the most commonly used databases in system design, organized by category.
Relational Databases
PostgreSQL
PostgreSQL is often called the “Swiss Army knife” of databases—it does nearly everything well. It’s open-source, ACID-compliant, and constantly evolving.
Key Features:
- Mature and battle-tested (50+ years of database research built in)
- Rich type system (JSON, arrays, custom types, composite types)
- Powerful extensions ecosystem (PostGIS for geo, pg_trgm for fuzzy matching, TimescaleDB)
- Full-text search capabilities
- JSONB for document-like storage with indexing
- Excellent query optimizer
- Replication and high-availability solutions (streaming replication, logical replication)
When to Use: General-purpose applications, complex queries, strong ACID guarantees needed, moderate to large scale. If you’re unsure what database to pick, PostgreSQL is often the right starting point.
Typical Use Cases: SaaS platforms, financial systems, e-commerce backends, content management systems.
Considerations: Single-node writes can become a bottleneck at extreme scale. Horizontal sharding is manual and complex. Slower than specialized databases for specific workloads.
MySQL
MySQL remains the most widely deployed database in the world, particularly for web applications. InnoDB is the default storage engine and provides ACID compliance.
Key Features:
- Easy to set up and operate
- Synchronous replication (master-slave, master-master)
- InnoDB storage engine with row-level locking
- Reasonable performance for most web workloads
- Mature tooling and broad community knowledge
When to Use: Web applications, content platforms, anything where you need reliable replication but don’t need PostgreSQL’s advanced features.
Typical Use Cases: WordPress, Drupal, custom web applications, payment processing.
Considerations: Historically slower than PostgreSQL for complex queries. InnoDB can struggle with very large tables. Sharding requires application-level coordination.
Amazon Aurora
Aurora is AWS’s cloud-native relational database. It’s compatible with MySQL and PostgreSQL but reimpleered from the ground up.
Key Features:
- MySQL/PostgreSQL compatibility (you can migrate easily)
- 5x faster writes than MySQL, 3x faster than PostgreSQL
- Shared storage layer (separates compute from storage)
- Automatic scaling—storage grows automatically up to 128 TB
- Read replicas are nearly free (share storage layer)
- Multi-AZ failover in under 30 seconds
- Serverless option (Aurora Serverless v2) auto-scales compute
When to Use: AWS-native architectures, applications needing high availability with low operational overhead, variable or unpredictable workloads (with Serverless).
Typical Use Cases: Cloud-native web applications, SaaS products on AWS, high-performance OLTP systems.
Considerations: AWS lock-in. More expensive than self-managed PostgreSQL/MySQL. Backups are automatic but tied to AWS (not portable).
Document Databases
MongoDB
MongoDB is the most popular document database. Data is stored as JSON-like documents within collections, offering flexible schema.
Key Features:
- Flexible, schema-less design (each document can differ)
- Horizontal scaling via sharding (data distributed by shard key)
- Aggregation pipeline for complex data transformations
- Multi-document ACID transactions (as of version 4.0)
- Replication set for high availability
- Rich indexing options
- Full-text search
When to Use: When your data structure varies or evolves frequently, when you want quick prototyping without schema design, rapid development cycles.
Typical Use Cases: Mobile app backends, IoT data collection, user profiles with variable attributes, content platforms.
Considerations: Write operations are slower than some alternatives. Transactions incur overhead. Schema flexibility can become a liability if not disciplined—data becomes inconsistent over time. Over-normalization defeats the purpose.
Pro Tip: Be intentional about your data model even in MongoDB. Flexibility is powerful but can hide data quality issues.
Amazon DynamoDB
DynamoDB is AWS’s fully managed NoSQL service. You don’t manage servers or capacity—AWS handles it all.
Key Features:
- Fully managed (serverless)
- Single-digit millisecond latency at any scale
- Auto-scaling based on read/write capacity
- Pay-per-request or provisioned capacity options
- Built-in replication across Availability Zones
- Global Secondary Indexes (GSIs) for alternate query patterns
- TTL for automatic data expiration
- Streams for change data capture
When to Use: AWS-native architectures, applications with unpredictable traffic, when you want zero operational overhead for the database.
Typical Use Cases: Real-time analytics dashboards, user sessions, leaderboards, notifications, IoT sensor data.
Considerations: Limited query flexibility (must think in terms of partition keys and sort keys). No complex joins or transactions across partitions. Pricing can surprise you if you’re not careful with capacity planning.
Wide-Column Stores
Apache Cassandra
Cassandra is a distributed, peer-to-peer database designed for massive scale. Every node is equal—no single point of failure.
Key Features:
- Masterless architecture (every node can accept writes)
- High write throughput (designed for writes, not reads)
- Tunable consistency (you choose consistency level per query)
- Automatic replication across data centers
- No single point of failure
- CQL (Cassandra Query Language) similar to SQL
- Good for time-series and event data
When to Use: When you need to handle millions of writes per second, when you must have multi-datacenter replication, when you accept eventual consistency.
Typical Use Cases: Time-series metrics (monitoring, IoT), event logging, analytics pipelines, activity tracking.
Considerations: Operational complexity is high. Eventual consistency requires application handling of duplicates and out-of-order events. Reads are slower than writes. Need expertise to tune and operate well.
Apache HBase
HBase sits on top of Hadoop’s HDFS and is designed for random read/write access to very large tables.
Key Features:
- Distributed column-oriented store
- Strong consistency
- Excellent for random read/write patterns
- Integrates with Hadoop ecosystem
- Good compression ratios for sparse data
- Replication via HDFS
When to Use: When you’re already in the Hadoop ecosystem, when you need strong consistency with millions of rows, when you want column-oriented compression.
Typical Use Cases: Time-series data in Hadoop, large-scale analytics, messaging platforms.
Considerations: Operational overhead is substantial. Slower than Cassandra for pure write throughput. Usually requires expertise in Hadoop.
Key-Value Stores
Redis
Redis is an in-memory data structure store. It’s fast—sub-millisecond latency—but not persistent by default.
Key Features:
- Ultra-fast (in-memory, single-threaded event loop)
- Rich data structures (strings, lists, sets, sorted sets, streams, geospatial)
- Pub/sub messaging
- Lua scripting for atomic operations
- Persistence options (RDB snapshots, AOF journaling)
- Clustering for distribution
- Replication for high availability
- Key expiration/TTL built-in
When to Use: Caching, sessions, real-time leaderboards, rate limiting, message queues, feature flags.
Typical Use Cases: Session storage, database caching, leaderboards in gaming, real-time notifications, shopping carts, pub/sub messaging.
Considerations: In-memory means RAM is the bottleneck. Data loss risk if persistence isn’t configured. Not suitable for large datasets that don’t fit in memory. Single-threaded by default (though Redis 6.0+ has threading).
Pro Tip: Use Redis for “hot” data only. It’s perfect as a cache-aside layer in front of a persistent database.
Memcached
Memcached is simpler than Redis—a distributed memory cache, nothing more.
Key Features:
- Simple key-value caching
- Multi-threaded
- Distributed across servers
- No persistence
- Lower CPU overhead than Redis
- Excellent for memcache adoption across many platforms
When to Use: Simple caching, when you don’t need data structures, when you want minimal operational overhead.
Typical Use Cases: Database result caching, rendered page caching, expensive computation caching.
Considerations: No persistence means total data loss on restart. No pub/sub or complex operations. Redis has replaced Memcached for most new projects.
Time-Series Databases
InfluxDB
InfluxDB is purpose-built for metrics and events. Time is a first-class citizen.
Key Features:
- Optimized for timestamped data ingestion
- InfluxQL and Flux query languages
- Downsampling to keep old data compact
- Retention policies (auto-delete old data)
- High write throughput
- Built-in clustering (InfluxDB Enterprise)
- Schema-less (tags and fields)
When to Use: Metrics collection, monitoring, application telemetry, sensor data.
Typical Use Cases: Infrastructure monitoring, application performance monitoring (APM), IoT sensor data, stock prices.
Considerations: The open-source version is single-node. Clustering requires Enterprise edition. Query performance on large time ranges can be slow.
TimescaleDB
TimescaleDB is a PostgreSQL extension for time-series data. You get all of PostgreSQL plus time-series optimizations.
Key Features:
- PostgreSQL extension (runs as a PostgreSQL table)
- Full SQL support for time-series queries
- Automatic partitioning by time
- Compression for historical data
- Continuous aggregates for rolling calculations
- Full relational capabilities
When to Use: When you’re already using PostgreSQL, when you need SQL for time-series analysis, when you want the flexibility of a relational database.
Typical Use Cases: Infrastructure monitoring in PostgreSQL environments, financial time-series, production metrics alongside relational data.
Considerations: Single-node (horizontal scaling is paid/cloud). Less specialized than InfluxDB. Requires PostgreSQL expertise.
Graph Databases
Neo4j
Neo4j stores data as relationships between nodes. The query language Cypher is intuitive for relationship-heavy data.
Key Features:
- Relationship-first data model
- Cypher query language (intuitive for graph traversals)
- ACID transactions
- Property graph model (nodes and edges have attributes)
- Pattern matching in queries
- High performance for relationship queries
- Community and Enterprise editions
When to Use: Social networks, recommendation engines, knowledge graphs, identity and access management, master data management.
Typical Use Cases: LinkedIn-like networks, “friends of friends” queries, product recommendations, fraud detection (relationship patterns).
Considerations: Operational complexity is moderate. Not ideal for non-relationship data. Licensing can be expensive for Enterprise. Learning Cypher has a curve.
Pro Tip: Graph databases shine when relationships are as important as the data itself. If your queries rarely traverse relationships, a relational database is simpler.
Search Engines
Elasticsearch
Elasticsearch is a distributed search and analytics engine built on Lucene. You can index, search, and analyze massive volumes of data.
Key Features:
- Full-text search
- Real-time analytics
- Distributed and scalable
- REST API
- Schema flexibility (mappings)
- Aggregations for bucketing and statistics
- Part of the ELK Stack (Elasticsearch, Logstash, Kibana)
- Machine learning features (anomaly detection, forecasting)
When to Use: Log analysis, full-text search, application search functionality, real-time analytics dashboards.
Typical Use Cases: ELK logging stacks, product search in e-commerce, document search, metrics analysis.
Considerations: Not a primary data store (no ACID). Operational complexity is high (cluster management, shard balancing). Expensive at scale. Licensing recently changed (now source-available with restrictions).
NewSQL Databases
CockroachDB
CockroachDB is a distributed SQL database with strong consistency. It mimics PostgreSQL’s wire protocol but scales horizontally.
Key Features:
- Distributed and highly available (no single point of failure)
- Strong consistency with SERIALIZABLE isolation
- PostgreSQL compatibility (same wire protocol)
- Automatic sharding
- Multi-region replication
- SQL interface (standard relational model)
- ACID transactions across any number of shards
When to Use: When you need horizontal scalability of NoSQL but want the simplicity and consistency of SQL. Financial systems that need strong consistency.
Typical Use Cases: Geo-distributed financial systems, global product catalogs, systems requiring strict ACID across shards.
Considerations: Still smaller community than PostgreSQL. Less operational knowledge available. Transactions have higher latency than single-node databases.
Google Spanner
Spanner is Google’s globally distributed database with strong consistency. It uses atomic clocks (TrueTime) to ensure external consistency.
Key Features:
- Globally distributed (strong consistency across datacenters)
- SQL interface
- ACID transactions
- Automatic sharding and replication
- Synchronous replication (high availability)
- Extremely high availability (99.999%)
When to Use: When you need global distribution with strong consistency, when you want a fully managed database, when you’re in the GCP ecosystem.
Typical Use Cases: Global financial platforms, globally distributed applications requiring strong consistency.
Considerations: Google Cloud only. High operational cost. Overkill for applications that don’t truly need global distribution with consistency.
Database Comparison Matrix
| Database | Type | Consistency | Scaling | Best For | Hosted Option |
|---|---|---|---|---|---|
| PostgreSQL | Relational | ACID | Vertical (sharding manual) | General-purpose, complex queries | AWS RDS, Azure, GCP Cloud SQL, Heroku |
| MySQL | Relational | ACID | Vertical (sharding manual) | Web applications, legacy systems | AWS RDS, Azure, GCP Cloud SQL, PlanetScale |
| Aurora | Relational | ACID | Horizontal (storage auto-scales) | AWS-native, high availability | AWS Aurora (managed) |
| MongoDB | Document | Eventually consistent (now ACID with multi-doc) | Horizontal (sharding) | Flexible schema, rapid iteration | MongoDB Atlas (managed) |
| DynamoDB | Document/Key-value | Eventually consistent | Horizontal (auto) | AWS serverless, unpredictable traffic | AWS DynamoDB (fully managed) |
| Cassandra | Wide-column | Tunable | Horizontal (peer-to-peer) | Massive write throughput, time-series | Astra (DataStax managed) |
| HBase | Wide-column | Strong | Horizontal (via HDFS) | Hadoop integration, analytics | Cloudera, Hortonworks (managed) |
| Redis | Key-value | Eventual (with replication) | Horizontal (clustering) | Caching, real-time, sessions | AWS ElastiCache, Redis Cloud |
| Memcached | Key-value | None (cache) | Horizontal | Simple caching | AWS ElastiCache, cloud providers |
| InfluxDB | Time-series | Eventual | Horizontal (Enterprise) | Metrics, monitoring, telemetry | InfluxDB Cloud (managed) |
| TimescaleDB | Time-series | ACID | Vertical (scaling paid) | Time-series in PostgreSQL | Managed via Timescale Cloud |
| Neo4j | Graph | ACID | Vertical (Enterprise horizontal) | Social networks, recommendations | Neo4j Aura (managed) |
| Elasticsearch | Search | Eventual | Horizontal | Full-text search, log analysis | Elastic Cloud (managed) |
| CockroachDB | Relational | ACID | Horizontal | SQL at scale, strong consistency | CockroachDB Cloud (managed) |
| Spanner | Relational | ACID | Horizontal (global) | Global distribution, strong consistency | Google Cloud Spanner (managed) |
Choosing the Right Database
Start with the access pattern. Ask yourself:
- How will I query the data most frequently?
- What’s my consistency requirement?
- How much data will I store?
- What’s my throughput requirement?
Consider the team. Choose databases your team knows or can learn. Operational expertise matters enormously.
Prefer boring, proven technology. PostgreSQL and MySQL are the safe default. They solve most problems adequately. Choose a specialized database only when your workload truly demands it.
Plan for scale, but don’t over-engineer. Most systems grow into their database choice over months or years. Start simple, monitor, and evolve.
Evaluate managed services. Cloud-managed databases (RDS, DynamoDB, Atlas) trade control for operational simplicity. The operational savings often justify the cost.
Key Takeaways
- Relational databases (PostgreSQL, MySQL, Aurora) remain the default choice for most applications. They’re mature, well-understood, and handle complex queries well.
- Document databases (MongoDB, DynamoDB) suit applications with flexible or evolving schemas and rapid development cycles.
- Wide-column stores (Cassandra, HBase) are necessary only at extreme scale or with specific patterns (high write throughput, multi-region).
- Key-value stores (Redis) are essential for caching, sessions, and real-time data—not primary stores.
- Time-series databases (InfluxDB, TimescaleDB) are purpose-built for metrics and events. Use them if that’s your primary use case.
- Graph databases (Neo4j) shine when relationships are as important as data itself.
- Search engines (Elasticsearch) are for full-text search and analytics, not ACID transactions.
- NewSQL (CockroachDB, Spanner) solve the distributed SQL problem but add operational complexity.
Choose the simplest database that satisfies your requirements. You can always migrate later, but doing so is expensive.