Popular Databases

The database landscape has exploded in recent years. Gone are the days when PostgreSQL or MySQL were your only options. Today, we choose databases based on our specific access patterns, consistency requirements, and scale expectations. This appendix provides a practical reference for the most commonly used databases in system design, organized by category.

Relational Databases

PostgreSQL

PostgreSQL is often called the “Swiss Army knife” of databases—it does nearly everything well. It’s open-source, ACID-compliant, and constantly evolving.

Key Features:

Mature and battle-tested (50+ years of database research built in)
Rich type system (JSON, arrays, custom types, composite types)
Powerful extensions ecosystem (PostGIS for geo, pg_trgm for fuzzy matching, TimescaleDB)
Full-text search capabilities
JSONB for document-like storage with indexing
Excellent query optimizer
Replication and high-availability solutions (streaming replication, logical replication)

When to Use: General-purpose applications, complex queries, strong ACID guarantees needed, moderate to large scale. If you’re unsure what database to pick, PostgreSQL is often the right starting point.

Typical Use Cases: SaaS platforms, financial systems, e-commerce backends, content management systems.

Considerations: Single-node writes can become a bottleneck at extreme scale. Horizontal sharding is manual and complex. Slower than specialized databases for specific workloads.

MySQL

MySQL remains the most widely deployed database in the world, particularly for web applications. InnoDB is the default storage engine and provides ACID compliance.

Key Features:

Easy to set up and operate
Synchronous replication (master-slave, master-master)
InnoDB storage engine with row-level locking
Reasonable performance for most web workloads
Mature tooling and broad community knowledge

When to Use: Web applications, content platforms, anything where you need reliable replication but don’t need PostgreSQL’s advanced features.

Typical Use Cases: WordPress, Drupal, custom web applications, payment processing.

Considerations: Historically slower than PostgreSQL for complex queries. InnoDB can struggle with very large tables. Sharding requires application-level coordination.

Amazon Aurora

Aurora is AWS’s cloud-native relational database. It’s compatible with MySQL and PostgreSQL but reimpleered from the ground up.

Key Features:

MySQL/PostgreSQL compatibility (you can migrate easily)
5x faster writes than MySQL, 3x faster than PostgreSQL
Shared storage layer (separates compute from storage)
Automatic scaling—storage grows automatically up to 128 TB
Read replicas are nearly free (share storage layer)
Multi-AZ failover in under 30 seconds
Serverless option (Aurora Serverless v2) auto-scales compute

When to Use: AWS-native architectures, applications needing high availability with low operational overhead, variable or unpredictable workloads (with Serverless).

Typical Use Cases: Cloud-native web applications, SaaS products on AWS, high-performance OLTP systems.

Considerations: AWS lock-in. More expensive than self-managed PostgreSQL/MySQL. Backups are automatic but tied to AWS (not portable).

Document Databases

MongoDB

MongoDB is the most popular document database. Data is stored as JSON-like documents within collections, offering flexible schema.

Key Features:

Flexible, schema-less design (each document can differ)
Horizontal scaling via sharding (data distributed by shard key)
Aggregation pipeline for complex data transformations
Multi-document ACID transactions (as of version 4.0)
Replication set for high availability
Rich indexing options
Full-text search

When to Use: When your data structure varies or evolves frequently, when you want quick prototyping without schema design, rapid development cycles.

Typical Use Cases: Mobile app backends, IoT data collection, user profiles with variable attributes, content platforms.

Considerations: Write operations are slower than some alternatives. Transactions incur overhead. Schema flexibility can become a liability if not disciplined—data becomes inconsistent over time. Over-normalization defeats the purpose.

Pro Tip: Be intentional about your data model even in MongoDB. Flexibility is powerful but can hide data quality issues.

Amazon DynamoDB

DynamoDB is AWS’s fully managed NoSQL service. You don’t manage servers or capacity—AWS handles it all.

Key Features:

Fully managed (serverless)
Single-digit millisecond latency at any scale
Auto-scaling based on read/write capacity
Pay-per-request or provisioned capacity options
Built-in replication across Availability Zones
Global Secondary Indexes (GSIs) for alternate query patterns
TTL for automatic data expiration
Streams for change data capture

When to Use: AWS-native architectures, applications with unpredictable traffic, when you want zero operational overhead for the database.

Typical Use Cases: Real-time analytics dashboards, user sessions, leaderboards, notifications, IoT sensor data.

Considerations: Limited query flexibility (must think in terms of partition keys and sort keys). No complex joins or transactions across partitions. Pricing can surprise you if you’re not careful with capacity planning.

Wide-Column Stores

Apache Cassandra

Cassandra is a distributed, peer-to-peer database designed for massive scale. Every node is equal—no single point of failure.

Key Features:

Masterless architecture (every node can accept writes)
High write throughput (designed for writes, not reads)
Tunable consistency (you choose consistency level per query)
Automatic replication across data centers
No single point of failure
CQL (Cassandra Query Language) similar to SQL
Good for time-series and event data

When to Use: When you need to handle millions of writes per second, when you must have multi-datacenter replication, when you accept eventual consistency.

Typical Use Cases: Time-series metrics (monitoring, IoT), event logging, analytics pipelines, activity tracking.

Considerations: Operational complexity is high. Eventual consistency requires application handling of duplicates and out-of-order events. Reads are slower than writes. Need expertise to tune and operate well.

Apache HBase

HBase sits on top of Hadoop’s HDFS and is designed for random read/write access to very large tables.

Key Features:

Distributed column-oriented store
Strong consistency
Excellent for random read/write patterns
Integrates with Hadoop ecosystem
Good compression ratios for sparse data
Replication via HDFS

When to Use: When you’re already in the Hadoop ecosystem, when you need strong consistency with millions of rows, when you want column-oriented compression.

Typical Use Cases: Time-series data in Hadoop, large-scale analytics, messaging platforms.

Considerations: Operational overhead is substantial. Slower than Cassandra for pure write throughput. Usually requires expertise in Hadoop.

Key-Value Stores

Redis

Redis is an in-memory data structure store. It’s fast—sub-millisecond latency—but not persistent by default.

Key Features:

Ultra-fast (in-memory, single-threaded event loop)
Rich data structures (strings, lists, sets, sorted sets, streams, geospatial)
Pub/sub messaging
Lua scripting for atomic operations
Persistence options (RDB snapshots, AOF journaling)
Clustering for distribution
Replication for high availability
Key expiration/TTL built-in

When to Use: Caching, sessions, real-time leaderboards, rate limiting, message queues, feature flags.

Typical Use Cases: Session storage, database caching, leaderboards in gaming, real-time notifications, shopping carts, pub/sub messaging.

Considerations: In-memory means RAM is the bottleneck. Data loss risk if persistence isn’t configured. Not suitable for large datasets that don’t fit in memory. Single-threaded by default (though Redis 6.0+ has threading).

Pro Tip: Use Redis for “hot” data only. It’s perfect as a cache-aside layer in front of a persistent database.

Memcached

Memcached is simpler than Redis—a distributed memory cache, nothing more.

Key Features:

Simple key-value caching
Multi-threaded
Distributed across servers
No persistence
Lower CPU overhead than Redis
Excellent for memcache adoption across many platforms

When to Use: Simple caching, when you don’t need data structures, when you want minimal operational overhead.

Typical Use Cases: Database result caching, rendered page caching, expensive computation caching.

Considerations: No persistence means total data loss on restart. No pub/sub or complex operations. Redis has replaced Memcached for most new projects.

Time-Series Databases

InfluxDB

InfluxDB is purpose-built for metrics and events. Time is a first-class citizen.

Key Features:

Optimized for timestamped data ingestion
InfluxQL and Flux query languages
Downsampling to keep old data compact
Retention policies (auto-delete old data)
High write throughput
Built-in clustering (InfluxDB Enterprise)
Schema-less (tags and fields)

When to Use: Metrics collection, monitoring, application telemetry, sensor data.

Typical Use Cases: Infrastructure monitoring, application performance monitoring (APM), IoT sensor data, stock prices.

Considerations: The open-source version is single-node. Clustering requires Enterprise edition. Query performance on large time ranges can be slow.

TimescaleDB

TimescaleDB is a PostgreSQL extension for time-series data. You get all of PostgreSQL plus time-series optimizations.

Key Features:

PostgreSQL extension (runs as a PostgreSQL table)
Full SQL support for time-series queries
Automatic partitioning by time
Compression for historical data
Continuous aggregates for rolling calculations
Full relational capabilities

When to Use: When you’re already using PostgreSQL, when you need SQL for time-series analysis, when you want the flexibility of a relational database.

Typical Use Cases: Infrastructure monitoring in PostgreSQL environments, financial time-series, production metrics alongside relational data.

Considerations: Single-node (horizontal scaling is paid/cloud). Less specialized than InfluxDB. Requires PostgreSQL expertise.

Graph Databases

Neo4j

Neo4j stores data as relationships between nodes. The query language Cypher is intuitive for relationship-heavy data.

Key Features:

Relationship-first data model
Cypher query language (intuitive for graph traversals)
ACID transactions
Property graph model (nodes and edges have attributes)
Pattern matching in queries
High performance for relationship queries
Community and Enterprise editions

When to Use: Social networks, recommendation engines, knowledge graphs, identity and access management, master data management.

Typical Use Cases: LinkedIn-like networks, “friends of friends” queries, product recommendations, fraud detection (relationship patterns).

Considerations: Operational complexity is moderate. Not ideal for non-relationship data. Licensing can be expensive for Enterprise. Learning Cypher has a curve.

Pro Tip: Graph databases shine when relationships are as important as the data itself. If your queries rarely traverse relationships, a relational database is simpler.

Search Engines

Elasticsearch

Elasticsearch is a distributed search and analytics engine built on Lucene. You can index, search, and analyze massive volumes of data.

Key Features:

Full-text search
Real-time analytics
Distributed and scalable
REST API
Schema flexibility (mappings)
Aggregations for bucketing and statistics
Part of the ELK Stack (Elasticsearch, Logstash, Kibana)
Machine learning features (anomaly detection, forecasting)

When to Use: Log analysis, full-text search, application search functionality, real-time analytics dashboards.

Typical Use Cases: ELK logging stacks, product search in e-commerce, document search, metrics analysis.

Considerations: Not a primary data store (no ACID). Operational complexity is high (cluster management, shard balancing). Expensive at scale. Licensing recently changed (now source-available with restrictions).

NewSQL Databases

CockroachDB

CockroachDB is a distributed SQL database with strong consistency. It mimics PostgreSQL’s wire protocol but scales horizontally.

Key Features:

Distributed and highly available (no single point of failure)
Strong consistency with SERIALIZABLE isolation
PostgreSQL compatibility (same wire protocol)
Automatic sharding
Multi-region replication
SQL interface (standard relational model)
ACID transactions across any number of shards

When to Use: When you need horizontal scalability of NoSQL but want the simplicity and consistency of SQL. Financial systems that need strong consistency.

Typical Use Cases: Geo-distributed financial systems, global product catalogs, systems requiring strict ACID across shards.

Considerations: Still smaller community than PostgreSQL. Less operational knowledge available. Transactions have higher latency than single-node databases.

Google Spanner

Spanner is Google’s globally distributed database with strong consistency. It uses atomic clocks (TrueTime) to ensure external consistency.

Key Features:

Globally distributed (strong consistency across datacenters)
SQL interface
ACID transactions
Automatic sharding and replication
Synchronous replication (high availability)
Extremely high availability (99.999%)

When to Use: When you need global distribution with strong consistency, when you want a fully managed database, when you’re in the GCP ecosystem.

Typical Use Cases: Global financial platforms, globally distributed applications requiring strong consistency.

Considerations: Google Cloud only. High operational cost. Overkill for applications that don’t truly need global distribution with consistency.

Database Comparison Matrix

Database	Type	Consistency	Scaling	Best For	Hosted Option
PostgreSQL	Relational	ACID	Vertical (sharding manual)	General-purpose, complex queries	AWS RDS, Azure, GCP Cloud SQL, Heroku
MySQL	Relational	ACID	Vertical (sharding manual)	Web applications, legacy systems	AWS RDS, Azure, GCP Cloud SQL, PlanetScale
Aurora	Relational	ACID	Horizontal (storage auto-scales)	AWS-native, high availability	AWS Aurora (managed)
MongoDB	Document	Eventually consistent (now ACID with multi-doc)	Horizontal (sharding)	Flexible schema, rapid iteration	MongoDB Atlas (managed)
DynamoDB	Document/Key-value	Eventually consistent	Horizontal (auto)	AWS serverless, unpredictable traffic	AWS DynamoDB (fully managed)
Cassandra	Wide-column	Tunable	Horizontal (peer-to-peer)	Massive write throughput, time-series	Astra (DataStax managed)
HBase	Wide-column	Strong	Horizontal (via HDFS)	Hadoop integration, analytics	Cloudera, Hortonworks (managed)
Redis	Key-value	Eventual (with replication)	Horizontal (clustering)	Caching, real-time, sessions	AWS ElastiCache, Redis Cloud
Memcached	Key-value	None (cache)	Horizontal	Simple caching	AWS ElastiCache, cloud providers
InfluxDB	Time-series	Eventual	Horizontal (Enterprise)	Metrics, monitoring, telemetry	InfluxDB Cloud (managed)
TimescaleDB	Time-series	ACID	Vertical (scaling paid)	Time-series in PostgreSQL	Managed via Timescale Cloud
Neo4j	Graph	ACID	Vertical (Enterprise horizontal)	Social networks, recommendations	Neo4j Aura (managed)
Elasticsearch	Search	Eventual	Horizontal	Full-text search, log analysis	Elastic Cloud (managed)
CockroachDB	Relational	ACID	Horizontal	SQL at scale, strong consistency	CockroachDB Cloud (managed)
Spanner	Relational	ACID	Horizontal (global)	Global distribution, strong consistency	Google Cloud Spanner (managed)

Choosing the Right Database

Start with the access pattern. Ask yourself:

How will I query the data most frequently?
What’s my consistency requirement?
How much data will I store?
What’s my throughput requirement?

Consider the team. Choose databases your team knows or can learn. Operational expertise matters enormously.

Prefer boring, proven technology. PostgreSQL and MySQL are the safe default. They solve most problems adequately. Choose a specialized database only when your workload truly demands it.

Plan for scale, but don’t over-engineer. Most systems grow into their database choice over months or years. Start simple, monitor, and evolve.

Evaluate managed services. Cloud-managed databases (RDS, DynamoDB, Atlas) trade control for operational simplicity. The operational savings often justify the cost.

Key Takeaways

Relational databases (PostgreSQL, MySQL, Aurora) remain the default choice for most applications. They’re mature, well-understood, and handle complex queries well.
Document databases (MongoDB, DynamoDB) suit applications with flexible or evolving schemas and rapid development cycles.
Wide-column stores (Cassandra, HBase) are necessary only at extreme scale or with specific patterns (high write throughput, multi-region).
Key-value stores (Redis) are essential for caching, sessions, and real-time data—not primary stores.
Time-series databases (InfluxDB, TimescaleDB) are purpose-built for metrics and events. Use them if that’s your primary use case.
Graph databases (Neo4j) shine when relationships are as important as data itself.
Search engines (Elasticsearch) are for full-text search and analytics, not ACID transactions.
NewSQL (CockroachDB, Spanner) solve the distributed SQL problem but add operational complexity.

Choose the simplest database that satisfies your requirements. You can always migrate later, but doing so is expensive.