Time-Series Databases
The Monitoring Challenge at Scale
Imagine you’re building a monitoring platform for a cloud infrastructure provider. Your customers have 10,000 servers spread across multiple regions. Each server emits metrics every 10 seconds: CPU utilization, memory usage, disk I/O, network throughput. That’s 4 metrics × 10,000 servers = 40,000 data points every 10 seconds. Extrapolate that: 4 million writes per minute, 240 million per hour, 5.7 billion per day.
Try storing this in PostgreSQL, MySQL, or any general-purpose relational database. You’ll quickly hit write saturation. Your indexes bloat. Storage balloons. Query latency spirals. Old monitoring data stays around indefinitely, consuming disk space. You need something fundamentally different—a database purpose-built for time-series workloads.
Welcome to time-series databases (TSDBs).
This chapter builds on our coverage of storage engines and columnar formats from earlier sections. We’ll explore what makes TSDBs different, their internal architectures, when to use them, and how to design systems around them. By the end, you’ll understand why companies like Netflix, Uber, and Google rely on specialized time-series storage rather than forcing time-series data into general-purpose systems.
Understanding Time-Series Data
Time-series data is fundamentally different from transactional or analytical data. Let’s define its characteristics:
What is time-series data? A sequence of observations indexed by timestamp. Examples: stock prices, sensor readings, application metrics, user events, financial ticks, system logs, weather measurements. Each data point has a timestamp and one or more values.
Key characteristics:
-
Append-heavy writes — Data arrives in time order. You rarely update or delete historical readings. It’s almost always insert-only. This is unlike transactional systems where updates are common.
-
Time-ordered — Data naturally arrives sorted by timestamp. Queries almost always filter by time range. Sequential access patterns dominate.
-
High cardinality — Many unique labels or tags. A monitoring metric might have cardinality across servers, regions, instance types, application versions. A single metric name can have millions of unique tag combinations.
-
Natural expiration — Old data becomes less valuable. You might keep raw data for 7 days but aggregated data for a year. Data naturally ages out of the system.
-
High write volume, lower query diversity — Writes are massive and uniform. Queries follow predictable patterns: “What was the 95th percentile CPU in region US-EAST over the last hour?” or “Show me all error rate spikes in the last day.”
Why general-purpose databases fail:
A traditional relational database optimizes for random access and flexible queries. It uses B-tree indexes that work well for high-selectivity queries (“find the customer with id=42”). But time-series workloads are different:
- Write amplification — Each insert triggers index updates. With millions of inserts per minute across thousands of tag combinations, index maintenance becomes the bottleneck.
- Index overhead — B-tree indexes assume random access. Time-series access is sequential. You pay overhead you don’t need.
- Storage bloat — No built-in data expiration. Old data accumulates, and compacting historical data is expensive.
- Compression miss — General databases don’t compress intelligently. Time-series values are often highly correlated (CPU usage changes gradually), but standard compression ignores this.
What makes a time-series database different:
- Columnar storage — Store all values for a metric in a column, not as rows. This enables efficient compression and vectorized queries.
- Time-based partitioning — Organize data by time windows (hourly, daily). Enables efficient retention policies and parallel querying.
- Specialized compression — Delta encoding (store differences, not absolute values), run-length encoding, gorilla compression for floats. Time-series values compress 10-100x better than generic data.
- Downsampling and rollups — Automatically reduce precision for old data. Keep 1-minute granularity for the last 7 days, but aggregate to hourly for the last year.
- LSM trees instead of B-trees — Optimize for write-heavy workloads with sequential disk I/O, not random access.
The Weather Station Analogy
Imagine a weather station that records temperature, humidity, and pressure every hour. Each reading has a timestamp. You never go back and change yesterday’s 3 PM temperature—it’s immutable. The logbook is optimized for questions like “What was the temperature between 2 PM and 4 PM yesterday?” or “Show me the average humidity for every day in March.”
But it’s terrible for questions like “Find all readings where temperature equals 72°F exactly” (random access, not time-ordered). And old readings don’t stay at full granularity forever. After a month, you discard the hourly data and keep only daily averages. After a year, you summarize to monthly values.
That’s time-series thinking. Your database should reflect this pattern.
Storage Architecture and Compression
The magic of modern TSDBs is in how they store and compress data. Let’s dive deeper.
LSM Trees vs. B-Trees
We covered LSM (Log-Structured Merge) trees briefly in earlier storage chapters. They’re the default for time-series systems.
B-trees (traditional relational databases):
- Organized for random access
- Every insert walks the tree, touching multiple blocks
- With millions of inserts per second, disk I/O becomes random and costly
- Great for “find this specific key” queries
LSM trees (TSDBs):
- Optimized for sequential writes
- Inserts go to an in-memory buffer (memtable)
- When the buffer fills, it’s flushed to disk as a sorted run
- Multiple runs are periodically merged (compaction)
- Reads check the memtable first, then runs in order
- Downside: Compaction can create write amplification, but it’s predictable and manageable
For time-series, writes arrive in chronological order. LSM trees exploit this: data is naturally sorted by timestamp, compaction is straightforward, and I/O patterns are sequential.
Time-Based Partitioning
TSDBs partition data by time window. InfluxDB creates a shard per time interval (e.g., 24 hours). Prometheus stores data in 2-hour blocks. TimescaleDB on PostgreSQL uses hypertables that are automatically partitioned by time.
Benefits:
- Efficient deletion — Drop a whole shard to expire old data. No expensive row-by-row deletes.
- Parallel processing — Query different time partitions in parallel.
- Tiered storage — Move recent shards to fast storage, older shards to cheaper storage.
- Easier to reason about — Each shard is independent.
Compression Techniques
Time-series data compresses remarkably well. Here’s why and how:
Delta encoding — Instead of storing absolute values, store the difference from the previous value. CPU at time T=0 is 65%. At T=1 it’s 66%. Store “1” instead of “66”. Since metrics change gradually, deltas are tiny.
Run-length encoding — If a value repeats (e.g., disk utilization is stuck at 80%), store it once with a repeat count instead of repeating it millions of times.
Gorilla compression — Designed for floating-point time-series (financial data, sensor readings). It stores:
- The XOR of the value with the previous value (many floats are similar, so XOR is sparse)
- Variable-length encoding of the result
- Achieves 1.37 bytes per data point for typical financial time-series data
A typical time-series TSDB achieves 10-100x compression on raw data. A metric stored at 1-minute granularity for a year might occupy only a few gigabytes instead of hundreds.
Compaction and Downsampling
Compaction happens automatically. As writes arrive and fill in-memory buffers, the TSDB flushes them to disk as immutable sorted files. Periodically, multiple files are merged into fewer, larger files. This is similar to LSM tree compaction but organized by time.
Downsampling or rollups is different—it’s about reducing precision for old data. A TSDB might run a background job: “For data older than 7 days, compute 1-hour aggregates (sum, avg, max, min, etc.) and discard the raw 1-minute data.” This saves storage while preserving the ability to answer questions like “Was there an issue 2 weeks ago?” at lower granularity.
Popular Time-Series Databases
Let’s compare the major players:
| Database | Engine | Model | Best For | Notes |
|---|---|---|---|---|
| InfluxDB | TSM (custom LSM variant) | Proprietary | Cloud-native metrics, IoT | Built-in downsampling, excellent query language (InfluxQL, Flux) |
| Prometheus | Custom in-memory + disk | Pull-based | Infrastructure monitoring | Scrapes metrics from targets, local storage per instance, simple and reliable |
| TimescaleDB | PostgreSQL extension | Relational + time-series | Hybrid workloads | Familiar SQL, excellent for mixed transactional + time-series |
| ClickHouse | MergeTree family | Columnar OLAP | Analytics on time-series | Extreme compression, billions of rows per second, SQL interface |
| QuestDB | Custom columnar | Columnar TSDB | High-frequency data | Ultra-low latency writes and queries, designed for modern hardware |
| Apache Druid | Columnar, segment-based | Real-time analytics | Ad-hoc querying on time-series | Strong aggregation queries, sub-second latency |
Choosing between them:
- Self-hosted and simple? Prometheus for metrics, InfluxDB for general time-series.
- Managed cloud service? InfluxDB Cloud or AWS Timestream.
- Already using PostgreSQL? TimescaleDB is a natural fit.
- Extreme write scale (millions/sec)? ClickHouse or QuestDB.
- Analytics with SQL? ClickHouse or Druid.
Query Patterns and Performance
Time-series databases are optimized for specific query patterns:
Range queries — “Give me all CPU readings for server-42 between 2 PM and 3 PM”
SELECT timestamp, cpu_usage
FROM metrics
WHERE host = 'server-42' AND timestamp BETWEEN '14:00' AND '15:00';
Time-based partitioning and columnar storage make these blazing fast. The TSDB loads only the relevant partition and CPU column.
Aggregations with time windows — “Show me average memory usage per minute for the last hour”
SELECT TIME_BUCKET('1 minute', timestamp) AS minute, AVG(memory_usage)
FROM metrics
WHERE timestamp > NOW() - INTERVAL '1 hour'
GROUP BY minute;
TSDBs compute aggregates efficiently by scanning a single column in order.
Last-value queries — “What’s the current CPU for all servers?”
SELECT host, LAST(cpu_usage)
FROM metrics
WHERE timestamp > NOW() - INTERVAL '1 minute'
GROUP BY host;
Optimized with in-memory caches or special indexing.
High-cardinality GROUP BY — “Show error rate by (region, service, error_type)”
This is harder. If you have 100 regions × 500 services × 1000 error types = 50 million combinations, GROUP BY becomes expensive. Many TSDBs use HyperLogLog or other cardinality approximations.
Pro tip: When designing time-series systems, think about your query patterns first. A TSDB optimized for range + aggregation queries might struggle with ad-hoc GROUP BY queries across high cardinality dimensions.
Managing Cardinality
Cardinality is the number of unique combinations of tag values. In a monitoring system:
Metric: http_request_latency
Tags: region (5 values) × service (200 values) × endpoint (1000 values) × status_code (5 values)
Total combinations: 5M
High cardinality is the Achilles heel of time-series systems:
- Memory overhead — In-memory indexes to track all combinations consume RAM.
- Query complexity — Queries across high-cardinality dimensions become expensive.
- Cardinality explosion — Adding new tags can multiply combinations exponentially.
Best practices:
- Avoid unbounded tag values (e.g., user_id as a tag is dangerous if you have millions of users).
- Use bounded, low-cardinality tags (e.g., region, environment, service_name).
- For high-cardinality data (like user IDs), use fields/columns instead of tags, or store in a separate system.
- Monitor cardinality growth. Some TSDBs have built-in alerts for cardinality explosions.
Retention, Tiering, and Cost
Time-series data is voluminous. Managing lifecycle and cost is critical.
Retention policies — TSDBs support automatic data expiration:
Keep raw data for 7 days
Keep 1-hour aggregates for 30 days
Keep daily aggregates for 1 year
Keep monthly aggregates forever
Tiered storage — Hot data (recent) on SSDs. Warm data (2-4 weeks old) on cheaper storage. Cold data on archival storage. Some TSDBs automate this; others require manual scripting.
Compression vs. compute — Gorilla compression is great but requires decompression on read. ClickHouse’s compression is heavier but reads faster. Trade-off based on your read pattern.
Real-World Example: Infrastructure Monitoring Stack
Let’s build a monitoring system for a 5,000-server infrastructure:
Architecture:
- Prometheus on each region (scrapes metrics from servers every 15 seconds)
- Remote write to a centralized InfluxDB or cloud TSDB for long-term storage
- Grafana for visualization and dashboards
- Alertmanager for notification
Metrics to track:
system.cpu.usage (gauge, %)
system.memory.usage (gauge, %)
system.disk.io.read_bytes (counter)
http.request.latency (histogram, ms)
application.errors (counter)
Cardinality design:
Tags: region (10), environment (3: dev/staging/prod), service (100), instance (5000 max)
Max combinations: 10 × 3 × 100 × 5000 = 15M
This is manageable if instance is the only truly variable tag.
Query example (PromQL):
rate(http_request_latency_sum[5m]) / rate(http_request_latency_count[5m])
Computes the 5-minute average latency by dividing cumulative sum by request count.
Retention:
- Raw 15-second data: 7 days
- 1-minute aggregates: 90 days
- Daily aggregates: 2 years
Daily cost estimate: 5000 servers × 30 metrics × 86,400 seconds/day ≈ 13B data points/day. At ~$5 per million data points, roughly $65/day or $2000/month.
When NOT to Use a Time-Series Database
TSDBs are powerful but not a silver bullet:
Don’t use if:
- Your data isn’t time-ordered — TSDBs assume append-only, ordered writes. If you’re constantly updating historical records, use PostgreSQL.
- Your dataset is small — A TSDB adds operational complexity. If you have only millions of data points (not billions), a simple relational database works fine.
- You need complex relational queries — “Find all orders placed by users in region X who bought products in category Y and returned them within 30 days.” That’s transactional analysis, not time-series.
- You need strong ACID transactions — TSDBs prioritize write throughput over transactional semantics.
- Cardinality is unbounded — If every data point has a unique ID or label, TSDBs will struggle. Use blob storage or a data lake instead.
Key Takeaways
-
Time-series workloads are fundamentally different — Append-only, time-ordered, high write volume. General-purpose databases struggle. Purpose-built TSDBs optimize for these patterns using LSM trees, columnar storage, and intelligent compression.
-
Compression is dramatic — Delta encoding, run-length encoding, and specialized algorithms like Gorilla achieve 10-100x compression. Time-series data is highly redundant; exploit it.
-
Cardinality is the silent killer — High cardinality (millions of tag combinations) breaks many TSDBs. Design your tag schemas carefully. Bounded, low-cardinality tags are your friend.
-
Retention and tiering save costs — Automatically downsampling and expiring old data can reduce storage by 90%. Use tiered storage (hot/warm/cold) strategically.
-
Choose the right TSDB for your use case — Prometheus for pull-based metrics. InfluxDB for push-based general time-series. TimescaleDB if you have hybrid workloads. ClickHouse if you need analytics at scale.
-
Query patterns matter — TSDBs are optimized for range queries and aggregations. Ad-hoc queries across high cardinality dimensions are slow. Design your data model with your queries in mind.
Practice Scenarios
Scenario 1: IoT Sensor Mesh
You’re building a platform for managing 100,000 IoT devices spread across 50 cities. Each device sends temperature, humidity, and pressure readings every 30 seconds. Design the TSDB architecture:
- How would you partition the data?
- What tags would you use? (Watch cardinality!)
- What retention policy makes sense?
- How would you handle “give me the average temperature for each neighborhood” queries?
Scenario 2: Financial Tick Data
You’re storing stock market tick data: every trade for 5,000 securities. Peak volume is 5 million ticks per second. Design the system:
- Would Prometheus work here? Why or why not?
- What compression technique would you use for floating-point prices?
- How would you handle burst writes during market opens/closes?
- What’s your retention policy? (SEC regulations require 3-year history.)
Scenario 3: Multi-Tenant Observability
You’re building a SaaS observability platform. Each customer can emit metrics, logs, and traces. Customers range from startups (1000 data points/day) to enterprises (10B/day). You need to bill based on usage.
- How do you isolate customer data? (TSDB per customer or shared?)
- How do you prevent one customer’s cardinality explosion from affecting others?
- How do you fairly allocate compaction costs?
- What’s your SLA for query latency?
Looking Ahead
Time-series databases are the backbone of modern observability. In the next chapter, we’ll explore search engines and full-text search systems—another specialized database category. While TSDBs optimize for numeric aggregations and time ranges, search engines optimize for keyword matching and relevance ranking. Both are purpose-built for different patterns. Together with the relational systems we covered earlier, they form the foundation of a robust data layer.
As systems grow, you’ll use all these tools in concert: PostgreSQL for transactional data, Elasticsearch for search, and a TSDB for metrics. Understanding when and how to apply each is the mark of a mature system design.