System Design Fundamentals

Object vs Block vs File Storage

A

Object vs Block vs File Storage

Introduction

Imagine your team is building a platform that handles four distinct storage challenges simultaneously: users uploading high-resolution photos that need to be served globally, application servers generating gigabytes of logs that must be archived long-term, virtual machines running production workloads that require low-latency disk I/O, and DevOps teams sharing configuration files across regions. How do you store each? Do you use the same infrastructure? The answer reveals why storage isn’t one-size-fits-all.

This chapter explores three fundamentally different storage paradigms that dominate cloud infrastructure and distributed systems. We’ll move beyond the database concepts you learned in Chapter 8 and examine the specialized storage layers that power scalable applications. Understanding when to use block, file, or object storage—and when to combine them—is essential for designing systems that balance cost, performance, and operational simplicity.

Understanding the Three Storage Paradigms

Before diving into implementation details, let’s establish clear definitions. These paradigms differ in how they abstract storage and the interface they expose to applications.

Block Storage treats storage as a collection of fixed-size blocks (typically 4 KB) arranged sequentially. The storage system has no awareness of data structure—it simply maintains which blocks are allocated to which volume. Think of it as renting raw disk space. An operating system or database manages the organization, creating filesystems or allocation tables on top. Block storage is accessed at the volume level, not the individual file level. When you mount a block device, you format it with a filesystem and then use that filesystem to store files.

File Storage adds a filesystem layer that manages a hierarchical namespace. Files and directories have metadata (ownership, permissions, modification time), and the storage system itself understands these concepts. File storage is accessed via network protocols like NFS (Network File System), SMB (Server Message Block), or CIFS (Common Internet File System). Multiple clients can mount the same filesystem simultaneously, making it ideal for shared access patterns.

Object Storage discards the filesystem hierarchy entirely. Instead, every piece of data is an “object”—a self-contained bundle containing the actual data plus metadata plus a unique identifier (typically an HTTP URL or a UUID). There are no directories, no files, no hierarchy. Objects are stored in a flat namespace and accessed via RESTful APIs (GET, PUT, DELETE). Each object is immutable once written; modifications mean creating a new object version.

Here’s a comparison of the access patterns:

Block Storage:    /dev/sda1 → [Filesystem] → /users/photos/vacation.jpg
File Storage:     Mount NFS → Browse /users/photos/ → Read/Write files
Object Storage:   PUT https://bucket.s3.amazonaws.com/users/photos/vacation.jpg

The Warehouse Analogy

To internalize these differences, imagine you’re renting warehouse space for different purposes.

With block storage, you rent an empty warehouse. You’re responsible for everything inside: installing shelving, organizing inventory, labeling boxes, tracking what’s where. The warehouse operator doesn’t care what you store or how you organize it. You get raw capacity. This flexibility means you can optimize the organization for your specific workloads. If you need to run a database with specific I/O patterns, block storage lets you tune everything. But it’s also your responsibility to manage it.

File storage is like renting from a company that already has a cabinet system installed—drawers, labels, indexing. You can store files in folders, create subfolders, and multiple people can access the same folder simultaneously. The cabinet company manages the basic organization and ensures nobody interferes with somebody else’s files (permissions). You get convenience at the cost of some flexibility. You can’t reorganize the cabinet structure itself; you work within the system provided.

Object storage is the valet parking service analogy. You hand over your item (the data). They tag it with a unique ticket (the object ID/URL) and store it however they want in their massive lot. When you need it back, you show your ticket. You don’t see or care how they organize the lot internally. They can move your car around, duplicate it across multiple lots (durability), or even let other people access it if you shared your ticket (public access). You don’t interact with the storage mechanism directly—only with the valet service interface.

How Block Storage Works

Block storage presents to the operating system as a raw volume. The cloud provider (AWS EBS, GCP Persistent Disk, Azure Managed Disks) provisions storage and exposes it as an iSCSI LUN (Logical Unit Number) or a virtual block device.

graph TD
    A["Application/Database"] -->|Read/Write| B["Filesystem Layer"]
    B -->|Block I/O Requests| C["Block Device Driver"]
    C -->|SCSI Commands / iSCSI| D["Block Storage Service"]
    D -->|Distribute Across Disks| E["Physical Storage Backends"]

How it works technically:

  • An application (database, VM, etc.) needs storage. It creates or mounts a block device.
  • When the application reads/writes files, the filesystem translates those operations into block I/O requests: “read blocks 1024-1040” or “write data to blocks 2048-2064”.
  • These requests go through a device driver to the storage backend.
  • The storage system locates the blocks and reads or writes them, handling redundancy and replication.

Performance characteristics:

  • IOPS (Input/Output Per Second): Block storage is measured in IOPS. A provisioned volume might offer 3,000 to 64,000 IOPS depending on the tier.
  • Latency: Single-digit milliseconds (typically 1-10 ms) for cloud block storage, enabling real-time database operations.
  • Throughput: Measured in MB/s. A single EBS volume might sustain 250 MB/s; larger deployments use multiple volumes.

Durability mechanisms: Block storage achieves durability through:

  • Replication: Each block is automatically replicated across multiple physical disks and availability zones.
  • Snapshots: Point-in-time copies that enable disaster recovery.
  • RAID-like protection: Cloud providers implement distributed redundancy so that loss of multiple disks doesn’t cause data loss.

Example: Provisioning EBS for a database

# Create a 100 GB, high-IOPS volume in us-east-1a
aws ec2 create-volume \
  --size 100 \
  --volume-type gp3 \
  --iops 3000 \
  --throughput 125 \
  --availability-zone us-east-1a

# Attach to an EC2 instance
aws ec2 attach-volume \
  --volume-id vol-1234567890abcdef0 \
  --instance-id i-1234567890abcdef0 \
  --device /dev/sdf

# On the instance, format and mount
sudo mkfs -t ext4 /dev/nvme1n1
sudo mount /dev/nvme1n1 /data

How File Storage Works

File storage (NAS) adds organizational structure on top of block storage. A central metadata server manages the filesystem hierarchy, permissions, and file locks. Multiple clients access the same filesystem over the network.

graph TD
    A["Client 1"] -->|NFS/SMB| D["Metadata Server"]
    B["Client 2"] -->|NFS/SMB| D
    C["Client 3"] -->|NFS/SMB| D
    D -->|Coordinate Locks| E["Lock Manager"]
    D -->|Read/Write Blocks| F["Underlying Block Storage"]
    E -->|Prevent Conflicts| A
    E -->|Prevent Conflicts| B
    E -->|Prevent Conflicts| C

How it works technically:

  • A client mounts a filesystem via NFS or SMB: mount -t nfs server:/export /mnt/shared
  • The client’s OS sends filesystem operations (open, read, write, close) to the NAS server.
  • The metadata server tracks which client has which file open, manages locks to prevent simultaneous writes to the same file, and routes I/O to the underlying block storage.
  • Multiple clients can read the same file simultaneously, but write access requires coordination.

Locking mechanisms:

  • Advisory locks: Processes ask permission before accessing a file; other processes should respect these locks.
  • Mandatory locks: The filesystem enforces locks, preventing unauthorized access.
  • Lease-based locks: The server grants short-term access leases that must be renewed; this helps handle client disconnections.

Performance characteristics:

  • Latency: 10-50 ms typically, higher than block storage due to network round trips and metadata server overhead.
  • Throughput: Shared across all connected clients. A single NAS might support 1-10 GB/s aggregate, divided among clients.
  • Scaling: Horizontal scaling is complex. You can’t simply add more NAS servers to a single mount point; that’s a separate scaling challenge.

Example: Setting up EFS (AWS managed NFS)

# Create an EFS filesystem
aws efs create-file-system \
  --performance-mode generalPurpose \
  --throughput-mode bursting

# Create a mount target in a subnet
aws efs create-mount-target \
  --file-system-id fs-12345678 \
  --subnet-id subnet-12345678 \
  --security-groups sg-12345678

# On an EC2 instance, mount the filesystem
sudo mount -t nfs4 -o nfsvers=4.1,rsize=1048576,wsize=1048576 \
  fs-12345678.efs.us-east-1.amazonaws.com:/ /mnt/efs

How Object Storage Works

Object storage fundamentally rethinks the storage model. Instead of exposing block storage or filesystem semantics, it offers a PUT/GET/DELETE API over HTTP.

graph TD
    A["Client"] -->|PUT Object| B["Object Storage API"]
    A -->|GET Object| B
    A -->|DELETE Object| B
    B -->|Distribute & Replicate| C["Distributed Storage Cluster"]
    C -->|Erasure Code| D["Physical Storage Nodes"]
    C -->|Metadata Index| E["Metadata Service"]
    E -->|Track: Bucket, Key, Size, Tags| F["Metadata DB"]

How it works technically:

When you PUT an object:

  1. The client sends HTTP request: PUT /bucket/path/to/object.jpg
  2. The API endpoint assigns the object a unique identifier (usually a hash-based key).
  3. The object is split into chunks, replicated or erasure-coded across storage nodes.
  4. Metadata (size, content-type, etag, custom headers) is indexed in a fast lookup service.
  5. The API returns a successful response.

When you GET an object:

  1. The client sends HTTP request: GET /bucket/path/to/object.jpg
  2. The API looks up the object in the metadata index.
  3. It retrieves the chunks from storage nodes and reassembles them.
  4. It returns the object with metadata headers.

Durability and redundancy:

Object storage uses erasure coding heavily. With 4+2 erasure coding:

  • The object is split into 4 data chunks and 2 parity chunks.
  • You can lose any 2 chunks and still recover the object (the 4 data chunks can be reconstructed from any 6 of the 8 chunks).
  • This provides durability equivalent to 3x replication while using only 50% more space.

For example, AWS S3 claims “11 9’s” of durability (99.999999999%), meaning statistically one object is lost every 10 million years.

Consistency model:

Object storage implements eventual consistency:

  • A PUT succeeds, but a subsequent GET from a different region might not immediately return the new object.
  • Once an object is written, all GET requests for that object within a region are consistent.
  • This trade-off allows massive horizontal scalability and dramatically reduces latency.

Example: S3 API interactions

import boto3

s3_client = boto3.client('s3')

# PUT object
s3_client.put_object(
    Bucket='my-bucket',
    Key='users/photos/vacation.jpg',
    Body=open('vacation.jpg', 'rb'),
    ContentType='image/jpeg',
    Metadata={'photographer': 'alice', 'location': 'hawaii'}
)

# GET object
response = s3_client.get_object(Bucket='my-bucket', Key='users/photos/vacation.jpg')
image_data = response['Body'].read()

# DELETE object
s3_client.delete_object(Bucket='my-bucket', Key='users/photos/vacation.jpg')

# List objects with prefix
paginator = s3_client.get_paginator('list_objects_v2')
pages = paginator.paginate(Bucket='my-bucket', Prefix='users/photos/')
for page in pages:
    for obj in page.get('Contents', []):
        print(obj['Key'], obj['Size'])

When to Use Each Storage Type

Block Storage:

  • Databases (SQL, NoSQL): Need ACID guarantees and low-latency I/O.
  • Virtual machines: Each VM needs its own block device for the OS and data.
  • High-performance computing: Applications that perform random I/O at high frequencies.
  • Message queues: Systems like RabbitMQ or Kafka benefit from fast, predictable latency.

File Storage:

  • Shared home directories: Development teams accessing shared code bases (though Git is often better).
  • Content management systems: WordPress, Drupal accessing shared media libraries.
  • Legacy applications: Systems written assuming local filesystem access.
  • Machine learning training: Distributed training jobs reading datasets from a shared source.

Object Storage:

  • Media assets: Photos, videos, documents served to users.
  • Data lakes: Storing petabytes of raw data for analytics.
  • Static website hosting: Serving HTML, CSS, JS, images to users worldwide.
  • Backups and archives: Long-term retention of log files, database dumps.
  • Machine learning: Training datasets, model artifacts, inference input/output.
  • Data pipelines: Intermediate results stored between processing stages.

Performance, Cost, and Scalability Comparison

DimensionBlock StorageFile StorageObject Storage
Latency1-10 ms10-50 ms50-500 ms (API call overhead)
Throughput per client250+ MB/s100+ MB/sScales with API calls, typical 50+ MB/s
IOPS capacity64,000+500-7,500Not applicable (request-based)
Cost per GB/month$0.10-0.20$0.30$0.023
Durability11 9’s via replication11 9’s via replication11 9’s via erasure coding
ScalabilityLimited to attached volumesShared but difficult to scale beyond single serverUnlimited horizontal scaling
ConsistencyStrongStrongEventual
Access patternOptimized for random I/OOptimized for file operationsOptimized for sequential reads
Suitable dataset sizeUp to 100s of TBUp to 1000s of TBUnlimited (petabytes)

Migration Patterns and Hybrid Approaches

Real-world systems often use all three types, each optimized for its use case:

Example architecture:

Application Servers

    (API Layer)

    ┌───┴────────┬────────────────┐
    ↓            ↓                ↓
Block Storage  File Storage   Object Storage
(Database)     (Shared Logs)   (User Uploads)

Migration scenario: You start with a monolithic application storing everything on a NAS filesystem. As you scale:

  1. Extract database: Move the database to block-storage-backed instances running PostgreSQL or MySQL.
  2. Extract static assets: Move user-uploaded photos to object storage (S3), updating application code to reference object URLs.
  3. Keep shared configuration: Configuration files remain on the file storage for ease of access.

Hybrid example: Data pipeline with all three storage types

# Input data from object storage
s3_client = boto3.client('s3')
raw_data = s3_client.get_object(Bucket='raw-data', Key='logs/2024-01-01.jsonl')

# Process and write to block-storage-backed database
import psycopg2
conn = psycopg2.connect("dbname=analytics user=admin host=db.internal")
cursor = conn.cursor()
cursor.execute("INSERT INTO events (timestamp, user_id, action) VALUES (%s, %s, %s)", event)
conn.commit()

# Write intermediate results to file storage for team access
import os
with open('/mnt/shared/results/daily_summary.txt', 'w') as f:
    f.write(summary_report)

Pro Tips and Design Considerations

Pro Tip: S3 pricing optimization

Object storage costs vary dramatically by access pattern:

  • Standard: $0.023/GB/month (frequently accessed)
  • Infrequent Access: $0.0125/GB/month (accessed less than once per month)
  • Glacier: $0.004/GB/month (archival, retrieval takes hours)

Set lifecycle policies to automatically transition old objects to cheaper tiers:

{
  "Rules": [
    {
      "Id": "ArchiveRule",
      "Status": "Enabled",
      "Transitions": [
        {
          "Days": 30,
          "StorageClass": "STANDARD_IA"
        },
        {
          "Days": 90,
          "StorageClass": "GLACIER"
        }
      ]
    }
  ]
}

Did you know? Object storage is “write-once, read-many” (WORM) in practice. While you can overwrite objects, the underlying storage system is optimized for immutability. Some implementations actually prevent overwrites for compliance.

Consistency and application design: If your application assumes strong consistency (write, then immediately read), object storage will surprise you. Always design with eventual consistency in mind: wait for confirmations, use version IDs, or implement application-level consistency checks.

Key Takeaways

  • Block storage provides raw disk abstraction ideal for databases and VMs requiring low-latency, high-IOPS access patterns.
  • File storage adds filesystem semantics and shared access, suited for collaborative workflows but with limited horizontal scalability.
  • Object storage sacrifices filesystem hierarchy and strong consistency for unlimited scalability and minimal operational overhead, perfect for data lakes, media, and archives.
  • Cost scales inversely with access frequency: Block (expensive, fast), File (moderate), Object (cheap, tolerates higher latency).
  • Real systems use all three: Your architecture will likely include a database on block storage, shared configuration on file storage, and user media in object storage.
  • Consistency models drive design: Strong consistency (block/file) means coordinated access; eventual consistency (object) means designing for eventual correctness.

Practice Scenarios

Scenario 1: You’re building a photo-sharing app

Users upload photos, and the app needs to display them in real-time feeds. Photos are typically accessed once or twice then rarely again. Where would you store photos, and why? How would you structure the system? What if you later needed to apply ML-based image recognition to all uploaded photos asynchronously?

Scenario 2: A data analytics platform

Your platform ingests 10 TB of raw log data daily from thousands of sources, processes it through multi-stage pipelines, and serves aggregated insights via dashboards. Where would you store: raw logs, intermediate processing results, final analytics databases, and configuration files? How would you optimize for cost without sacrificing latency for critical queries?

Scenario 3: A distributed training system

Your ML platform trains models across a cluster of 100 GPU nodes. Each node needs to read a 1 TB training dataset, write periodic checkpoints, and log training progress. Block storage would be prohibitively expensive (100 TB provisioned). How would you structure storage? What consistency guarantees would you need for checkpointing to work correctly?

Looking Ahead

Now that you understand how data is stored, the next chapter explores specialized blob storage implementations and how different cloud providers optimize their storage services for specific workloads. We’ll see how understanding these trade-offs helps you build systems that scale efficiently from gigabytes to exabytes.