System Design Fundamentals

File Storage & Sharing

A

File Storage & Sharing

The Problem We’re Solving

Imagine you’re designing a system like Dropbox or Google Drive. Your users need to seamlessly upload files from their laptop, access them from their phone, share documents with colleagues, and have peace of mind that their data is safe. On the surface, it sounds simple: store files in the cloud. But at scale—supporting 50 million users with 10 million daily active users—the complexity multiplies. You need to handle unreliable networks, manage massive files (up to 10GB), detect and resolve conflicts when the same file is edited on two devices simultaneously, and guarantee your users never lose their data. This is file storage and sharing.

Functional Requirements

Let’s be concrete about what we’re building:

  • Upload and download files up to 10GB with resume capability
  • Folder hierarchy with full CRUD operations
  • Sharing and permissions — share via secure link, with specific users, granular access control (viewer, editor, owner)
  • File versioning — maintain up to 30 versions per file, restore to any version
  • Real-time sync across devices — changes on one device appear on others within seconds
  • Conflict resolution — handle simultaneous edits gracefully

Non-Functional Requirements

These are the characteristics that determine whether our system succeeds in production:

  • Durability: 99.999999999% (eleven 9s) — approximately one file lost every 10,000 years across the entire system
  • Availability: 99.9% uptime
  • Latency: under 200ms for uploads (on good networks), under 100ms for metadata operations
  • Consistency: eventual consistency for file content, strong consistency for metadata
  • Scale: support 10M DAU, 500PB total storage

Scale Estimation

Let’s do some quick math to understand the traffic we’re handling:

Daily uploads:

  • 10M DAU × 2 files/day average = 20M uploads/day
  • 20M ÷ 86,400 seconds = ~230 requests/second (steady state)
  • Peak traffic (3x) = ~700 requests/second

Storage:

  • Average file size: 500KB
  • Large files (100MB–10GB): separate handling
  • 500PB target storage = 500 × 10^15 bytes

Concurrent connections:

  • 10M DAU with sync agents = millions of WebSocket connections
  • Assuming average session duration 4 hours: ~40M / 6 hours per day = ~1.1M concurrent connections
MetricValueNotes
Daily uploads20MSteady state ~230/s, peak ~700/s
Average file size500KBDominates upload volume
Large file uploads1% of totalBut consume 40% of bandwidth
Metadata QPS~50KIncludes folder reads, sharing queries
Concurrent WebSocket connections1M+Sync agents on desktop/mobile clients

High-Level Architecture

graph TB
    Client["Desktop/Mobile Client<br/>(Sync Agent)"]
    APIGateway["API Gateway<br/>(Load Balanced)"]

    MetadataService["Metadata Service<br/>(PostgreSQL)"]
    BlockStorageService["Block Storage Service<br/>(Uploads, Deduplication)"]
    SyncService["Sync Service<br/>(WebSocket-based)"]

    MetadataCache["Metadata Cache<br/>(Redis)"]
    ObjectStorage["Object Storage<br/>(AWS S3)"]

    NotificationService["Notification Service<br/>(Push/Email)"]
    CDN["CDN<br/>(Public Shares)"]

    Client -->|REST API<br/>WebSocket| APIGateway
    APIGateway -->|Folder/Permission<br/>Queries| MetadataService
    APIGateway -->|Upload Chunks| BlockStorageService
    APIGateway -->|Sync State| SyncService

    MetadataService -->|Cache| MetadataCache
    BlockStorageService -->|Dedupe Check| MetadataCache
    BlockStorageService -->|Store Blocks| ObjectStorage

    SyncService -->|Updates| MetadataService
    SyncService -->|WebSocket Push| Client

    NotificationService -->|User Events| Client
    ObjectStorage -->|Public Files| CDN

This architecture separates concerns: metadata (what files you have) flows through PostgreSQL and is cached aggressively, while file content is chunked, deduplicated, and stored in object storage. The sync service maintains real-time awareness of changes via WebSocket, pushing updates to connected clients.

Deep Dive: Chunked Uploads and Deduplication

Here’s where we handle the realities of the internet: network failures happen, file transfers take time, and many users upload similar files.

Chunking Strategy

Instead of uploading a 5GB video file as a single blob, we split it into 4MB blocks:

  1. Client computes SHA-256 hash of each block
  2. Client sends block metadata (file_id, block_index, hash) to server
  3. Server responds: “block exists” (skip) or “upload needed”
  4. Client uploads needed blocks in parallel (e.g., 4 concurrent connections)
  5. If network fails mid-upload, client resumes with only missing blocks
// Client-side chunking logic (pseudocode)
const CHUNK_SIZE = 4 * 1024 * 1024; // 4MB

async function uploadFile(file: File) {
  const chunks = [];
  for (let i = 0; i < file.size; i += CHUNK_SIZE) {
    const chunk = file.slice(i, i + CHUNK_SIZE);
    const hash = await computeSHA256(chunk);
    chunks.push({ index: i / CHUNK_SIZE, hash, blob: chunk });
  }

  // Check which chunks need uploading
  const response = await api.post('/check-blocks',
    { file_id, blocks: chunks.map(c => ({ index: c.index, hash: c.hash })) }
  );

  // Upload only missing blocks, in parallel
  const needed = chunks.filter(c => !response.existingHashes.has(c.hash));
  await Promise.all(
    needed.map(c => uploadChunk(file_id, c.index, c.blob))
  );
}

Block-Level Deduplication

Here’s a key insight: if two users upload the same movie file, we only store it once on disk. Using content-addressable storage:

  • Store blocks by their SHA-256 hash (not by filename)
  • When a new block arrives, compute its hash and check if it exists
  • If exists, just add a reference in the metadata layer; no disk write needed
  • Multiple users’ files can reference the same underlying block

This reduces storage by 30–50% in practice, especially for shared data like software distributions, media files, and corporate documents.

// Server-side deduplication
interface Block {
  hash: string;        // SHA-256
  size: number;
  storage_path: string;  // s3://bucket/blocks/hash
  ref_count: number;
}

interface FileBlock {
  file_id: string;
  block_index: number;
  block_hash: string;   // Reference to Block table
}

async function ingestBlock(file_id: string, block_index: number, blob: Buffer) {
  const hash = sha256(blob);

  // Check if block already stored
  let existing = await db.blocks.findOne({ hash });

  if (!existing) {
    // New block: upload to S3
    await s3.putObject({
      Bucket: 'file-blocks',
      Key: `blocks/${hash}`,
      Body: blob
    });
    existing = await db.blocks.insert({ hash, size: blob.length, storage_path: `s3://...`, ref_count: 0 });
  }

  // Create reference
  await db.file_blocks.insert({ file_id, block_index, block_hash: hash });
  await db.blocks.increment('ref_count', { hash });
}

File Versioning and Reconstruction

Every time a user saves a file, we create a new version. The trick: we don’t re-store all blocks. We store a version manifest (a list of which blocks comprise that version):

CREATE TABLE file_versions (
  version_id UUID PRIMARY KEY,
  file_id UUID,
  created_at TIMESTAMP,
  created_by UUID,
  block_map JSONB,  -- { "0": "hash1", "1": "hash2", ... }
  size BIGINT,
  metadata JSONB
);

To restore version 5 of a file, we fetch its block_map, retrieve all referenced blocks from object storage, and reassemble them. This is far cheaper than storing every version as a complete copy.

Sync Protocol and Conflict Resolution

The sync agent on the client runs continuously. Here’s how it detects and syncs changes:

graph LR
    A["Client<br/>Local State<br/>v=42"] -->|Poll or<br/>WebSocket| B["Sync Service<br/>Server State<br/>v=100"]
    B -->|Diff<br/>operations| A
    A -->|Apply<br/>operations<br/>locally| A

Conflict Detection: When the same file is modified on two devices simultaneously:

  1. Device A edits file at timestamp 1000, uploads new blocks
  2. Device B edits same file at timestamp 1001 (but hasn’t synced yet), uploads different blocks
  3. Server receives both, detects both claim to be the latest version
  4. Server marks Device B’s version as a conflict copy: “My File (Device B’s conflict 2025-02-14).txt”

Sync Frequency Trade-off: We could poll every second, but that creates ~1B requests/day per user. Instead, we use WebSocket for real-time events and let clients poll periodically (every 30 seconds) as a fallback. This balances latency with resource consumption.

Sharing and Permissions Model

We implement a straightforward ACL (Access Control List) model:

CREATE TABLE sharing (
  share_id UUID PRIMARY KEY,
  resource_id UUID,           -- file_id or folder_id
  resource_type VARCHAR,       -- 'file' or 'folder'
  shared_by_user_id UUID,
  shared_with_user_id UUID,   -- NULL for link-based shares
  permission_level VARCHAR,   -- 'viewer', 'editor', 'owner'
  shareable_link_token VARCHAR, -- For link-based sharing
  link_expiry TIMESTAMP,       -- Optional
  created_at TIMESTAMP
);

Permission levels:

  • Viewer: read-only access, cannot modify or share
  • Editor: can modify files, cannot change permissions
  • Owner: full control, can delete, change permissions, revoke access

Link-based shares (shareable links) are stored as special rows with shared_with_user_id = NULL and include optional password protection and expiry timestamps.

Metadata Database Design

PostgreSQL handles all metadata, sharded by user_id for horizontal scaling:

-- files table
CREATE TABLE files (
  file_id UUID PRIMARY KEY,
  user_id UUID,
  parent_folder_id UUID,
  name VARCHAR,
  size BIGINT,
  created_at TIMESTAMP,
  updated_at TIMESTAMP,
  deleted_at TIMESTAMP,  -- Soft delete
  INDEX (user_id, parent_folder_id)
);

-- file_blocks table (links files to blocks)
CREATE TABLE file_blocks (
  id BIGSERIAL PRIMARY KEY,
  file_id UUID,
  block_index INT,
  block_hash VARCHAR(64),  -- SHA-256
  INDEX (file_id)
);

-- blocks table (content-addressed storage registry)
CREATE TABLE blocks (
  hash VARCHAR(64) PRIMARY KEY,
  size BIGINT,
  uploaded_at TIMESTAMP,
  ref_count INT,
  storage_path VARCHAR
);

Queries are optimized for the common path: “show me all files in this folder” (fast with the index on user_id + parent_folder_id).

Scaling Strategies

Metadata Sharding: Partition by user_id hash across 256 database instances. This ensures one user’s operations don’t compete with another’s.

Caching Layer: Redis cache for:

  • Active file listings (TTL: 5 minutes)
  • Permission checks (TTL: 10 minutes)
  • Block existence checks (TTL: 1 hour)

Object Storage Scaling: S3 is effectively infinite—no action needed. The limiting factor is our metadata database and API gateway, both of which scale horizontally.

Upload vs. Download Path: Route uploads to one fleet of servers optimized for write performance (SSD for temp staging), and downloads to a separate fleet optimized for read performance (cached).

CDN for Public Shares: When a file is shared publicly via link, automatically replicate to CDN edge nodes. This keeps bandwidth costs down and latency low for end users worldwide.

Key Trade-offs

Trade-offOption AOption BOur Choice
Dedup levelBlock-level (complex)File-level (simple)Block-level (saves 40% storage at cost of metadata overhead)
Sync frequencyReal-time WebSocketPolling every 30sHybrid (WebSocket for awareness, polling for resilience)
Conflict handlingLast-writer-winsConflict copiesConflict copies (less data loss, clearer UX)
Version retentionKeep all versions forever30-day limit30-day limit (trade storage for compliance concerns)

Pro Tips for Interviews

  • Version Control Semantics: Be ready to discuss immutable snapshots (Git-like) vs. delta-based versioning. Immutable is simpler to reason about but more expensive.
  • Eventual Consistency: Acknowledge that P2P syncing means your system is eventually consistent, not strongly consistent. A file uploaded on one device may take a few seconds to appear elsewhere.
  • CAP Theorem Trade-offs: Discuss what happens when a partition occurs between a client and server—we typically choose availability (client queues operations) over consistency.

Practice Exercise

Extend the system for:

  1. Collaborative editing — multiple users editing the same document simultaneously (think Google Docs). How would you handle concurrent edits, merge conflicts, and maintain a consistent view?
  2. Offline-first mode — allow users to work offline, queue operations locally, and sync when connectivity returns. How do you detect conflicts in offline scenarios?
  3. Incremental sync — instead of syncing entire files, send only the deltas (changed bytes). What data structure would you use?

Connection to the Next Pattern

You’ve now designed a system where users can reliably store and share files across devices. In the next section, we’ll tackle an equally complex challenge: real-time communication. As you’ll see, the patterns you’ve learned here—chunking for efficiency, deduplication for cost, conflict resolution for consistency—will reappear, adapted for the demands of messaging at scale.