Vertical Partitioning
Vertical partitioning, also called column-based splitting, involves dividing a database table by columns rather than rows. Instead of splitting data horizontally across multiple servers based on a partition key, vertical partitioning separates different groups of columns into different tables or databases. This technique is particularly valuable when different columns have vastly different access patterns, sizes, or availability requirements.
Understanding Vertical Partitioning
In a traditional monolithic database, all columns of a table reside together. A user profile table might contain frequently accessed fields (username, email) alongside rarely accessed fields (bio, profile_image_blob), or fields with different performance characteristics.
Vertical partitioning addresses a fundamental mismatch: why keep all columns together when they have different usage patterns?
Consider a simple users table:
CREATE TABLE users (
user_id BIGINT,
username VARCHAR(255),
email VARCHAR(255),
password_hash VARCHAR(255),
created_at TIMESTAMP,
last_login TIMESTAMP,
bio TEXT,
profile_image BLOB,
preferences JSON,
account_balance DECIMAL,
payment_method VARCHAR(100)
);
Different columns have dramatically different characteristics:
- Hot data:
user_id,username,emailare accessed on nearly every request - Warm data:
password_hash,last_login,account_balanceare accessed during authentication and transactions - Cold data:
bio,profile_imageare accessed infrequently, only when viewing a detailed profile - Different sizes:
profile_imagecould be megabytes whileusernameis hundreds of bytes
Vertical partitioning splits this into multiple tables optimized for their access patterns.
Partitioning Strategies
Column-Based Splitting
The simplest vertical partitioning strategy divides columns into logical groups:
-- Hot data: users_core
CREATE TABLE users_core (
user_id BIGINT PRIMARY KEY,
username VARCHAR(255),
email VARCHAR(255),
created_at TIMESTAMP,
last_login TIMESTAMP
);
-- Warm data: users_auth
CREATE TABLE users_auth (
user_id BIGINT PRIMARY KEY REFERENCES users_core(user_id),
password_hash VARCHAR(255),
account_balance DECIMAL,
payment_method VARCHAR(100)
);
-- Cold data: users_profile
CREATE TABLE users_profile (
user_id BIGINT PRIMARY KEY REFERENCES users_core(user_id),
bio TEXT,
profile_image BLOB,
preferences JSON
);
With this structure, queries fetching user details for a homepage can query only users_core, avoiding I/O from loading megabyte-sized profile images.
Hot and Cold Data Separation
A common pattern distinguishes between frequently and infrequently accessed data:
graph TB
subgraph Original["Original Table: users (all columns)"]
UA[user_id, username, email, created_at]
UB[password_hash, last_login, account_balance]
UC[bio, profile_image, preferences]
end
subgraph Partitioned["After Vertical Partitioning"]
direction TB
H["π₯ Hot Table: users_hot<br/>(Frequently accessed)"]
W["β οΈ Warm Table: users_warm<br/>(Occasionally accessed)"]
C["βοΈ Cold Table: users_cold<br/>(Rarely accessed)"]
end
Original -->|Split by access pattern| Partitioned
H -->|SSD/Memory| Hot["Fast tier storage"]
W -->|SSD| Warm["Standard storage"]
C -->|Blob storage/Archive| Cold["Cost-effective storage"]
Hot data includes columns accessed on most requests:
- User IDs, usernames, email addresses
- Frequently checked status fields
- Recent activity timestamps
Warm data includes columns accessed occasionally:
- Encrypted passwords (accessed only during authentication)
- Financial balances (accessed during transactions)
- Audit logs (accessed for compliance)
Cold data includes columns rarely accessed:
- Long-form text (bios, descriptions)
- Large binary files (profile pictures, documents)
- Historical archives
Separating these allows different storage and caching strategies. Hot data can be cached aggressively, warm data kept in standard databases, and cold data archived to cheaper object storage.
Partitioning by Access Patterns
Instead of temperature-based separation, partition by functional access patterns:
Authentication system accesses:
- user_id, email, password_hash
Profile viewing accesses:
- user_id, username, profile_image, bio, preferences
Payment system accesses:
- user_id, account_balance, payment_method
Analytics system accesses:
- user_id, created_at, last_login, preferences
Each system queries different subsets. Separating these allows:
- Independent scaling: The authentication table can be replicated across more servers than the profile table
- Specialized storage: Payment data might use encrypted storage, profile images use CDN-backed blob storage
- Different consistency models: Authentication data requires strong consistency; profile updates can be eventually consistent
- Targeted indexing: Build indexes optimized for each partitionβs queries
Normalization vs. Denormalization in Vertical Partitioning
Vertical partitioning intersects with normalization. When columns are split across tables, relationships emerge.
Normalized Approach
Partition with foreign key relationships:
-- Base entity
CREATE TABLE users (
user_id BIGINT PRIMARY KEY,
username VARCHAR(255),
email VARCHAR(255)
);
-- Vertically partitioned: settings
CREATE TABLE user_settings (
user_id BIGINT PRIMARY KEY REFERENCES users(user_id),
theme VARCHAR(50),
notifications_enabled BOOLEAN,
language VARCHAR(10)
);
-- Query: Fetch user with settings
SELECT u.*, us.theme, us.language
FROM users u
LEFT JOIN user_settings us ON u.user_id = us.user_id
WHERE u.user_id = 123;
Advantages:
- No data duplication
- Consistency enforced by foreign keys
- Clean schema design
Disadvantages:
- Requires joins across partitions (performance cost)
- Latency increases if partitions are on different servers
- Network round-trips for related data
Denormalized Approach
Replicate key identifiers to avoid joins:
CREATE TABLE users_core (
user_id BIGINT PRIMARY KEY,
username VARCHAR(255),
email VARCHAR(255)
);
CREATE TABLE users_settings (
user_id BIGINT PRIMARY KEY,
username VARCHAR(255), -- denormalized copy
email VARCHAR(255), -- denormalized copy
theme VARCHAR(50),
notifications_enabled BOOLEAN,
language VARCHAR(10)
);
-- Query: Fetch settings without join
SELECT theme, language FROM user_settings
WHERE user_id = 123;
Advantages:
- No joins required, queries are faster
- Partition is self-contained
- Better for distributed systems
Disadvantages:
- Data duplication increases storage
- Consistency challenges (username changes must update both tables)
- Requires application logic or triggers for synchronization
In distributed systems, denormalization often wins. The cost of joins across network boundaries exceeds the benefit of normalized storage.
Combining Vertical and Horizontal Partitioning
Vertical and horizontal partitioning are complementary. Combine them for maximum optimization:
User Table (Vertical Split):
βββ users_core (user_id, username, email)
β βββ Horizontal: Partition by user_id range
β βββ Shard 1: user_id 1-1M
β βββ Shard 2: user_id 1M-2M
β βββ Shard 3: user_id 2M-3M
β
βββ users_auth (user_id, password_hash, account_balance)
β βββ Horizontal: Partition by user_id range
β βββ Shard 1: user_id 1-1M
β βββ Shard 2: user_id 1M-2M
β βββ Shard 3: user_id 2M-3M
β
βββ users_profile (user_id, bio, profile_image)
βββ Horizontal: Partition by user_id range
βββ Shard 1: user_id 1-1M
βββ Shard 2: user_id 1M-2M
βββ Shard 3: user_id 2M-3M
Each partition (vertical split) is then sharded horizontally, creating a two-dimensional distribution:
- Vertical: Group related columns
- Horizontal: Distribute each group across servers
This maximizes benefits:
users_core (hot data)
βββ Small, frequently accessed
βββ Shard 1: users 1-1M (Memory-cached, SSD)
βββ Shard 2: users 1M-2M (Memory-cached, SSD)
βββ Shard 3: users 2M-3M (Memory-cached, SSD)
users_profile (cold data)
βββ Large, rarely accessed
βββ Shard 1: users 1-1M (Object storage)
βββ Shard 2: users 1M-2M (Object storage)
βββ Shard 3: users 2M-3M (Object storage)
The hot data shards can be small and fast; cold data shards can use cheaper storage.
Practical Implementation Patterns
API Layer Routing
The application layer must know which partition to query:
class UserService:
def get_user_core(self, user_id):
"""Fetch hot data only"""
return db_hot.query(
"SELECT * FROM users_core WHERE user_id = %s",
(user_id,)
)
def get_user_profile(self, user_id):
"""Fetch cold data when needed"""
return db_cold.query(
"SELECT * FROM users_profile WHERE user_id = %s",
(user_id,)
)
def get_user_full(self, user_id):
"""Fetch all data (multiple queries)"""
core = self.get_user_core(user_id)
profile = self.get_user_profile(user_id)
# Merge in application
return {**core, **profile}
Caching Strategy
Different partitions benefit from different caching strategies:
# Hot data: Cache aggressively
cache.set(
f"user:core:{user_id}",
user_core_data,
ttl=3600 # 1 hour
)
# Warm data: Cache moderately
cache.set(
f"user:auth:{user_id}",
user_auth_data,
ttl=300 # 5 minutes
)
# Cold data: Cache rarely or not at all
# Profile images: Cache via CDN, not in-process cache
Trade-Offs and Considerations
Advantages
- Reduced I/O: Queries accessing only hot data donβt read cold columns
- Faster caching: Smaller partitions fit in memory more easily
- Flexible storage tiers: Place hot on SSD, cold on cheaper blob storage
- Independent optimization: Optimize each partition for its access pattern
- Query performance: Smaller tables mean faster scans and indexes
Disadvantages
- Increased complexity: Application must route to correct partitions
- Join overhead: Reconstructing full records requires multiple queries
- Data consistency: Maintaining consistency across partitions is harder
- Schema changes: Adding columns requires updating partition logic
- Operational complexity: Managing multiple tables instead of one
When to Use Vertical Partitioning
Vertical partitioning is most valuable when:
- Access patterns diverge significantly: Some columns accessed constantly, others rarely
- Column sizes vary dramatically: Some columns are large (images, blobs), others tiny
- Different consistency requirements: Some columns need strong consistency, others can be eventually consistent
- Storage cost matters: Expensive to keep massive columns in hot storage
- Scalability bottlenecked by I/O: Reducing data volume per query improves performance
When columns are uniformly accessed and similar size, horizontal partitioning alone is usually sufficient.
Real-World Examples
E-Commerce Product Catalog
products_meta (hot)
βββ product_id, name, price, stock_level
βββ Accessed on every product page load
products_details (warm)
βββ product_id, description, specifications
βββ Accessed when user views full product page
products_media (cold)
βββ product_id, images, videos, 3d_models
βββ Accessed when user wants to see gallery
products_reviews (warm)
βββ product_id, review_count, average_rating
βββ Accessed on product page (but count only)
βββ Detailed reviews in separate partition
Social Media User Profile
profile_core (hot)
βββ user_id, username, follower_count
βββ Accessed on feed, mentions, search
profile_contact (warm)
βββ user_id, email, phone, address
βββ Accessed during settings, rarely
profile_content (warm)
βββ user_id, bio, profile_picture_url
βββ Accessed when viewing profile
profile_media (cold)
βββ user_id, photos, videos, archives
βββ Accessed when user browses own content
Key Takeaways
- Vertical partitioning splits tables by columns, separating data with different access patterns
- Hot/cold separation allows cost-effective storage and caching strategies
- Combine vertical and horizontal partitioning for maximum scalability
- Application logic must route requests to the correct partition
- Denormalization often wins in distributed systems to avoid expensive joins
- Use vertical partitioning when access patterns, sizes, or consistency requirements diverge significantly
Connection to Next Section
Successful vertical partitioning requires choosing which columns to group togetherβa decision that mirrors the challenge of horizontal partitioning: choosing the right partition key. In the next section, we explore how to select partition keys that balance data distribution, access patterns, and consistency requirements.