System Design Fundamentals

Resource Right-Sizing

A

Resource Right-Sizing

The Apartment Problem

Your monitoring dashboard shows that your fleet of 20 m5.2xlarge EC2 instances runs at 15% CPU utilization and 30% memory usage on average. You’re paying $0.384/hour per instance—about $2,809/month for the entire fleet—but only using a fraction of their capacity.

What if you downsize to m5.large instances (one-quarter the resources)? You’d pay $0.096/hour per instance, or $702/month total. Your application runs without performance degradation. You just saved $2,107/month, or $25,284 per year, by choosing smaller boxes.

This is the power of right-sizing: matching your provisioned resources to your actual workload needs. In our experience, right-sizing is the single most impactful cost optimization technique in cloud environments. It’s not flashy—it doesn’t involve architectural redesigns or clever algorithms. But it saves money immediately and consistently.

Right-sizing is not a one-time activity. Workloads evolve, traffic patterns change, and efficiency improves over time. Effective organizations treat right-sizing as an ongoing practice: measure → analyze → resize → monitor → repeat.

What Is Right-Sizing?

Right-sizing means allocating the minimum resources necessary to meet your performance requirements. Three failure modes exist:

Over-provisioning (most common): You allocate more resources than your workload needs. The application performs well, but you waste money on unused capacity. Teams often over-provision “just in case” or because they don’t measure actual usage.

Under-provisioning: You allocate fewer resources than needed. Your application experiences performance degradation: slow response times, request timeouts, out-of-memory errors. This is obvious and usually gets fixed immediately.

Poor-provisioning: You choose the right capacity but the wrong instance type for your workload. A memory-optimized instance (r5.large) for a compute-bound workload costs more than a compute-optimized instance (c5.large) and delivers worse performance.

Right-sizing addresses all three. It requires honest measurement of actual usage, not assumptions about what you might need.

The Right-Sizing Cycle

A repeatable process prevents guessing:

  1. Measure: Collect actual usage metrics for compute, memory, storage, network, and I/O.
  2. Analyze: Compare provisioned capacity to measured usage. Identify over-provisioned resources.
  3. Resize: Modify instance types, counts, or configurations. Update capacity reservations.
  4. Monitor: Track performance metrics after changes. Ensure no degradation occurred.
  5. Repeat: Re-measure every 1–3 months as workloads evolve.

This cycle runs continuously. Your goal is a cost curve that descends over time as you learn and optimize, not flatlines from day one.

Resources to Right-Size

Most cloud environments have multiple layers where right-sizing applies:

Compute Instances (EC2, Compute Engine, VMs): CPU, memory, storage.

Databases (RDS, Cloud SQL, managed databases): Instance type, storage volume, IOPS, replica counts.

Container Resources (Kubernetes): CPU requests/limits, memory requests/limits per pod.

Storage Volumes (EBS, Persistent Disks): IOPS, throughput, capacity.

Serverless Concurrency: Lambda reserved concurrency, Cloud Function invocation limits.

Cache Layers (Redis, Memcached): Node size, node count, eviction policies.

Each layer has its own measurement approach and trade-offs.

Measuring Actual Usage: The Toolbox

You can’t right-size what you don’t measure. Here’s how to collect real data:

CloudWatch (AWS): Collect metrics for EC2 instances—CPUUtilization, MemoryUtilization (requires CloudWatch agent), NetworkIn/NetworkOut, DiskReadBytes/DiskWriteBytes. Set metrics to 1-minute granularity and collect for at least 1 month to capture weekly and monthly patterns.

AWS Compute Optimizer: Analyzes your EC2 instances, RDS databases, Auto Scaling Groups, and Lambda functions. Provides rightsizing recommendations with savings calculations. For example: “Downsize i-0a1b2c3d4e5f6g7h8 from m5.2xlarge to m5.large, save 75% on compute costs, estimated CPU utilization will remain below 20%.”

Kubernetes Vertical Pod Autoscaler (VPA): Analyzes actual CPU and memory usage for containerized workloads. Recommends request/limit adjustments. Can auto-update resource requests based on observed usage patterns.

GCP Recommender: Similar to AWS Compute Optimizer. Analyzes Compute Engine instances and provides rightsizing recommendations.

Custom Monitoring (Prometheus/Grafana): If you run self-managed infrastructure, scrape application metrics (memory allocated, heap size, connections) and infrastructure metrics (CPU, memory) using Prometheus. Correlate them to understand application behavior.

Pro tip: Measure for at least 1 month before resizing. Most workloads have weekly cycles (higher traffic on weekdays, lower on weekends) and monthly cycles (month-end batches, reporting runs). One week of data misses important patterns.

Instance Family Selection: Choosing the Right Tool

Cloud providers organize instance types into families, each optimized for specific workload characteristics. Choosing the right family is crucial:

FamilyBest ForExample Use CasesCost Notes
General Purpose (M-series)Balanced workloadsWeb servers, small databases, developmentModerate cost, versatile
Compute Optimized (C-series)CPU-bound workloadsBatch processing, scientific computing, video encodingHigher cost than M, justified for CPU-heavy work
Memory Optimized (R-series)Memory-bound workloadsCaching layers, in-memory databases, analyticsExpensive per vCPU, justified for memory-heavy work
Storage Optimized (I-series)I/O-bound workloadsNoSQL databases, data warehouses, ElasticsearchVery expensive, justified only for I/O-intensive work

Choose poorly and you’re wasting money on the wrong resource type. A Redis cluster running on memory-optimized instances is efficient. A Redis cluster running on compute-optimized instances costs more and performs worse.

Did you know? Graviton processors (AWS’s custom ARM-based chips) offer 20–40% better price-performance than traditional Intel/AMD instances for many workloads. An m6g.large (Graviton) offers similar performance to an m5.large (Intel) but costs about 15% less. The catch: you need to ensure your application runs on ARM (most modern applications do).

Container Right-Sizing: Kubernetes Requests and Limits

Kubernetes lets you specify CPU and memory requests and limits for each container. These are levers for right-sizing:

Requests: The amount of resources Kubernetes guarantees for a container. The scheduler uses requests to decide which nodes can fit a pod. If you set requests too low (e.g., 50m CPU when you actually need 500m), the scheduler might overpack the node, causing CPU throttling.

Limits: The maximum resources a container can consume. If you set limits too low, the container gets killed when it exceeds the limit (out-of-memory, SIGKILL). If you set limits too high, you allocate unused capacity.

apiVersion: v1
kind: Pod
metadata:
  name: example-pod
spec:
  containers:
  - name: example
    image: example:latest
    resources:
      requests:
        cpu: 250m      # Requesting 0.25 vCPU
        memory: 256Mi  # Requesting 256 MB
      limits:
        cpu: 500m      # Limit to 0.5 vCPU
        memory: 512Mi  # Limit to 512 MB

Right-sizing Kubernetes involves:

  1. Set requests accurately: Start with estimates, then observe actual usage. Adjust requests based on monitoring.
  2. Avoid the “noisy neighbor” problem: If one pod doesn’t set requests, it can consume all available resources, starving sibling pods.
  3. Use Vertical Pod Autoscaler: VPA analyzes actual usage and recommends request adjustments, then auto-updates them.
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: example-vpa
spec:
  targetRef:
    apiVersion: "apps/v1"
    kind: Deployment
    name: example
  updatePolicy:
    updateMode: "Auto"  # Auto-update requests

Database Right-Sizing

Database instances are often over-provisioned because sizing decisions are made early in development before actual usage patterns emerge:

RDS Instance Sizing: Start with the smallest instance that handles peak load, then measure actual CPU and memory usage. AWS RDS metrics show CPU, memory, and IOPS. If CPU averages 20% and memory averages 40%, you can likely downsize. However, account for maintenance windows, backups, and future growth (20% headroom is reasonable).

DynamoDB Provisioned Capacity: You reserve read and write capacity units (RCU and WCU). Each RCU provides 4 KB of consistent read throughput per second; each WCU provides 1 KB of write throughput per second. If your application uses 50 RCU consistently but provisioned 500 RCU, you’re wasting money. Solution: switch to on-demand pricing if usage is variable, or reduce provisioned capacity to match actual usage.

Read Replicas: Adding read replicas to RDS or Aurora doesn’t increase cost linearly—you pay for additional instances, but they don’t need to match the primary’s capacity. A primary handling 100 write requests/second needs high CPU for writes, but replicas handling only reads can be smaller instances.

Caching Layers: ElastiCache (Redis/Memcached) sizing depends on working set size (the data you actively access). If you cache 10 GB of data, a 50 GB node is oversized. Measure hit ratios and cache usage; adjust node size accordingly.

The Right-Sizing Cadence

Workloads change. What was right-sized three months ago might be oversized today. Establish a cadence:

  • New applications: Right-size after 2 weeks of production traffic (captures weekly patterns).
  • Mature applications: Review quarterly. Set calendar reminders.
  • After traffic changes: Spikes or drops warrant immediate review.
  • Seasonal workloads: Review before peak and off-peak seasons.

Automate this where possible. Set up AWS Compute Optimizer to run weekly and email recommendations. Configure Kubernetes VPA to auto-update requests. Use Kubecost to track container resource efficiency.

Right-Sizing Trade-Offs

Right-sizing isn’t free of risks and trade-offs:

Risk of Under-Provisioning: If you downsize too aggressively, performance degrades during traffic spikes. Solution: apply a buffer (e.g., right-size to P95 usage, not average usage) and maintain headroom.

Buffer Strategy: Most teams use P95 or P99 metrics as their sizing target, not average. If your average CPU is 20% but P95 is 60%, size for P95 to handle spikes without degradation. The P99 approach adds more headroom but costs more.

Effort vs Savings: Optimizing a 512 MB Lambda function from 1,024 MB saves 50% on that function’s compute cost. But if it costs 4 engineering hours to measure and optimize, and it costs $5/month, the ROI is terrible. Focus right-sizing efforts on your highest-cost resources first.

Graviton Migration Effort: Graviton instances require your application to run on ARM. Most modern languages support ARM, but some dependencies don’t. The cost savings (20–40%) must justify the migration effort.

Pro tip: Right-sizing is an asymptotic process. The first 20% of effort yields 80% of savings (the low-hanging fruit). The last 80% of effort yields 20% of savings (diminishing returns). Know when to stop.

Key Takeaways

  1. Measure actual usage before resizing: Assumptions lead to mistakes. Collect metrics for at least 1 month, capturing weekly and monthly patterns.

  2. Right-sizing is an ongoing practice: Set a cadence (quarterly review for mature applications) and revisit regularly as workloads evolve.

  3. Choose the right instance family: Compute-optimized for CPU-bound work, memory-optimized for memory-bound work, general-purpose for balanced workloads.

  4. Container right-sizing prevents overprovisioning: Set Kubernetes requests accurately using VPA recommendations. Avoid the noisy neighbor problem.

  5. Database right-sizing requires monitoring: RDS, DynamoDB, and caching layers all have specific measurement approaches. Focus on actual usage vs provisioned capacity.

  6. Buffer for spikes, not average usage: Size to P95 or P99 metrics, not average, to maintain performance during traffic increases.

Practice Scenarios

Scenario 1: Your application runs on 10 t3.xlarge instances. CloudWatch shows average CPU is 25% and memory is 35%. Using AWS Compute Optimizer, you receive a recommendation to downsize to t3.large (assuming maintained performance). Calculate the annual savings and estimate the risk if actual peak usage is 10% higher than measured.

Scenario 2: Your Kubernetes cluster has 50 pods with resource requests of 500m CPU and 512 Mi memory each. Vertical Pod Autoscaler recommends reducing requests to 200m CPU and 256 Mi memory on average. Calculate the freed-up cluster capacity and determine how many additional pods you could schedule without adding nodes.


Next: Right-sizing identifies over-provisioned resources. Once you’ve right-sized, the next decision is commitment: should you use on-demand pricing for maximum flexibility or reserved instances for maximum savings?