System Design Fundamentals

WebSockets & Real-Time

A

WebSockets & Real-Time

Why Chat Apps Need More Than HTTP

Imagine you’re building a messaging app like WhatsApp or Discord. Your users expect messages to appear instantly—no refreshing, no waiting. With HTTP (which we covered earlier), here’s the problem: the client always has to ask the server for new messages. “Do you have anything for me?” … “No.” … “How about now?” … “Still no.” This approach is inefficient and creates delays.

If you refresh a chat window every second to check for new messages, you’re sending hundreds or thousands of HTTP requests per day, even when nothing has changed. That’s a lot of wasted network traffic. More importantly, the user experience suffers: there’s a noticeable lag between when someone sends a message and when the recipient sees it.

What we need is a two-way street where the server can push new messages to the client without waiting for the client to ask. That’s where WebSockets come in. They let us build truly real-time applications where data flows bidirectionally, instantly, and continuously.

Understanding WebSocket Fundamentals

WebSockets are a communication protocol that provides a persistent, full-duplex connection between a client (usually a browser) and a server over a single TCP connection. Let’s unpack that:

  • Persistent connection: Once established, the connection stays open. You don’t have to open and close connections for every message.
  • Full-duplex: Both the client and server can send messages at the same time, independently. Neither has to wait for the other to finish.
  • Single TCP connection: Everything happens over one long-lived connection, which is more efficient than opening many short-lived HTTP connections.

WebSocket communication starts with an HTTP handshake—this is the clever part. The client sends a special HTTP request asking to “upgrade” the connection to WebSocket protocol. If the server agrees, they both switch protocols on the same TCP connection. From that moment on, they communicate using the WebSocket protocol, not HTTP.

Here’s how it differs from alternatives you might use:

ApproachHow It WorksLatencyServer LoadBest For
HTTP PollingClient repeatedly asks “any updates?”High (depends on poll interval)High (many requests)Simple, infrequent updates
Long-PollingClient waits for server response; server holds request until data availableMediumMedium (fewer connections)Fallback for old browsers
Server-Sent Events (SSE)Server pushes updates to client over HTTP; unidirectionalLowLowOne-way feeds (notifications, live scores)
WebSocketPersistent bidirectional connection; both sides can send anytimeVery LowLow (fewer connections)Real-time chat, collaborative apps, multiplayer games

When should you use each? Use WebSockets when you need true real-time, bidirectional communication. Use SSE when the server only needs to push data (like news feeds). Use long-polling as a fallback for older browsers. Regular HTTP polling is rarely the right choice today.

A Simple Analogy: Letters vs. Phone Calls

Think of HTTP as sending letters through the mail. You write a letter (request), mail it (send HTTP request), and wait days for a reply (receive response). If you need frequent updates, you either write many letters hoping they arrive, or you keep checking your mailbox hoping something arrived.

WebSocket is like a phone call. You dial the number (handshake), the connection is established, and then both of you can talk freely, at any time, without hanging up and calling back. You can interrupt each other, send messages rapidly, and the conversation flows naturally. The phone line stays open as long as you need it.

In a chat app, HTTP is the postal service. WebSocket is the telephone.

How WebSockets Actually Work

The WebSocket journey has several stages. Let’s walk through them:

The Handshake: Upgrading from HTTP

When a client wants to use WebSockets, it sends an HTTP request that looks something like this:

GET /chat HTTP/1.1
Host: server.example.com
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==
Sec-WebSocket-Version: 13

This request has special headers: Upgrade and Connection tell the server “we want to switch protocols.” The Sec-WebSocket-Key is a random value used for security (to prevent HTTP caches from misinterpreting the connection).

If the server agrees, it responds:

HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo=

The 101 status code means “we’re switching protocols.” From this point forward, the TCP connection is no longer HTTP—it’s WebSocket.

Understanding WebSocket Frames

Once the handshake completes, data is sent as frames. A frame is a small packet containing:

  • A header indicating the type of data (text, binary, ping, pong, close, etc.)
  • The length of the payload
  • The actual payload (the message)
  • A mask bit for security (client-to-server messages are masked)

This frame format is much more efficient than HTTP. You’re not sending headers for every message, just a small frame overhead. A single text message might be 10-20 bytes of overhead plus your actual message.

Connection Lifecycle

Here’s what happens during a WebSocket connection:

  1. Open event: After the handshake succeeds, both client and server emit an “open” event
  2. Message exchange: Either side can send messages at any time; the other side receives a “message” event
  3. Ping/Pong: Periodically, the client or server sends a ping; the receiver auto-responds with pong (keeps the connection alive)
  4. Close event: Either side can initiate closure; gracefully closes the connection
  5. Error event: Network problems or protocol violations trigger an error event

Here’s a diagram comparing the message flow between HTTP polling and WebSocket:

graph TD
    A["Client Needs Updates"] -->|HTTP Request 1| B["Server"]
    B -->|Response| A
    A -->|HTTP Request 2| B
    B -->|Response| A
    A -->|HTTP Request 3| B
    B -->|Response| A
    A -->|Keep polling| B

    style A fill:#e1f5ff
    style B fill:#f3e5f5

Now contrast that with WebSocket:

graph TD
    A["Client"] -->|HTTP Upgrade Request| B["Server"]
    B -->|101 Switching Protocols| A
    A <-->|WebSocket Connection| B
    A -->|Message 1| B
    B -->|Message 2| A
    A -->|Message 3| B
    B -->|Message 4| A
    A <-->|Persistent, bidirectional| B

    style A fill:#e1f5ff
    style B fill:#f3e5f5

Scaling WebSocket Servers

A single server can handle thousands of concurrent WebSocket connections (one per user). But when you have millions of users, you need multiple servers. This creates a challenge: when User A on Server 1 sends a message to User B on Server 2, how does Server 1 tell Server 2 about it?

The solution is a pub/sub (publish/subscribe) system, usually Redis. Here’s how it works:

  • Each server maintains WebSocket connections with its connected users
  • When a message arrives at any server, that server publishes it to a Redis channel (e.g., “chat:room123”)
  • All servers subscribe to that channel
  • When a server receives a published message, it checks: “Do I have any clients in this room?” If yes, it sends the message to them

Another important detail: use sticky sessions. If User A reconnects, route them back to the same server (using session affinity in your load balancer). This prevents unnecessary reconnects and keeps the user’s connection state intact.

graph LR
    U1["User A<br/>Server 1"] <-->|WebSocket| S1["Server 1"]
    U2["User B<br/>Server 2"] <-->|WebSocket| S2["Server 2"]
    S1 <-->|Redis Pub/Sub| Redis["Redis"]
    S2 <-->|Redis Pub/Sub| Redis
    S1 -.->|Message from A| S2
    S2 -.->|Delivers to B| U2

    style U1 fill:#c8e6c9
    style U2 fill:#c8e6c9
    style S1 fill:#bbdefb
    style S2 fill:#bbdefb
    style Redis fill:#ffe0b2

Building Real-Time Systems with WebSockets

Let’s look at practical examples and see what real code looks like.

A Simple Chat Application

Here’s how you’d open a WebSocket connection in JavaScript:

const socket = new WebSocket('ws://localhost:8000/chat');

socket.addEventListener('open', () => {
  console.log('Connected to chat server');
  socket.send(JSON.stringify({ type: 'join', username: 'Alice' }));
});

socket.addEventListener('message', (event) => {
  const data = JSON.parse(event.data);
  console.log(`${data.username}: ${data.message}`);
  // Update the UI with the new message
});

socket.addEventListener('close', () => {
  console.log('Disconnected from server');
});

socket.addEventListener('error', (error) => {
  console.error('WebSocket error:', error);
});

// Send a message
function sendMessage(text) {
  socket.send(JSON.stringify({
    type: 'message',
    text: text,
    timestamp: new Date().toISOString()
  }));
}

On the server side (Node.js with the ws library):

const WebSocket = require('ws');
const server = new WebSocket.Server({ port: 8000 });
const rooms = {};

server.on('connection', (socket) => {
  let username;

  socket.on('message', (data) => {
    const message = JSON.parse(data);

    if (message.type === 'join') {
      username = message.username;
      rooms[username] = socket;
      broadcast({ type: 'user-joined', username });
    } else if (message.type === 'message') {
      broadcast({
        type: 'message',
        username,
        text: message.text,
        timestamp: message.timestamp
      });
    }
  });

  socket.on('close', () => {
    delete rooms[username];
    broadcast({ type: 'user-left', username });
  });
});

function broadcast(data) {
  server.clients.forEach((client) => {
    if (client.readyState === WebSocket.OPEN) {
      client.send(JSON.stringify(data));
    }
  });
}

Real-World Use Cases

Live dashboards: A stock trading platform needs to show price updates instantly. WebSocket pushes new prices to all connected traders without delay.

Collaborative editing: Google Docs-style editing where multiple users edit simultaneously. WebSocket sends keystroke events in real-time, and a conflict resolution algorithm merges concurrent edits.

Multiplayer games: Player positions, actions, and game state must sync in milliseconds. WebSocket provides the latency required.

Notifications: When a server event occurs (payment processed, someone liked your post), the server pushes it to the client immediately.

Trade-Offs: When WebSocket is Right (and When It’s Not)

WebSockets are powerful, but they’re not always the best choice.

Advantages: Bidirectional, low latency, low overhead, persistent connection, instant updates.

Disadvantages: Stateful connections are harder to scale than stateless HTTP. Servers must manage connection state, handle reconnects, and coordinate across multiple servers. WebSocket connections consume server resources; a server with 10,000 concurrent users needs robust infrastructure. Also, NAT devices, firewalls, or proxies may block WebSocket connections (though this is rare today).

When to use alternatives:

  • If you only need server-to-client updates (one-way), Server-Sent Events are simpler and use HTTP semantics. They’re easier to cache and route.
  • If updates are infrequent (once per minute or less), HTTP polling or long-polling are adequate and easier to implement.
  • If you need historical data, HTTP is better because requests are stateless and cacheable.

Pro tip: Many production systems use a hybrid approach. They start with long-polling for compatibility, then upgrade to WebSocket if the browser supports it and the use case demands real-time performance.

Key Takeaways

  • WebSocket enables true real-time, bidirectional communication over a single persistent TCP connection, unlike HTTP which is request-response and requires repeated connections.
  • The WebSocket handshake begins with an HTTP upgrade request; once the server agrees (101 Switching Protocols), both parties switch to WebSocket protocol.
  • WebSocket frames are lightweight (small overhead), making it efficient for frequent message exchanges.
  • To scale WebSocket servers, use a pub/sub system (like Redis) so messages can be delivered across multiple servers, and use sticky sessions to route reconnects back to the original server.
  • Choose WebSocket for truly real-time, bidirectional apps (chat, collaborative editing, multiplayer games); use SSE for server-push-only scenarios; use long-polling only as a fallback.
  • Managing connection state (tracking who’s connected, handling disconnects, reconnects) is more complex than stateless HTTP, so weigh the complexity against your real-time requirements.

Practice Scenarios

Scenario 1: You’re building a customer support chat system where agents answer customer inquiries. Why is WebSocket better than HTTP polling here? What architecture would you use to ensure a message from a customer reaches any available agent server?

Scenario 2: Your product team asks for a feature showing live notification badges (e.g., “3 new messages”) that update instantly when users get messages. Would you use WebSocket or Server-Sent Events, and why?

Scenario 3: Design a simple multiplayer tic-tac-toe game. Sketch out the WebSocket messages that would flow between players and the server when a player makes a move. What happens if Player A and Player B both click the same cell at the exact same millisecond?

What’s Next

Now that you understand how WebSocket enables real-time bidirectional communication, we need to zoom out and understand the underlying transport layer. WebSocket sits on top of TCP, but TCP itself has trade-offs. Some applications (like video streaming or live sports) don’t need guaranteed order or reliability; they just need speed. That’s where UDP comes in. In the next chapter, we’ll compare TCP and UDP and explore when you’d use each.