RPC and gRPC Fundamentals
The Problem with Service-to-Service Communication
Imagine you’re running a microservices architecture at scale. Your user service needs to fetch account data, which requires calling the billing service. Your order service needs to validate inventory, so it calls the catalog service. Maybe a hundred times per second. Maybe a thousand.
You’ve built these services with REST — the same REST API you expose to your mobile clients and web browsers. And it works. But you’re starting to notice something: your internal service-to-service communication is a bottleneck. Each request serializes data to JSON, sends it over HTTP/1.1 (which establishes a new connection or reuses one inefficiently), deserializes on the other end, and waits for a response. The overhead compounds with every call. JSON payloads are verbose. Latency creeps up. Your infra team is scaling database replicas to handle the load, but the real issue isn’t data — it’s communication.
This is where Remote Procedure Calls (RPC) come in. The idea is simple but powerful: you want to call a function on a remote server as if it were running locally. You don’t want to think about HTTP, JSON, or network details. You just invoke a method, pass arguments, and get back a result. gRPC, Google’s modern open-source RPC framework, takes this concept and optimizes it for the real world. Today, it powers high-frequency service-to-service communication at Netflix, Spotify, Google, Dropbox, and thousands of other companies. It’s not meant to replace REST for public APIs — it’s meant to replace REST for the parts where it was never ideal: internal communication where you control both sides, where performance matters, and where you need features like bidirectional streaming.
What is RPC? The Abstraction
A Remote Procedure Call is a fundamental abstraction that hides network communication behind a function call interface. Here’s the model:
- Client side: You have a “client stub” — a local proxy that looks like the real function you want to call.
- You invoke the stub with your arguments, just like a normal function call.
- The stub marshals (serializes) your arguments into bytes suitable for network transmission.
- The network layer sends those bytes to a server.
- Server side: A “server stub” receives the bytes, unmarshals (deserializes) them back into native objects, and invokes the actual function.
- The real function runs and returns a result.
- The server stub marshals the result and sends it back.
- The client stub receives the response, unmarshals it, and returns it to your code as if it were a local call.
From your perspective, you just call user_service.get_user(user_id) and get back a user object. The network, serialization, and deserialization are invisible.
This abstraction has a long history. Sun Microsystems created Sun RPC in the 1980s. Then came CORBA (distributed objects), DCOM (Microsoft’s version), XML-RPC (RPC over HTTP with XML), SOAP (heavy-weight enterprise RPC), Thrift (Facebook’s lightweight binary RPC), and finally gRPC (Google’s high-performance, modern RPC).
gRPC: RPC for the Modern Era
gRPC stands for “gRPC Remote Procedure Call” (the “g” originally stood for “Google”). It’s open-source, created by Google in 2015, and designed specifically for microservices and modern infrastructure.
The key innovations in gRPC:
HTTP/2 as the transport: Unlike REST (which uses HTTP/1.1), gRPC is built on HTTP/2. This means:
- Multiplexing: Many requests/responses can flow over a single TCP connection simultaneously.
- Binary framing: Data is sent in compact binary frames, not text.
- Header compression: HTTP headers (which are often redundant) are compressed with HPACK.
- Server push: Servers can proactively send data to clients.
Protocol Buffers for serialization: Instead of JSON or XML, gRPC uses Protocol Buffers (protobuf) — a schema-based binary serialization format. You define your message types in a .proto file, and the protobuf compiler generates language-specific code (Python, Go, Java, etc.). Messages are smaller, faster to serialize/deserialize, and strongly typed. Plus, protobufs handle backward and forward compatibility elegantly.
Multiple communication patterns:
- Unary: Traditional request-response (one call, one response).
- Server streaming: Client sends one request, server streams multiple responses.
- Client streaming: Client streams multiple requests, server sends one response.
- Bidirectional streaming: Both client and server stream messages simultaneously.
These patterns make gRPC incredibly flexible. Need to push real-time updates to clients? Bidirectional streaming. Need to upload a large file in chunks? Client streaming. Need to fetch paginated data? Server streaming.
An Analogy
Think of communication styles:
-
REST is like sending letters. Each request is a complete, self-contained package. The letter includes headers (metadata), a body (payload), and is human-readable. It’s great for one-off communication or public APIs where you don’t control both sides. Mailmen don’t need to know the letter’s content to deliver it.
-
gRPC is like a phone call. You establish a connection (TCP) once, then have a direct, continuous conversation. You can talk, listen, or both simultaneously. Messages are compressed shorthand (protobuf) — both parties agreed ahead of time on what each abbreviation means (the schema). It’s fast and efficient, but only works if both parties speak the same language.
-
Protocol Buffers are like shorthand. Instead of spelling everything out (“I need to retrieve user information with ID 42”), you both agreed: field 1 = user_id, field 2 = action. So the message is just “1:42 2:retrieve”. Much shorter, much faster to parse.
How Protocol Buffers Work
You define your data structures in .proto files using a simple, language-agnostic syntax:
syntax = "proto3";
package user_service;
message User {
int32 id = 1;
string name = 2;
string email = 3;
repeated string roles = 4;
bool is_active = 5;
}
message GetUserRequest {
int32 user_id = 1;
}
message GetUserResponse {
User user = 1;
string timestamp = 2;
}
Notice the field numbers (1, 2, 3, etc.). These are crucial for backward compatibility. If version 1 of your schema has fields 1-5, and you add a new field in version 2, you assign it number 6. Old clients simply ignore field 6, and new clients understand it. This is why protobuf handles versioning so well.
The protobuf compiler generates code in your target language:
# Generated Python code (you don't write this)
class User:
def __init__(self, id, name, email, roles, is_active):
self.id = id
self.name = name
self.email = email
self.roles = roles
self.is_active = is_active
# Serialization/deserialization methods are generated too
You then use these generated classes in your gRPC services.
Defining gRPC Services
You define the actual RPC methods in your .proto file:
service UserService {
rpc GetUser (GetUserRequest) returns (GetUserResponse);
rpc ListUsers (ListUsersRequest) returns (stream User);
rpc CreateUser (stream CreateUserRequest) returns (CreateUserResponse);
rpc ProcessUsers (stream User) returns (stream ProcessResult);
}
The stream keyword indicates which communication pattern you’re using. This definition is then compiled into both client and server interfaces.
The Four Communication Patterns
Let’s see how these patterns look in practice:
# Unary: Request-response
user = stub.get_user(GetUserRequest(user_id=123))
print(user.name)
# Server streaming: Server sends multiple messages
stream = stub.list_users(ListUsersRequest(page_size=100))
for user in stream:
process(user)
# Client streaming: Client sends multiple messages
requests = [CreateUserRequest(name="Alice"), CreateUserRequest(name="Bob")]
response = stub.create_user(iter(requests))
print(response.created_count)
# Bidirectional streaming: Both sides stream
def requests():
for user_id in [1, 2, 3, 4, 5]:
yield GetUserRequest(user_id=user_id)
responses = stub.process_users(requests())
for result in responses:
print(result.status)
HTTP/2 and Performance
Why is gRPC so fast? HTTP/2 multiplexing is a game-changer. In HTTP/1.1, each request either opens a new connection (expensive) or waits for the previous response (blocking). With HTTP/2:
HTTP/1.1: [Request 1] ----wait----> [Response 1]
[Request 2] ----wait----> [Response 2]
[Request 3] ----wait----> [Response 3]
HTTP/2: [Request 1] ---↓
[Request 2] ---↓ (single connection)
[Request 3] ---↓
↓ [Response 1]
↓ [Response 2]
↓ [Response 3]
All three requests can be in-flight simultaneously, interleaved at the frame level. This is especially powerful for streaming scenarios.
Here’s a realistic performance comparison:
| Metric | REST (JSON) | gRPC (Protobuf) |
|---|---|---|
| Message size (user object) | 2.1 KB | 0.3 KB |
| Serialization time | 0.8 ms | 0.05 ms |
| Deserialization time | 1.2 ms | 0.08 ms |
| Round-trip latency (10 hops) | 150 ms | 45 ms |
| Connections for 100 concurrent clients | 100 | 4-8 |
The reasons for these differences:
- Binary format: Protobuf is binary, not text. Serializing and parsing binary data is faster and produces smaller output.
- Schema-driven: Because both sides know the schema, you don’t need to include type information or field names in every message.
- Connection reuse: HTTP/2 multiplexing means far fewer TCP connections, reducing overhead.
- Field numbering: You transmit field numbers (1-3 bytes) instead of field names (“user_id” = 7 bytes).
Advanced gRPC Features
Interceptors are gRPC’s equivalent to middleware. You can intercept and modify requests/responses:
def auth_interceptor(continuation, client_call_details):
# Modify metadata (headers)
client_call_details.metadata.append(("authorization", "Bearer token"))
return continuation(client_call_details)
channel.intercept_channel(auth_interceptor)
Use interceptors for logging, authentication, rate limiting, and observability.
Load Balancing is more complex with gRPC than REST. With HTTP/1.1 REST, an L4 load balancer can route requests round-robin. With gRPC’s multiplexed HTTP/2 connections, you need L7 (application-layer) load balancing or client-side load balancing. This is why tools like Envoy (a proxy) became essential in microservices — they understand HTTP/2 and gRPC.
Health Checking uses the grpc.health.v1 protocol. Services expose a Health RPC that reports their status, and load balancers poll it:
service Health {
rpc Check (HealthCheckRequest) returns (HealthCheckResponse);
rpc Watch (HealthCheckRequest) returns (stream HealthCheckResponse);
}
Deadlines and Timeouts: Every gRPC call can have a deadline. If the server doesn’t respond within the deadline, the call is cancelled. This prevents cascading timeouts in deeply nested service calls.
Error Handling uses standard gRPC status codes instead of HTTP status codes:
OK (0): Success
CANCELLED (1): Operation cancelled by caller
UNKNOWN (2): Unknown error
INVALID_ARGUMENT (3): Invalid argument
DEADLINE_EXCEEDED (4): Deadline exceeded before completion
NOT_FOUND (5): Resource not found
ALREADY_EXISTS (6): Resource already exists
PERMISSION_DENIED (7): Caller does not have permission
RESOURCE_EXHAUSTED (8): Some resource has been exhausted
FAILED_PRECONDITION (9): Operation preconditions not met
ABORTED (10): Operation aborted
OUT_OF_RANGE (11): Out of range
UNIMPLEMENTED (12): Operation not implemented
INTERNAL (13): Internal error
UNAVAILABLE (14): Service currently unavailable
DATA_LOSS (15): Unrecoverable data loss or corruption
UNAUTHENTICATED (16): Unauthenticated request
These codes map neatly to operational concerns, not just HTTP semantics.
REST vs gRPC: When to Use What
| Dimension | REST | gRPC |
|---|---|---|
| Primary use case | Public APIs, browser clients | Internal service-to-service |
| Performance | Good (HTTP/1.1) | Excellent (HTTP/2, binary) |
| Human readability | Excellent (JSON) | Poor (binary protobuf) |
| Streaming | Limited (chunked encoding) | Native (unary, server, client, bidirectional) |
| Browser support | Native | Requires gRPC-Web |
| Debugging | Easy (curl, browser DevTools) | Harder (need gRPC tools) |
| Learning curve | Shallow | Moderate |
| Language support | Excellent | Excellent |
| Versioning | Tricky (content negotiation) | Easy (field numbering) |
| Type safety | Weak (parsed JSON) | Strong (generated code) |
The practical answer: Use gRPC for internal communication between services you control. Use REST for public APIs, especially those consumed by third parties or browsers. In a hybrid world, use gRPC-Gateway — a tool that generates a REST API from your gRPC service definition. You write the gRPC service once, and the gateway automatically handles REST-to-gRPC translation. Best of both worlds.
Practical Example: A User Service
Here’s a complete example. The .proto file:
syntax = "proto3";
package user_service;
message User {
int32 id = 1;
string name = 2;
string email = 3;
repeated string roles = 4;
}
message GetUserRequest { int32 user_id = 1; }
message GetUserResponse { User user = 1; }
message CreateUserRequest { string name = 1; string email = 2; }
message CreateUserResponse { int32 user_id = 1; }
service UserService {
rpc GetUser (GetUserRequest) returns (GetUserResponse);
rpc CreateUser (stream CreateUserRequest) returns (CreateUserResponse);
}
The Python server (using the generated code):
from concurrent import futures
import grpc
import user_service_pb2
import user_service_pb2_grpc
class UserServicer(user_service_pb2_grpc.UserServiceServicer):
def GetUser(self, request, context):
# Fetch user from database
user = db.get_user(request.user_id)
return user_service_pb2.GetUserResponse(user=user)
def CreateUser(self, request_iterator, context):
created_count = 0
for request in request_iterator:
user = db.create_user(request.name, request.email)
created_count += 1
return user_service_pb2.CreateUserResponse(user_id=created_count)
server = grpc.server(futures.ThreadPoolExecutor(max_workers=10))
user_service_pb2_grpc.add_UserServiceServicer_to_server(UserServicer(), server)
server.add_insecure_port('[::]:50051')
server.start()
The Python client:
import grpc
import user_service_pb2
import user_service_pb2_grpc
channel = grpc.aio.secure_channel('localhost:50051', credentials)
stub = user_service_pb2_grpc.UserServiceStub(channel)
# Unary call
response = stub.GetUser(user_service_pb2.GetUserRequest(user_id=123))
print(response.user.name)
# Client streaming
def requests():
for name, email in [("Alice", "[email protected]"), ("Bob", "[email protected]")]:
yield user_service_pb2.CreateUserRequest(name=name, email=email)
response = stub.CreateUser(requests())
print(f"Created {response.user_id} users")
gRPC in Your Architecture
Did you know? Many companies run gRPC over encrypted connections (TLS) even internally. While the performance impact is small compared to REST, the security benefits of encryption are worth it.
Pro tip: Use gRPC for service-to-service communication, but always consider exposing a REST API (via gRPC-Gateway or a separate service) for external partners, mobile apps, and anything that can’t use gRPC.
Strengths and Tradeoffs
gRPC strengths:
- Performance is exceptional — binary serialization, HTTP/2 multiplexing, and reduced connection overhead.
- Streaming is a first-class feature — not a hack like chunked encoding in REST.
- Type safety — generated code catches errors at compile time.
- Excellent code generation — write your service once, generate clients and servers in multiple languages.
- Schema versioning is elegant — field numbering handles backward and forward compatibility.
gRPC weaknesses:
- Not human-readable — you can’t curl a gRPC endpoint and see what’s happening (you need
grpcurlor similar). - Browser support is limited — gRPC-Web exists, but it’s a bridge technology, not native.
- Learning curve — understanding HTTP/2, protobufs, and the gRPC model takes time.
- Ecosystem is younger — fewer libraries and tools compared to REST.
- Harder to debug — binary protocols aren’t as transparent as JSON over HTTP.
Key Takeaways
- RPC abstracts network communication: Call remote functions as if they were local, hiding serialization, deserialization, and network details.
- gRPC is RPC optimized for microservices: HTTP/2 multiplexing, protocol buffers, and streaming patterns make it ideal for high-throughput service-to-service communication.
- HTTP/2 enables efficiency: Multiplexing multiple requests over one connection reduces latency and connection overhead compared to HTTP/1.1.
- Protocol Buffers provide schema-driven serialization: Binary format, strong typing, and field numbering create compact, fast, and version-safe messages.
- Streaming patterns unlock new capabilities: Server streaming, client streaming, and bidirectional streaming enable use cases REST struggles with.
- Use gRPC internally, REST externally: gRPC for services you control, REST for public APIs and browser clients. Use gRPC-Gateway for dual support.
Practice Scenarios
Scenario 1: Real-Time Data Pipeline You’re building a real-time analytics system. Raw events flow from your web services to an analytics service, which aggregates and streams results back to dashboards. REST with polling would be inefficient. Design a gRPC solution using server streaming for the analytics service and unary calls for event ingestion. What happens if a dashboard client disconnects mid-stream?
Scenario 2: Large File Upload Service Your platform needs to accept large CSV files (100+ MB) for batch operations. HTTP POST with a single request becomes risky — if the connection drops halfway, you’ve lost everything. Design a gRPC solution using client streaming. How do you handle errors mid-stream? What do you return to the client?
Scenario 3: Migration from REST You have a mature REST API serving both internal services and external partners. You want to add gRPC for internal use without breaking the existing API. How do you structure this? Consider using gRPC-Gateway to generate a REST endpoint from your gRPC service definition, allowing you to maintain backward compatibility while gaining gRPC performance internally.
What’s Next?
gRPC is powerful for internal communication, but what about querying complex, interconnected data from client applications? REST APIs require multiple calls and careful versioning. In the next section, we’ll explore GraphQL, which takes a different approach: let clients query exactly the data they need, with one request, using a declarative query language.