Skip to content

Web Services Architecture

Deep dive into every major API paradigm — how each protocol works under the hood, when to use each, and how they compare.


Protocol Comparison Overview

graph TD
    A[Client needs data] --> B{Use case?}
    B -->|Public API, CRUD, browser-native| C[REST]
    B -->|Flexible queries, complex frontends| D[GraphQL]
    B -->|Internal service-to-service, streaming| E[gRPC]
    B -->|Real-time bidirectional| F[WebSocket]
    B -->|Server pushes only, notifications| G[SSE]
    B -->|TypeScript full-stack only| H[tRPC]
    B -->|Event notification to external systems| I[Webhooks]
    B -->|Legacy enterprise integration| J[SOAP]
Protocol Transport Format Direction Browser Native Best For
REST HTTP/1.1, HTTP/2 JSON (typically) Req/Res Public APIs, CRUD, resource modeling
GraphQL HTTP/1.1, HTTP/2 JSON Req/Res + Subscription Complex frontends, data aggregation
gRPC HTTP/2 only Protocol Buffers (binary) Req/Res + Streaming ⚠️ (needs proxy) Internal microservices, high-throughput
SOAP HTTP, SMTP, TCP XML Req/Res Legacy enterprise, financial services
WebSocket WS (TCP upgrade) Any (text/binary) Full-duplex Real-time chat, gaming, collaboration
SSE HTTP/1.1, HTTP/2 Text (UTF-8) Server → Client only Feeds, notifications, AI streaming
Webhooks HTTP POST JSON (typically) Server → Client push Event-driven integrations, automation
tRPC HTTP/WebSocket JSON Req/Res + Subscription ✅ (Node/TS only) TypeScript full-stack monorepos

REST (Representational State Transfer)

Roy Fielding defined REST in his 2000 doctoral dissertation as an architectural style — not a protocol — built on six constraints that, when applied together, produce a scalable, stateless, and cacheable web service.

The Six Architectural Constraints

1. Client–Server Separation

The client and server evolve independently. The server manages data storage and business logic; the client manages the user interface and user state. Neither depends on the other's implementation details — only the shared API contract.

This decoupling allows frontend teams to swap frameworks (React → Vue) or mobile clients to evolve, without requiring backend changes, and vice versa.

2. Stateless

Every request from client to server must contain all information necessary to understand and process the request. The server stores no session state between requests.

❌ Stateful (server stores session):
POST /login       → server creates session, returns cookie
GET /dashboard    → server reads session to identify user

✅ Stateless (client carries state):
GET /dashboard
Authorization: Bearer eyJhbGciOiJSUzI1NiJ9...

Consequences: - Scalability: any server instance can handle any request — no sticky sessions - Reliability: no session state to lose if a server crashes - Overhead: every request must carry auth credentials and context (larger payloads)

3. Cacheable

Responses must declare whether they are cacheable or not. When responses are cacheable, clients and intermediaries (CDNs, proxies) can serve them without contacting the server.

Key HTTP cache headers: | Header | Purpose | Example | |---|---|---| | Cache-Control | Directives for caching behavior | Cache-Control: max-age=3600, public | | ETag | Fingerprint of resource version | ETag: "d8e8fca2dc0f896fd7cb4cb0031ba249" | | Last-Modified | When resource last changed | Last-Modified: Tue, 22 Apr 2026 12:00:00 GMT | | Vary | Which headers affect the cache key | Vary: Accept-Encoding, Authorization |

Conditional requests let clients validate their cache:

GET /users/42
If-None-Match: "d8e8fca2dc0f896fd7cb4cb0031ba249"

→ 304 Not Modified (body omitted — client uses cached copy)
→ 200 OK + new ETag + new body (cache miss — resource changed)

4. Uniform Interface

The single most important constraint. It defines four sub-principles:

4a. Resource Identification in Requests — every resource has a stable URI:

/users                        → collection of users
/users/42                     → specific user
/users/42/orders              → orders belonging to user 42
/users/42/orders/7/items      → items in that order

4b. Manipulation via Representations — clients hold representations (JSON, XML, HTML), not live objects. The client modifies the representation and sends it back.

4c. Self-Descriptive Messages — each request/response carries enough metadata to describe how to process it: Content-Type, method, status code, cache directives.

4d. HATEOAS — see section below.

5. Layered System

Clients cannot tell whether they're connected directly to the server or an intermediary (load balancer, CDN, API gateway, caching proxy). Each layer only knows about the adjacent layer.

This enables transparent insertion of: - CDNs for caching at the edge - API gateways for auth, rate limiting, routing - Load balancers for distributing traffic - Service meshes for observability and mTLS

6. Code on Demand (optional)

The only optional constraint. Servers can temporarily extend client functionality by transferring executable code (e.g., JavaScript). Rarely relevant in modern API design.

HTTP Methods and Idempotency

Method Semantics Idempotent Safe Common Use
GET Retrieve resource(s) Read data
HEAD GET without body (check existence/metadata) Cache validation
POST Create a new resource; non-idempotent actions Create, submit form, trigger action
PUT Replace entire resource (upsert) Full update
PATCH Partial update ❌* Partial update
DELETE Remove resource Delete
OPTIONS Discover allowed methods (used for CORS preflight) CORS

* PATCH can be designed idempotently but is not required to be.

Safe = no side effects (read-only). Idempotent = making the same request N times has the same effect as making it once.

HTTP Status Codes

Range Category Key Codes
2xx Success 200 OK, 201 Created, 202 Accepted, 204 No Content
3xx Redirection 301 Moved Permanently, 304 Not Modified
4xx Client Error 400 Bad Request, 401 Unauthorized, 403 Forbidden, 404 Not Found, 409 Conflict, 422 Unprocessable Entity, 429 Too Many Requests
5xx Server Error 500 Internal Server Error, 502 Bad Gateway, 503 Service Unavailable, 504 Gateway Timeout

Common Status Code Mistakes

  • Never return 200 OK with an error in the body — clients must parse every body to detect errors
  • Use 401 for unauthenticated, 403 for authenticated but unauthorized
  • Use 422 (not 400) when the request is syntactically valid but semantically wrong (e.g. invalid field value)
  • 404 means "resource not found", not "I don't know" — don't use it as a catch-all

PATCH Semantics: JSON Patch vs JSON Merge Patch

PATCH is the most nuanced HTTP method. The two dominant formats behave very differently:

JSON Merge Patch (RFC 7396) — simple, intuitive; send only the fields you want to change:

PATCH /users/42 HTTP/1.1
Content-Type: application/merge-patch+json

{"email": "new@example.com", "phone": null}

Server merges the patch with the existing resource: email is updated, phone is removed (explicit null), all other fields are unchanged.

Limitation: you cannot set a field to null and leave it present — null always means "remove." This makes JSON Merge Patch unusable for APIs where null is a meaningful value.

JSON Patch (RFC 6902) — explicit operations array, more powerful but more complex:

PATCH /users/42 HTTP/1.1
Content-Type: application/json-patch+json

[
  { "op": "replace", "path": "/email", "value": "new@example.com" },
  { "op": "remove", "path": "/phone" },
  { "op": "add", "path": "/addresses/1", "value": {"city": "Berlin"} },
  { "op": "test", "path": "/version", "value": 3 }
]

Operations: add, remove, replace, move, copy, test. The test operation enables optimistic concurrency — the patch fails atomically if the tested value doesn't match.

Dimension JSON Merge Patch JSON Patch
RFC 7396 6902
Content-Type application/merge-patch+json application/json-patch+json
Format Partial JSON object Array of operations
Set field to null ❌ (null = remove) {"op": "replace", "path": "/x", "value": null}
Array operations Replace entire array only Add/remove individual elements
Atomicity No built-in check test operation for optimistic locking
Complexity Low — just send partial object Higher — must construct operation array
Adoption More common (GitHub, Stripe) Less common; used when precision needed

Practical Recommendation

Most APIs use JSON Merge Patch for simplicity. Use JSON Patch only when you need array element manipulation, optimistic concurrency via test, or the ability to distinguish "set to null" from "remove."

HATEOAS

Hypermedia as the Engine of Application State — the highest constraint of REST. Responses include hyperlinks that describe what actions are available next. Clients need no prior knowledge of URL structure; they navigate by following links.

{
  "id": 42,
  "name": "Alice",
  "email": "alice@example.com",
  "_links": {
    "self":   { "href": "/users/42", "method": "GET" },
    "orders": { "href": "/users/42/orders", "method": "GET" },
    "update": { "href": "/users/42", "method": "PUT" },
    "delete": { "href": "/users/42", "method": "DELETE" }
  }
}

Benefits: API is self-documenting; server can change URL structure without breaking clients; workflow steps are discoverable.

In practice: very few production APIs implement full HATEOAS. Most APIs reach Level 2 of the Richardson Maturity Model (proper HTTP verbs) and stop there.

Richardson Maturity Model

A framework for measuring how RESTful an API actually is:

Level Name What It Adds Example
0 Swamp of POX Single endpoint, single method POST /api with XML body specifying action
1 Resources Multiple URIs, but still single HTTP verb POST /users, POST /users/42
2 HTTP Verbs Uses GET/POST/PUT/DELETE meaningfully GET /users/42, DELETE /users/42
3 Hypermedia Responses contain links for navigation (HATEOAS) JSON with _links section

Roy Fielding stated that Level 3 is the pre-condition of REST. Most production APIs sit at Level 2 — which is fine for practical purposes, even if technically not "truly RESTful."


GraphQL

Facebook created GraphQL in 2012 and open-sourced it in 2015. It is a query language for your API and a runtime for executing those queries — giving clients the power to ask for exactly what they need and nothing more.

Core Concept: Single Endpoint

Unlike REST's resource-per-endpoint model, GraphQL exposes a single endpoint (typically POST /graphql) that accepts queries describing the exact shape of data needed.

# REST requires 3 round trips:
# GET /users/42
# GET /users/42/posts
# GET /posts/7/comments

# GraphQL fetches all in one request:
query {
  user(id: 42) {
    name
    email
    posts(limit: 5) {
      title
      publishedAt
      comments(limit: 3) {
        body
        author { name }
      }
    }
  }
}

Type System and Schema

Everything in GraphQL is strongly typed. The schema is the single source of truth — it describes every piece of data the API can return and every operation clients can perform.

Scalar Types

Built-in primitives: Int, Float, String, Boolean, ID. Custom scalars can be defined (e.g., DateTime, URL, JSON).

Object Types

type User {
  id: ID!                  # ! = non-nullable
  name: String!
  email: String!
  createdAt: DateTime!
  posts: [Post!]!          # non-null list of non-null Posts
}

type Post {
  id: ID!
  title: String!
  body: String
  author: User!
  tags: [String!]!
}

Special Root Types

type Query {
  user(id: ID!): User
  users(limit: Int = 20, offset: Int = 0): [User!]!
}

type Mutation {
  createUser(input: CreateUserInput!): User!
  updateUser(id: ID!, input: UpdateUserInput!): User!
  deleteUser(id: ID!): Boolean!
}

type Subscription {
  userCreated: User!
  messageReceived(roomId: ID!): Message!
}

Other Type Categories

Type Purpose Example
Input Arguments to mutations input CreateUserInput { name: String!, email: String! }
Enum Fixed set of values enum Status { ACTIVE INACTIVE SUSPENDED }
Interface Shared fields across types interface Node { id: ID! }
Union Type can be one of many union SearchResult = User \| Post \| Comment
Fragment Reusable field selection fragment UserFields on User { id name email }

Queries, Mutations, Subscriptions

Query — read data. Resolvers can be called in parallel:

query GetDashboard {
  currentUser {
    name
    notifications(unread: true) { id title }
  }
  trending { title views }
}

Mutation — write data. Resolvers execute sequentially:

mutation CreatePost($input: CreatePostInput!) {
  createPost(input: $input) {
    id
    title
    author { name }
  }
}

Subscription — real-time data via WebSocket (typically). Server pushes updates when events occur:

subscription OnMessageReceived($roomId: ID!) {
  messageReceived(roomId: $roomId) {
    id body sender { name } sentAt
  }
}

Resolvers

Resolvers are functions that produce data for each field in the schema. GraphQL execution is a depth-first traversal of the query tree — each field resolver receives:

  1. parent — resolved value of the parent field
  2. args — arguments passed to this field
  3. context — shared object (DB connection, auth user, DataLoaders)
  4. info — query metadata (field name, selection set, schema)
const resolvers = {
  Query: {
    user: async (_, { id }, { db }) => db.users.findById(id),
    users: async (_, { limit, offset }, { db }) =>
      db.users.findAll({ limit, offset }),
  },
  User: {
    // Parent resolver returned a user object; now resolve its posts field
    posts: async (user, { limit }, { db }) =>
      db.posts.findByUserId(user.id, limit),
  },
  Mutation: {
    createUser: async (_, { input }, { db }) => db.users.create(input),
  },
};

The N+1 Problem

The most common GraphQL performance trap. Without optimization, resolving a list of N users and their posts triggers 1 + N queries:

Query: users(limit: 20)    → SELECT * FROM users LIMIT 20          (1 query)
  User[0].posts            → SELECT * FROM posts WHERE user_id = 1  (1 query)
  User[1].posts            → SELECT * FROM posts WHERE user_id = 2  (1 query)
  ...
  User[19].posts           → SELECT * FROM posts WHERE user_id = 20 (1 query)
                                                                TOTAL: 21 queries

Real-world impact compounds with nesting — posts fetching authors fetching their posts can generate hundreds of queries for a single GraphQL request.

DataLoader — The Solution

Facebook's DataLoader batches and caches loads within a single request using Node.js's event loop tick:

import DataLoader from 'dataloader';

// Created once per request (NOT per application startup)
const postsByUserLoader = new DataLoader(async (userIds: readonly string[]) => {
  // Single batch query: SELECT * FROM posts WHERE user_id IN (1, 2, ..., 20)
  const posts = await db.posts.findByUserIds(userIds);
  // Return results in same order as input keys
  return userIds.map(id => posts.filter(p => p.userId === id));
});

// In resolver — these 20 calls become ONE SQL query
const resolvers = {
  User: {
    posts: (user, _, { loaders }) =>
      loaders.postsByUser.load(user.id),  // batched automatically
  },
};

Result: 21 queries → 2 queries (one for users, one batch for all posts).

DataLoader Instance Per Request

Create a new DataLoader instance for each request. DataLoader caches results for the duration of a request — sharing across requests will serve stale data.

Directives

Directives annotate schema elements or control query execution:

type User {
  email: String! @deprecated(reason: "Use contactEmail instead")
  contactEmail: String!
  password: String! @auth(requires: ADMIN)  # custom directive
}

# Built-in execution directives:
query GetUser($showEmail: Boolean!) {
  user(id: 42) {
    name
    email @include(if: $showEmail)   # conditionally include field
    phone @skip(if: $skipPhone)      # conditionally skip field
  }
}

Introspection

GraphQL APIs are self-documenting — clients can query the schema itself:

{
  __schema {
    types { name kind }
  }
  __type(name: "User") {
    fields { name type { name kind } }
  }
}

Introspection powers tools like GraphiQL, Apollo Studio, and GraphQL Playground. Disable introspection in production for security-sensitive APIs.

Query Complexity and Depth Limiting

Without limits, a malicious client can craft exponentially expensive queries:

# Denial-of-service via deeply nested query:
{ user { friends { friends { friends { friends { ... } } } } } }

Protect with: - Depth limiting: reject queries deeper than N levels (graphql-depth-limit) - Complexity analysis: assign costs to fields; reject queries over a budget (graphql-validation-complexity) - Query whitelisting (persisted queries): only allow pre-approved queries in production

Federation

GraphQL Federation lets multiple teams own separate subgraphs that compose into a unified supergraph — one schema, one endpoint, distributed implementation.

┌─────────────────────────────────────────────┐
│           Apollo Router (Supergraph)         │
│     Single endpoint: POST /graphql           │
└────────┬──────────────┬──────────────────────┘
         │              │
   ┌─────▼─────┐  ┌─────▼──────┐  ┌──────────┐
   │  Users     │  │  Products  │  │  Orders  │
   │  Subgraph  │  │  Subgraph  │  │ Subgraph │
   │  (Team A)  │  │  (Team B)  │  │ (Team C) │
   └───────────┘  └────────────┘  └──────────┘

Key concepts: - Entities: types that can be extended across subgraphs, identified by a @key directive - __resolveReference: resolver that hydrates an entity from a key passed by the router - @external: field defined in another subgraph, referenced here - Each subgraph is independently deployable; the router composes them at query time

Federation vs Schema Stitching

Before Federation, schema stitching was the primary approach to composing multiple GraphQL services. They solve the same problem differently:

Dimension Schema Stitching Federation
Composition Gateway merges schemas at runtime Router composes via a supergraph schema
Type ownership Gateway defines cross-service types Each subgraph owns its types via @key
Coupling Gateway knows about all subgraphs' internal types Subgraphs are self-contained; router only knows entities
Deployment Change in one subgraph may require gateway redeploy Subgraphs deploy independently
Conflict resolution Manual: gateway resolves field name conflicts Automatic: @override, @provides, @shareable directives
Tooling GraphQL Tools (@graphql-tools/stitch) Apollo Router, Apollo Studio, Cosmo Router
Status Still works; no longer recommended for new projects Industry standard for multi-team GraphQL

When stitching still makes sense: small teams, legacy services being gradually migrated, or when you need to compose third-party GraphQL APIs you don't control (federation requires subgraphs to add @key directives).

Error Handling

GraphQL errors behave fundamentally differently from REST:

Partial responses — in REST, an error means the entire response fails. In GraphQL, individual fields can fail while the rest of the response succeeds:

{
  "data": {
    "user": {
      "name": "Alice",
      "email": "alice@example.com",
      "creditScore": null
    }
  },
  "errors": [
    {
      "message": "Unauthorized to access creditScore",
      "locations": [{ "line": 5, "column": 5 }],
      "path": ["user", "creditScore"],
      "extensions": {
        "code": "UNAUTHORIZED",
        "classification": "AUTHORIZATION"
      }
    }
  ]
}

The data field contains whatever succeeded; errors contains what failed. The client must handle both.

Error extensions — the extensions field is the standard way to add machine-readable error metadata:

// Apollo Server — throw typed error with extensions
import { GraphQLError } from 'graphql';

throw new GraphQLError('Order not found', {
  extensions: {
    code: 'NOT_FOUND',
    http: { status: 404 },
    orderId: input.id,
    traceId: ctx.traceId,
  },
});

Error masking — in production, mask internal errors to prevent leaking implementation details:

// Apollo Server 4 — format error for production
const server = new ApolloServer({
  typeDefs,
  resolvers,
  formatError: (formattedError, error) => {
    // Log full error internally
    logger.error(error);
    // Return sanitized error to client
    if (formattedError.extensions?.code === 'INTERNAL_SERVER_ERROR') {
      return { message: 'Internal server error', extensions: { code: 'INTERNAL_SERVER_ERROR' } };
    }
    return formattedError;
  },
});

Error classification patterns:

Code Meaning HTTP Equivalent
BAD_USER_INPUT Invalid query variables 400
UNAUTHENTICATED Missing or invalid auth 401
FORBIDDEN Authenticated but not authorized 403
NOT_FOUND Resource doesn't exist 404
GRAPHQL_VALIDATION_FAILED Query doesn't match schema 400
PERSISTED_QUERY_NOT_FOUND Unknown query hash (APQ miss) 400
INTERNAL_SERVER_ERROR Unhandled server error 500

Caching

GraphQL caching is fundamentally harder than REST caching because requests use POST with dynamic query bodies — HTTP caches can't distinguish between different queries to the same /graphql endpoint.

HTTP-level caching (limited): - GET requests for queries: GET /graphql?query={user(id:42){name}}&variables={} — cacheable by CDN, but URL length limits apply - Automatic Persisted Queries (APQ) solve this: GET /graphql?extensions={"persistedQuery":{"sha256Hash":"abc..."}}&variables={"id":"42"} — short, cacheable, CDN-friendly

Client-side normalized caching (Apollo Client):

Apollo Client maintains an in-memory normalized cache keyed by __typename:id:

Cache store:
  User:42  → { __typename: "User", id: "42", name: "Alice", email: "alice@example.com" }
  Post:7   → { __typename: "Post", id: "7", title: "Hello", author: { __ref: "User:42" } }
  Post:8   → { __typename: "Post", id: "8", title: "World", author: { __ref: "User:42" } }

When a mutation updates User:42, every query displaying that user re-renders automatically — no manual cache invalidation. This is the primary DX advantage of GraphQL over REST for complex frontends.

Cache policies:

Policy Behavior Use Case
cache-first Read from cache; network only on miss Default; best for mostly-static data
network-only Always fetch; update cache Dashboards, real-time displays
cache-and-network Return cache immediately, then update with network Instant UI + fresh data
no-cache Fetch without reading or updating cache One-off queries, sensitive data

Server-side caching: - Response-level: cache full GraphQL responses keyed by query hash + variables (Redis) - Resolver-level: cache individual resolver results (DataLoader already provides per-request caching; add Redis for cross-request caching) - @cacheControl directive (Apollo): per-field cache hints

type Product @cacheControl(maxAge: 3600) {
  id: ID!
  name: String!
  price: Float! @cacheControl(maxAge: 60)    # price changes more often
  reviews: [Review!]! @cacheControl(maxAge: 300)
}

gRPC

gRPC (Google Remote Procedure Call) is a high-performance, open-source RPC framework that uses Protocol Buffers as its interface definition language and serialization format, and HTTP/2 as the transport protocol. A CNCF project since 2016.

Protocol Buffers (Protobuf)

Protobuf is a language-neutral, platform-neutral binary serialization format. Compared to JSON:

Property JSON Protobuf
Format Text (UTF-8) Binary
Size ~1x baseline 3–10x smaller
Parse speed ~1x baseline 5–10x faster
Schema Optional (JSON Schema) Required (.proto file)
Human-readable ❌ (need tools)
Schema evolution Manual / fragile Built-in field numbering

A .proto service definition:

syntax = "proto3";
package com.example.users;

// Message types
message User {
  string id        = 1;
  string name      = 2;
  string email     = 3;
  int64  created_at = 4;
}

message GetUserRequest  { string user_id = 1; }
message CreateUserRequest {
  string name  = 1;
  string email = 2;
}
message UserList { repeated User users = 1; }

// Service definition
service UserService {
  // Unary
  rpc GetUser(GetUserRequest) returns (User);

  // Server streaming
  rpc ListUsers(ListUsersRequest) returns (stream User);

  // Client streaming
  rpc CreateUsersBulk(stream CreateUserRequest) returns (UserList);

  // Bidirectional streaming
  rpc Chat(stream ChatMessage) returns (stream ChatMessage);
}

The protoc compiler generates strongly-typed client stubs and server interfaces in Go, Java, Python, C++, Node.js, Rust, Kotlin, Swift, and more.

HTTP/2 Features Exploited by gRPC

HTTP/2 Feature What It Enables
Multiplexing Multiple RPC calls on one TCP connection; no head-of-line blocking between requests
Binary framing Headers and data sent as binary frames — more efficient than HTTP/1.1 text headers
Header compression (HPACK) Repeated headers (auth token, content-type) sent as index references after first use; 85–90% header reduction
Full-duplex streams Client and server can send frames simultaneously on the same stream
Flow control Prevents fast producers from overwhelming slow consumers per-stream
Server push Server can pre-emptively send resources (rarely used in gRPC)

The Four Streaming Types

Unary RPC

rpc GetUser(GetUserRequest) returns (User);
Classic request-response. Client sends one message, server sends one message. Equivalent to a REST GET.

Server Streaming RPC

rpc WatchLogs(WatchRequest) returns (stream LogEntry);
Client sends one request; server streams multiple responses. Useful for: live logs, large dataset export, real-time feeds.

Client Streaming RPC

rpc UploadMetrics(stream MetricPoint) returns (UploadSummary);
Client streams multiple messages; server collects them and returns one response. Useful for: telemetry ingestion, file uploads chunked by the client, batch writes.

Bidirectional Streaming RPC

rpc BidirectionalChat(stream ChatMessage) returns (stream ChatMessage);
Both sides can send and receive messages in any order over a long-lived connection. Both streams operate independently. Useful for: chat, collaborative editing, real-time games, audio/video signaling.

Deadlines and Cancellation

Every gRPC call should set a deadline — the absolute time by which the client requires a response. The server checks whether the deadline has been exceeded before starting expensive work.

ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
defer cancel()
resp, err := client.GetUser(ctx, &pb.GetUserRequest{UserId: "42"})

Deadlines propagate through the entire call chain — if service A calls service B calls service C, all three respect the same deadline window, preventing one slow downstream call from causing timeouts at every layer.

Interceptors

Interceptors wrap gRPC method invocations — the gRPC equivalent of middleware:

// Unary server interceptor for logging
func loggingInterceptor(ctx context.Context, req interface{},
  info *grpc.UnaryServerInfo, handler grpc.UnaryHandler,
) (interface{}, error) {
  start := time.Now()
  resp, err := handler(ctx, req)
  log.Printf("Method: %s | Duration: %v | Error: %v",
    info.FullMethod, time.Since(start), err)
  return resp, err
}

// Register:
s := grpc.NewServer(
  grpc.UnaryInterceptor(loggingInterceptor),
  grpc.StreamInterceptor(streamLoggingInterceptor),
)

Common interceptors: authentication, tracing (OpenTelemetry), logging, metrics, panic recovery, rate limiting, deadline enforcement.

Load Balancing

Because gRPC multiplexes many RPCs over a single TCP connection, L4 (TCP) load balancing distributes connections, not RPCs. A single long-lived connection from service A to a single pod of service B bypasses all other pods.

Solutions: - L7 (application-layer) load balancing — proxy understands HTTP/2 streams and distributes individual RPCs: Envoy, nginx, gRPC-aware load balancers - Client-side load balancing — the gRPC client resolves all backend IPs (via DNS), maintains connections to each, and distributes RPCs itself - Headless services in Kubernetes — returns all pod IPs; combined with gRPC client-side round-robin

gRPC-Web (Browser Bridge)

Browsers cannot make native HTTP/2 gRPC calls (no access to HTTP/2 frames or trailers). gRPC-Web bridges this gap with a protocol translation proxy.

flowchart LR
    B[Browser\ngRPC-Web Client] -->|HTTP/1.1 or HTTP/2\nContent-Type: application/grpc-web| P[Envoy Proxy\ngRPC-Web Filter]
    P -->|Native HTTP/2 gRPC| S[gRPC Server]

How it works: 1. Browser client uses @grpc/grpc-web or connect-web to make gRPC calls 2. Calls are encoded as application/grpc-web (base64 or binary) over standard HTTP 3. Envoy proxy (or Connect protocol server) translates to native gRPC 4. Server sees standard gRPC requests — no code changes needed

// Browser client using Connect (modern alternative to grpc-web)
import { createClient } from "@connectrpc/connect";
import { createGrpcWebTransport } from "@connectrpc/connect-web";
import { UserService } from "./gen/users_connect";

const transport = createGrpcWebTransport({
  baseUrl: "https://api.example.com",
});

const client = createClient(UserService, transport);
const user = await client.getUser({ userId: "42" });

gRPC-Web limitations: - Only unary and server-streaming RPCs (no client-streaming or bidirectional) - Requires a proxy (Envoy, Connect, nginx) unless using Connect protocol natively - Slightly higher latency due to protocol translation

Connect protocol (from Buf) is the modern alternative: supports gRPC, gRPC-Web, and a new Connect protocol natively — all three over a single HTTP endpoint, with browser support without a proxy for the Connect wire format.


SOAP / XML-RPC

SOAP (Simple Object Access Protocol) is the predecessor to REST. Still deeply embedded in enterprise systems, financial services, healthcare (HL7), and government integrations.

Protocol Structure

A SOAP message is an XML document with a mandatory Envelope, optional Header, and mandatory Body:

<?xml version="1.0" encoding="UTF-8"?>
<soap:Envelope
  xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"
  xmlns:usr="http://example.com/users">
  <soap:Header>
    <usr:AuthToken>abc123</usr:AuthToken>
  </soap:Header>
  <soap:Body>
    <usr:GetUser>
      <usr:UserId>42</usr:UserId>
    </usr:GetUser>
  </soap:Body>
</soap:Envelope>

WSDL (Web Services Description Language)

WSDL is SOAP's IDL — an XML document that describes the service completely: operations, input/output message types, bindings (how operations map to protocols), and endpoints. It serves the same role as OpenAPI for REST or .proto files for gRPC.

<wsdl:definitions name="UserService" ...>
  <wsdl:types>
    <xs:schema>
      <xs:element name="GetUserRequest">
        <xs:complexType>
          <xs:sequence>
            <xs:element name="UserId" type="xs:string"/>
          </xs:sequence>
        </xs:complexType>
      </xs:element>
    </xs:schema>
  </wsdl:types>
  <wsdl:message name="GetUserInput">
    <wsdl:part name="parameters" element="tns:GetUserRequest"/>
  </wsdl:message>
  <wsdl:portType name="UserServicePortType">
    <wsdl:operation name="GetUser">
      <wsdl:input message="tns:GetUserInput"/>
      <wsdl:output message="tns:GetUserOutput"/>
    </wsdl:operation>
  </wsdl:portType>
</wsdl:definitions>

SOAP vs REST

Dimension SOAP REST
Payload XML (verbose) JSON (compact)
Contract WSDL (machine-readable) OpenAPI (optional)
Transport HTTP, SMTP, TCP HTTP only
State Stateful or stateless Stateless
Security WS-Security (powerful but complex) OAuth 2.0, JWT, mTLS
Error handling soap:Fault (standardized) HTTP status codes (convention-based)
Tooling Mature but heavy Light and universal
Still used for Banking, insurance, health (HL7), government Virtually everything new

XML-RPC predates SOAP — a simpler, less extensible ancestor using XML payloads over HTTP POST. Effectively obsolete.


WebSocket

WebSocket provides a persistent, full-duplex TCP connection between client and server, established via an HTTP upgrade handshake. Once established, either side can send messages at any time with minimal overhead.

Handshake

# Client initiates upgrade:
GET /ws HTTP/1.1
Host: api.example.com
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==
Sec-WebSocket-Version: 13

# Server confirms upgrade:
HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo=

After the handshake, the connection is no longer HTTP. Data flows as frames — the minimal overhead unit:

Frame Type Description
Text frame UTF-8 text message
Binary frame Raw bytes (audio, video, protobuf)
Ping frame Heartbeat probe (server → client)
Pong frame Heartbeat response
Close frame Graceful connection termination

Connection Management

The primary operational challenge of WebSocket is connection state management:

  • Heartbeats (ping/pong): detect dead connections that appear open at the TCP layer. Servers should send pings every 30–60s; if no pong arrives, close and clean up.
  • Reconnection: clients should implement exponential backoff when the connection drops. Libraries like reconnecting-websocket handle this automatically.
  • Backpressure: if a slow client can't consume fast enough, the server's send buffer fills. Monitor ws.bufferedAmount on the client, or implement application-level flow control.
  • Horizontal scaling: WebSocket connections are stateful and sticky. A message sent by user A (connected to server 1) destined for user B (connected to server 2) must be routed between servers via a pub/sub layer (Redis Pub/Sub, Kafka).

When to Use WebSocket

  • Interactive real-time features: chat, collaborative document editing, multiplayer gaming
  • Financial data: live order books, tick-by-tick price feeds
  • IoT: bidirectional device control with low latency
  • When the client sends frequent data to the server (>1 msg/second)

Server-Sent Events (SSE)

SSE is a W3C standard for server-to-client streaming over plain HTTP. Unlike WebSocket, there is no protocol upgrade — it's just a long-lived HTTP response with Content-Type: text/event-stream.

Protocol

Server response:

HTTP/1.1 200 OK
Content-Type: text/event-stream
Cache-Control: no-cache
Connection: keep-alive

id: 1
event: message
data: {"type": "notification", "text": "Hello!"}

id: 2
event: update
data: {"user": "alice", "status": "online"}

: heartbeat comment (ignored by client)

SSE message fields: | Field | Purpose | |---|---| | data: | The message payload (can span multiple lines) | | event: | Custom event type (client listens via addEventListener) | | id: | Message ID; sent as Last-Event-ID header on reconnect | | : (comment) | Ignored by client; used for keepalive pings |

Auto-Reconnection

SSE's killer feature: if the connection drops, the browser automatically reconnects and sends the Last-Event-ID header — the server can resume from where it left off. No client code required.

const source = new EventSource('/events');

source.addEventListener('message', e => console.log(e.data));
source.addEventListener('update', e => handleUpdate(JSON.parse(e.data)));
source.onerror = e => console.error('SSE error', e);
// Reconnection happens automatically — no manual retry logic needed

HTTP/2 SSE

Under HTTP/1.1, browsers limit each domain to 6 connections. With 7 tabs open, SSE connections compete with XHR/fetch requests. Under HTTP/2, all SSE streams multiplex over a single TCP connection — this limit disappears entirely.

AI Streaming

SSE is the standard for LLM token streaming. OpenAI, Anthropic, and virtually all LLM APIs stream completions via SSE because data flows in one direction (server → client), SSE is simpler than WebSocket, and auto-reconnect handles transient failures gracefully.


Webhooks

Webhooks are HTTP POST callbacks — the server pushes events to client-registered URLs instead of the client polling for changes. "Don't call us, we'll call you."

Flow

sequenceDiagram
    participant Client
    participant YourServer
    participant WebhookConsumer

    Client->>YourServer: Register webhook URL
    Note over YourServer: Event occurs (payment, commit, signup)
    YourServer->>WebhookConsumer: POST /webhook {"event": "payment.succeeded", ...}
    WebhookConsumer-->>YourServer: 200 OK (within 5s)
    Note over WebhookConsumer: Queue event for async processing

Production Webhook Pattern

Respond immediately, process asynchronously:

@app.post("/webhook")
async def webhook_handler(request: Request):
    payload = await request.json()
    # 1. Validate signature FIRST
    verify_signature(request.headers, payload)
    # 2. Return 200 immediately — before any processing
    background_tasks.add_task(process_event, payload)
    return {"status": "accepted"}

Never do slow work (DB queries, API calls) in the webhook handler. Return 200 within 5 seconds or the sender will retry.

Security: Signature Verification

Every webhook provider should sign payloads. Verify before processing:

import hmac, hashlib

def verify_signature(headers: dict, body: bytes, secret: str) -> bool:
    expected = hmac.new(
        secret.encode(), body, hashlib.sha256
    ).hexdigest()
    received = headers.get("X-Signature-256", "").removeprefix("sha256=")
    return hmac.compare_digest(expected, received)

Reliability Patterns

Pattern Purpose
Idempotency key Deduplicate retried deliveries — store processed event IDs
Exponential backoff retries Sender retries on non-2xx: immediately, 5s, 30s, 5m, 30m, 2h
Dead letter queue After N retries, move to DLQ for manual inspection
Event replay Allow consumers to re-request past events by ID
CloudEvents format Standard envelope: id, source, type, time, data

tRPC

tRPC lets TypeScript full-stack teams build APIs where type safety flows automatically from server to client — no code generation, no schema files, no out-of-sync types.

How It Works

  1. Define procedures on the server (TypeScript functions)
  2. Export the router's type
  3. Import and use that type on the client
  4. TypeScript infers input/output types automatically

The client never imports server implementation code — only the type. At runtime, tRPC serializes calls over HTTP (queries → GET/POST, mutations → POST, subscriptions → WebSocket).

Routers and Procedures

// server/routers/users.ts
import { z } from 'zod';
import { router, publicProcedure, protectedProcedure } from '../trpc';

export const userRouter = router({
  // Query — GET /trpc/users.getById
  getById: publicProcedure
    .input(z.object({ id: z.string() }))
    .query(async ({ input, ctx }) => {
      return ctx.db.user.findUnique({ where: { id: input.id } });
    }),

  // Mutation — POST /trpc/users.create
  create: protectedProcedure
    .input(z.object({ name: z.string(), email: z.string().email() }))
    .mutation(async ({ input, ctx }) => {
      return ctx.db.user.create({ data: input });
    }),
});

// server/routers/_app.ts
export const appRouter = router({
  users: userRouter,
  posts: postRouter,
  comments: commentRouter,
});

export type AppRouter = typeof appRouter;  // ← this is all the client needs

Client Usage

// client/trpc.ts
import { createTRPCReact } from '@trpc/react-query';
import type { AppRouter } from '../server/routers/_app';

export const trpc = createTRPCReact<AppRouter>();

// In a React component:
function UserProfile({ userId }: { userId: string }) {
  // Fully typed: input, output, error — all inferred from server code
  const { data, isLoading } = trpc.users.getById.useQuery({ id: userId });
  // data is typed as: User | null | undefined
  // Change server return type → TypeScript error here immediately
}

Context and Middleware

// Context: per-request shared state (auth user, DB, etc.)
export const createContext = async ({ req, res }: CreateNextContextOptions) => ({
  db: prisma,
  session: await getSession({ req }),
});

// Middleware: wraps procedures with reusable logic
const isAuthenticated = middleware(({ ctx, next }) => {
  if (!ctx.session?.user) throw new TRPCError({ code: 'UNAUTHORIZED' });
  return next({ ctx: { ...ctx, user: ctx.session.user } });
});

// Protected procedure: any procedure using this is automatically auth-gated
const protectedProcedure = publicProcedure.use(isAuthenticated);

tRPC vs Alternatives

Dimension tRPC REST + OpenAPI GraphQL
Type safety ✅ Automatic, zero-gen ⚠️ Code generation required ⚠️ Code generation required
Language support TypeScript/JS only Universal Universal
Schema file ❌ None (types are the schema) OpenAPI YAML/JSON .graphql SDL
Learning curve Low (just TypeScript) Low High
Client flexibility ❌ Must use tRPC client ✅ Any HTTP client ✅ Any GraphQL client
Over/under-fetching Field selection not built-in Full response always ✅ Client specifies fields
Best for TypeScript monorepos (T3 stack, Next.js) Public APIs, polyglot Complex multi-client frontends

Choosing the Right API Paradigm

Is this a public API consumed by external developers or third parties?
→ REST (universal, familiar, broad tooling)

Is the frontend complex with multiple clients fetching different data shapes?
→ GraphQL (eliminates over/under-fetching, empowers frontend teams)

Is this internal service-to-service communication with high throughput?
→ gRPC (fastest, binary, streaming support, code-gen clients)

Does the data need to flow in real time in both directions?
→ WebSocket (full-duplex, persistent)

Does the server push updates to passive clients (feeds, notifications)?
→ SSE (simpler than WebSocket, HTTP-native, auto-reconnect)

Is the entire stack TypeScript and owned by one team?
→ tRPC (zero boilerplate, type-safe end-to-end)

Does an external system need to notify you when events occur?
→ Webhooks (event-driven push, polling eliminated)

Is this a legacy enterprise or regulated domain (banking, healthcare)?
→ SOAP (accept the complexity; interoperability with existing systems)

It Is Not Either-Or

Real systems commonly use multiple paradigms together: a public REST API for external consumers, gRPC internally between microservices, GraphQL for the customer-facing frontend, WebSocket for real-time features, and webhooks for third-party integrations.


HTTP/2 and HTTP/3 (QUIC)

All modern API protocols ride on top of HTTP — understanding transport evolution is essential.

HTTP/2 (2015, RFC 7540)

HTTP/2 is the minimum transport for gRPC and significantly improves REST/GraphQL performance.

Feature HTTP/1.1 HTTP/2
Framing Text-based Binary frames
Multiplexing ❌ (one request per TCP connection) ✅ Multiple streams per connection
Header compression ✅ HPACK
Server push ✅ (rarely used in practice)
Connection limit 6 per origin (browser) 1 TCP connection, unlimited streams
Head-of-line blocking ✅ At HTTP layer ❌ At HTTP layer — but YES at TCP layer

The TCP head-of-line blocking problem: if a single TCP packet is lost, ALL HTTP/2 streams on that connection stall until retransmission completes. This is the fundamental limitation HTTP/3 solves.

HTTP/3 (2022, RFC 9114)

HTTP/3 replaces TCP with QUIC (UDP-based transport with built-in TLS 1.3).

graph TB
    subgraph "HTTP/2 Stack"
        H2[HTTP/2] --> TLS2[TLS 1.2/1.3]
        TLS2 --> TCP[TCP]
        TCP --> IP1[IP]
    end
    subgraph "HTTP/3 Stack"
        H3[HTTP/3] --> QUIC[QUIC\nbuilt-in TLS 1.3]
        QUIC --> UDP[UDP]
        UDP --> IP2[IP]
    end

Key improvements:

Feature HTTP/2 (TCP) HTTP/3 (QUIC)
Head-of-line blocking ✅ TCP-level HOL ❌ Independent streams per QUIC stream
Connection setup TCP handshake + TLS handshake (2–3 RTT) 0-RTT or 1-RTT (TLS built into QUIC)
Connection migration ❌ New connection on network change ✅ Connection ID survives IP change
Packet loss recovery Entire connection stalls Only affected stream pauses
Congestion control Kernel TCP (cubic/bbr) User-space (pluggable, per-connection)

Connection migration is particularly impactful for mobile APIs: when a phone switches from WiFi to cellular, HTTP/2 drops the TCP connection and must re-handshake. HTTP/3's connection ID persists across network changes — the connection continues seamlessly.

0-RTT resumption: returning clients can send data in the very first packet by reusing a previously negotiated TLS session. Crucial for latency-sensitive API calls on mobile networks.

0-RTT Replay Risk

0-RTT data can be replayed by a network attacker. Only use 0-RTT for idempotent operations (GET). Non-idempotent operations (POST) should wait for the full handshake.

gRPC and HTTP/3: gRPC currently requires HTTP/2. Experimental gRPC-over-QUIC implementations exist (e.g., quic-go), but the gRPC specification does not officially support HTTP/3 yet. When it does, the independent-stream property of QUIC will eliminate the head-of-line blocking that currently affects multiplexed gRPC connections.

Content Negotiation

Content negotiation lets client and server agree on response format:

# Client requests JSON, can accept XML as fallback
GET /v2/orders/42 HTTP/1.1
Accept: application/json, application/xml;q=0.9, */*;q=0.1
Accept-Language: en-US, fr;q=0.5
Accept-Encoding: gzip, br

# Server responds with chosen representation
HTTP/1.1 200 OK
Content-Type: application/json; charset=utf-8
Content-Language: en-US
Content-Encoding: br
Vary: Accept, Accept-Language, Accept-Encoding

The Vary header tells caches which request headers affect the response — critical for correct caching behavior.

API versioning via content negotiation:

Accept: application/vnd.example.v2+json

This is the most RESTful versioning approach (no URL pollution) but less discoverable than URI versioning.


Architectural Patterns

Backend for Frontend (BFF)

The BFF pattern creates a dedicated API gateway per client type — each frontend gets an API layer optimized for its specific data needs.

flowchart LR
    subgraph Clients
        M[Mobile App]
        W[Web App]
        TV[Smart TV]
    end
    subgraph BFF Layer
        MB[Mobile BFF\nGo / Node.js]
        WB[Web BFF\nNode.js]
        TB[TV BFF\nNode.js]
    end
    subgraph Backend Services
        US[User Service]
        PS[Product Service]
        OS[Order Service]
    end

    M --> MB
    W --> WB
    TV --> TB
    MB --> US & PS & OS
    WB --> US & PS & OS
    TB --> US & PS

Why BFF over a single gateway: - Mobile needs minimal payloads; web needs rich data — one API can't optimize for both - Each BFF aggregates multiple backend calls into one client-optimized response - Teams can deploy BFFs independently; breaking a mobile BFF doesn't affect web - Authentication/session management can differ per client type

BFF vs GraphQL: GraphQL solves the over/under-fetching problem with client-specified queries, potentially eliminating the need for separate BFFs. However, BFF is still valuable when: - Clients need significantly different business logic (not just different fields) - The team wants to contain complexity behind a simple REST API per client - Backend services expose gRPC — the BFF translates to REST/JSON for browser clients

GraphQL Persisted Queries

Persisted queries replace arbitrary client-sent GraphQL strings with pre-registered query IDs — improving security, performance, and bandwidth.

# Without persisted queries — client sends full query string
POST /graphql
{"query": "query GetUser($id: ID!) { user(id: $id) { name email posts { title } } }", "variables": {"id": "42"}}

# With persisted queries — client sends only the hash
POST /graphql
{"extensions": {"persistedQuery": {"version": 1, "sha256Hash": "ecf4edb46db40b5132295c0291d62fb65d6759a9eedfa4062f09b5bad56a6585"}}, "variables": {"id": "42"}}

Automatic persisted queries (APQ) flow (Apollo): 1. Client sends query hash only 2. If server doesn't recognize the hash → returns PersistedQueryNotFound 3. Client retries with full query string + hash 4. Server stores the mapping; subsequent requests use hash only

Benefits: - Security: in locked-down mode, server rejects any query not in the allowlist — prevents arbitrary query attacks - Bandwidth: hash (64 chars) replaces potentially multi-KB query strings - CDN caching: hash-based GET requests are cacheable at edge (GET /graphql?extensions={...}&variables={...})

gRPC Health Checking Protocol

gRPC defines a standardized health checking protocol (grpc.health.v1) for load balancers and orchestrators:

syntax = "proto3";
package grpc.health.v1;

service Health {
  rpc Check(HealthCheckRequest) returns (HealthCheckResponse);
  rpc Watch(HealthCheckRequest) returns (stream HealthCheckResponse);
}

message HealthCheckRequest {
  string service = 1;  // empty string = overall server health
}

message HealthCheckResponse {
  enum ServingStatus {
    UNKNOWN = 0;
    SERVING = 1;
    NOT_SERVING = 2;
    SERVICE_UNKNOWN = 3;
  }
  ServingStatus status = 1;
}
# Check health with grpcurl
grpcurl -plaintext localhost:50051 grpc.health.v1.Health/Check

# Check specific service
grpcurl -plaintext -d '{"service": "orders.OrderService"}' \
  localhost:50051 grpc.health.v1.Health/Check

# Kubernetes gRPC health probe (k8s 1.24+)
# In pod spec:
# livenessProbe:
#   grpc:
#     port: 50051
#     service: ""

gRPC Server Reflection

Server reflection allows tools like grpcurl to discover services without .proto files — the gRPC equivalent of OpenAPI's /swagger.json:

// Enable reflection in Go gRPC server
import "google.golang.org/grpc/reflection"

s := grpc.NewServer()
pb.RegisterOrderServiceServer(s, &server{})
reflection.Register(s)  // enables runtime schema discovery
# Discover all services (requires reflection)
grpcurl -plaintext localhost:50051 list

# Describe a specific service
grpcurl -plaintext localhost:50051 describe orders.OrderService

# Describe a message type
grpcurl -plaintext localhost:50051 describe orders.Order

Disable Reflection in Production

Like GraphQL introspection, gRPC reflection exposes your entire API surface. Disable it in production or restrict to authorized callers only.


API Performance Patterns

Request Compression

# Client sends compressed body
POST /v2/orders HTTP/1.1
Content-Encoding: gzip
Content-Type: application/json

# Client requests compressed response
GET /v2/orders HTTP/1.1
Accept-Encoding: gzip, br

Brotli (br) achieves 15–25% better compression than gzip for JSON/text payloads, but requires more CPU for compression. Most CDNs pre-compress static assets with Brotli. For dynamic API responses, gzip is usually the better trade-off (faster compression, slightly larger output).

Connection Pooling

HTTP/1.1 clients should maintain a connection pool to avoid the overhead of TCP+TLS handshakes per request:

Setting Typical Value Notes
Pool size (per host) 20–100 Match to expected concurrency
Idle timeout 30–90s Close idle connections to free resources
Max lifetime 5–10 min Prevent sticky connections to a single backend
Health check interval 10s Detect dead connections proactively

HTTP/2 clients typically use a single connection per host with unlimited streams — connection pooling is less critical but still relevant for fault tolerance (maintain 2–3 connections).

ETag-Based Conditional Requests

First request:
  GET /v2/orders/42 → 200 OK, ETag: "abc123"

Subsequent request:
  GET /v2/orders/42
  If-None-Match: "abc123"
  → 304 Not Modified (no body, use cached copy)
  → 200 OK + new ETag (resource changed, here is new version)

ETags reduce bandwidth and server load. For mutable resources, use strong ETags (exact byte-for-byte match). For semantic equivalence, use weak ETags (W/"abc123").

Async Request Collapsing (Request Deduplication)

When multiple clients request the same resource simultaneously, collapse them into a single backend request:

Time T=0:  Client A → GET /products/42
Time T=1ms: Client B → GET /products/42  (same key, collapse)
Time T=2ms: Client C → GET /products/42  (same key, collapse)
Time T=50ms: Backend returns → fan out to A, B, C

Result: 1 backend call instead of 3

Implemented in: Nginx (proxy_cache_lock), Varnish (grace mode), CloudFlare, Envoy.


Benchmarks: Protocol Performance

Approximate comparisons under controlled conditions. Real-world performance depends heavily on payload, network, and implementation.

Metric REST (JSON/HTTP2) GraphQL (JSON/HTTP2) gRPC (Protobuf/HTTP2)
Serialization size (1KB logical payload) ~1.2 KB ~1.0 KB (no over-fetching) ~0.4 KB
Serialization time ~1x baseline ~1x ~0.1–0.3x (binary)
Latency (unary, same DC) ~1–5ms ~2–8ms (resolver overhead) ~0.5–2ms
Throughput (single connection) Limited by HTTP/1.1 HOL Same as REST Higher (multiplexed, binary)
Browser support ✅ Native ✅ Native ⚠️ grpc-web proxy required
Streaming ❌ (SSE for server-push) ✅ Subscriptions (WS) ✅ 4 streaming types

When Performance Matters Less

For most CRUD APIs, the difference between REST and gRPC latency is negligible compared to database query time. Choose the paradigm based on developer experience and client requirements, not raw protocol speed — unless you're building a low-latency trading system or processing millions of internal RPCs per second.