API Gateway Architecture: Complete Implementation Guide for Production Systems

Introduction

Your microservices architecture is growing. You now have 15 services, each implementing its own authentication, rate limiting, and logging. A security audit reveals three services using outdated JWT libraries. Performance testing shows duplicate calls because caching isn't standardized. Your ops team is drowning in configuration drift across services.

This is exactly why API gateways exist.

An API gateway acts as the single entry point for all client requests, centralizing cross-cutting concerns like authentication, rate limiting, caching, and routing. Instead of each service reimplementing these responsibilities, the gateway handles them once—consistently, efficiently, and securely.

The impact is substantial: Companies report 40% reduction in backend code, 10x improvement in incident response time, and 60% faster time-to-market for new APIs after adopting gateway patterns.

This comprehensive guide covers API gateway architecture from fundamentals to production deployment, with real-world examples using Kong, AWS API Gateway, Azure API Management, and NGINX.

What is an API Gateway?

Core Responsibilities

An API gateway sits between clients and backend services, handling:

1. Request Routing

Client → API Gateway → Service Discovery → Backend Service

2. Authentication & Authorization

JWT validation
OAuth 2.0 flows
API key management
mTLS verification

3. Rate Limiting & Throttling

Per-client limits
Global quotas
Burst handling

4. Response Transformation

Protocol translation (REST ↔ gRPC)
Response aggregation
Data masking

5. Caching

Response caching
Cache invalidation
Edge caching

6. Monitoring & Logging

Request/response logging
Metrics collection
Distributed tracing

Gateway vs Reverse Proxy

Feature	Reverse Proxy (NGINX)	API Gateway
Routing	✅ Basic	✅ Advanced (path, header, query)
Load Balancing	✅	✅
SSL Termination	✅	✅
Authentication	⚠️ Limited	✅ Full (JWT, OAuth, API keys)
Rate Limiting	⚠️ Basic	✅ Advanced (per-user, per-endpoint)
Response Caching	✅	✅
API Analytics	❌	✅
Service Discovery	❌	✅
Circuit Breaking	❌	✅

Architecture Patterns

1. Single Gateway Pattern

Use case: Small to medium applications, single team

┌─────────┐
│ Clients │
└────┬────┘
     │
┌────▼────────────┐
│   API Gateway   │
└────┬────┬───┬───┘
     │    │   │
  ┌──▼─┐ │  ┌▼──┐
  │Auth│ │  │DB │
  └──┬─┘ │  └───┘
     │   │
  ┌──▼───▼──┐
  │ Service │
  └─────────┘

Pros:

Simple architecture
Single point of configuration
Easy to reason about

Cons:

Single point of failure
Scaling limitations
Team bottlenecks

2. Gateway Per Team Pattern

Use case: Large organizations, multiple autonomous teams

┌──────────────────┐
│     Clients      │
└────┬─────────┬───┘
     │         │
┌────▼─────┐ ┌▼──────────┐
│ Gateway  │ │ Gateway   │
│ (Team A) │ │ (Team B)  │
└────┬─────┘ └┬──────────┘
     │         │
┌────▼────┐ ┌─▼─────────┐
│Services │ │ Services  │
│(Team A) │ │ (Team B)  │
└─────────┘ └───────────┘

Pros:

Team autonomy
Independent deployments
Isolated failures

Cons:

Client complexity
Duplicate configuration
Cross-team coordination

3. Backend for Frontend (BFF) Pattern

Use case: Multiple client types (web, mobile, IoT)

┌────┐ ┌─────┐ ┌────┐
│Web │ │Mobile│ │IoT │
└─┬──┘ └──┬──┘ └─┬──┘
  │       │      │
┌─▼──┐ ┌──▼──┐ ┌▼───┐
│Web │ │Mobile│ │IoT │
│BFF │ │ BFF │ │BFF │
└─┬──┘ └──┬──┘ └┬───┘
  │       │      │
  └───────┼──────┘
          │
    ┌─────▼──────┐
    │  Services  │
    └────────────┘

Pros:

Optimized for each client
Independent evolution
Better performance

Cons:

More complexity
Code duplication
Maintenance overhead

4. Micro Gateway Pattern

Use case: Kubernetes/service mesh environments

┌─────────────┐
│   Ingress   │
└──────┬──────┘
       │
┌──────▼──────┐
│ API Gateway │ (Global)
└──┬───┬───┬──┘
   │   │   │
┌──▼┐ ┌▼─┐ ┌▼──┐
│MG1│ │MG2│ │MG3│ (Micro Gateways)
└─┬─┘ └┬──┘ └┬──┘
  │    │     │
┌─▼─┐ ┌▼──┐ ┌▼──┐
│Svc│ │Svc│ │Svc│
└───┘ └───┘ └───┘

Pros:

Decentralized control
Service-level policies
Better resilience

Cons:

Complex orchestration
Harder debugging
More moving parts

Implementation: Kong Gateway

Installation (Docker)

# docker-compose.yml version: '3.8' services: kong-database: image: postgres:15 environment: POSTGRES_DB: kong POSTGRES_USER: kong POSTGRES_PASSWORD: kong volumes: - kong_data:/var/lib/postgresql/data networks: - kong-net kong-migrations: image: kong:3.5 command: kong migrations bootstrap environment: KONG_DATABASE: postgres KONG_PG_HOST: kong-database KONG_PG_USER: kong KONG_PG_PASSWORD: kong depends_on: - kong-database networks: - kong-net kong: image: kong:3.5 environment: KONG_DATABASE: postgres KONG_PG_HOST: kong-database KONG_PG_USER: kong KONG_PG_PASSWORD: kong KONG_PROXY_ACCESS_LOG: /dev/stdout KONG_ADMIN_ACCESS_LOG: /dev/stdout KONG_PROXY_ERROR_LOG: /dev/stderr KONG_ADMIN_ERROR_LOG: /dev/stderr KONG_ADMIN_LISTEN: 0.0.0.0:8001 KONG_ADMIN_GUI_URL: http://localhost:8002 ports: - "8000:8000" # Proxy - "8443:8443" # Proxy SSL - "8001:8001" # Admin API - "8002:8002" # Admin GUI depends_on: - kong-database - kong-migrations networks: - kong-net volumes: kong_data:

networks: kong-net:

Basic Configuration

# Start Kong docker-compose up -d Create a service curl -i -X POST http://localhost:8001/services --data name=user-service --data url=http://backend:3000 Create a route curl -i -X POST http://localhost:8001/services/user-service/routes --data 'paths[]=/users' --data name=user-route Test

curl http://localhost:8000/users

Authentication: JWT Plugin

# Enable JWT plugin curl -X POST http://localhost:8001/services/user-service/plugins \ --data "name=jwt" Create consumer curl -X POST http://localhost:8001/consumers --data "username=john" Create JWT credential

curl -X POST http://localhost:8001/consumers/john/jwt --data "key=john-api-key" --data "secret=my-secret-key"

Generate JWT (Node.js):

const jwt = require('jsonwebtoken');
const token = jwt.sign(
{ sub: 'john', exp: Math.floor(Date.now() / 1000) + 3600 },
'my-secret-key',
{ header: { iss: 'john-api-key' } }
);
console.log(token);

Make authenticated request:

curl -H "Authorization: Bearer <JWT_TOKEN>" \
  http://localhost:8000/users

Rate Limiting

# Apply rate limiting (100 requests per minute)
curl -X POST http://localhost:8001/services/user-service/plugins \
  --data "name=rate-limiting" \
  --data "config.minute=100" \
  --data "config.policy=local"
Per-consumer rate limiting
curl -X POST http://localhost:8001/plugins 

--data "name=rate-limiting" 

--data "consumer.id=<CONSUMER_ID>" 

--data "config.minute=50"

Response Caching

# Enable proxy caching
curl -X POST http://localhost:8001/services/user-service/plugins \
  --data "name=proxy-cache" \
  --data "config.strategy=memory" \
  --data "config.content_type[]=application/json" \
  --data "config.cache_ttl=300"

Request Transformation

# Add headers to upstream curl -X POST http://localhost:8001/services/user-service/plugins \ --data "name=request-transformer" \ --data "config.add.headers=X-Request-ID:$(uuidgen)" \ --data "config.add.headers=X-Gateway:Kong" Remove sensitive headers from response

curl -X POST http://localhost:8001/services/user-service/plugins --data "name=response-transformer" --data "config.remove.headers=X-Internal-Token"

Implementation: AWS API Gateway

REST API Configuration (Terraform)

# api_gateway.tf resource "aws_api_gateway_rest_api" "main" { name = "production-api" description = "Production API Gateway" endpoint_configuration } Users resource resource "aws_api_gateway_resource" "users" GET /users method resource "aws_api_gateway_method" "get_users" { rest_api_id = aws_api_gateway_rest_api.main.id resource_id = aws_api_gateway_resource.users.id http_method = "GET" authorization = "COGNITO_USER_POOLS" authorizer_id = aws_api_gateway_authorizer.cognito.id request_parameters = { "method.request.querystring.limit" = false "method.request.querystring.offset" = false } } Lambda integration resource "aws_api_gateway_integration" "users_lambda" Cognito authorizer resource "aws_api_gateway_authorizer" "cognito" Usage plan with rate limiting resource "aws_api_gateway_usage_plan" "standard" { name = "standard-plan" api_stages throttle_settings quota_settings } Deploy resource "aws_api_gateway_deployment" "prod" { rest_api_id = aws_api_gateway_rest_api.main.id triggers = lifecycle } resource "aws_api_gateway_stage" "prod" { deployment_id = aws_api_gateway_deployment.prod.id rest_api_id = aws_api_gateway_rest_api.main.id stage_name = "prod" Enable caching cache_cluster_enabled = true cache_cluster_size = "0.5" # 0.5GB Enable detailed CloudWatch metrics xray_tracing_enabled = true

access_log_settings) } }

HTTP API (WebSocket Support)

# HTTP API (cheaper, simpler)
resource "aws_apigatewayv2_api" "http" {
  name          = "http-api"
  protocol_type = "HTTP"
cors_configuration {
allow_origins = ["https://example.com"]
allow_methods = ["GET", "POST", "PUT", "DELETE"]
allow_headers = ["Authorization", "Content-Type"]
max_age       = 3600
}
}
resource "aws_apigatewayv2_integration" "lambda"
resource "aws_apigatewayv2_route" "get_users" {
api_id    = aws_apigatewayv2_api.http.id
route_key = "GET /users"
target    = "integrations/${aws_apigatewayv2_integration.lambda.id}"
authorization_type = "JWT"
authorizer_id      = aws_apigatewayv2_authorizer.jwt.id
}
JWT authorizer
resource "aws_apigatewayv2_authorizer" "jwt" {
api_id           = aws_apigatewayv2_api.http.id
authorizer_type  = "JWT"
identity_sources = ["$request.header.Authorization"]
name             = "jwt-authorizer"
jwt_configuration
}
resource "aws_apigatewayv2_stage" "prod" {
api_id      = aws_apigatewayv2_api.http.id
name        = "prod"
auto_deploy = true
default_route_settings
access_log_settings)
}
}

Implementation: Azure API Management

Bicep Deployment

// apim.bicep
param location string = resourceGroup().location
param apimName string = 'prod-apim'
resource apim 'Microsoft.ApiManagement/service@2023-05-01-preview' = {
name: apimName
location: location
sku: {
name: 'Developer'
capacity: 1
}
properties: {
publisherEmail: 'admin@example.com'
publisherName: 'Example Corp'
}
}
// Backend service
resource backend 'Microsoft.ApiManagement/service/backends@2023-05-01-preview' = {
parent: apim
name: 'user-service'
properties: {
url: 'https://backend.example.com'
protocol: 'http'
tls: {
validateCertificateChain: true
validateCertificateName: true
}
}
}
// API definition
resource api 'Microsoft.ApiManagement/service/apis@2023-05-01-preview' = {
parent: apim
name: 'users-api'
properties: {
displayName: 'Users API'
path: 'users'
protocols: ['https']
subscriptionRequired: true
serviceUrl: 'https://backend.example.com'
}
}
// GET /users operation
resource getUsers 'Microsoft.ApiManagement/service/apis/operations@2023-05-01-preview' = {
parent: api
name: 'get-users'
properties: {
displayName: 'Get Users'
method: 'GET'
urlTemplate: '/'
responses: [
{
statusCode: 200
description: 'Success'
}
]
}
}
// Policy: JWT validation + rate limiting
resource policy 'Microsoft.ApiManagement/service/apis/operations/policies@2023-05-01-preview' = {
parent: getUsers
name: 'policy'
properties: {
value: '''
<policies>
<inbound>
<base />
<!-- JWT validation -->
<validate-jwt header-name="Authorization" failed-validation-httpcode="401">
<openid-config url="https://login.microsoftonline.com/{tenant-id}/v2.0/.well-known/openid-configuration" />
<required-claims>
<claim name="aud">
<value>api.example.com</value>
</claim>
</required-claims>
</validate-jwt>
      &lt;!-- Rate limiting: 100 calls per minute --&gt;
      &lt;rate-limit calls=&quot;100&quot; renewal-period=&quot;60&quot; /&gt;

      &lt;!-- Caching --&gt;
      &lt;cache-lookup vary-by-developer=&quot;false&quot; vary-by-developer-groups=&quot;false&quot; downstream-caching-type=&quot;none&quot;&gt;
        &lt;vary-by-query-parameter&gt;limit&lt;/vary-by-query-parameter&gt;
        &lt;vary-by-query-parameter&gt;offset&lt;/vary-by-query-parameter&gt;
      &lt;/cache-lookup&gt;

      &lt;!-- Add correlation ID --&gt;
      &lt;set-header name=&quot;X-Correlation-ID&quot; exists-action=&quot;override&quot;&gt;
        &lt;value&gt;@(Guid.NewGuid().ToString())&lt;/value&gt;
      &lt;/set-header&gt;
    &lt;/inbound&gt;

    &lt;backend&gt;
      &lt;base /&gt;
    &lt;/backend&gt;

    &lt;outbound&gt;
      &lt;base /&gt;
      &lt;!-- Cache response for 5 minutes --&gt;
      &lt;cache-store duration=&quot;300&quot; /&gt;

      &lt;!-- Remove internal headers --&gt;
      &lt;set-header name=&quot;X-Internal-Token&quot; exists-action=&quot;delete&quot; /&gt;
    &lt;/outbound&gt;

    &lt;on-error&gt;
      &lt;base /&gt;
    &lt;/on-error&gt;
  &lt;/policies&gt;
'''
format: 'xml'

}
}
// Product (usage tier)
resource product 'Microsoft.ApiManagement/service/products@2023-05-01-preview' = {
parent: apim
name: 'standard'
properties: {
displayName: 'Standard Tier'
description: 'Standard API access'
subscriptionRequired: true
approvalRequired: false
state: 'published'
subscriptionsLimit: 1000
}
}
// Link API to product
resource productApi 'Microsoft.ApiManagement/service/products/apis@2023-05-01-preview' = {
parent: product
name: api.name
}

Implementation: NGINX as API Gateway

Configuration

# /etc/nginx/nginx.conf
user nginx;
worker_processes auto;
error_log /var/log/nginx/error.log warn;
pid /var/run/nginx.pid;
events {
worker_connections 4096;
}
http {
include /etc/nginx/mime.types;
default_type application/octet-stream;
log_format json_combined escape=json '{'
    '&quot;time_local&quot;:&quot;$time_local&quot;,'
    '&quot;remote_addr&quot;:&quot;$remote_addr&quot;,'
    '&quot;request&quot;:&quot;$request&quot;,'
    '&quot;status&quot;:$status,'
    '&quot;body_bytes_sent&quot;:$body_bytes_sent,'
    '&quot;request_time&quot;:$request_time,'
    '&quot;upstream_response_time&quot;:&quot;$upstream_response_time&quot;,'
    '&quot;upstream_addr&quot;:&quot;$upstream_addr&quot;'
'}';

access_log /var/log/nginx/access.log json_combined;

sendfile on;
tcp_nopush on;
tcp_nodelay on;
keepalive_timeout 65;
types_hash_max_size 2048;

# Rate limiting zones
limit_req_zone $binary_remote_addr zone=global:10m rate=100r/m;
limit_req_zone $http_authorization zone=per_user:10m rate=50r/m;

# Caching
proxy_cache_path /var/cache/nginx/api
    levels=1:2
    keys_zone=api_cache:10m
    max_size=1g
    inactive=60m
    use_temp_path=off;

# Upstream services
upstream user_service {
    least_conn;
    server backend1:3000 max_fails=3 fail_timeout=30s;
    server backend2:3000 max_fails=3 fail_timeout=30s;
    server backend3:3000 max_fails=3 fail_timeout=30s backup;
    keepalive 32;
}

# JWT validation (requires lua module)
lua_shared_dict jwks 1m;
lua_package_path &quot;/etc/nginx/lua/?.lua;;&quot;;

server {
    listen 80;
    server_name api.example.com;

    # Redirect HTTP to HTTPS
    return 301 https://$server_name$request_uri;
}

server {
    listen 443 ssl http2;
    server_name api.example.com;

    ssl_certificate /etc/nginx/ssl/cert.pem;
    ssl_certificate_key /etc/nginx/ssl/key.pem;
    ssl_protocols TLSv1.2 TLSv1.3;
    ssl_ciphers HIGH:!aNULL:!MD5;
    ssl_prefer_server_ciphers on;

    # Security headers
    add_header Strict-Transport-Security &quot;max-age=31536000; includeSubDomains&quot; always;
    add_header X-Content-Type-Options &quot;nosniff&quot; always;
    add_header X-Frame-Options &quot;DENY&quot; always;
    add_header X-XSS-Protection &quot;1; mode=block&quot; always;

    # CORS
    add_header Access-Control-Allow-Origin &quot;https://example.com&quot; always;
    add_header Access-Control-Allow-Methods &quot;GET, POST, PUT, DELETE, OPTIONS&quot; always;
    add_header Access-Control-Allow-Headers &quot;Authorization, Content-Type&quot; always;
    add_header Access-Control-Max-Age &quot;3600&quot; always;

    if ($request_method = 'OPTIONS') {
        return 204;
    }

    # Health check endpoint
    location /health {
        access_log off;
        return 200 &quot;healthy\n&quot;;
        add_header Content-Type text/plain;
    }

    # Users API
    location /users {
        # Rate limiting
        limit_req zone=global burst=20 nodelay;
        limit_req zone=per_user burst=10 nodelay;

        # JWT validation (Lua)
        access_by_lua_block {
            local jwt = require &quot;resty.jwt&quot;
            local jwt_token = ngx.var.http_authorization

            if not jwt_token then
                ngx.status = 401
                ngx.say('{&quot;error&quot;:&quot;Missing Authorization header&quot;}')
                ngx.exit(401)
            end

            jwt_token = jwt_token:gsub(&quot;Bearer &quot;, &quot;&quot;)
            local jwt_obj = jwt:verify(&quot;your-secret-key&quot;, jwt_token)

            if not jwt_obj.verified then
                ngx.status = 401
                ngx.say('{&quot;error&quot;:&quot;Invalid token&quot;}')
                ngx.exit(401)
            end
        }

        # Caching
        proxy_cache api_cache;
        proxy_cache_key &quot;$request_uri$http_authorization&quot;;
        proxy_cache_valid 200 5m;
        proxy_cache_use_stale error timeout updating http_500 http_502 http_503 http_504;
        proxy_cache_background_update on;
        proxy_cache_lock on;
        add_header X-Cache-Status $upstream_cache_status;

        # Proxy settings
        proxy_pass http://user_service;
        proxy_http_version 1.1;
        proxy_set_header Connection &quot;&quot;;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
        proxy_set_header X-Request-ID $request_id;

        # Timeouts
        proxy_connect_timeout 5s;
        proxy_send_timeout 10s;
        proxy_read_timeout 10s;

        # Buffering
        proxy_buffering on;
        proxy_buffer_size 4k;
        proxy_buffers 8 4k;
    }
}

}

Security Best Practices

1. Authentication Strategies

API Keys (Simplest)

✅ Use for: Internal services, low-security public APIs
❌ Avoid for: User authentication, sensitive data

JWT (Recommended)

✅ Use for: User authentication, microservices
❌ Avoid for: Very high-security scenarios (use short-lived tokens)

OAuth 2.0 (Most Secure)

✅ Use for: Third-party integrations, delegated access
❌ Avoid for: Simple internal APIs (overkill)

mTLS (Maximum Security)

✅ Use for: Service-to-service, zero-trust networks
❌ Avoid for: Public-facing APIs (complexity)

2. Input Validation

# NGINX: Block malicious patterns
location /api {
    # Block SQL injection attempts
    if ($request_uri ~* "(union|select|insert|update|delete|drop|exec|script)") {
        return 403;
    }
# Block path traversal
if ($request_uri ~* &quot;\.\./&quot;) {
    return 403;
}

# Limit request body size
client_max_body_size 1m;

proxy_pass http://backend;

}

3. DDoS Protection

Rate Limiting Tiers:

Global: 1000 req/s (all traffic)
Per-IP: 100 req/min
Per-User: 50 req/min
Per-Endpoint: Varies (e.g., /login → 5 req/min)

Circuit Breaking (Kong):

curl -X POST http://localhost:8001/services/user-service/plugins \
  --data "name=circuit-breaker" \
  --data "config.failure_threshold=5" \
  --data "config.success_threshold=3" \
  --data "config.timeout=30"

4. Secrets Management

❌ DON'T:

# Bad: Hardcoded secrets
environment:
  JWT_SECRET: "my-secret-123"
  DB_PASSWORD: "password"

✅ DO:

# Good: Use secrets managers
environment:
  JWT_SECRET: ${AWS_SECRETS_MANAGER:prod/jwt-secret}
  DB_PASSWORD: ${VAULT:database/password}

Performance Optimization

1. Connection Pooling

Kong:

-- /etc/kong/kong.conf
upstream_keepalive_pool_size = 100
upstream_keepalive_max_requests = 1000
upstream_keepalive_idle_timeout = 60

NGINX:

upstream backend {
    server backend:3000;
    keepalive 32;  # Connection pool
}
location / {
proxy_http_version 1.1;
proxy_set_header Connection "";  # Enable keepalive
proxy_pass http://backend;
}

2. Response Compression

gzip on;
gzip_vary on;
gzip_min_length 1024;
gzip_types
    text/plain
    text/css
    text/xml
    text/javascript
    application/json
    application/javascript
    application/xml+rss;

3. Caching Strategies

Cache-Control Headers:

Public endpoints: Cache-Control: public, max-age=300
User-specific: Cache-Control: private, max-age=60
Dynamic data: Cache-Control: no-cache
Sensitive data: Cache-Control: no-store

Cache Invalidation (Kong):

# Invalidate cache for specific endpoint curl -X DELETE http://localhost:8001/proxy-cache/{cache-key} Purge all cache

curl -X DELETE http://localhost:8001/proxy-cache

4. Load Balancing Algorithms

Algorithm	Use Case
Round Robin	Equal server capacity
Least Connections	Varying request complexity
IP Hash	Session affinity needed
Weighted	Heterogeneous servers

Monitoring & Observability

Metrics to Track

Gateway Metrics:

Request rate (req/s)
Error rate (4xx, 5xx %)
Latency (p50, p95, p99)
Cache hit rate

Upstream Metrics:

Response time
Failure rate
Circuit breaker state

Resource Metrics:

CPU usage
Memory usage
Connection pool exhaustion

Prometheus Integration (Kong)

# kong.yml
plugins:
  - name: prometheus
    config:
      per_consumer: true

Query Examples:

# Request rate by service
rate(kong_http_requests_total[5m])
Error rate
rate(kong_http_requests_total[5m]) / rate(kong_http_requests_total[5m])
Latency (95th percentile)
histogram_quantile(0.95, rate(kong_latency_bucket[5m]))
Cache hit rate
rate(kong_bandwidth[5m]) / rate(kong_bandwidth[5m])

Distributed Tracing

OpenTelemetry Integration:

# AWS API Gateway
resource "aws_api_gateway_stage" "prod" {
  xray_tracing_enabled = true
}
Kong
plugins:

name: opentelemetry
config:
endpoint: "http://jaeger:4318/v1/traces"
resource_attributes:
service.name: "api-gateway"

Alerting Rules

# Prometheus alerts
groups:
  - name: api_gateway
    rules:
      - alert: HighErrorRate
        expr: rate(kong_http_requests_total{code=~"5.."}[5m]) > 0.05
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "High error rate detected"
  - alert: HighLatency
    expr: histogram_quantile(0.95, rate(kong_latency_bucket[5m])) &gt; 1000
    for: 10m
    labels:
      severity: warning
    annotations:
      summary: &quot;High latency (p95 &gt; 1s)&quot;

  - alert: LowCacheHitRate
    expr: rate(kong_bandwidth{type=&quot;cache-hit&quot;}[10m]) / rate(kong_bandwidth[10m]) &lt; 0.5
    for: 15m
    labels:
      severity: warning
    annotations:
      summary: &quot;Cache hit rate below 50%&quot;

Production Checklist

Security

HTTPS/TLS 1.3 enabled
Strong authentication (JWT/OAuth/mTLS)
Rate limiting configured
Input validation active
CORS properly configured
Security headers added
Secrets in vault (not config files)
API keys rotated regularly

Performance

Connection pooling enabled
Response caching configured
Compression enabled
Load balancing configured
Timeouts set appropriately
Circuit breakers implemented
Health checks configured

Observability

Access logs enabled
Metrics collection configured
Distributed tracing active
Dashboards created
Alerts configured
Log aggregation setup

High Availability

Multiple gateway instances
Database/state replication
Health checks implemented
Graceful shutdown configured
Auto-scaling enabled
Disaster recovery plan

Operations

CI/CD pipeline configured
Blue-green deployment ready
Rollback procedure documented
Runbooks created
On-call rotation setup

Cost Optimization

AWS API Gateway Pricing (2025)

REST API:

$3.50 per million requests
$0.09/GB data transfer out
Cache: $0.020/hour per GB

HTTP API (70% cheaper):

$1.00 per million requests
No caching support
Use for: Simple proxying

Self-Hosted (Kong/NGINX)

Costs:

EC2 instances: ~$150/month (t3.large x2)
Load balancer: ~$20/month
Database: ~$50/month (RDS t3.small)

Break-even point: ~50M requests/month

Conclusion

API gateways are critical infrastructure for modern applications. They centralize cross-cutting concerns, improve security, enable observability, and reduce backend complexity.

Key takeaways:

Choose the right pattern - Single gateway for simplicity, BFF for client optimization, micro-gateways for scale
Security is paramount - Use strong authentication, validate inputs, implement rate limiting
Cache aggressively - 5-minute caching can reduce load by 80%+
Monitor everything - Track gateway, upstream, and resource metrics
Plan for failure - Circuit breakers, health checks, graceful degradation
Start simple - Add complexity only when needed

Whether you choose a managed solution (AWS API Gateway, Azure APIM) or self-hosted (Kong, NGINX), the principles remain the same: security, performance, and reliability.

Additional Resources

Kong Gateway: https://docs.konghq.com/
AWS API Gateway: https://docs.aws.amazon.com/apigateway/
Azure API Management: https://docs.microsoft.com/azure/api-management/
NGINX: https://nginx.org/en/docs/
OpenAPI Specification: https://spec.openapis.org/
OAuth 2.0: https://oauth.net/2/
JWT Best Practices: https://datatracker.ietf.org/doc/html/rfc8725

Introduction

What is an API Gateway?

Core Responsibilities

Gateway vs Reverse Proxy

Architecture Patterns

1. Single Gateway Pattern

2. Gateway Per Team Pattern

3. Backend for Frontend (BFF) Pattern

4. Micro Gateway Pattern

Implementation: Kong Gateway

Installation (Docker)

Basic Configuration

Create a service

Create a route

Test

Authentication: JWT Plugin

Create consumer

Create JWT credential

Rate Limiting

Per-consumer rate limiting

Response Caching

Request Transformation

Remove sensitive headers from response

Implementation: AWS API Gateway

REST API Configuration (Terraform)

Users resource

GET /users method

Lambda integration

Cognito authorizer

Usage plan with rate limiting

Deploy

Enable caching

Enable detailed CloudWatch metrics

HTTP API (WebSocket Support)

JWT authorizer

Implementation: Azure API Management

Bicep Deployment

Implementation: NGINX as API Gateway

Configuration

Security Best Practices

1. Authentication Strategies

2. Input Validation

3. DDoS Protection

4. Secrets Management

Performance Optimization

1. Connection Pooling

2. Response Compression

3. Caching Strategies

Purge all cache

4. Load Balancing Algorithms

Monitoring & Observability

Metrics to Track

Prometheus Integration (Kong)

Error rate

Latency (95th percentile)

Cache hit rate

Distributed Tracing

Kong

Alerting Rules

Production Checklist

Security

Performance

Observability

High Availability

Operations

Cost Optimization

AWS API Gateway Pricing (2025)

Self-Hosted (Kong/NGINX)

Conclusion

Additional Resources

Related Articles

GraphQL API Design - Production Architecture and Best Practices for Scalable Systems

Testing Strategies - Unit, Integration, and E2E Testing Best Practices for Production Quality

Monitoring and Observability - Production Systems Performance and Debugging at Scale

Written by StaticBlock Editorial