Error Handling and Logging - Production Patterns for Robust Applications

Effective error handling and logging are critical for maintaining production systems, reducing mean time to recovery (MTTR), and providing excellent user experience. This comprehensive guide covers production-ready patterns for handling errors gracefully, implementing structured logging, and building observable systems that surface issues before they impact users.

Why Error Handling Matters

User Experience: Proper error handling prevents application crashes, provides helpful feedback, and maintains data integrity when things go wrong.

Debugging Efficiency: Well-structured logs reduce debugging time from hours to minutes by providing context about what happened, when, and why.

Reliability: Graceful degradation and circuit breaker patterns keep applications functional even when dependencies fail.

Stripe processes 100M+ API requests daily with 99.99% uptime by implementing comprehensive error handling and real-time error tracking that alerts engineers within seconds of issues.

Error Handling Fundamentals

Error Types and Classification

Different error types require different handling strategies:

// Operational errors - expected failures (network issues, validation errors)
class OperationalError extends Error {
  constructor(
    message: string,
    public statusCode: number = 500,
    public code?: string
  ) {
    super(message);
    this.name = 'OperationalError';
  }
}
// Programmer errors - bugs that should be fixed
class ProgrammerError extends Error {
constructor(message: string) {
super(message);
this.name = 'ProgrammerError';
}
}
// Validation errors
class ValidationError extends OperationalError {
constructor(
message: string,
public fields: Record<string, string[]>
) {
super(message, 400, 'VALIDATION_ERROR');
this.name = 'ValidationError';
}
}
// Not found errors
class NotFoundError extends OperationalError {
constructor(resource: string, id: string) {
super(${resource} with id ${id} not found, 404, 'NOT_FOUND');
this.name = 'NotFoundError';
}
}
// Unauthorized errors
class UnauthorizedError extends OperationalError {
constructor(message: string = 'Unauthorized') {
super(message, 401, 'UNAUTHORIZED');
this.name = 'UnauthorizedError';
}
}
// Database errors
class DatabaseError extends OperationalError {
constructor(message: string, public originalError: Error) {
super(message, 500, 'DATABASE_ERROR');
this.name = 'DatabaseError';
}
}

Express Error Handling Middleware

import express, { Request, Response, NextFunction } from 'express';
import { logger } from './logger';
// Async handler wrapper - catches async errors
export const asyncHandler = (
fn: (req: Request, res: Response, next: NextFunction) => Promise<any>
) => {
return (req: Request, res: Response, next: NextFunction) => {
Promise.resolve(fn(req, res, next)).catch(next);
};
};
// Error handling middleware
export const errorHandler = (
err: Error,
req: Request,
res: Response,
next: NextFunction
) => {
// Log error with context
logger.error('Request error', {
error: {
name: err.name,
message: err.message,
stack: err.stack
},
request: {
method: req.method,
url: req.url,
headers: req.headers,
body: req.body,
params: req.params,
query: req.query,
user: req.user?.id
}
});
// Handle operational errors
if (err instanceof OperationalError) {
return res.status(err.statusCode).json({
error: {
code: err.code,
message: err.message,
...(err instanceof ValidationError && { fields: err.fields })
}
});
}
// Handle known framework errors
if (err.name === 'CastError') {
return res.status(400).json({
error: {
code: 'INVALID_ID',
message: 'Invalid ID format'
}
});
}
if (err.name === 'JsonWebTokenError') {
return res.status(401).json({
error: {
code: 'INVALID_TOKEN',
message: 'Invalid authentication token'
}
});
}
// Handle Prisma errors
if (err.name === 'PrismaClientKnownRequestError') {
const prismaError = err as any;
if (prismaError.code === 'P2002') {
  return res.status(409).json({
    error: {
      code: 'DUPLICATE_ENTRY',
      message: 'Resource already exists',
      field: prismaError.meta?.target?.[0]
    }
  });
}

if (prismaError.code === 'P2025') {
  return res.status(404).json({
    error: {
      code: 'NOT_FOUND',
      message: 'Resource not found'
    }
  });
}

}
// Programmer errors - don't expose internal details
logger.error('Unhandled error', {
error: {
name: err.name,
message: err.message,
stack: err.stack
}
});
res.status(500).json({
error: {
code: 'INTERNAL_ERROR',
message: 'An internal error occurred'
}
});
};
// 404 handler
export const notFoundHandler = (
req: Request,
res: Response,
next: NextFunction
) => {
res.status(404).json({
error: {
code: 'NOT_FOUND',
message: Route ${req.method} ${req.url} not found
}
});
};
// Usage
const app = express();
// Routes
app.get('/api/users/:id', asyncHandler(async (req, res) => {
const user = await prisma.user.findUnique({
where: { id: req.params.id }
});
if (!user) {
throw new NotFoundError('User', req.params.id);
}
res.json(user);
}));
// Error handlers (must be last)
app.use(notFoundHandler);
app.use(errorHandler);

Try-Catch Best Practices

// ❌ Bad - swallows errors
async function getUser(id: string) {
  try {
    return await prisma.user.findUnique({ where: { id } });
  } catch (error) {
    console.log('Error fetching user');
    return null;
  }
}
// ✅ Good - propagates errors
async function getUser(id: string) {
try {
return await prisma.user.findUnique({ where: { id } });
} catch (error) {
logger.error('Failed to fetch user', { userId: id, error });
throw new DatabaseError('Failed to fetch user', error);
}
}
// ✅ Good - handles specific errors
async function createUser(data: CreateUserDto) {
try {
return await prisma.user.create({ data });
} catch (error) {
if (error.code === 'P2002') {
throw new ValidationError('Email already exists', {
email: ['Email is already registered']
});
}
logger.error('Failed to create user', { data, error });
throw new DatabaseError('Failed to create user', error);

}
}
// ✅ Good - cleanup in finally
async function processFile(filePath: string) {
const fileHandle = await fs.open(filePath, 'r');
try {
const content = await fileHandle.readFile('utf-8');
return processContent(content);
} catch (error) {
logger.error('Failed to process file', { filePath, error });
throw error;
} finally {
// Always cleanup
await fileHandle.close();
}
}

Structured Logging

Structured logging uses JSON format instead of plain text, enabling powerful querying and analysis.

Winston Logger Setup

import winston from 'winston';
const logLevels = {
error: 0,
warn: 1,
info: 2,
http: 3,
debug: 4
};
const logColors = {
error: 'red',
warn: 'yellow',
info: 'green',
http: 'magenta',
debug: 'blue'
};
winston.addColors(logColors);
const consoleFormat = winston.format.combine(
winston.format.timestamp({ format: 'YYYY-MM-DD HH:mm:ss' }),
winston.format.colorize({ all: true }),
winston.format.printf(info => {
const { timestamp, level, message, ...meta } = info;
return ${timestamp} [${level}]: ${message} ${ Object.keys(meta).length ? JSON.stringify(meta, null, 2) : '' };
})
);
const jsonFormat = winston.format.combine(
winston.format.timestamp(),
winston.format.errors({ stack: true }),
winston.format.json()
);
export const logger = winston.createLogger({
levels: logLevels,
level: process.env.LOG_LEVEL || 'info',
transports: [
// Console output for development
new winston.transports.Console({
format: process.env.NODE_ENV === 'production' ? jsonFormat : consoleFormat
}),
// File output for production
new winston.transports.File({
filename: 'logs/error.log',
level: 'error',
format: jsonFormat
}),
new winston.transports.File({
filename: 'logs/combined.log',
format: jsonFormat
})
]
});
// Request logging middleware
export const requestLogger = (req: Request, res: Response, next: NextFunction) => {
const startTime = Date.now();
res.on('finish', () => {
const duration = Date.now() - startTime;
logger.http('HTTP Request', {
  method: req.method,
  url: req.url,
  statusCode: res.statusCode,
  duration,
  userAgent: req.get('user-agent'),
  ip: req.ip,
  userId: req.user?.id
});

});
next();
};

Contextual Logging

// Add context to all logs within a request
import { AsyncLocalStorage } from 'async_hooks';
const asyncLocalStorage = new AsyncLocalStorage<Map<string, any>>();
// Middleware to create request context
export const contextMiddleware = (
req: Request,
res: Response,
next: NextFunction
) => {
const context = new Map<string, any>();
context.set('requestId', generateRequestId());
context.set('userId', req.user?.id);
context.set('method', req.method);
context.set('url', req.url);
asyncLocalStorage.run(context, () => next());
};
// Logger with automatic context
class ContextLogger {
private getContext(): Record<string, any> {
const context = asyncLocalStorage.getStore();
if (!context) return {};
return Object.fromEntries(context.entries());

}
info(message: string, meta?: Record<string, any>) {
logger.info(message, { ...this.getContext(), ...meta });
}
error(message: string, meta?: Record<string, any>) {
logger.error(message, { ...this.getContext(), ...meta });
}
warn(message: string, meta?: Record<string, any>) {
logger.warn(message, { ...this.getContext(), ...meta });
}
debug(message: string, meta?: Record<string, any>) {
logger.debug(message, { ...this.getContext(), ...meta });
}
}
export const contextLogger = new ContextLogger();
// Usage
app.use(contextMiddleware);
app.get('/api/users/:id', async (req, res) => {
// All logs automatically include requestId, userId, method, url
contextLogger.info('Fetching user');
const user = await prisma.user.findUnique({
where: { id: req.params.id }
});
contextLogger.info('User fetched successfully', { userId: user.id });
res.json(user);
});

Log Levels and Best Practices

// ERROR - Critical issues requiring immediate attention
logger.error('Payment processing failed', {
  orderId: order.id,
  amount: order.total,
  error: error.message,
  stack: error.stack
});
// WARN - Potential issues that don't stop execution
logger.warn('Cache miss for frequently accessed data', {
key: 'user:profile:123',
hitRate: 0.65
});
// INFO - Important business events
logger.info('Order created', {
orderId: order.id,
userId: order.userId,
total: order.total,
items: order.items.length
});
// HTTP - Request/response logging
logger.http('API request', {
method: 'GET',
url: '/api/products',
statusCode: 200,
duration: 45
});
// DEBUG - Detailed diagnostic information
logger.debug('Database query executed', {
query: 'SELECT * FROM users WHERE id = ?',
params: [userId],
duration: 12
});
// ❌ Bad - logging sensitive data
logger.info('User login', {
email: user.email,
password: user.password // Never log passwords!
});
// ✅ Good - redact sensitive data
logger.info('User login', {
email: user.email,
userId: user.id
});
// ❌ Bad - logging in loops
users.forEach(user => {
logger.info('Processing user', { userId: user.id });
processUser(user);
});
// ✅ Good - aggregate logging
logger.info('Processing users', {
count: users.length,
startTime: Date.now()
});
users.forEach(user => processUser(user));
logger.info('Users processed', {
count: users.length,
duration: Date.now() - startTime
});

Error Monitoring and Alerting

Sentry Integration

import * as Sentry from '@sentry/node';
import { ProfilingIntegration } from '@sentry/profiling-node';
Sentry.init({
dsn: process.env.SENTRY_DSN,
environment: process.env.NODE_ENV,
integrations: [
new ProfilingIntegration(),
new Sentry.Integrations.Http({ tracing: true }),
new Sentry.Integrations.Express({ app })
],
tracesSampleRate: process.env.NODE_ENV === 'production' ? 0.1 : 1.0,
profilesSampleRate: 0.1,
beforeSend(event, hint) {
// Filter sensitive data
if (event.request) {
delete event.request.cookies;
delete event.request.headers?.['authorization'];
}
return event;

}
});
// Request handler must be first
app.use(Sentry.Handlers.requestHandler());
app.use(Sentry.Handlers.tracingHandler());
// Routes
app.get('/api/users/:id', async (req, res) => {
const transaction = Sentry.startTransaction({
op: 'http.server',
name: 'GET /api/users/:id'
});
try {
const span = transaction.startChild({
op: 'db.query',
description: 'Fetch user from database'
});
const user = await prisma.user.findUnique({
  where: { id: req.params.id }
});

span.finish();

if (!user) {
  throw new NotFoundError('User', req.params.id);
}

res.json(user);

} catch (error) {
// Add context to error
Sentry.setContext('user', {
id: req.params.id,
requestedBy: req.user?.id
});
Sentry.captureException(error);
throw error;

} finally {
transaction.finish();
}
});
// Error handler must be after routes
app.use(Sentry.Handlers.errorHandler());
app.use(errorHandler);

Custom Error Tracking

class ErrorTracker {
  private errors = new Map<string, {
    count: number;
    lastOccurrence: Date;
    samples: Error[];
  }>();
track(error: Error): void {
const key = ${error.name}:${error.message};
const existing = this.errors.get(key);
if (existing) {
  existing.count++;
  existing.lastOccurrence = new Date();

  // Keep last 5 samples
  if (existing.samples.length &lt; 5) {
    existing.samples.push(error);
  }
} else {
  this.errors.set(key, {
    count: 1,
    lastOccurrence: new Date(),
    samples: [error]
  });
}

// Alert if error occurs frequently
if (existing &amp;&amp; existing.count &gt;= 10) {
  this.alertFrequentError(key, existing);
}

}
private async alertFrequentError(
key: string,
data: { count: number; lastOccurrence: Date; samples: Error[] }
): Promise<void> {
logger.error('Frequent error detected', {
errorKey: key,
occurrences: data.count,
lastOccurrence: data.lastOccurrence,
sample: {
message: data.samples[0].message,
stack: data.samples[0].stack
}
});
// Send to alerting system (PagerDuty, Slack, etc.)
await this.sendAlert({
  severity: 'high',
  title: 'Frequent Error Detected',
  description: `Error &quot;${key}&quot; occurred ${data.count} times in the last hour`,
  context: data.samples[0]
});

}
getStats(): Array<{ key: string; count: number; lastOccurrence: Date }> {
return Array.from(this.errors.entries()).map(([key, data]) => ({
key,
count: data.count,
lastOccurrence: data.lastOccurrence
}));
}
reset(): void {
this.errors.clear();
}
}
export const errorTracker = new ErrorTracker();
// Reset stats hourly
setInterval(() => errorTracker.reset(), 60 * 60 * 1000);
// Use in error handler
app.use((err: Error, req: Request, res: Response, next: NextFunction) => {
errorTracker.track(err);
errorHandler(err, req, res, next);
});

Graceful Degradation Patterns

Circuit Breaker

Prevent cascading failures by stopping requests to failing services:

class CircuitBreaker {
  private failures = 0;
  private lastFailureTime?: Date;
  private state: 'closed' | 'open' | 'half-open' = 'closed';
constructor(
private threshold: number = 5,
private timeout: number = 60000, // 1 minute
private halfOpenRequests: number = 3
) {}
async execute<T>(fn: () => Promise<T>): Promise<T> {
if (this.state === 'open') {
// Check if timeout has passed
if (
this.lastFailureTime &&
Date.now() - this.lastFailureTime.getTime() > this.timeout
) {
this.state = 'half-open';
this.failures = 0;
} else {
throw new Error('Circuit breaker is OPEN');
}
}
try {
  const result = await fn();

  // Success - reset if in half-open state
  if (this.state === 'half-open') {
    this.state = 'closed';
    this.failures = 0;
  }

  return result;
} catch (error) {
  this.failures++;
  this.lastFailureTime = new Date();

  if (this.failures &gt;= this.threshold) {
    this.state = 'open';
    logger.warn('Circuit breaker opened', {
      failures: this.failures,
      threshold: this.threshold
    });
  }

  throw error;
}

}
getState(): string {
return this.state;
}
}
// Usage
const paymentServiceBreaker = new CircuitBreaker(5, 60000);
async function processPayment(orderId: string, amount: number) {
try {
return await paymentServiceBreaker.execute(async () => {
return await paymentService.charge({ orderId, amount });
});
} catch (error) {
if (error.message === 'Circuit breaker is OPEN') {
// Fallback behavior
logger.warn('Payment service unavailable, queueing for later', {
orderId,
amount
});
  await queuePayment({ orderId, amount });
  return { status: 'queued' };
}

throw error;

}
}

Retry with Exponential Backoff

async function retry<T>(
  fn: () => Promise<T>,
  maxAttempts: number = 3,
  baseDelay: number = 1000
): Promise<T> {
  let lastError: Error;
for (let attempt = 1; attempt <= maxAttempts; attempt++) {
try {
return await fn();
} catch (error) {
lastError = error;
  if (attempt === maxAttempts) {
    logger.error('All retry attempts failed', {
      attempts: maxAttempts,
      error: error.message
    });
    break;
  }

  // Exponential backoff with jitter
  const delay = baseDelay * Math.pow(2, attempt - 1);
  const jitter = Math.random() * delay * 0.1;
  const totalDelay = delay + jitter;

  logger.warn('Retry attempt failed', {
    attempt,
    maxAttempts,
    nextRetryIn: totalDelay,
    error: error.message
  });

  await new Promise(resolve =&gt; setTimeout(resolve, totalDelay));
}

}
throw lastError!;
}
// Usage
const user = await retry(
() => externalAPI.fetchUser(userId),
3,
1000
);

Timeout Wrapper

async function withTimeout<T>(
  promise: Promise<T>,
  timeoutMs: number,
  timeoutError: string = 'Operation timed out'
): Promise<T> {
  let timeoutId: NodeJS.Timeout;
const timeout = new Promise<never>((_, reject) => {
timeoutId = setTimeout(() => {
reject(new Error(timeoutError));
}, timeoutMs);
});
try {
return await Promise.race([promise, timeout]);
} finally {
clearTimeout(timeoutId!);
}
}
// Usage
try {
const result = await withTimeout(
slowExternalAPI.fetchData(),
5000,
'External API timed out after 5 seconds'
);
} catch (error) {
if (error.message.includes('timed out')) {
// Handle timeout
logger.warn('API request timed out, using cached data');
return getCachedData();
}
throw error;
}

Log Aggregation and Analysis

Elasticsearch and Kibana

import { Client } from '@elastic/elasticsearch';
const esClient = new Client({
node: process.env.ELASTICSEARCH_URL || 'http://localhost:9200'
});
// Winston transport for Elasticsearch
import TransportStream from 'winston-transport';
class ElasticsearchTransport extends TransportStream {
async log(info: any, callback: () => void) {
try {
await esClient.index({
index: logs-${new Date().toISOString().slice(0, 10)},
document: {
'@timestamp': new Date().toISOString(),
level: info.level,
message: info.message,
...info
}
});
} catch (error) {
console.error('Failed to send log to Elasticsearch:', error);
}
callback();

}
}
logger.add(new ElasticsearchTransport());
// Query logs
async function searchLogs(query: string, from: Date, to: Date) {
const { hits } = await esClient.search({
index: 'logs-*',
body: {
query: {
bool: {
must: [
{
query_string: {
query
}
},
{
range: {
'@timestamp': {
gte: from.toISOString(),
lte: to.toISOString()
}
}
}
]
}
},
sort: [{ '@timestamp': 'desc' }],
size: 100
}
});
return hits.hits.map(hit => hit._source);
}

Real-World Examples

Airbnb's Error Handling

Airbnb uses multi-layered error handling:

Client-side: Fallback UI components for failed loads
API Gateway: Circuit breakers for backend services
Service Layer: Retry logic with exponential backoff
Monitoring: Real-time error rate alerts triggering auto-rollback

Their system recovers from partial outages without user-visible errors, maintaining booking functionality even when search is degraded.

Stripe's Logging Strategy

Stripe implements comprehensive logging:

Structured JSON logs: Every request logged with 50+ metadata fields
Log sampling: 100% for errors, 1% for successful requests at scale
Retention: 30 days hot, 1 year cold storage
Alerting: Automated alerts on error rate spikes (>1% increase)

This enables Stripe to debug payment issues across billions of transactions with average resolution time under 15 minutes.

Netflix's Chaos Engineering

Netflix intentionally injects failures to test error handling:

Chaos Monkey: Randomly terminates instances
Latency Monkey: Adds artificial delays
Conformity Monkey: Shuts down non-compliant instances

Their error handling ensures zero user impact during individual service failures, maintaining 99.99% streaming uptime across 200M+ subscribers.

Conclusion

Robust error handling and comprehensive logging are essential for production systems. Implement structured logging with appropriate levels, handle errors gracefully with circuit breakers and retries, and use error monitoring tools like Sentry to catch issues before users report them.

Key patterns - custom error types for different scenarios, async error handling with middleware, contextual logging with request IDs, and graceful degradation with fallbacks - provide the foundation for resilient applications. Monitor error rates, set up intelligent alerting, and use log aggregation to quickly diagnose and resolve production issues.

Effective observability reduces MTTR by 80%+ while improving user experience and system reliability.