0% read
Skip to main content
Error Handling and Logging - Production Patterns for Robust Applications

Error Handling and Logging - Production Patterns for Robust Applications

Master error handling and logging with structured logging, error monitoring, exception handling patterns, log aggregation, alerting strategies, and observability best practices for production applications.

S
StaticBlock Editorial
21 min read

Effective error handling and logging are critical for maintaining production systems, reducing mean time to recovery (MTTR), and providing excellent user experience. This comprehensive guide covers production-ready patterns for handling errors gracefully, implementing structured logging, and building observable systems that surface issues before they impact users.

Why Error Handling Matters

User Experience: Proper error handling prevents application crashes, provides helpful feedback, and maintains data integrity when things go wrong.

Debugging Efficiency: Well-structured logs reduce debugging time from hours to minutes by providing context about what happened, when, and why.

Reliability: Graceful degradation and circuit breaker patterns keep applications functional even when dependencies fail.

Stripe processes 100M+ API requests daily with 99.99% uptime by implementing comprehensive error handling and real-time error tracking that alerts engineers within seconds of issues.

Error Handling Fundamentals

Error Types and Classification

Different error types require different handling strategies:

// Operational errors - expected failures (network issues, validation errors)
class OperationalError extends Error {
  constructor(
    message: string,
    public statusCode: number = 500,
    public code?: string
  ) {
    super(message);
    this.name = 'OperationalError';
  }
}

// Programmer errors - bugs that should be fixed class ProgrammerError extends Error { constructor(message: string) { super(message); this.name = 'ProgrammerError'; } }

// Validation errors class ValidationError extends OperationalError { constructor( message: string, public fields: Record<string, string[]> ) { super(message, 400, 'VALIDATION_ERROR'); this.name = 'ValidationError'; } }

// Not found errors class NotFoundError extends OperationalError { constructor(resource: string, id: string) { super(${resource} with id ${id} not found, 404, 'NOT_FOUND'); this.name = 'NotFoundError'; } }

// Unauthorized errors class UnauthorizedError extends OperationalError { constructor(message: string = 'Unauthorized') { super(message, 401, 'UNAUTHORIZED'); this.name = 'UnauthorizedError'; } }

// Database errors class DatabaseError extends OperationalError { constructor(message: string, public originalError: Error) { super(message, 500, 'DATABASE_ERROR'); this.name = 'DatabaseError'; } }

Express Error Handling Middleware

import express, { Request, Response, NextFunction } from 'express';
import { logger } from './logger';

// Async handler wrapper - catches async errors export const asyncHandler = ( fn: (req: Request, res: Response, next: NextFunction) => Promise<any> ) => { return (req: Request, res: Response, next: NextFunction) => { Promise.resolve(fn(req, res, next)).catch(next); }; };

// Error handling middleware export const errorHandler = ( err: Error, req: Request, res: Response, next: NextFunction ) => { // Log error with context logger.error('Request error', { error: { name: err.name, message: err.message, stack: err.stack }, request: { method: req.method, url: req.url, headers: req.headers, body: req.body, params: req.params, query: req.query, user: req.user?.id } });

// Handle operational errors if (err instanceof OperationalError) { return res.status(err.statusCode).json({ error: { code: err.code, message: err.message, ...(err instanceof ValidationError && { fields: err.fields }) } }); }

// Handle known framework errors if (err.name === 'CastError') { return res.status(400).json({ error: { code: 'INVALID_ID', message: 'Invalid ID format' } }); }

if (err.name === 'JsonWebTokenError') { return res.status(401).json({ error: { code: 'INVALID_TOKEN', message: 'Invalid authentication token' } }); }

// Handle Prisma errors if (err.name === 'PrismaClientKnownRequestError') { const prismaError = err as any;

if (prismaError.code === 'P2002') {
  return res.status(409).json({
    error: {
      code: 'DUPLICATE_ENTRY',
      message: 'Resource already exists',
      field: prismaError.meta?.target?.[0]
    }
  });
}

if (prismaError.code === 'P2025') {
  return res.status(404).json({
    error: {
      code: 'NOT_FOUND',
      message: 'Resource not found'
    }
  });
}

}

// Programmer errors - don't expose internal details logger.error('Unhandled error', { error: { name: err.name, message: err.message, stack: err.stack } });

res.status(500).json({ error: { code: 'INTERNAL_ERROR', message: 'An internal error occurred' } }); };

// 404 handler export const notFoundHandler = ( req: Request, res: Response, next: NextFunction ) => { res.status(404).json({ error: { code: 'NOT_FOUND', message: Route ${req.method} ${req.url} not found } }); };

// Usage const app = express();

// Routes app.get('/api/users/:id', asyncHandler(async (req, res) => { const user = await prisma.user.findUnique({ where: { id: req.params.id } });

if (!user) { throw new NotFoundError('User', req.params.id); }

res.json(user); }));

// Error handlers (must be last) app.use(notFoundHandler); app.use(errorHandler);

Try-Catch Best Practices

// ❌ Bad - swallows errors
async function getUser(id: string) {
  try {
    return await prisma.user.findUnique({ where: { id } });
  } catch (error) {
    console.log('Error fetching user');
    return null;
  }
}

// ✅ Good - propagates errors async function getUser(id: string) { try { return await prisma.user.findUnique({ where: { id } }); } catch (error) { logger.error('Failed to fetch user', { userId: id, error }); throw new DatabaseError('Failed to fetch user', error); } }

// ✅ Good - handles specific errors async function createUser(data: CreateUserDto) { try { return await prisma.user.create({ data }); } catch (error) { if (error.code === 'P2002') { throw new ValidationError('Email already exists', { email: ['Email is already registered'] }); }

logger.error('Failed to create user', { data, error });
throw new DatabaseError('Failed to create user', error);

} }

// ✅ Good - cleanup in finally async function processFile(filePath: string) { const fileHandle = await fs.open(filePath, 'r');

try { const content = await fileHandle.readFile('utf-8'); return processContent(content); } catch (error) { logger.error('Failed to process file', { filePath, error }); throw error; } finally { // Always cleanup await fileHandle.close(); } }

Structured Logging

Structured logging uses JSON format instead of plain text, enabling powerful querying and analysis.

Winston Logger Setup

import winston from 'winston';

const logLevels = { error: 0, warn: 1, info: 2, http: 3, debug: 4 };

const logColors = { error: 'red', warn: 'yellow', info: 'green', http: 'magenta', debug: 'blue' };

winston.addColors(logColors);

const consoleFormat = winston.format.combine( winston.format.timestamp({ format: 'YYYY-MM-DD HH:mm:ss' }), winston.format.colorize({ all: true }), winston.format.printf(info => { const { timestamp, level, message, ...meta } = info; return ${timestamp} [${level}]: ${message} ${ Object.keys(meta).length ? JSON.stringify(meta, null, 2) : '' }; }) );

const jsonFormat = winston.format.combine( winston.format.timestamp(), winston.format.errors({ stack: true }), winston.format.json() );

export const logger = winston.createLogger({ levels: logLevels, level: process.env.LOG_LEVEL || 'info', transports: [ // Console output for development new winston.transports.Console({ format: process.env.NODE_ENV === 'production' ? jsonFormat : consoleFormat }), // File output for production new winston.transports.File({ filename: 'logs/error.log', level: 'error', format: jsonFormat }), new winston.transports.File({ filename: 'logs/combined.log', format: jsonFormat }) ] });

// Request logging middleware export const requestLogger = (req: Request, res: Response, next: NextFunction) => { const startTime = Date.now();

res.on('finish', () => { const duration = Date.now() - startTime;

logger.http('HTTP Request', {
  method: req.method,
  url: req.url,
  statusCode: res.statusCode,
  duration,
  userAgent: req.get('user-agent'),
  ip: req.ip,
  userId: req.user?.id
});

});

next(); };

Contextual Logging

// Add context to all logs within a request
import { AsyncLocalStorage } from 'async_hooks';

const asyncLocalStorage = new AsyncLocalStorage<Map<string, any>>();

// Middleware to create request context export const contextMiddleware = ( req: Request, res: Response, next: NextFunction ) => { const context = new Map<string, any>();

context.set('requestId', generateRequestId()); context.set('userId', req.user?.id); context.set('method', req.method); context.set('url', req.url);

asyncLocalStorage.run(context, () => next()); };

// Logger with automatic context class ContextLogger { private getContext(): Record<string, any> { const context = asyncLocalStorage.getStore(); if (!context) return {};

return Object.fromEntries(context.entries());

}

info(message: string, meta?: Record<string, any>) { logger.info(message, { ...this.getContext(), ...meta }); }

error(message: string, meta?: Record<string, any>) { logger.error(message, { ...this.getContext(), ...meta }); }

warn(message: string, meta?: Record<string, any>) { logger.warn(message, { ...this.getContext(), ...meta }); }

debug(message: string, meta?: Record<string, any>) { logger.debug(message, { ...this.getContext(), ...meta }); } }

export const contextLogger = new ContextLogger();

// Usage app.use(contextMiddleware);

app.get('/api/users/:id', async (req, res) => { // All logs automatically include requestId, userId, method, url contextLogger.info('Fetching user');

const user = await prisma.user.findUnique({ where: { id: req.params.id } });

contextLogger.info('User fetched successfully', { userId: user.id });

res.json(user); });

Log Levels and Best Practices

// ERROR - Critical issues requiring immediate attention
logger.error('Payment processing failed', {
  orderId: order.id,
  amount: order.total,
  error: error.message,
  stack: error.stack
});

// WARN - Potential issues that don't stop execution logger.warn('Cache miss for frequently accessed data', { key: 'user:profile:123', hitRate: 0.65 });

// INFO - Important business events logger.info('Order created', { orderId: order.id, userId: order.userId, total: order.total, items: order.items.length });

// HTTP - Request/response logging logger.http('API request', { method: 'GET', url: '/api/products', statusCode: 200, duration: 45 });

// DEBUG - Detailed diagnostic information logger.debug('Database query executed', { query: 'SELECT * FROM users WHERE id = ?', params: [userId], duration: 12 });

// ❌ Bad - logging sensitive data logger.info('User login', { email: user.email, password: user.password // Never log passwords! });

// ✅ Good - redact sensitive data logger.info('User login', { email: user.email, userId: user.id });

// ❌ Bad - logging in loops users.forEach(user => { logger.info('Processing user', { userId: user.id }); processUser(user); });

// ✅ Good - aggregate logging logger.info('Processing users', { count: users.length, startTime: Date.now() }); users.forEach(user => processUser(user)); logger.info('Users processed', { count: users.length, duration: Date.now() - startTime });

Error Monitoring and Alerting

Sentry Integration

import * as Sentry from '@sentry/node';
import { ProfilingIntegration } from '@sentry/profiling-node';

Sentry.init({ dsn: process.env.SENTRY_DSN, environment: process.env.NODE_ENV, integrations: [ new ProfilingIntegration(), new Sentry.Integrations.Http({ tracing: true }), new Sentry.Integrations.Express({ app }) ], tracesSampleRate: process.env.NODE_ENV === 'production' ? 0.1 : 1.0, profilesSampleRate: 0.1, beforeSend(event, hint) { // Filter sensitive data if (event.request) { delete event.request.cookies; delete event.request.headers?.['authorization']; }

return event;

} });

// Request handler must be first app.use(Sentry.Handlers.requestHandler()); app.use(Sentry.Handlers.tracingHandler());

// Routes app.get('/api/users/:id', async (req, res) => { const transaction = Sentry.startTransaction({ op: 'http.server', name: 'GET /api/users/:id' });

try { const span = transaction.startChild({ op: 'db.query', description: 'Fetch user from database' });

const user = await prisma.user.findUnique({
  where: { id: req.params.id }
});

span.finish();

if (!user) {
  throw new NotFoundError('User', req.params.id);
}

res.json(user);

} catch (error) { // Add context to error Sentry.setContext('user', { id: req.params.id, requestedBy: req.user?.id });

Sentry.captureException(error);
throw error;

} finally { transaction.finish(); } });

// Error handler must be after routes app.use(Sentry.Handlers.errorHandler()); app.use(errorHandler);

Custom Error Tracking

class ErrorTracker {
  private errors = new Map<string, {
    count: number;
    lastOccurrence: Date;
    samples: Error[];
  }>();

track(error: Error): void { const key = ${error.name}:${error.message}; const existing = this.errors.get(key);

if (existing) {
  existing.count++;
  existing.lastOccurrence = new Date();

  // Keep last 5 samples
  if (existing.samples.length &lt; 5) {
    existing.samples.push(error);
  }
} else {
  this.errors.set(key, {
    count: 1,
    lastOccurrence: new Date(),
    samples: [error]
  });
}

// Alert if error occurs frequently
if (existing &amp;&amp; existing.count &gt;= 10) {
  this.alertFrequentError(key, existing);
}

}

private async alertFrequentError( key: string, data: { count: number; lastOccurrence: Date; samples: Error[] } ): Promise<void> { logger.error('Frequent error detected', { errorKey: key, occurrences: data.count, lastOccurrence: data.lastOccurrence, sample: { message: data.samples[0].message, stack: data.samples[0].stack } });

// Send to alerting system (PagerDuty, Slack, etc.)
await this.sendAlert({
  severity: 'high',
  title: 'Frequent Error Detected',
  description: `Error &quot;${key}&quot; occurred ${data.count} times in the last hour`,
  context: data.samples[0]
});

}

getStats(): Array<{ key: string; count: number; lastOccurrence: Date }> { return Array.from(this.errors.entries()).map(([key, data]) => ({ key, count: data.count, lastOccurrence: data.lastOccurrence })); }

reset(): void { this.errors.clear(); } }

export const errorTracker = new ErrorTracker();

// Reset stats hourly setInterval(() => errorTracker.reset(), 60 * 60 * 1000);

// Use in error handler app.use((err: Error, req: Request, res: Response, next: NextFunction) => { errorTracker.track(err); errorHandler(err, req, res, next); });

Graceful Degradation Patterns

Circuit Breaker

Prevent cascading failures by stopping requests to failing services:

class CircuitBreaker {
  private failures = 0;
  private lastFailureTime?: Date;
  private state: 'closed' | 'open' | 'half-open' = 'closed';

constructor( private threshold: number = 5, private timeout: number = 60000, // 1 minute private halfOpenRequests: number = 3 ) {}

async execute<T>(fn: () => Promise<T>): Promise<T> { if (this.state === 'open') { // Check if timeout has passed if ( this.lastFailureTime && Date.now() - this.lastFailureTime.getTime() > this.timeout ) { this.state = 'half-open'; this.failures = 0; } else { throw new Error('Circuit breaker is OPEN'); } }

try {
  const result = await fn();

  // Success - reset if in half-open state
  if (this.state === 'half-open') {
    this.state = 'closed';
    this.failures = 0;
  }

  return result;
} catch (error) {
  this.failures++;
  this.lastFailureTime = new Date();

  if (this.failures &gt;= this.threshold) {
    this.state = 'open';
    logger.warn('Circuit breaker opened', {
      failures: this.failures,
      threshold: this.threshold
    });
  }

  throw error;
}

}

getState(): string { return this.state; } }

// Usage const paymentServiceBreaker = new CircuitBreaker(5, 60000);

async function processPayment(orderId: string, amount: number) { try { return await paymentServiceBreaker.execute(async () => { return await paymentService.charge({ orderId, amount }); }); } catch (error) { if (error.message === 'Circuit breaker is OPEN') { // Fallback behavior logger.warn('Payment service unavailable, queueing for later', { orderId, amount });

  await queuePayment({ orderId, amount });
  return { status: 'queued' };
}

throw error;

} }

Retry with Exponential Backoff

async function retry<T>(
  fn: () => Promise<T>,
  maxAttempts: number = 3,
  baseDelay: number = 1000
): Promise<T> {
  let lastError: Error;

for (let attempt = 1; attempt <= maxAttempts; attempt++) { try { return await fn(); } catch (error) { lastError = error;

  if (attempt === maxAttempts) {
    logger.error('All retry attempts failed', {
      attempts: maxAttempts,
      error: error.message
    });
    break;
  }

  // Exponential backoff with jitter
  const delay = baseDelay * Math.pow(2, attempt - 1);
  const jitter = Math.random() * delay * 0.1;
  const totalDelay = delay + jitter;

  logger.warn('Retry attempt failed', {
    attempt,
    maxAttempts,
    nextRetryIn: totalDelay,
    error: error.message
  });

  await new Promise(resolve =&gt; setTimeout(resolve, totalDelay));
}

}

throw lastError!; }

// Usage const user = await retry( () => externalAPI.fetchUser(userId), 3, 1000 );

Timeout Wrapper

async function withTimeout<T>(
  promise: Promise<T>,
  timeoutMs: number,
  timeoutError: string = 'Operation timed out'
): Promise<T> {
  let timeoutId: NodeJS.Timeout;

const timeout = new Promise<never>((_, reject) => { timeoutId = setTimeout(() => { reject(new Error(timeoutError)); }, timeoutMs); });

try { return await Promise.race([promise, timeout]); } finally { clearTimeout(timeoutId!); } }

// Usage try { const result = await withTimeout( slowExternalAPI.fetchData(), 5000, 'External API timed out after 5 seconds' ); } catch (error) { if (error.message.includes('timed out')) { // Handle timeout logger.warn('API request timed out, using cached data'); return getCachedData(); } throw error; }

Log Aggregation and Analysis

Elasticsearch and Kibana

import { Client } from '@elastic/elasticsearch';

const esClient = new Client({ node: process.env.ELASTICSEARCH_URL || 'http://localhost:9200' });

// Winston transport for Elasticsearch import TransportStream from 'winston-transport';

class ElasticsearchTransport extends TransportStream { async log(info: any, callback: () => void) { try { await esClient.index({ index: logs-${new Date().toISOString().slice(0, 10)}, document: { '@timestamp': new Date().toISOString(), level: info.level, message: info.message, ...info } }); } catch (error) { console.error('Failed to send log to Elasticsearch:', error); }

callback();

} }

logger.add(new ElasticsearchTransport());

// Query logs async function searchLogs(query: string, from: Date, to: Date) { const { hits } = await esClient.search({ index: 'logs-*', body: { query: { bool: { must: [ { query_string: { query } }, { range: { '@timestamp': { gte: from.toISOString(), lte: to.toISOString() } } } ] } }, sort: [{ '@timestamp': 'desc' }], size: 100 } });

return hits.hits.map(hit => hit._source); }

Real-World Examples

Airbnb's Error Handling

Airbnb uses multi-layered error handling:

  • Client-side: Fallback UI components for failed loads
  • API Gateway: Circuit breakers for backend services
  • Service Layer: Retry logic with exponential backoff
  • Monitoring: Real-time error rate alerts triggering auto-rollback

Their system recovers from partial outages without user-visible errors, maintaining booking functionality even when search is degraded.

Stripe's Logging Strategy

Stripe implements comprehensive logging:

  • Structured JSON logs: Every request logged with 50+ metadata fields
  • Log sampling: 100% for errors, 1% for successful requests at scale
  • Retention: 30 days hot, 1 year cold storage
  • Alerting: Automated alerts on error rate spikes (>1% increase)

This enables Stripe to debug payment issues across billions of transactions with average resolution time under 15 minutes.

Netflix's Chaos Engineering

Netflix intentionally injects failures to test error handling:

  • Chaos Monkey: Randomly terminates instances
  • Latency Monkey: Adds artificial delays
  • Conformity Monkey: Shuts down non-compliant instances

Their error handling ensures zero user impact during individual service failures, maintaining 99.99% streaming uptime across 200M+ subscribers.

Conclusion

Robust error handling and comprehensive logging are essential for production systems. Implement structured logging with appropriate levels, handle errors gracefully with circuit breakers and retries, and use error monitoring tools like Sentry to catch issues before users report them.

Key patterns - custom error types for different scenarios, async error handling with middleware, contextual logging with request IDs, and graceful degradation with fallbacks - provide the foundation for resilient applications. Monitor error rates, set up intelligent alerting, and use log aggregation to quickly diagnose and resolve production issues.

Effective observability reduces MTTR by 80%+ while improving user experience and system reliability.

Found this helpful? Share it!

Related Articles

S

Written by StaticBlock Editorial

StaticBlock Editorial is a technical writer and software engineer specializing in web development, performance optimization, and developer tooling.