Error Handling and Logging - Production Patterns for Robust Applications
Master error handling and logging with structured logging, error monitoring, exception handling patterns, log aggregation, alerting strategies, and observability best practices for production applications.
Effective error handling and logging are critical for maintaining production systems, reducing mean time to recovery (MTTR), and providing excellent user experience. This comprehensive guide covers production-ready patterns for handling errors gracefully, implementing structured logging, and building observable systems that surface issues before they impact users.
Why Error Handling Matters
User Experience: Proper error handling prevents application crashes, provides helpful feedback, and maintains data integrity when things go wrong.
Debugging Efficiency: Well-structured logs reduce debugging time from hours to minutes by providing context about what happened, when, and why.
Reliability: Graceful degradation and circuit breaker patterns keep applications functional even when dependencies fail.
Stripe processes 100M+ API requests daily with 99.99% uptime by implementing comprehensive error handling and real-time error tracking that alerts engineers within seconds of issues.
Error Handling Fundamentals
Error Types and Classification
Different error types require different handling strategies:
// Operational errors - expected failures (network issues, validation errors)
class OperationalError extends Error {
constructor(
message: string,
public statusCode: number = 500,
public code?: string
) {
super(message);
this.name = 'OperationalError';
}
}
// Programmer errors - bugs that should be fixed
class ProgrammerError extends Error {
constructor(message: string) {
super(message);
this.name = 'ProgrammerError';
}
}
// Validation errors
class ValidationError extends OperationalError {
constructor(
message: string,
public fields: Record<string, string[]>
) {
super(message, 400, 'VALIDATION_ERROR');
this.name = 'ValidationError';
}
}
// Not found errors
class NotFoundError extends OperationalError {
constructor(resource: string, id: string) {
super(${resource} with id ${id} not found, 404, 'NOT_FOUND');
this.name = 'NotFoundError';
}
}
// Unauthorized errors
class UnauthorizedError extends OperationalError {
constructor(message: string = 'Unauthorized') {
super(message, 401, 'UNAUTHORIZED');
this.name = 'UnauthorizedError';
}
}
// Database errors
class DatabaseError extends OperationalError {
constructor(message: string, public originalError: Error) {
super(message, 500, 'DATABASE_ERROR');
this.name = 'DatabaseError';
}
}
Express Error Handling Middleware
import express, { Request, Response, NextFunction } from 'express';
import { logger } from './logger';
// Async handler wrapper - catches async errors
export const asyncHandler = (
fn: (req: Request, res: Response, next: NextFunction) => Promise<any>
) => {
return (req: Request, res: Response, next: NextFunction) => {
Promise.resolve(fn(req, res, next)).catch(next);
};
};
// Error handling middleware
export const errorHandler = (
err: Error,
req: Request,
res: Response,
next: NextFunction
) => {
// Log error with context
logger.error('Request error', {
error: {
name: err.name,
message: err.message,
stack: err.stack
},
request: {
method: req.method,
url: req.url,
headers: req.headers,
body: req.body,
params: req.params,
query: req.query,
user: req.user?.id
}
});
// Handle operational errors
if (err instanceof OperationalError) {
return res.status(err.statusCode).json({
error: {
code: err.code,
message: err.message,
...(err instanceof ValidationError && { fields: err.fields })
}
});
}
// Handle known framework errors
if (err.name === 'CastError') {
return res.status(400).json({
error: {
code: 'INVALID_ID',
message: 'Invalid ID format'
}
});
}
if (err.name === 'JsonWebTokenError') {
return res.status(401).json({
error: {
code: 'INVALID_TOKEN',
message: 'Invalid authentication token'
}
});
}
// Handle Prisma errors
if (err.name === 'PrismaClientKnownRequestError') {
const prismaError = err as any;
if (prismaError.code === 'P2002') {
return res.status(409).json({
error: {
code: 'DUPLICATE_ENTRY',
message: 'Resource already exists',
field: prismaError.meta?.target?.[0]
}
});
}
if (prismaError.code === 'P2025') {
return res.status(404).json({
error: {
code: 'NOT_FOUND',
message: 'Resource not found'
}
});
}
}
// Programmer errors - don't expose internal details
logger.error('Unhandled error', {
error: {
name: err.name,
message: err.message,
stack: err.stack
}
});
res.status(500).json({
error: {
code: 'INTERNAL_ERROR',
message: 'An internal error occurred'
}
});
};
// 404 handler
export const notFoundHandler = (
req: Request,
res: Response,
next: NextFunction
) => {
res.status(404).json({
error: {
code: 'NOT_FOUND',
message: Route ${req.method} ${req.url} not found
}
});
};
// Usage
const app = express();
// Routes
app.get('/api/users/:id', asyncHandler(async (req, res) => {
const user = await prisma.user.findUnique({
where: { id: req.params.id }
});
if (!user) {
throw new NotFoundError('User', req.params.id);
}
res.json(user);
}));
// Error handlers (must be last)
app.use(notFoundHandler);
app.use(errorHandler);
Try-Catch Best Practices
// ❌ Bad - swallows errors
async function getUser(id: string) {
try {
return await prisma.user.findUnique({ where: { id } });
} catch (error) {
console.log('Error fetching user');
return null;
}
}
// ✅ Good - propagates errors
async function getUser(id: string) {
try {
return await prisma.user.findUnique({ where: { id } });
} catch (error) {
logger.error('Failed to fetch user', { userId: id, error });
throw new DatabaseError('Failed to fetch user', error);
}
}
// ✅ Good - handles specific errors
async function createUser(data: CreateUserDto) {
try {
return await prisma.user.create({ data });
} catch (error) {
if (error.code === 'P2002') {
throw new ValidationError('Email already exists', {
email: ['Email is already registered']
});
}
logger.error('Failed to create user', { data, error });
throw new DatabaseError('Failed to create user', error);
}
}
// ✅ Good - cleanup in finally
async function processFile(filePath: string) {
const fileHandle = await fs.open(filePath, 'r');
try {
const content = await fileHandle.readFile('utf-8');
return processContent(content);
} catch (error) {
logger.error('Failed to process file', { filePath, error });
throw error;
} finally {
// Always cleanup
await fileHandle.close();
}
}
Structured Logging
Structured logging uses JSON format instead of plain text, enabling powerful querying and analysis.
Winston Logger Setup
import winston from 'winston';
const logLevels = {
error: 0,
warn: 1,
info: 2,
http: 3,
debug: 4
};
const logColors = {
error: 'red',
warn: 'yellow',
info: 'green',
http: 'magenta',
debug: 'blue'
};
winston.addColors(logColors);
const consoleFormat = winston.format.combine(
winston.format.timestamp({ format: 'YYYY-MM-DD HH:mm:ss' }),
winston.format.colorize({ all: true }),
winston.format.printf(info => {
const { timestamp, level, message, ...meta } = info;
return ${timestamp} [${level}]: ${message} ${ Object.keys(meta).length ? JSON.stringify(meta, null, 2) : '' };
})
);
const jsonFormat = winston.format.combine(
winston.format.timestamp(),
winston.format.errors({ stack: true }),
winston.format.json()
);
export const logger = winston.createLogger({
levels: logLevels,
level: process.env.LOG_LEVEL || 'info',
transports: [
// Console output for development
new winston.transports.Console({
format: process.env.NODE_ENV === 'production' ? jsonFormat : consoleFormat
}),
// File output for production
new winston.transports.File({
filename: 'logs/error.log',
level: 'error',
format: jsonFormat
}),
new winston.transports.File({
filename: 'logs/combined.log',
format: jsonFormat
})
]
});
// Request logging middleware
export const requestLogger = (req: Request, res: Response, next: NextFunction) => {
const startTime = Date.now();
res.on('finish', () => {
const duration = Date.now() - startTime;
logger.http('HTTP Request', {
method: req.method,
url: req.url,
statusCode: res.statusCode,
duration,
userAgent: req.get('user-agent'),
ip: req.ip,
userId: req.user?.id
});
});
next();
};
Contextual Logging
// Add context to all logs within a request
import { AsyncLocalStorage } from 'async_hooks';
const asyncLocalStorage = new AsyncLocalStorage<Map<string, any>>();
// Middleware to create request context
export const contextMiddleware = (
req: Request,
res: Response,
next: NextFunction
) => {
const context = new Map<string, any>();
context.set('requestId', generateRequestId());
context.set('userId', req.user?.id);
context.set('method', req.method);
context.set('url', req.url);
asyncLocalStorage.run(context, () => next());
};
// Logger with automatic context
class ContextLogger {
private getContext(): Record<string, any> {
const context = asyncLocalStorage.getStore();
if (!context) return {};
return Object.fromEntries(context.entries());
}
info(message: string, meta?: Record<string, any>) {
logger.info(message, { ...this.getContext(), ...meta });
}
error(message: string, meta?: Record<string, any>) {
logger.error(message, { ...this.getContext(), ...meta });
}
warn(message: string, meta?: Record<string, any>) {
logger.warn(message, { ...this.getContext(), ...meta });
}
debug(message: string, meta?: Record<string, any>) {
logger.debug(message, { ...this.getContext(), ...meta });
}
}
export const contextLogger = new ContextLogger();
// Usage
app.use(contextMiddleware);
app.get('/api/users/:id', async (req, res) => {
// All logs automatically include requestId, userId, method, url
contextLogger.info('Fetching user');
const user = await prisma.user.findUnique({
where: { id: req.params.id }
});
contextLogger.info('User fetched successfully', { userId: user.id });
res.json(user);
});
Log Levels and Best Practices
// ERROR - Critical issues requiring immediate attention
logger.error('Payment processing failed', {
orderId: order.id,
amount: order.total,
error: error.message,
stack: error.stack
});
// WARN - Potential issues that don't stop execution
logger.warn('Cache miss for frequently accessed data', {
key: 'user:profile:123',
hitRate: 0.65
});
// INFO - Important business events
logger.info('Order created', {
orderId: order.id,
userId: order.userId,
total: order.total,
items: order.items.length
});
// HTTP - Request/response logging
logger.http('API request', {
method: 'GET',
url: '/api/products',
statusCode: 200,
duration: 45
});
// DEBUG - Detailed diagnostic information
logger.debug('Database query executed', {
query: 'SELECT * FROM users WHERE id = ?',
params: [userId],
duration: 12
});
// ❌ Bad - logging sensitive data
logger.info('User login', {
email: user.email,
password: user.password // Never log passwords!
});
// ✅ Good - redact sensitive data
logger.info('User login', {
email: user.email,
userId: user.id
});
// ❌ Bad - logging in loops
users.forEach(user => {
logger.info('Processing user', { userId: user.id });
processUser(user);
});
// ✅ Good - aggregate logging
logger.info('Processing users', {
count: users.length,
startTime: Date.now()
});
users.forEach(user => processUser(user));
logger.info('Users processed', {
count: users.length,
duration: Date.now() - startTime
});
Error Monitoring and Alerting
Sentry Integration
import * as Sentry from '@sentry/node';
import { ProfilingIntegration } from '@sentry/profiling-node';
Sentry.init({
dsn: process.env.SENTRY_DSN,
environment: process.env.NODE_ENV,
integrations: [
new ProfilingIntegration(),
new Sentry.Integrations.Http({ tracing: true }),
new Sentry.Integrations.Express({ app })
],
tracesSampleRate: process.env.NODE_ENV === 'production' ? 0.1 : 1.0,
profilesSampleRate: 0.1,
beforeSend(event, hint) {
// Filter sensitive data
if (event.request) {
delete event.request.cookies;
delete event.request.headers?.['authorization'];
}
return event;
}
});
// Request handler must be first
app.use(Sentry.Handlers.requestHandler());
app.use(Sentry.Handlers.tracingHandler());
// Routes
app.get('/api/users/:id', async (req, res) => {
const transaction = Sentry.startTransaction({
op: 'http.server',
name: 'GET /api/users/:id'
});
try {
const span = transaction.startChild({
op: 'db.query',
description: 'Fetch user from database'
});
const user = await prisma.user.findUnique({
where: { id: req.params.id }
});
span.finish();
if (!user) {
throw new NotFoundError('User', req.params.id);
}
res.json(user);
} catch (error) {
// Add context to error
Sentry.setContext('user', {
id: req.params.id,
requestedBy: req.user?.id
});
Sentry.captureException(error);
throw error;
} finally {
transaction.finish();
}
});
// Error handler must be after routes
app.use(Sentry.Handlers.errorHandler());
app.use(errorHandler);
Custom Error Tracking
class ErrorTracker {
private errors = new Map<string, {
count: number;
lastOccurrence: Date;
samples: Error[];
}>();
track(error: Error): void {
const key = ${error.name}:${error.message};
const existing = this.errors.get(key);
if (existing) {
existing.count++;
existing.lastOccurrence = new Date();
// Keep last 5 samples
if (existing.samples.length < 5) {
existing.samples.push(error);
}
} else {
this.errors.set(key, {
count: 1,
lastOccurrence: new Date(),
samples: [error]
});
}
// Alert if error occurs frequently
if (existing && existing.count >= 10) {
this.alertFrequentError(key, existing);
}
}
private async alertFrequentError(
key: string,
data: { count: number; lastOccurrence: Date; samples: Error[] }
): Promise<void> {
logger.error('Frequent error detected', {
errorKey: key,
occurrences: data.count,
lastOccurrence: data.lastOccurrence,
sample: {
message: data.samples[0].message,
stack: data.samples[0].stack
}
});
// Send to alerting system (PagerDuty, Slack, etc.)
await this.sendAlert({
severity: 'high',
title: 'Frequent Error Detected',
description: `Error "${key}" occurred ${data.count} times in the last hour`,
context: data.samples[0]
});
}
getStats(): Array<{ key: string; count: number; lastOccurrence: Date }> {
return Array.from(this.errors.entries()).map(([key, data]) => ({
key,
count: data.count,
lastOccurrence: data.lastOccurrence
}));
}
reset(): void {
this.errors.clear();
}
}
export const errorTracker = new ErrorTracker();
// Reset stats hourly
setInterval(() => errorTracker.reset(), 60 * 60 * 1000);
// Use in error handler
app.use((err: Error, req: Request, res: Response, next: NextFunction) => {
errorTracker.track(err);
errorHandler(err, req, res, next);
});
Graceful Degradation Patterns
Circuit Breaker
Prevent cascading failures by stopping requests to failing services:
class CircuitBreaker {
private failures = 0;
private lastFailureTime?: Date;
private state: 'closed' | 'open' | 'half-open' = 'closed';
constructor(
private threshold: number = 5,
private timeout: number = 60000, // 1 minute
private halfOpenRequests: number = 3
) {}
async execute<T>(fn: () => Promise<T>): Promise<T> {
if (this.state === 'open') {
// Check if timeout has passed
if (
this.lastFailureTime &&
Date.now() - this.lastFailureTime.getTime() > this.timeout
) {
this.state = 'half-open';
this.failures = 0;
} else {
throw new Error('Circuit breaker is OPEN');
}
}
try {
const result = await fn();
// Success - reset if in half-open state
if (this.state === 'half-open') {
this.state = 'closed';
this.failures = 0;
}
return result;
} catch (error) {
this.failures++;
this.lastFailureTime = new Date();
if (this.failures >= this.threshold) {
this.state = 'open';
logger.warn('Circuit breaker opened', {
failures: this.failures,
threshold: this.threshold
});
}
throw error;
}
}
getState(): string {
return this.state;
}
}
// Usage
const paymentServiceBreaker = new CircuitBreaker(5, 60000);
async function processPayment(orderId: string, amount: number) {
try {
return await paymentServiceBreaker.execute(async () => {
return await paymentService.charge({ orderId, amount });
});
} catch (error) {
if (error.message === 'Circuit breaker is OPEN') {
// Fallback behavior
logger.warn('Payment service unavailable, queueing for later', {
orderId,
amount
});
await queuePayment({ orderId, amount });
return { status: 'queued' };
}
throw error;
}
}
Retry with Exponential Backoff
async function retry<T>(
fn: () => Promise<T>,
maxAttempts: number = 3,
baseDelay: number = 1000
): Promise<T> {
let lastError: Error;
for (let attempt = 1; attempt <= maxAttempts; attempt++) {
try {
return await fn();
} catch (error) {
lastError = error;
if (attempt === maxAttempts) {
logger.error('All retry attempts failed', {
attempts: maxAttempts,
error: error.message
});
break;
}
// Exponential backoff with jitter
const delay = baseDelay * Math.pow(2, attempt - 1);
const jitter = Math.random() * delay * 0.1;
const totalDelay = delay + jitter;
logger.warn('Retry attempt failed', {
attempt,
maxAttempts,
nextRetryIn: totalDelay,
error: error.message
});
await new Promise(resolve => setTimeout(resolve, totalDelay));
}
}
throw lastError!;
}
// Usage
const user = await retry(
() => externalAPI.fetchUser(userId),
3,
1000
);
Timeout Wrapper
async function withTimeout<T>(
promise: Promise<T>,
timeoutMs: number,
timeoutError: string = 'Operation timed out'
): Promise<T> {
let timeoutId: NodeJS.Timeout;
const timeout = new Promise<never>((_, reject) => {
timeoutId = setTimeout(() => {
reject(new Error(timeoutError));
}, timeoutMs);
});
try {
return await Promise.race([promise, timeout]);
} finally {
clearTimeout(timeoutId!);
}
}
// Usage
try {
const result = await withTimeout(
slowExternalAPI.fetchData(),
5000,
'External API timed out after 5 seconds'
);
} catch (error) {
if (error.message.includes('timed out')) {
// Handle timeout
logger.warn('API request timed out, using cached data');
return getCachedData();
}
throw error;
}
Log Aggregation and Analysis
Elasticsearch and Kibana
import { Client } from '@elastic/elasticsearch';
const esClient = new Client({
node: process.env.ELASTICSEARCH_URL || 'http://localhost:9200'
});
// Winston transport for Elasticsearch
import TransportStream from 'winston-transport';
class ElasticsearchTransport extends TransportStream {
async log(info: any, callback: () => void) {
try {
await esClient.index({
index: logs-${new Date().toISOString().slice(0, 10)},
document: {
'@timestamp': new Date().toISOString(),
level: info.level,
message: info.message,
...info
}
});
} catch (error) {
console.error('Failed to send log to Elasticsearch:', error);
}
callback();
}
}
logger.add(new ElasticsearchTransport());
// Query logs
async function searchLogs(query: string, from: Date, to: Date) {
const { hits } = await esClient.search({
index: 'logs-*',
body: {
query: {
bool: {
must: [
{
query_string: {
query
}
},
{
range: {
'@timestamp': {
gte: from.toISOString(),
lte: to.toISOString()
}
}
}
]
}
},
sort: [{ '@timestamp': 'desc' }],
size: 100
}
});
return hits.hits.map(hit => hit._source);
}
Real-World Examples
Airbnb's Error Handling
Airbnb uses multi-layered error handling:
- Client-side: Fallback UI components for failed loads
- API Gateway: Circuit breakers for backend services
- Service Layer: Retry logic with exponential backoff
- Monitoring: Real-time error rate alerts triggering auto-rollback
Their system recovers from partial outages without user-visible errors, maintaining booking functionality even when search is degraded.
Stripe's Logging Strategy
Stripe implements comprehensive logging:
- Structured JSON logs: Every request logged with 50+ metadata fields
- Log sampling: 100% for errors, 1% for successful requests at scale
- Retention: 30 days hot, 1 year cold storage
- Alerting: Automated alerts on error rate spikes (>1% increase)
This enables Stripe to debug payment issues across billions of transactions with average resolution time under 15 minutes.
Netflix's Chaos Engineering
Netflix intentionally injects failures to test error handling:
- Chaos Monkey: Randomly terminates instances
- Latency Monkey: Adds artificial delays
- Conformity Monkey: Shuts down non-compliant instances
Their error handling ensures zero user impact during individual service failures, maintaining 99.99% streaming uptime across 200M+ subscribers.
Conclusion
Robust error handling and comprehensive logging are essential for production systems. Implement structured logging with appropriate levels, handle errors gracefully with circuit breakers and retries, and use error monitoring tools like Sentry to catch issues before users report them.
Key patterns - custom error types for different scenarios, async error handling with middleware, contextual logging with request IDs, and graceful degradation with fallbacks - provide the foundation for resilient applications. Monitor error rates, set up intelligent alerting, and use log aggregation to quickly diagnose and resolve production issues.
Effective observability reduces MTTR by 80%+ while improving user experience and system reliability.
Related Articles
GraphQL API Design - Production Architecture and Best Practices for Scalable Systems
Master GraphQL API design covering schema design principles, resolver optimization, N+1 query prevention with DataLoader, authentication and authorization patterns, caching strategies, error handling, and production deployment for high-performance GraphQL systems.
Testing Strategies - Unit, Integration, and E2E Testing Best Practices for Production Quality
Comprehensive guide to testing strategies covering unit tests, integration tests, end-to-end testing, test-driven development, mocking patterns, testing pyramid, and production testing practices for reliable software delivery.
Monitoring and Observability - Production Systems Performance and Debugging at Scale
Master monitoring and observability covering metrics collection with Prometheus, distributed tracing with OpenTelemetry, log aggregation, alerting strategies, SLOs/SLIs, and production debugging techniques for reliable systems.
Written by StaticBlock Editorial
StaticBlock Editorial is a technical writer and software engineer specializing in web development, performance optimization, and developer tooling.