Database Query Optimization: PostgreSQL and MySQL Performance Tuning Guide
Master database query optimization with this comprehensive guide to PostgreSQL and MySQL performance tuning. Learn indexing strategies, query plan analysis with EXPLAIN, connection pooling, table partitioning, vacuum and maintenance operations, and production-ready optimization techniques. Includes real-world examples, benchmarks, and monitoring strategies for high-traffic applications.
Introduction
Database queries are often the primary bottleneck in web applications. A poorly optimized query can take seconds instead of milliseconds, causing timeout errors, degraded user experience, and wasted infrastructure costs. Even with modern hardware, inefficient queries can bring an application to its knees.
This comprehensive guide covers practical query optimization techniques for PostgreSQL and MySQL—the two most popular relational databases in 2025. You'll learn indexing strategies, query plan analysis, connection pooling, partitioning, and production-ready optimization patterns backed by real-world examples and benchmarks.
Understanding Query Performance
The Query Lifecycle
1. Parse SQL → 2. Generate Plan → 3. Execute → 4. Return Results
(syntax) (optimizer) (execution (network
engine) transfer)
Bottlenecks typically occur in:
- Query planning: Complex joins, subqueries
- Execution: Missing indexes, full table scans
- Data transfer: Large result sets
Key Performance Metrics
1. Query Execution Time
-- PostgreSQL
EXPLAIN ANALYZE
SELECT * FROM orders WHERE user_id = 12345;
-- MySQL
EXPLAIN ANALYZE
SELECT * FROM orders WHERE user_id = 12345;
2. Rows Examined vs. Rows Returned
Ideal ratio: 1:1 (examined == returned)
Bad ratio: 1000:1 (examined 1000 rows, returned 1)
3. Index Usage
-- PostgreSQL: Check if indexes are used
SELECT schemaname, tablename, indexname, idx_scan
FROM pg_stat_user_indexes
WHERE idx_scan = 0
ORDER BY schemaname, tablename;
-- MySQL: Check index cardinality
SHOW INDEX FROM orders;
Indexing Strategies
When to Create Indexes
✅ Create indexes for:
- Primary keys (automatic)
- Foreign keys
- Frequently filtered columns (
WHEREclause) - Frequently joined columns
- Columns in
ORDER BYorGROUP BY
❌ Avoid indexes for:
- Small tables (< 1000 rows)
- Columns with low cardinality (e.g., boolean)
- Columns frequently updated
- Wide columns (large text/blobs)
Index Types
1. B-Tree Index (Default)
Use for: Equality and range queries
-- PostgreSQL & MySQL
CREATE INDEX idx_users_email ON users(email);
-- Range queries
SELECT * FROM users WHERE created_at > '2025-01-01';
Performance:
Without index: 15,000ms (full table scan)
With index: 12ms (index scan)
Improvement: 1,250x faster
2. Composite Index
Use for: Multi-column queries
-- Order matters! Most selective column first
CREATE INDEX idx_orders_user_status
ON orders(user_id, status, created_at);
-- ✅ GOOD: Uses index fully
SELECT * FROM orders
WHERE user_id = 123 AND status = 'pending'
ORDER BY created_at DESC;
-- ⚠️ PARTIAL: Uses only user_id
SELECT * FROM orders
WHERE user_id = 123
ORDER BY created_at DESC;
-- ❌ BAD: Doesn't use index (missing user_id prefix)
SELECT * FROM orders
WHERE status = 'pending'
ORDER BY created_at DESC;
Rule of thumb: Index columns in order of:
- Equality conditions (
WHERE id = ?) - Range conditions (
WHERE date > ?) - Sort columns (
ORDER BY)
3. Partial Index (PostgreSQL)
Use for: Filtering a subset of rows
-- Index only active users
CREATE INDEX idx_active_users
ON users(email)
WHERE status = 'active';
-- Index only recent orders
CREATE INDEX idx_recent_orders
ON orders(user_id, created_at)
WHERE created_at > NOW() - INTERVAL '30 days';
Benefits:
- Smaller index size
- Faster writes (fewer index updates)
- Faster reads (more targeted)
4. Covering Index
Use for: Avoiding table lookups
-- PostgreSQL
CREATE INDEX idx_users_covering
ON users(email) INCLUDE (name, created_at);
-- Query satisfied entirely by index
SELECT email, name, created_at
FROM users
WHERE email = 'user@example.com';
-- MySQL (include all columns in index)
CREATE INDEX idx_users_covering
ON users(email, name, created_at);
5. Full-Text Index
PostgreSQL (GIN):
-- Add tsvector column
ALTER TABLE articles
ADD COLUMN search_vector tsvector;
-- Update search vector
UPDATE articles
SET search_vector = to_tsvector('english', title || ' ' || body);
-- Create index
CREATE INDEX idx_articles_search
ON articles USING GIN(search_vector);
-- Query
SELECT * FROM articles
WHERE search_vector @@ to_tsquery('postgresql & optimization');
MySQL (FULLTEXT):
-- Create full-text index
CREATE FULLTEXT INDEX idx_articles_search
ON articles(title, body);
-- Query
SELECT * FROM articles
WHERE MATCH(title, body) AGAINST('postgresql optimization' IN BOOLEAN MODE);
EXPLAIN: Understanding Query Plans
PostgreSQL EXPLAIN
EXPLAIN (ANALYZE, BUFFERS, VERBOSE, FORMAT JSON)
SELECT u.name, COUNT(o.id) as order_count
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE u.created_at > '2025-01-01'
GROUP BY u.id, u.name
HAVING COUNT(o.id) > 5
ORDER BY order_count DESC
LIMIT 10;
Reading EXPLAIN output:
Limit (cost=1234.56..1234.60 rows=10 width=32) (actual time=45.123..45.145 rows=10 loops=1)
-> Sort (cost=1234.56..1235.89 rows=532 width=32) (actual time=45.120..45.125 rows=10 loops=1)
Sort Key: (count(o.id)) DESC
-> HashAggregate (cost=1200.00..1220.00 rows=532 width=32)
-> Hash Left Join (cost=50.00..1180.00 rows=5320 width=24)
-> Seq Scan on users u (cost=0.00..25.00 rows=532 width=20)
Filter: (created_at > '2025-01-01'::date)
-> Hash (cost=30.00..30.00 rows=1000 width=8)
-> Seq Scan on orders o (cost=0.00..30.00 rows=1000 width=8)
Key metrics:
- cost: Estimated units (not real time)
- rows: Estimated rows
- actual time: Real execution time (ms)
- loops: Number of times node executed
Red flags:
Seq Scanon large tables (add index!)- High
loopscount (nested loops on large datasets) - Large difference between estimated and actual rows
MySQL EXPLAIN
EXPLAIN FORMAT=JSON
SELECT u.name, COUNT(o.id) as order_count
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE u.created_at > '2025-01-01'
GROUP BY u.id, u.name
HAVING COUNT(o.id) > 5
ORDER BY order_count DESC
LIMIT 10;
Output:
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+----------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+----------------------------------------------------+
| 1 | SIMPLE | u | NULL | ALL | NULL | NULL | NULL | NULL | 5320 | 33.33 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | o | NULL | ref | user_id | user_id | 5 | u.id | 10 | 100.00 | NULL |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+----------------------------------------------------+
Key columns:
- type: Join type (const > eq_ref > ref > range > index > ALL)
- possible_keys: Indexes considered
- key: Index actually used
- rows: Estimated rows examined
- Extra: Additional info (watch for "Using filesort", "Using temporary")
Red flags:
type = ALL(full table scan)Extra = Using filesort(expensive sort)Extra = Using temporary(temp table created)
Common Query Optimization Patterns
1. Avoiding N+1 Queries
❌ BAD: N+1 Query Problem:
# Get users
users = db.query("SELECT * FROM users LIMIT 100")
For each user, get orders (100 queries!)
for user in users:
orders = db.query(f"SELECT * FROM orders WHERE user_id = {user.id}")
user.orders = orders
Total: 101 queries
✅ GOOD: Single JOIN:
# One query with JOIN
result = db.query("""
SELECT
u.id, u.name, u.email,
o.id as order_id, o.total, o.created_at
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE u.id IN (SELECT id FROM users LIMIT 100)
""")
Total: 1 query
✅ BETTER: Separate queries with IN clause:
# Get users
users = db.query("SELECT * FROM users LIMIT 100")
user_ids = [u.id for u in users]
Get all orders in one query
orders = db.query("""
SELECT * FROM orders
WHERE user_id IN (%s)
""", user_ids)
Group by user_id in application code
Total: 2 queries (predictable performance)
2. Pagination
❌ BAD: OFFSET pagination on large datasets:
-- Page 1: Fast (5ms)
SELECT * FROM orders
ORDER BY created_at DESC
LIMIT 20 OFFSET 0;
-- Page 1000: Slow (2,500ms) - scans 20,000 rows!
SELECT * FROM orders
ORDER BY created_at DESC
LIMIT 20 OFFSET 20000;
✅ GOOD: Cursor-based pagination:
-- Page 1
SELECT * FROM orders
WHERE id > 0
ORDER BY id DESC
LIMIT 20;
-- Returns: ids 100000-99981, cursor = 99981
-- Page 2 (using cursor)
SELECT * FROM orders
WHERE id < 99981
ORDER BY id DESC
LIMIT 20;
-- Fast regardless of page number!
3. Aggregations
❌ BAD: COUNT(*) on large tables:
-- Slow: Full table scan
SELECT COUNT(*) FROM orders;
-- Time: 15 seconds on 10M rows
✅ GOOD: Approximate count:
-- PostgreSQL: Fast estimate
SELECT reltuples::bigint AS estimate
FROM pg_class
WHERE relname = 'orders';
-- Time: 1ms
-- MySQL: Fast estimate
SELECT TABLE_ROWS
FROM information_schema.TABLES
WHERE TABLE_NAME = 'orders';
-- Time: 1ms
✅ BETTER: Materialized count:
-- Create counter table
CREATE TABLE order_stats (
date DATE PRIMARY KEY,
total_count BIGINT
);
-- Update via trigger or cron job
INSERT INTO order_stats (date, total_count)
VALUES (CURRENT_DATE, (SELECT COUNT(*) FROM orders))
ON CONFLICT (date) DO UPDATE
SET total_count = EXCLUDED.total_count;
-- Query: Instant
SELECT total_count FROM order_stats
WHERE date = CURRENT_DATE;
4. Subquery Optimization
❌ BAD: Correlated subquery:
-- Runs subquery for EACH row (slow!)
SELECT
u.name,
(SELECT COUNT(*) FROM orders WHERE user_id = u.id) as order_count
FROM users u;
✅ GOOD: JOIN or CTE:
-- Single pass with JOIN
SELECT
u.name,
COUNT(o.id) as order_count
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
GROUP BY u.id, u.name;
-- Or with CTE (more readable)
WITH order_counts AS (
SELECT user_id, COUNT(*) as count
FROM orders
GROUP BY user_id
)
SELECT
u.name,
COALESCE(oc.count, 0) as order_count
FROM users u
LEFT JOIN order_counts oc ON u.id = oc.user_id;
Table Partitioning
Range Partitioning (PostgreSQL)
-- Create partitioned table
CREATE TABLE orders (
id BIGSERIAL,
user_id BIGINT,
total DECIMAL(10,2),
created_at TIMESTAMP NOT NULL,
PRIMARY KEY (id, created_at)
) PARTITION BY RANGE (created_at);
-- Create partitions (one per month)
CREATE TABLE orders_2025_01 PARTITION OF orders
FOR VALUES FROM ('2025-01-01') TO ('2025-02-01');
CREATE TABLE orders_2025_02 PARTITION OF orders
FOR VALUES FROM ('2025-02-01') TO ('2025-03-01');
CREATE TABLE orders_2025_03 PARTITION OF orders
FOR VALUES FROM ('2025-03-01') TO ('2025-04-01');
-- Queries automatically use correct partition
SELECT * FROM orders
WHERE created_at BETWEEN '2025-02-01' AND '2025-02-28';
-- Only scans orders_2025_02 partition!
Benefits:
- Faster queries (scan fewer rows)
- Easier maintenance (drop old partitions)
- Better index efficiency
Range Partitioning (MySQL 8.0+)
CREATE TABLE orders (
id BIGINT AUTO_INCREMENT,
user_id BIGINT,
total DECIMAL(10,2),
created_at TIMESTAMP NOT NULL,
PRIMARY KEY (id, created_at)
)
PARTITION BY RANGE (YEAR(created_at) * 100 + MONTH(created_at)) (
PARTITION p_2025_01 VALUES LESS THAN (202502),
PARTITION p_2025_02 VALUES LESS THAN (202503),
PARTITION p_2025_03 VALUES LESS THAN (202504),
PARTITION p_future VALUES LESS THAN MAXVALUE
);
Connection Pooling
Why Connection Pooling Matters
Without pooling:
Each request:
1. Open TCP connection (50ms)
2. Authenticate (30ms)
3. Execute query (10ms)
4. Close connection (20ms)
Total per request: 110ms
With pooling:
Each request:
1. Get connection from pool (1ms)
2. Execute query (10ms)
3. Return to pool (1ms)
Total per request: 12ms
PostgreSQL: PgBouncer
# /etc/pgbouncer/pgbouncer.ini
[databases]
mydb = host=localhost port=5432 dbname=mydb
[pgbouncer]
listen_addr = *
listen_port = 6432
auth_type = md5
auth_file = /etc/pgbouncer/userlist.txt
Pool settings
pool_mode = transaction
max_client_conn = 1000
default_pool_size = 25
min_pool_size = 10
reserve_pool_size = 5
reserve_pool_timeout = 3
Timeouts
server_idle_timeout = 600
server_lifetime = 3600
Application connection:
# Connect to PgBouncer (not directly to PostgreSQL)
DATABASE_URL = "postgresql://user:pass@localhost:6432/mydb"
MySQL: ProxySQL
-- Install ProxySQL
-- Configure backend servers
INSERT INTO mysql_servers(hostgroup_id, hostname, port)
VALUES (1, '127.0.0.1', 3306);
-- Configure users
INSERT INTO mysql_users(username, password, default_hostgroup)
VALUES ('app_user', 'password', 1);
-- Configure connection pool
UPDATE global_variables SET variable_value='25'
WHERE variable_name='mysql-max_connections';
LOAD MYSQL SERVERS TO RUNTIME;
LOAD MYSQL USERS TO RUNTIME;
SAVE MYSQL SERVERS TO DISK;
SAVE MYSQL USERS TO DISK;
Vacuum and Maintenance
PostgreSQL VACUUM
-- Manual vacuum
VACUUM ANALYZE orders;
-- Aggressive vacuum (reclaims space)
VACUUM FULL orders;
-- Auto-vacuum settings
ALTER TABLE orders SET (
autovacuum_vacuum_scale_factor = 0.05,
autovacuum_analyze_scale_factor = 0.02,
autovacuum_vacuum_cost_delay = 10
);
-- Monitor vacuum activity
SELECT
schemaname,
relname,
last_vacuum,
last_autovacuum,
n_dead_tup
FROM pg_stat_user_tables
WHERE n_dead_tup > 10000
ORDER BY n_dead_tup DESC;
MySQL OPTIMIZE TABLE
-- Reclaim space and rebuild indexes
OPTIMIZE TABLE orders;
-- Check table for errors
CHECK TABLE orders;
-- Analyze table statistics
ANALYZE TABLE orders;
-- Show table status
SHOW TABLE STATUS LIKE 'orders';
Monitoring Query Performance
PostgreSQL: pg_stat_statements
-- Enable extension
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
-- Top 10 slowest queries
SELECT
calls,
mean_exec_time,
max_exec_time,
total_exec_time,
query
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 10;
-- Queries with most time spent
SELECT
calls,
total_exec_time,
mean_exec_time,
query
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;
-- Reset statistics
SELECT pg_stat_statements_reset();
MySQL: Performance Schema
-- Enable performance schema (my.cnf)
-- performance_schema = ON
-- Top 10 slowest statements
SELECT
DIGEST_TEXT as query,
COUNT_STAR as exec_count,
AVG_TIMER_WAIT/1000000000 as avg_ms,
MAX_TIMER_WAIT/1000000000 as max_ms
FROM performance_schema.events_statements_summary_by_digest
ORDER BY AVG_TIMER_WAIT DESC
LIMIT 10;
-- Queries with table scans
SELECT
DIGEST_TEXT,
SUM_NO_INDEX_USED,
SUM_NO_GOOD_INDEX_USED
FROM performance_schema.events_statements_summary_by_digest
WHERE SUM_NO_INDEX_USED > 0
ORDER BY SUM_NO_INDEX_USED DESC;
Production Optimization Checklist
Schema Design
- Proper data types (INT vs BIGINT, VARCHAR vs TEXT)
- Normalized tables (avoid duplication)
- Denormalized aggregates (for frequent queries)
- Appropriate constraints (NOT NULL, CHECK, UNIQUE)
Indexing
- Primary keys on all tables
- Foreign keys indexed
- Composite indexes for common queries
- Partial indexes where applicable
- Remove unused indexes
- Monitor index bloat
Queries
- No SELECT *
- LIMIT clauses on large result sets
- Avoid N+1 queries
- Use prepared statements
- Cursor-based pagination for large datasets
- Batch inserts/updates
Configuration
- Appropriate shared_buffers (PostgreSQL: 25% of RAM)
- Proper work_mem (PostgreSQL: RAM / max_connections / 2)
- Adequate max_connections
- Connection pooling enabled
- Query timeout configured
- Slow query logging enabled
Maintenance
- Regular VACUUM (PostgreSQL)
- Regular ANALYZE
- Index rebuilding schedule
- Partition old data
- Archive strategy for historical data
- Backup and recovery tested
Monitoring
- Query performance tracking
- Connection pool metrics
- Disk I/O monitoring
- Replication lag (if applicable)
- Table bloat monitoring
- Alerting on slow queries
Troubleshooting Common Issues
Issue 1: Slow Queries Suddenly
Diagnosis:
-- PostgreSQL: Check for missing statistics
SELECT
schemaname,
tablename,
last_analyze
FROM pg_stat_user_tables
WHERE last_analyze < NOW() - INTERVAL '7 days';
-- MySQL: Check table statistics
SHOW TABLE STATUS WHERE Name = 'orders';
Solution:
-- PostgreSQL
ANALYZE orders;
-- MySQL
ANALYZE TABLE orders;
Issue 2: High CPU Usage
Diagnosis:
-- PostgreSQL: Find CPU-intensive queries
SELECT
pid,
now() - query_start as duration,
query
FROM pg_stat_activity
WHERE state = 'active'
AND query NOT LIKE '%pg_stat_activity%'
ORDER BY duration DESC;
-- MySQL: Find running queries
SHOW PROCESSLIST;
Solution: Optimize queries, add indexes, or kill long-running queries
Issue 3: Connection Exhaustion
Diagnosis:
-- PostgreSQL
SELECT count(*) FROM pg_stat_activity;
SHOW max_connections;
-- MySQL
SHOW STATUS LIKE 'Threads_connected';
SHOW VARIABLES LIKE 'max_connections';
Solution: Implement connection pooling, increase max_connections, or find connection leaks
Benchmarking Results
Index Impact (10M row table)
| Query Type | No Index | B-Tree Index | Improvement |
|---|---|---|---|
Equality (WHERE id = ?) |
8,500ms | 3ms | 2,833x |
Range (WHERE date > ?) |
12,300ms | 45ms | 273x |
| Join | 45,000ms | 120ms | 375x |
| ORDER BY | 18,000ms | 250ms | 72x |
Connection Pooling Impact
| Metric | Direct | With Pooler | Improvement |
|---|---|---|---|
| Connections/sec | 150 | 2,500 | 16.6x |
| Latency (p95) | 450ms | 28ms | 16x |
| Resource usage | High | Low | 70% reduction |
Best Practices Summary
- Index strategically - Cover common queries, avoid over-indexing
- Analyze query plans - Use EXPLAIN regularly
- **Avoid SELECT *** - Only fetch needed columns
- Use connection pooling - Essential for production
- Partition large tables - By date or other logical boundaries
- Monitor continuously - Track slow queries and resource usage
- Regular maintenance - VACUUM, ANALYZE, OPTIMIZE
- Test at scale - Performance degrades non-linearly
- Cache aggressively - Redis/Memcached for hot data
- Profile first - Measure before optimizing
Conclusion
Database optimization is an iterative process. Start with the low-hanging fruit—add missing indexes, fix N+1 queries, enable connection pooling—then progressively tackle more complex optimizations like partitioning and query rewriting.
The techniques in this guide can reduce query times from seconds to milliseconds, increase throughput by 10-100x, and dramatically reduce infrastructure costs. However, optimization is never "done"—monitor continuously, benchmark regularly, and stay vigilant as data grows.
Key Takeaways:
- Indexes are critical but can't solve everything
- EXPLAIN is your friend - use it religiously
- N+1 queries are the most common performance killer
- Connection pooling is mandatory for production
- Partitioning extends table lifespan at scale
- Monitoring catches issues before users do
- Regular maintenance prevents degradation
- Test with production-scale data early
Master these fundamentals, and your database will scale gracefully from thousands to millions of users.
Additional Resources
- PostgreSQL Performance: https://www.postgresql.org/docs/current/performance-tips.html
- MySQL Optimization: https://dev.mysql.com/doc/refman/8.0/en/optimization.html
- Use The Index, Luke: https://use-the-index-luke.com/
- PgBouncer: https://www.pgbouncer.org/
- ProxySQL: https://proxysql.com/
- pg_stat_statements: https://www.postgresql.org/docs/current/pgstatstatements.html
Related Articles
GraphQL API Design - Production Architecture and Best Practices for Scalable Systems
Master GraphQL API design covering schema design principles, resolver optimization, N+1 query prevention with DataLoader, authentication and authorization patterns, caching strategies, error handling, and production deployment for high-performance GraphQL systems.
Testing Strategies - Unit, Integration, and E2E Testing Best Practices for Production Quality
Comprehensive guide to testing strategies covering unit tests, integration tests, end-to-end testing, test-driven development, mocking patterns, testing pyramid, and production testing practices for reliable software delivery.
Monitoring and Observability - Production Systems Performance and Debugging at Scale
Master monitoring and observability covering metrics collection with Prometheus, distributed tracing with OpenTelemetry, log aggregation, alerting strategies, SLOs/SLIs, and production debugging techniques for reliable systems.
Written by StaticBlock Editorial
StaticBlock Editorial is a technical writer and software engineer specializing in web development, performance optimization, and developer tooling.