Database Query Optimization: PostgreSQL and MySQL Performance Tuning Guide

Introduction

Database queries are often the primary bottleneck in web applications. A poorly optimized query can take seconds instead of milliseconds, causing timeout errors, degraded user experience, and wasted infrastructure costs. Even with modern hardware, inefficient queries can bring an application to its knees.

This comprehensive guide covers practical query optimization techniques for PostgreSQL and MySQL—the two most popular relational databases in 2025. You'll learn indexing strategies, query plan analysis, connection pooling, partitioning, and production-ready optimization patterns backed by real-world examples and benchmarks.

Understanding Query Performance

The Query Lifecycle

1. Parse SQL → 2. Generate Plan → 3. Execute → 4. Return Results
   (syntax)      (optimizer)      (execution    (network
                                   engine)       transfer)

Bottlenecks typically occur in:

Query planning: Complex joins, subqueries
Execution: Missing indexes, full table scans
Data transfer: Large result sets

Key Performance Metrics

1. Query Execution Time

-- PostgreSQL
EXPLAIN ANALYZE
SELECT * FROM orders WHERE user_id = 12345;
-- MySQL
EXPLAIN ANALYZE
SELECT * FROM orders WHERE user_id = 12345;

2. Rows Examined vs. Rows Returned

Ideal ratio: 1:1 (examined == returned)
Bad ratio: 1000:1 (examined 1000 rows, returned 1)

3. Index Usage

-- PostgreSQL: Check if indexes are used
SELECT schemaname, tablename, indexname, idx_scan
FROM pg_stat_user_indexes
WHERE idx_scan = 0
ORDER BY schemaname, tablename;
-- MySQL: Check index cardinality
SHOW INDEX FROM orders;

Indexing Strategies

When to Create Indexes

✅ Create indexes for:

Primary keys (automatic)
Foreign keys
Frequently filtered columns (WHERE clause)
Frequently joined columns
Columns in ORDER BY or GROUP BY

❌ Avoid indexes for:

Small tables (< 1000 rows)
Columns with low cardinality (e.g., boolean)
Columns frequently updated
Wide columns (large text/blobs)

Index Types

1. B-Tree Index (Default)

Use for: Equality and range queries

-- PostgreSQL & MySQL
CREATE INDEX idx_users_email ON users(email);
-- Range queries
SELECT * FROM users WHERE created_at > '2025-01-01';

Performance:

Without index: 15,000ms (full table scan)
With index:    12ms (index scan)
Improvement:   1,250x faster

2. Composite Index

Use for: Multi-column queries

-- Order matters! Most selective column first
CREATE INDEX idx_orders_user_status
ON orders(user_id, status, created_at);
-- ✅ GOOD: Uses index fully
SELECT * FROM orders
WHERE user_id = 123 AND status = 'pending'
ORDER BY created_at DESC;
-- ⚠️ PARTIAL: Uses only user_id
SELECT * FROM orders
WHERE user_id = 123
ORDER BY created_at DESC;
-- ❌ BAD: Doesn't use index (missing user_id prefix)
SELECT * FROM orders
WHERE status = 'pending'
ORDER BY created_at DESC;

Rule of thumb: Index columns in order of:

Equality conditions (WHERE id = ?)
Range conditions (WHERE date > ?)
Sort columns (ORDER BY)

3. Partial Index (PostgreSQL)

Use for: Filtering a subset of rows

-- Index only active users
CREATE INDEX idx_active_users
ON users(email)
WHERE status = 'active';
-- Index only recent orders
CREATE INDEX idx_recent_orders
ON orders(user_id, created_at)
WHERE created_at > NOW() - INTERVAL '30 days';

Benefits:

Smaller index size
Faster writes (fewer index updates)
Faster reads (more targeted)

4. Covering Index

Use for: Avoiding table lookups

-- PostgreSQL
CREATE INDEX idx_users_covering
ON users(email) INCLUDE (name, created_at);
-- Query satisfied entirely by index
SELECT email, name, created_at
FROM users
WHERE email = 'user@example.com';
-- MySQL (include all columns in index)
CREATE INDEX idx_users_covering
ON users(email, name, created_at);

5. Full-Text Index

PostgreSQL (GIN):

-- Add tsvector column
ALTER TABLE articles
ADD COLUMN search_vector tsvector;
-- Update search vector
UPDATE articles
SET search_vector = to_tsvector('english', title || ' ' || body);
-- Create index
CREATE INDEX idx_articles_search
ON articles USING GIN(search_vector);
-- Query
SELECT * FROM articles
WHERE search_vector @@ to_tsquery('postgresql & optimization');

MySQL (FULLTEXT):

-- Create full-text index
CREATE FULLTEXT INDEX idx_articles_search
ON articles(title, body);
-- Query
SELECT * FROM articles
WHERE MATCH(title, body) AGAINST('postgresql optimization' IN BOOLEAN MODE);

EXPLAIN: Understanding Query Plans

PostgreSQL EXPLAIN

EXPLAIN (ANALYZE, BUFFERS, VERBOSE, FORMAT JSON)
SELECT u.name, COUNT(o.id) as order_count
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE u.created_at > '2025-01-01'
GROUP BY u.id, u.name
HAVING COUNT(o.id) > 5
ORDER BY order_count DESC
LIMIT 10;

Reading EXPLAIN output:

Limit  (cost=1234.56..1234.60 rows=10 width=32) (actual time=45.123..45.145 rows=10 loops=1)
  ->  Sort  (cost=1234.56..1235.89 rows=532 width=32) (actual time=45.120..45.125 rows=10 loops=1)
        Sort Key: (count(o.id)) DESC
        ->  HashAggregate  (cost=1200.00..1220.00 rows=532 width=32)
              ->  Hash Left Join  (cost=50.00..1180.00 rows=5320 width=24)
                    ->  Seq Scan on users u  (cost=0.00..25.00 rows=532 width=20)
                          Filter: (created_at > '2025-01-01'::date)
                    ->  Hash  (cost=30.00..30.00 rows=1000 width=8)
                          ->  Seq Scan on orders o  (cost=0.00..30.00 rows=1000 width=8)

Key metrics:

cost: Estimated units (not real time)
rows: Estimated rows
actual time: Real execution time (ms)
loops: Number of times node executed

Red flags:

Seq Scan on large tables (add index!)
High loops count (nested loops on large datasets)
Large difference between estimated and actual rows

MySQL EXPLAIN

EXPLAIN FORMAT=JSON
SELECT u.name, COUNT(o.id) as order_count
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE u.created_at > '2025-01-01'
GROUP BY u.id, u.name
HAVING COUNT(o.id) > 5
ORDER BY order_count DESC
LIMIT 10;

Output:

+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+----------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra                                              |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+----------------------------------------------------+
|  1 | SIMPLE      | u     | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 5320 |    33.33 | Using where; Using temporary; Using filesort       |
|  1 | SIMPLE      | o     | NULL       | ref  | user_id       | user_id | 5    | u.id |   10 |   100.00 | NULL                                               |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+----------------------------------------------------+

Key columns:

type: Join type (const > eq_ref > ref > range > index > ALL)
possible_keys: Indexes considered
key: Index actually used
rows: Estimated rows examined
Extra: Additional info (watch for "Using filesort", "Using temporary")

Red flags:

type = ALL (full table scan)
Extra = Using filesort (expensive sort)
Extra = Using temporary (temp table created)

Common Query Optimization Patterns

1. Avoiding N+1 Queries

❌ BAD: N+1 Query Problem:

# Get users
users = db.query("SELECT * FROM users LIMIT 100")
For each user, get orders (100 queries!)
for user in users:
orders = db.query(f"SELECT * FROM orders WHERE user_id = {user.id}")
user.orders = orders
Total: 101 queries

✅ GOOD: Single JOIN:

# One query with JOIN
result = db.query("""
    SELECT
        u.id, u.name, u.email,
        o.id as order_id, o.total, o.created_at
    FROM users u
    LEFT JOIN orders o ON u.id = o.user_id
    WHERE u.id IN (SELECT id FROM users LIMIT 100)
""")
Total: 1 query

✅ BETTER: Separate queries with IN clause:

# Get users
users = db.query("SELECT * FROM users LIMIT 100")
user_ids = [u.id for u in users]
Get all orders in one query
orders = db.query("""
SELECT * FROM orders
WHERE user_id IN (%s)
""", user_ids)
Group by user_id in application code
Total: 2 queries (predictable performance)

2. Pagination

❌ BAD: OFFSET pagination on large datasets:

-- Page 1: Fast (5ms)
SELECT * FROM orders
ORDER BY created_at DESC
LIMIT 20 OFFSET 0;
-- Page 1000: Slow (2,500ms) - scans 20,000 rows!
SELECT * FROM orders
ORDER BY created_at DESC
LIMIT 20 OFFSET 20000;

✅ GOOD: Cursor-based pagination:

-- Page 1
SELECT * FROM orders
WHERE id > 0
ORDER BY id DESC
LIMIT 20;
-- Returns: ids 100000-99981, cursor = 99981
-- Page 2 (using cursor)
SELECT * FROM orders
WHERE id < 99981
ORDER BY id DESC
LIMIT 20;
-- Fast regardless of page number!

3. Aggregations

❌ BAD: COUNT(*) on large tables:

-- Slow: Full table scan
SELECT COUNT(*) FROM orders;
-- Time: 15 seconds on 10M rows

✅ GOOD: Approximate count:

-- PostgreSQL: Fast estimate
SELECT reltuples::bigint AS estimate
FROM pg_class
WHERE relname = 'orders';
-- Time: 1ms
-- MySQL: Fast estimate
SELECT TABLE_ROWS
FROM information_schema.TABLES
WHERE TABLE_NAME = 'orders';
-- Time: 1ms

✅ BETTER: Materialized count:

-- Create counter table
CREATE TABLE order_stats (
    date DATE PRIMARY KEY,
    total_count BIGINT
);
-- Update via trigger or cron job
INSERT INTO order_stats (date, total_count)
VALUES (CURRENT_DATE, (SELECT COUNT(*) FROM orders))
ON CONFLICT (date) DO UPDATE
SET total_count = EXCLUDED.total_count;
-- Query: Instant
SELECT total_count FROM order_stats
WHERE date = CURRENT_DATE;

4. Subquery Optimization

❌ BAD: Correlated subquery:

-- Runs subquery for EACH row (slow!)
SELECT
    u.name,
    (SELECT COUNT(*) FROM orders WHERE user_id = u.id) as order_count
FROM users u;

✅ GOOD: JOIN or CTE:

-- Single pass with JOIN
SELECT
    u.name,
    COUNT(o.id) as order_count
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
GROUP BY u.id, u.name;
-- Or with CTE (more readable)
WITH order_counts AS (
SELECT user_id, COUNT(*) as count
FROM orders
GROUP BY user_id
)
SELECT
u.name,
COALESCE(oc.count, 0) as order_count
FROM users u
LEFT JOIN order_counts oc ON u.id = oc.user_id;

Table Partitioning

Range Partitioning (PostgreSQL)

-- Create partitioned table
CREATE TABLE orders (
    id BIGSERIAL,
    user_id BIGINT,
    total DECIMAL(10,2),
    created_at TIMESTAMP NOT NULL,
    PRIMARY KEY (id, created_at)
) PARTITION BY RANGE (created_at);
-- Create partitions (one per month)
CREATE TABLE orders_2025_01 PARTITION OF orders
FOR VALUES FROM ('2025-01-01') TO ('2025-02-01');
CREATE TABLE orders_2025_02 PARTITION OF orders
FOR VALUES FROM ('2025-02-01') TO ('2025-03-01');
CREATE TABLE orders_2025_03 PARTITION OF orders
FOR VALUES FROM ('2025-03-01') TO ('2025-04-01');
-- Queries automatically use correct partition
SELECT * FROM orders
WHERE created_at BETWEEN '2025-02-01' AND '2025-02-28';
-- Only scans orders_2025_02 partition!

Benefits:

Faster queries (scan fewer rows)
Easier maintenance (drop old partitions)
Better index efficiency

Range Partitioning (MySQL 8.0+)

CREATE TABLE orders (
    id BIGINT AUTO_INCREMENT,
    user_id BIGINT,
    total DECIMAL(10,2),
    created_at TIMESTAMP NOT NULL,
    PRIMARY KEY (id, created_at)
)
PARTITION BY RANGE (YEAR(created_at) * 100 + MONTH(created_at)) (
    PARTITION p_2025_01 VALUES LESS THAN (202502),
    PARTITION p_2025_02 VALUES LESS THAN (202503),
    PARTITION p_2025_03 VALUES LESS THAN (202504),
    PARTITION p_future VALUES LESS THAN MAXVALUE
);

Connection Pooling

Why Connection Pooling Matters

Without pooling:

Each request:
1. Open TCP connection (50ms)
2. Authenticate (30ms)
3. Execute query (10ms)
4. Close connection (20ms)
Total per request: 110ms

With pooling:

Each request:
1. Get connection from pool (1ms)
2. Execute query (10ms)
3. Return to pool (1ms)
Total per request: 12ms

PostgreSQL: PgBouncer

# /etc/pgbouncer/pgbouncer.ini
[databases]
mydb = host=localhost port=5432 dbname=mydb
[pgbouncer]
listen_addr = *
listen_port = 6432
auth_type = md5
auth_file = /etc/pgbouncer/userlist.txt
Pool settings
pool_mode = transaction
max_client_conn = 1000
default_pool_size = 25
min_pool_size = 10
reserve_pool_size = 5
reserve_pool_timeout = 3
Timeouts
server_idle_timeout = 600
server_lifetime = 3600

Application connection:

# Connect to PgBouncer (not directly to PostgreSQL)
DATABASE_URL = "postgresql://user:pass@localhost:6432/mydb"

MySQL: ProxySQL

-- Install ProxySQL
-- Configure backend servers
INSERT INTO mysql_servers(hostgroup_id, hostname, port)
VALUES (1, '127.0.0.1', 3306);
-- Configure users
INSERT INTO mysql_users(username, password, default_hostgroup)
VALUES ('app_user', 'password', 1);
-- Configure connection pool
UPDATE global_variables SET variable_value='25'
WHERE variable_name='mysql-max_connections';
LOAD MYSQL SERVERS TO RUNTIME;
LOAD MYSQL USERS TO RUNTIME;
SAVE MYSQL SERVERS TO DISK;
SAVE MYSQL USERS TO DISK;

Vacuum and Maintenance

PostgreSQL VACUUM

-- Manual vacuum
VACUUM ANALYZE orders;
-- Aggressive vacuum (reclaims space)
VACUUM FULL orders;
-- Auto-vacuum settings
ALTER TABLE orders SET (
autovacuum_vacuum_scale_factor = 0.05,
autovacuum_analyze_scale_factor = 0.02,
autovacuum_vacuum_cost_delay = 10
);
-- Monitor vacuum activity
SELECT
schemaname,
relname,
last_vacuum,
last_autovacuum,
n_dead_tup
FROM pg_stat_user_tables
WHERE n_dead_tup > 10000
ORDER BY n_dead_tup DESC;

MySQL OPTIMIZE TABLE

-- Reclaim space and rebuild indexes OPTIMIZE TABLE orders; -- Check table for errors CHECK TABLE orders; -- Analyze table statistics ANALYZE TABLE orders;

-- Show table status SHOW TABLE STATUS LIKE 'orders';

Monitoring Query Performance

PostgreSQL: pg_stat_statements

-- Enable extension
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
-- Top 10 slowest queries
SELECT
calls,
mean_exec_time,
max_exec_time,
total_exec_time,
query
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 10;
-- Queries with most time spent
SELECT
calls,
total_exec_time,
mean_exec_time,
query
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;
-- Reset statistics
SELECT pg_stat_statements_reset();

MySQL: Performance Schema

-- Enable performance schema (my.cnf) -- performance_schema = ON -- Top 10 slowest statements SELECT DIGEST_TEXT as query, COUNT_STAR as exec_count, AVG_TIMER_WAIT/1000000000 as avg_ms, MAX_TIMER_WAIT/1000000000 as max_ms FROM performance_schema.events_statements_summary_by_digest ORDER BY AVG_TIMER_WAIT DESC LIMIT 10;

-- Queries with table scans SELECT DIGEST_TEXT, SUM_NO_INDEX_USED, SUM_NO_GOOD_INDEX_USED FROM performance_schema.events_statements_summary_by_digest WHERE SUM_NO_INDEX_USED > 0 ORDER BY SUM_NO_INDEX_USED DESC;

Production Optimization Checklist

Schema Design

Proper data types (INT vs BIGINT, VARCHAR vs TEXT)
Normalized tables (avoid duplication)
Denormalized aggregates (for frequent queries)
Appropriate constraints (NOT NULL, CHECK, UNIQUE)

Indexing

Primary keys on all tables
Foreign keys indexed
Composite indexes for common queries
Partial indexes where applicable
Remove unused indexes
Monitor index bloat

Queries

No SELECT *
LIMIT clauses on large result sets
Avoid N+1 queries
Use prepared statements
Cursor-based pagination for large datasets
Batch inserts/updates

Configuration

Appropriate shared_buffers (PostgreSQL: 25% of RAM)
Proper work_mem (PostgreSQL: RAM / max_connections / 2)
Adequate max_connections
Connection pooling enabled
Query timeout configured
Slow query logging enabled

Maintenance

Regular VACUUM (PostgreSQL)
Regular ANALYZE
Index rebuilding schedule
Partition old data
Archive strategy for historical data
Backup and recovery tested

Monitoring

Query performance tracking
Connection pool metrics
Disk I/O monitoring
Replication lag (if applicable)
Table bloat monitoring
Alerting on slow queries

Troubleshooting Common Issues

Issue 1: Slow Queries Suddenly

Diagnosis:

-- PostgreSQL: Check for missing statistics SELECT schemaname, tablename, last_analyze FROM pg_stat_user_tables WHERE last_analyze < NOW() - INTERVAL '7 days';

-- MySQL: Check table statistics SHOW TABLE STATUS WHERE Name = 'orders';

Solution:

-- PostgreSQL
ANALYZE orders;
-- MySQL
ANALYZE TABLE orders;

Issue 2: High CPU Usage

Diagnosis:

-- PostgreSQL: Find CPU-intensive queries SELECT pid, now() - query_start as duration, query FROM pg_stat_activity WHERE state = 'active' AND query NOT LIKE '%pg_stat_activity%' ORDER BY duration DESC;

-- MySQL: Find running queries SHOW PROCESSLIST;

Solution: Optimize queries, add indexes, or kill long-running queries

Issue 3: Connection Exhaustion

Diagnosis:

-- PostgreSQL
SELECT count(*) FROM pg_stat_activity;
SHOW max_connections;
-- MySQL
SHOW STATUS LIKE 'Threads_connected';
SHOW VARIABLES LIKE 'max_connections';

Solution: Implement connection pooling, increase max_connections, or find connection leaks

Benchmarking Results

Index Impact (10M row table)

Query Type	No Index	B-Tree Index	Improvement
Equality (`WHERE id = ?`)	8,500ms	3ms	2,833x
Range (`WHERE date > ?`)	12,300ms	45ms	273x
Join	45,000ms	120ms	375x
ORDER BY	18,000ms	250ms	72x

Connection Pooling Impact

Metric	Direct	With Pooler	Improvement
Connections/sec	150	2,500	16.6x
Latency (p95)	450ms	28ms	16x
Resource usage	High	Low	70% reduction

Best Practices Summary

Index strategically - Cover common queries, avoid over-indexing
Analyze query plans - Use EXPLAIN regularly
**Avoid SELECT *** - Only fetch needed columns
Use connection pooling - Essential for production
Partition large tables - By date or other logical boundaries
Monitor continuously - Track slow queries and resource usage
Regular maintenance - VACUUM, ANALYZE, OPTIMIZE
Test at scale - Performance degrades non-linearly
Cache aggressively - Redis/Memcached for hot data
Profile first - Measure before optimizing

Conclusion

Database optimization is an iterative process. Start with the low-hanging fruit—add missing indexes, fix N+1 queries, enable connection pooling—then progressively tackle more complex optimizations like partitioning and query rewriting.

The techniques in this guide can reduce query times from seconds to milliseconds, increase throughput by 10-100x, and dramatically reduce infrastructure costs. However, optimization is never "done"—monitor continuously, benchmark regularly, and stay vigilant as data grows.

Key Takeaways:

Indexes are critical but can't solve everything
EXPLAIN is your friend - use it religiously
N+1 queries are the most common performance killer
Connection pooling is mandatory for production
Partitioning extends table lifespan at scale
Monitoring catches issues before users do
Regular maintenance prevents degradation
Test with production-scale data early

Master these fundamentals, and your database will scale gracefully from thousands to millions of users.

Additional Resources

PostgreSQL Performance: https://www.postgresql.org/docs/current/performance-tips.html
MySQL Optimization: https://dev.mysql.com/doc/refman/8.0/en/optimization.html
Use The Index, Luke: https://use-the-index-luke.com/
PgBouncer: https://www.pgbouncer.org/
ProxySQL: https://proxysql.com/
pg_stat_statements: https://www.postgresql.org/docs/current/pgstatstatements.html

Introduction

Understanding Query Performance

The Query Lifecycle

Key Performance Metrics

Indexing Strategies

When to Create Indexes

Index Types

1. B-Tree Index (Default)

2. Composite Index

3. Partial Index (PostgreSQL)

4. Covering Index

5. Full-Text Index

EXPLAIN: Understanding Query Plans

PostgreSQL EXPLAIN

MySQL EXPLAIN

Common Query Optimization Patterns

1. Avoiding N+1 Queries

For each user, get orders (100 queries!)

Total: 101 queries

Total: 1 query

Get all orders in one query

Group by user_id in application code

Total: 2 queries (predictable performance)

2. Pagination

3. Aggregations

4. Subquery Optimization

Table Partitioning

Range Partitioning (PostgreSQL)

Range Partitioning (MySQL 8.0+)

Connection Pooling

Why Connection Pooling Matters

PostgreSQL: PgBouncer

Pool settings

Timeouts

MySQL: ProxySQL

Vacuum and Maintenance

PostgreSQL VACUUM

MySQL OPTIMIZE TABLE

Monitoring Query Performance

PostgreSQL: pg_stat_statements

MySQL: Performance Schema

Production Optimization Checklist

Schema Design

Indexing

Queries

Configuration

Maintenance

Monitoring

Troubleshooting Common Issues

Issue 1: Slow Queries Suddenly

Issue 2: High CPU Usage

Issue 3: Connection Exhaustion

Benchmarking Results

Index Impact (10M row table)

Connection Pooling Impact

Best Practices Summary

Conclusion

Additional Resources

Related Articles

GraphQL API Design - Production Architecture and Best Practices for Scalable Systems

Testing Strategies - Unit, Integration, and E2E Testing Best Practices for Production Quality

Monitoring and Observability - Production Systems Performance and Debugging at Scale

Written by StaticBlock Editorial