0% read
Skip to main content
Database Query Optimization: PostgreSQL and MySQL Performance Tuning Guide

Database Query Optimization: PostgreSQL and MySQL Performance Tuning Guide

Master database query optimization with this comprehensive guide to PostgreSQL and MySQL performance tuning. Learn indexing strategies, query plan analysis with EXPLAIN, connection pooling, table partitioning, vacuum and maintenance operations, and production-ready optimization techniques. Includes real-world examples, benchmarks, and monitoring strategies for high-traffic applications.

S
StaticBlock Editorial
19 min read

Introduction

Database queries are often the primary bottleneck in web applications. A poorly optimized query can take seconds instead of milliseconds, causing timeout errors, degraded user experience, and wasted infrastructure costs. Even with modern hardware, inefficient queries can bring an application to its knees.

This comprehensive guide covers practical query optimization techniques for PostgreSQL and MySQL—the two most popular relational databases in 2025. You'll learn indexing strategies, query plan analysis, connection pooling, partitioning, and production-ready optimization patterns backed by real-world examples and benchmarks.

Understanding Query Performance

The Query Lifecycle

1. Parse SQL → 2. Generate Plan → 3. Execute → 4. Return Results
   (syntax)      (optimizer)      (execution    (network
                                   engine)       transfer)

Bottlenecks typically occur in:

  • Query planning: Complex joins, subqueries
  • Execution: Missing indexes, full table scans
  • Data transfer: Large result sets

Key Performance Metrics

1. Query Execution Time

-- PostgreSQL
EXPLAIN ANALYZE
SELECT * FROM orders WHERE user_id = 12345;

-- MySQL EXPLAIN ANALYZE SELECT * FROM orders WHERE user_id = 12345;

2. Rows Examined vs. Rows Returned

Ideal ratio: 1:1 (examined == returned)
Bad ratio: 1000:1 (examined 1000 rows, returned 1)

3. Index Usage

-- PostgreSQL: Check if indexes are used
SELECT schemaname, tablename, indexname, idx_scan
FROM pg_stat_user_indexes
WHERE idx_scan = 0
ORDER BY schemaname, tablename;

-- MySQL: Check index cardinality SHOW INDEX FROM orders;

Indexing Strategies

When to Create Indexes

Create indexes for:

  • Primary keys (automatic)
  • Foreign keys
  • Frequently filtered columns (WHERE clause)
  • Frequently joined columns
  • Columns in ORDER BY or GROUP BY

Avoid indexes for:

  • Small tables (< 1000 rows)
  • Columns with low cardinality (e.g., boolean)
  • Columns frequently updated
  • Wide columns (large text/blobs)

Index Types

1. B-Tree Index (Default)

Use for: Equality and range queries

-- PostgreSQL & MySQL
CREATE INDEX idx_users_email ON users(email);

-- Range queries SELECT * FROM users WHERE created_at > '2025-01-01';

Performance:

Without index: 15,000ms (full table scan)
With index:    12ms (index scan)
Improvement:   1,250x faster

2. Composite Index

Use for: Multi-column queries

-- Order matters! Most selective column first
CREATE INDEX idx_orders_user_status
ON orders(user_id, status, created_at);

-- ✅ GOOD: Uses index fully SELECT * FROM orders WHERE user_id = 123 AND status = 'pending' ORDER BY created_at DESC;

-- ⚠️ PARTIAL: Uses only user_id SELECT * FROM orders WHERE user_id = 123 ORDER BY created_at DESC;

-- ❌ BAD: Doesn't use index (missing user_id prefix) SELECT * FROM orders WHERE status = 'pending' ORDER BY created_at DESC;

Rule of thumb: Index columns in order of:

  1. Equality conditions (WHERE id = ?)
  2. Range conditions (WHERE date > ?)
  3. Sort columns (ORDER BY)

3. Partial Index (PostgreSQL)

Use for: Filtering a subset of rows

-- Index only active users
CREATE INDEX idx_active_users
ON users(email)
WHERE status = 'active';

-- Index only recent orders CREATE INDEX idx_recent_orders ON orders(user_id, created_at) WHERE created_at > NOW() - INTERVAL '30 days';

Benefits:

  • Smaller index size
  • Faster writes (fewer index updates)
  • Faster reads (more targeted)

4. Covering Index

Use for: Avoiding table lookups

-- PostgreSQL
CREATE INDEX idx_users_covering
ON users(email) INCLUDE (name, created_at);

-- Query satisfied entirely by index SELECT email, name, created_at FROM users WHERE email = 'user@example.com';

-- MySQL (include all columns in index) CREATE INDEX idx_users_covering ON users(email, name, created_at);

5. Full-Text Index

PostgreSQL (GIN):

-- Add tsvector column
ALTER TABLE articles
ADD COLUMN search_vector tsvector;

-- Update search vector UPDATE articles SET search_vector = to_tsvector('english', title || ' ' || body);

-- Create index CREATE INDEX idx_articles_search ON articles USING GIN(search_vector);

-- Query SELECT * FROM articles WHERE search_vector @@ to_tsquery('postgresql & optimization');

MySQL (FULLTEXT):

-- Create full-text index
CREATE FULLTEXT INDEX idx_articles_search
ON articles(title, body);

-- Query SELECT * FROM articles WHERE MATCH(title, body) AGAINST('postgresql optimization' IN BOOLEAN MODE);

EXPLAIN: Understanding Query Plans

PostgreSQL EXPLAIN

EXPLAIN (ANALYZE, BUFFERS, VERBOSE, FORMAT JSON)
SELECT u.name, COUNT(o.id) as order_count
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE u.created_at > '2025-01-01'
GROUP BY u.id, u.name
HAVING COUNT(o.id) > 5
ORDER BY order_count DESC
LIMIT 10;

Reading EXPLAIN output:

Limit  (cost=1234.56..1234.60 rows=10 width=32) (actual time=45.123..45.145 rows=10 loops=1)
  ->  Sort  (cost=1234.56..1235.89 rows=532 width=32) (actual time=45.120..45.125 rows=10 loops=1)
        Sort Key: (count(o.id)) DESC
        ->  HashAggregate  (cost=1200.00..1220.00 rows=532 width=32)
              ->  Hash Left Join  (cost=50.00..1180.00 rows=5320 width=24)
                    ->  Seq Scan on users u  (cost=0.00..25.00 rows=532 width=20)
                          Filter: (created_at > '2025-01-01'::date)
                    ->  Hash  (cost=30.00..30.00 rows=1000 width=8)
                          ->  Seq Scan on orders o  (cost=0.00..30.00 rows=1000 width=8)

Key metrics:

  • cost: Estimated units (not real time)
  • rows: Estimated rows
  • actual time: Real execution time (ms)
  • loops: Number of times node executed

Red flags:

  • Seq Scan on large tables (add index!)
  • High loops count (nested loops on large datasets)
  • Large difference between estimated and actual rows

MySQL EXPLAIN

EXPLAIN FORMAT=JSON
SELECT u.name, COUNT(o.id) as order_count
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE u.created_at > '2025-01-01'
GROUP BY u.id, u.name
HAVING COUNT(o.id) > 5
ORDER BY order_count DESC
LIMIT 10;

Output:

+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+----------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra                                              |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+----------------------------------------------------+
|  1 | SIMPLE      | u     | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 5320 |    33.33 | Using where; Using temporary; Using filesort       |
|  1 | SIMPLE      | o     | NULL       | ref  | user_id       | user_id | 5    | u.id |   10 |   100.00 | NULL                                               |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+----------------------------------------------------+

Key columns:

  • type: Join type (const > eq_ref > ref > range > index > ALL)
  • possible_keys: Indexes considered
  • key: Index actually used
  • rows: Estimated rows examined
  • Extra: Additional info (watch for "Using filesort", "Using temporary")

Red flags:

  • type = ALL (full table scan)
  • Extra = Using filesort (expensive sort)
  • Extra = Using temporary (temp table created)

Common Query Optimization Patterns

1. Avoiding N+1 Queries

❌ BAD: N+1 Query Problem:

# Get users
users = db.query("SELECT * FROM users LIMIT 100")

For each user, get orders (100 queries!)

for user in users: orders = db.query(f"SELECT * FROM orders WHERE user_id = {user.id}") user.orders = orders

Total: 101 queries

✅ GOOD: Single JOIN:

# One query with JOIN
result = db.query("""
    SELECT
        u.id, u.name, u.email,
        o.id as order_id, o.total, o.created_at
    FROM users u
    LEFT JOIN orders o ON u.id = o.user_id
    WHERE u.id IN (SELECT id FROM users LIMIT 100)
""")

Total: 1 query

✅ BETTER: Separate queries with IN clause:

# Get users
users = db.query("SELECT * FROM users LIMIT 100")
user_ids = [u.id for u in users]

Get all orders in one query

orders = db.query(""" SELECT * FROM orders WHERE user_id IN (%s) """, user_ids)

Group by user_id in application code

Total: 2 queries (predictable performance)

2. Pagination

❌ BAD: OFFSET pagination on large datasets:

-- Page 1: Fast (5ms)
SELECT * FROM orders
ORDER BY created_at DESC
LIMIT 20 OFFSET 0;

-- Page 1000: Slow (2,500ms) - scans 20,000 rows! SELECT * FROM orders ORDER BY created_at DESC LIMIT 20 OFFSET 20000;

✅ GOOD: Cursor-based pagination:

-- Page 1
SELECT * FROM orders
WHERE id > 0
ORDER BY id DESC
LIMIT 20;
-- Returns: ids 100000-99981, cursor = 99981

-- Page 2 (using cursor) SELECT * FROM orders WHERE id < 99981 ORDER BY id DESC LIMIT 20; -- Fast regardless of page number!

3. Aggregations

❌ BAD: COUNT(*) on large tables:

-- Slow: Full table scan
SELECT COUNT(*) FROM orders;
-- Time: 15 seconds on 10M rows

✅ GOOD: Approximate count:

-- PostgreSQL: Fast estimate
SELECT reltuples::bigint AS estimate
FROM pg_class
WHERE relname = 'orders';
-- Time: 1ms

-- MySQL: Fast estimate SELECT TABLE_ROWS FROM information_schema.TABLES WHERE TABLE_NAME = 'orders'; -- Time: 1ms

✅ BETTER: Materialized count:

-- Create counter table
CREATE TABLE order_stats (
    date DATE PRIMARY KEY,
    total_count BIGINT
);

-- Update via trigger or cron job INSERT INTO order_stats (date, total_count) VALUES (CURRENT_DATE, (SELECT COUNT(*) FROM orders)) ON CONFLICT (date) DO UPDATE SET total_count = EXCLUDED.total_count;

-- Query: Instant SELECT total_count FROM order_stats WHERE date = CURRENT_DATE;

4. Subquery Optimization

❌ BAD: Correlated subquery:

-- Runs subquery for EACH row (slow!)
SELECT
    u.name,
    (SELECT COUNT(*) FROM orders WHERE user_id = u.id) as order_count
FROM users u;

✅ GOOD: JOIN or CTE:

-- Single pass with JOIN
SELECT
    u.name,
    COUNT(o.id) as order_count
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
GROUP BY u.id, u.name;

-- Or with CTE (more readable) WITH order_counts AS ( SELECT user_id, COUNT(*) as count FROM orders GROUP BY user_id ) SELECT u.name, COALESCE(oc.count, 0) as order_count FROM users u LEFT JOIN order_counts oc ON u.id = oc.user_id;

Table Partitioning

Range Partitioning (PostgreSQL)

-- Create partitioned table
CREATE TABLE orders (
    id BIGSERIAL,
    user_id BIGINT,
    total DECIMAL(10,2),
    created_at TIMESTAMP NOT NULL,
    PRIMARY KEY (id, created_at)
) PARTITION BY RANGE (created_at);

-- Create partitions (one per month) CREATE TABLE orders_2025_01 PARTITION OF orders FOR VALUES FROM ('2025-01-01') TO ('2025-02-01');

CREATE TABLE orders_2025_02 PARTITION OF orders FOR VALUES FROM ('2025-02-01') TO ('2025-03-01');

CREATE TABLE orders_2025_03 PARTITION OF orders FOR VALUES FROM ('2025-03-01') TO ('2025-04-01');

-- Queries automatically use correct partition SELECT * FROM orders WHERE created_at BETWEEN '2025-02-01' AND '2025-02-28'; -- Only scans orders_2025_02 partition!

Benefits:

  • Faster queries (scan fewer rows)
  • Easier maintenance (drop old partitions)
  • Better index efficiency

Range Partitioning (MySQL 8.0+)

CREATE TABLE orders (
    id BIGINT AUTO_INCREMENT,
    user_id BIGINT,
    total DECIMAL(10,2),
    created_at TIMESTAMP NOT NULL,
    PRIMARY KEY (id, created_at)
)
PARTITION BY RANGE (YEAR(created_at) * 100 + MONTH(created_at)) (
    PARTITION p_2025_01 VALUES LESS THAN (202502),
    PARTITION p_2025_02 VALUES LESS THAN (202503),
    PARTITION p_2025_03 VALUES LESS THAN (202504),
    PARTITION p_future VALUES LESS THAN MAXVALUE
);

Connection Pooling

Why Connection Pooling Matters

Without pooling:

Each request:
1. Open TCP connection (50ms)
2. Authenticate (30ms)
3. Execute query (10ms)
4. Close connection (20ms)

Total per request: 110ms

With pooling:

Each request:
1. Get connection from pool (1ms)
2. Execute query (10ms)
3. Return to pool (1ms)

Total per request: 12ms

PostgreSQL: PgBouncer

# /etc/pgbouncer/pgbouncer.ini
[databases]
mydb = host=localhost port=5432 dbname=mydb

[pgbouncer] listen_addr = * listen_port = 6432 auth_type = md5 auth_file = /etc/pgbouncer/userlist.txt

Pool settings

pool_mode = transaction max_client_conn = 1000 default_pool_size = 25 min_pool_size = 10 reserve_pool_size = 5 reserve_pool_timeout = 3

Timeouts

server_idle_timeout = 600 server_lifetime = 3600

Application connection:

# Connect to PgBouncer (not directly to PostgreSQL)
DATABASE_URL = "postgresql://user:pass@localhost:6432/mydb"

MySQL: ProxySQL

-- Install ProxySQL
-- Configure backend servers
INSERT INTO mysql_servers(hostgroup_id, hostname, port)
VALUES (1, '127.0.0.1', 3306);

-- Configure users INSERT INTO mysql_users(username, password, default_hostgroup) VALUES ('app_user', 'password', 1);

-- Configure connection pool UPDATE global_variables SET variable_value='25' WHERE variable_name='mysql-max_connections';

LOAD MYSQL SERVERS TO RUNTIME; LOAD MYSQL USERS TO RUNTIME; SAVE MYSQL SERVERS TO DISK; SAVE MYSQL USERS TO DISK;

Vacuum and Maintenance

PostgreSQL VACUUM

-- Manual vacuum
VACUUM ANALYZE orders;

-- Aggressive vacuum (reclaims space) VACUUM FULL orders;

-- Auto-vacuum settings ALTER TABLE orders SET ( autovacuum_vacuum_scale_factor = 0.05, autovacuum_analyze_scale_factor = 0.02, autovacuum_vacuum_cost_delay = 10 );

-- Monitor vacuum activity SELECT schemaname, relname, last_vacuum, last_autovacuum, n_dead_tup FROM pg_stat_user_tables WHERE n_dead_tup > 10000 ORDER BY n_dead_tup DESC;

MySQL OPTIMIZE TABLE

-- Reclaim space and rebuild indexes
OPTIMIZE TABLE orders;

-- Check table for errors CHECK TABLE orders;

-- Analyze table statistics ANALYZE TABLE orders;

-- Show table status SHOW TABLE STATUS LIKE 'orders';

Monitoring Query Performance

PostgreSQL: pg_stat_statements

-- Enable extension
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

-- Top 10 slowest queries SELECT calls, mean_exec_time, max_exec_time, total_exec_time, query FROM pg_stat_statements ORDER BY mean_exec_time DESC LIMIT 10;

-- Queries with most time spent SELECT calls, total_exec_time, mean_exec_time, query FROM pg_stat_statements ORDER BY total_exec_time DESC LIMIT 10;

-- Reset statistics SELECT pg_stat_statements_reset();

MySQL: Performance Schema

-- Enable performance schema (my.cnf)
-- performance_schema = ON

-- Top 10 slowest statements SELECT DIGEST_TEXT as query, COUNT_STAR as exec_count, AVG_TIMER_WAIT/1000000000 as avg_ms, MAX_TIMER_WAIT/1000000000 as max_ms FROM performance_schema.events_statements_summary_by_digest ORDER BY AVG_TIMER_WAIT DESC LIMIT 10;

-- Queries with table scans SELECT DIGEST_TEXT, SUM_NO_INDEX_USED, SUM_NO_GOOD_INDEX_USED FROM performance_schema.events_statements_summary_by_digest WHERE SUM_NO_INDEX_USED > 0 ORDER BY SUM_NO_INDEX_USED DESC;

Production Optimization Checklist

Schema Design

  • Proper data types (INT vs BIGINT, VARCHAR vs TEXT)
  • Normalized tables (avoid duplication)
  • Denormalized aggregates (for frequent queries)
  • Appropriate constraints (NOT NULL, CHECK, UNIQUE)

Indexing

  • Primary keys on all tables
  • Foreign keys indexed
  • Composite indexes for common queries
  • Partial indexes where applicable
  • Remove unused indexes
  • Monitor index bloat

Queries

  • No SELECT *
  • LIMIT clauses on large result sets
  • Avoid N+1 queries
  • Use prepared statements
  • Cursor-based pagination for large datasets
  • Batch inserts/updates

Configuration

  • Appropriate shared_buffers (PostgreSQL: 25% of RAM)
  • Proper work_mem (PostgreSQL: RAM / max_connections / 2)
  • Adequate max_connections
  • Connection pooling enabled
  • Query timeout configured
  • Slow query logging enabled

Maintenance

  • Regular VACUUM (PostgreSQL)
  • Regular ANALYZE
  • Index rebuilding schedule
  • Partition old data
  • Archive strategy for historical data
  • Backup and recovery tested

Monitoring

  • Query performance tracking
  • Connection pool metrics
  • Disk I/O monitoring
  • Replication lag (if applicable)
  • Table bloat monitoring
  • Alerting on slow queries

Troubleshooting Common Issues

Issue 1: Slow Queries Suddenly

Diagnosis:

-- PostgreSQL: Check for missing statistics
SELECT
    schemaname,
    tablename,
    last_analyze
FROM pg_stat_user_tables
WHERE last_analyze < NOW() - INTERVAL '7 days';

-- MySQL: Check table statistics SHOW TABLE STATUS WHERE Name = 'orders';

Solution:

-- PostgreSQL
ANALYZE orders;

-- MySQL ANALYZE TABLE orders;

Issue 2: High CPU Usage

Diagnosis:

-- PostgreSQL: Find CPU-intensive queries
SELECT
    pid,
    now() - query_start as duration,
    query
FROM pg_stat_activity
WHERE state = 'active'
AND query NOT LIKE '%pg_stat_activity%'
ORDER BY duration DESC;

-- MySQL: Find running queries SHOW PROCESSLIST;

Solution: Optimize queries, add indexes, or kill long-running queries

Issue 3: Connection Exhaustion

Diagnosis:

-- PostgreSQL
SELECT count(*) FROM pg_stat_activity;
SHOW max_connections;

-- MySQL SHOW STATUS LIKE 'Threads_connected'; SHOW VARIABLES LIKE 'max_connections';

Solution: Implement connection pooling, increase max_connections, or find connection leaks

Benchmarking Results

Index Impact (10M row table)

Query Type No Index B-Tree Index Improvement
Equality (WHERE id = ?) 8,500ms 3ms 2,833x
Range (WHERE date > ?) 12,300ms 45ms 273x
Join 45,000ms 120ms 375x
ORDER BY 18,000ms 250ms 72x

Connection Pooling Impact

Metric Direct With Pooler Improvement
Connections/sec 150 2,500 16.6x
Latency (p95) 450ms 28ms 16x
Resource usage High Low 70% reduction

Best Practices Summary

  1. Index strategically - Cover common queries, avoid over-indexing
  2. Analyze query plans - Use EXPLAIN regularly
  3. **Avoid SELECT *** - Only fetch needed columns
  4. Use connection pooling - Essential for production
  5. Partition large tables - By date or other logical boundaries
  6. Monitor continuously - Track slow queries and resource usage
  7. Regular maintenance - VACUUM, ANALYZE, OPTIMIZE
  8. Test at scale - Performance degrades non-linearly
  9. Cache aggressively - Redis/Memcached for hot data
  10. Profile first - Measure before optimizing

Conclusion

Database optimization is an iterative process. Start with the low-hanging fruit—add missing indexes, fix N+1 queries, enable connection pooling—then progressively tackle more complex optimizations like partitioning and query rewriting.

The techniques in this guide can reduce query times from seconds to milliseconds, increase throughput by 10-100x, and dramatically reduce infrastructure costs. However, optimization is never "done"—monitor continuously, benchmark regularly, and stay vigilant as data grows.

Key Takeaways:

  1. Indexes are critical but can't solve everything
  2. EXPLAIN is your friend - use it religiously
  3. N+1 queries are the most common performance killer
  4. Connection pooling is mandatory for production
  5. Partitioning extends table lifespan at scale
  6. Monitoring catches issues before users do
  7. Regular maintenance prevents degradation
  8. Test with production-scale data early

Master these fundamentals, and your database will scale gracefully from thousands to millions of users.

Additional Resources

  • PostgreSQL Performance: https://www.postgresql.org/docs/current/performance-tips.html
  • MySQL Optimization: https://dev.mysql.com/doc/refman/8.0/en/optimization.html
  • Use The Index, Luke: https://use-the-index-luke.com/
  • PgBouncer: https://www.pgbouncer.org/
  • ProxySQL: https://proxysql.com/
  • pg_stat_statements: https://www.postgresql.org/docs/current/pgstatstatements.html

Found this helpful? Share it!

Related Articles

S

Written by StaticBlock Editorial

StaticBlock Editorial is a technical writer and software engineer specializing in web development, performance optimization, and developer tooling.