Database Indexing Strategies for High-Performance PostgreSQL

Why Database Indexing Matters

In production applications, database performance can make or break user experience. A query that takes 500ms instead of 50ms can cascade into timeout errors, frustrated users, and lost revenue. The difference often comes down to proper indexing.

Yet indexing remains one of the most misunderstood aspects of database optimization. Adding too many indexes slows down writes. Too few, and reads suffer. The key is understanding the trade-offs and choosing the right index type for your use case.

Understanding PostgreSQL Index Types

PostgreSQL offers several index types, each optimized for different query patterns:

B-tree Indexes (Default)

B-tree indexes are the workhorse of PostgreSQL. They handle equality and range queries efficiently and work with most data types.

-- Perfect for equality and range queries
CREATE INDEX idx_users_email ON users(email);
CREATE INDEX idx_orders_created_at ON orders(created_at);
-- Handles these queries efficiently:
SELECT * FROM users WHERE email = 'user@example.com';
SELECT * FROM orders WHERE created_at >= '2025-01-01';
SELECT * FROM orders WHERE created_at BETWEEN '2025-01-01' AND '2025-01-31';

When to use B-tree:

Equality comparisons (=, IN)
Range queries (<, >, BETWEEN)
Pattern matching with left-anchored LIKE (LIKE 'prefix%')
Sorting operations (ORDER BY)

Hash Indexes

Hash indexes are optimized for simple equality comparisons. They're faster than B-tree for = operations but can't handle range queries.

-- Good for exact lookups on high-cardinality columns
CREATE INDEX idx_users_api_key_hash ON users USING HASH (api_key);
-- Efficient query:
SELECT * FROM users WHERE api_key = '8f3b2a1c...';
-- Won't use the hash index:
SELECT * FROM users WHERE api_key LIKE '8f3b%';

When to use Hash:

High-cardinality columns (many unique values)
Only equality queries needed
Space efficiency is important

GIN Indexes (Generalized Inverted Index)

GIN indexes excel at indexing composite values like arrays, JSONB, and full-text search.

-- Perfect for JSONB and array queries
CREATE INDEX idx_products_tags ON products USING GIN (tags);
CREATE INDEX idx_users_metadata ON users USING GIN (metadata);
-- Handles these efficiently:
SELECT * FROM products WHERE tags @> ARRAY['electronics', 'sale'];
SELECT * FROM users WHERE metadata @> '{"plan": "pro"}';
SELECT * FROM articles WHERE to_tsvector('english', content) @@ to_tsquery('database & indexing');

When to use GIN:

JSONB columns with complex queries
Array contains operations
Full-text search
Multi-value columns

GiST Indexes (Generalized Search Tree)

GiST indexes support geometric data types, ranges, and nearest-neighbor searches.

-- Spatial queries and range types
CREATE INDEX idx_locations_coords ON locations USING GIST (coordinates);
CREATE INDEX idx_events_daterange ON events USING GIST (date_range);
-- Efficient queries:
SELECT * FROM locations WHERE coordinates <-> point(40.7128, -74.0060) < 10;
SELECT * FROM events WHERE date_range @> '2025-01-25'::date;

When to use GiST:

Geometric data (PostGIS)
Range types and overlaps
Nearest-neighbor searches
Custom data types

Real-World Indexing Strategy

Let's optimize a real e-commerce database:

-- Orders table optimization
CREATE TABLE orders (
    id BIGSERIAL PRIMARY KEY,
    user_id BIGINT NOT NULL,
    status VARCHAR(50) NOT NULL,
    total_amount DECIMAL(10,2),
    created_at TIMESTAMP NOT NULL DEFAULT NOW(),
    updated_at TIMESTAMP NOT NULL DEFAULT NOW(),
    metadata JSONB
);
-- Common query: Find user's recent orders
CREATE INDEX idx_orders_user_created
ON orders(user_id, created_at DESC);
-- Status filtering
CREATE INDEX idx_orders_status_created
ON orders(status, created_at DESC)
WHERE status IN ('pending', 'processing');
-- JSONB metadata queries
CREATE INDEX idx_orders_metadata
ON orders USING GIN (metadata);
-- Full-text search on order notes
CREATE INDEX idx_orders_notes_fts
ON orders USING GIN (to_tsvector('english', notes));

Partial Indexes for Efficiency

Partial indexes only index rows matching a condition, saving space and improving performance:

-- Only index active users
CREATE INDEX idx_active_users_email
ON users(email)
WHERE deleted_at IS NULL;
-- Only index failed payments for admin review
CREATE INDEX idx_failed_payments
ON payments(created_at DESC)
WHERE status = 'failed';

Composite Index Column Order

The order of columns in composite indexes matters significantly:

-- Efficient for these queries:
CREATE INDEX idx_users_country_city_created
ON users(country, city, created_at DESC);
-- Uses the index efficiently:
SELECT * FROM users WHERE country = 'US' AND city = 'New York';
SELECT * FROM users WHERE country = 'US' AND city = 'New York' ORDER BY created_at DESC;
SELECT * FROM users WHERE country = 'US';
-- Does NOT use the index efficiently:
SELECT * FROM users WHERE city = 'New York';  -- city is not leftmost
SELECT * FROM users WHERE created_at > '2025-01-01';  -- created_at not leftmost

Rule of thumb: Place columns in order of:

Equality conditions (highest selectivity first)
Range conditions
Sort order

Monitoring Index Usage

Always verify your indexes are being used:

-- Check index usage statistics
SELECT
    schemaname,
    tablename,
    indexname,
    idx_scan as scans,
    idx_tup_read as tuples_read,
    idx_tup_fetch as tuples_fetched
FROM pg_stat_user_indexes
WHERE schemaname = 'public'
ORDER BY idx_scan ASC;
-- Find unused indexes (candidates for removal)
SELECT
schemaname,
tablename,
indexname
FROM pg_stat_user_indexes
WHERE idx_scan = 0
AND indexrelname NOT LIKE 'pg_toast%'
ORDER BY pg_relation_size(indexrelid) DESC;
-- Analyze query plan
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM orders
WHERE user_id = 123
AND status = 'pending'
ORDER BY created_at DESC
LIMIT 10;

Index Maintenance

Indexes require periodic maintenance to remain efficient:

-- Rebuild bloated indexes REINDEX INDEX CONCURRENTLY idx_orders_user_created; -- Update statistics for query planner ANALYZE orders;

-- Vacuum to reclaim space VACUUM ANALYZE orders;

Performance Impact: Before and After

Here's a real example from our production system:

Before indexing:

EXPLAIN ANALYZE
SELECT * FROM orders
WHERE user_id = 12345
    AND status = 'completed'
ORDER BY created_at DESC
LIMIT 20;
-- Seq Scan on orders (cost=0.00..458291.77 rows=20)
-- Planning Time: 0.342 ms
-- Execution Time: 2847.623 ms  ⬅️ Nearly 3 seconds!

After adding composite index:

CREATE INDEX idx_orders_user_status_created
ON orders(user_id, status, created_at DESC);
EXPLAIN ANALYZE
SELECT * FROM orders
WHERE user_id = 12345
AND status = 'completed'
ORDER BY created_at DESC
LIMIT 20;
-- Index Scan using idx_orders_user_status_created (cost=0.56..127.43 rows=20)
-- Planning Time: 0.398 ms
-- Execution Time: 0.847 ms  ⬅️ 3,362x faster!

Common Pitfalls to Avoid

Over-indexing: Every index slows down INSERT/UPDATE/DELETE operations
Duplicate indexes: Check for redundant indexes with overlapping columns
Wrong column order: Always put highest selectivity columns first
Ignoring index bloat: Rebuild indexes periodically with REINDEX
Not analyzing queries: Use EXPLAIN ANALYZE to verify index usage

Conclusion

Proper indexing is a balancing act between read and write performance. Start with indexes on foreign keys and frequently queried columns. Monitor usage statistics, and don't be afraid to drop unused indexes.

Remember: Measure, don't guess. Use EXPLAIN ANALYZE to verify performance improvements before deploying to production.

Key Takeaways

B-tree indexes handle most use cases (equality and ranges)
GIN indexes excel at JSONB, arrays, and full-text search
Composite index column order matters significantly
Partial indexes reduce overhead for filtered datasets
Monitor index usage and remove unused indexes
Always verify with EXPLAIN ANALYZE

Proper indexing can transform a slow database into a high-performance powerhouse. Invest the time to understand your query patterns and choose indexes wisely.