Database Indexing Strategies for High-Performance PostgreSQL
Deep dive into PostgreSQL indexing strategies including B-tree, Hash, GiST, and GIN indexes. Learn when to use each type and how to optimize query performance with real-world examples.
Why Database Indexing Matters
In production applications, database performance can make or break user experience. A query that takes 500ms instead of 50ms can cascade into timeout errors, frustrated users, and lost revenue. The difference often comes down to proper indexing.
Yet indexing remains one of the most misunderstood aspects of database optimization. Adding too many indexes slows down writes. Too few, and reads suffer. The key is understanding the trade-offs and choosing the right index type for your use case.
Understanding PostgreSQL Index Types
PostgreSQL offers several index types, each optimized for different query patterns:
B-tree Indexes (Default)
B-tree indexes are the workhorse of PostgreSQL. They handle equality and range queries efficiently and work with most data types.
-- Perfect for equality and range queries
CREATE INDEX idx_users_email ON users(email);
CREATE INDEX idx_orders_created_at ON orders(created_at);
-- Handles these queries efficiently:
SELECT * FROM users WHERE email = 'user@example.com';
SELECT * FROM orders WHERE created_at >= '2025-01-01';
SELECT * FROM orders WHERE created_at BETWEEN '2025-01-01' AND '2025-01-31';
When to use B-tree:
- Equality comparisons (
=,IN) - Range queries (
<,>,BETWEEN) - Pattern matching with left-anchored
LIKE(LIKE 'prefix%') - Sorting operations (
ORDER BY)
Hash Indexes
Hash indexes are optimized for simple equality comparisons. They're faster than B-tree for = operations but can't handle range queries.
-- Good for exact lookups on high-cardinality columns
CREATE INDEX idx_users_api_key_hash ON users USING HASH (api_key);
-- Efficient query:
SELECT * FROM users WHERE api_key = '8f3b2a1c...';
-- Won't use the hash index:
SELECT * FROM users WHERE api_key LIKE '8f3b%';
When to use Hash:
- High-cardinality columns (many unique values)
- Only equality queries needed
- Space efficiency is important
GIN Indexes (Generalized Inverted Index)
GIN indexes excel at indexing composite values like arrays, JSONB, and full-text search.
-- Perfect for JSONB and array queries
CREATE INDEX idx_products_tags ON products USING GIN (tags);
CREATE INDEX idx_users_metadata ON users USING GIN (metadata);
-- Handles these efficiently:
SELECT * FROM products WHERE tags @> ARRAY['electronics', 'sale'];
SELECT * FROM users WHERE metadata @> '{"plan": "pro"}';
SELECT * FROM articles WHERE to_tsvector('english', content) @@ to_tsquery('database & indexing');
When to use GIN:
- JSONB columns with complex queries
- Array contains operations
- Full-text search
- Multi-value columns
GiST Indexes (Generalized Search Tree)
GiST indexes support geometric data types, ranges, and nearest-neighbor searches.
-- Spatial queries and range types
CREATE INDEX idx_locations_coords ON locations USING GIST (coordinates);
CREATE INDEX idx_events_daterange ON events USING GIST (date_range);
-- Efficient queries:
SELECT * FROM locations WHERE coordinates <-> point(40.7128, -74.0060) < 10;
SELECT * FROM events WHERE date_range @> '2025-01-25'::date;
When to use GiST:
- Geometric data (PostGIS)
- Range types and overlaps
- Nearest-neighbor searches
- Custom data types
Real-World Indexing Strategy
Let's optimize a real e-commerce database:
-- Orders table optimization
CREATE TABLE orders (
id BIGSERIAL PRIMARY KEY,
user_id BIGINT NOT NULL,
status VARCHAR(50) NOT NULL,
total_amount DECIMAL(10,2),
created_at TIMESTAMP NOT NULL DEFAULT NOW(),
updated_at TIMESTAMP NOT NULL DEFAULT NOW(),
metadata JSONB
);
-- Common query: Find user's recent orders
CREATE INDEX idx_orders_user_created
ON orders(user_id, created_at DESC);
-- Status filtering
CREATE INDEX idx_orders_status_created
ON orders(status, created_at DESC)
WHERE status IN ('pending', 'processing');
-- JSONB metadata queries
CREATE INDEX idx_orders_metadata
ON orders USING GIN (metadata);
-- Full-text search on order notes
CREATE INDEX idx_orders_notes_fts
ON orders USING GIN (to_tsvector('english', notes));
Partial Indexes for Efficiency
Partial indexes only index rows matching a condition, saving space and improving performance:
-- Only index active users
CREATE INDEX idx_active_users_email
ON users(email)
WHERE deleted_at IS NULL;
-- Only index failed payments for admin review
CREATE INDEX idx_failed_payments
ON payments(created_at DESC)
WHERE status = 'failed';
Composite Index Column Order
The order of columns in composite indexes matters significantly:
-- Efficient for these queries:
CREATE INDEX idx_users_country_city_created
ON users(country, city, created_at DESC);
-- Uses the index efficiently:
SELECT * FROM users WHERE country = 'US' AND city = 'New York';
SELECT * FROM users WHERE country = 'US' AND city = 'New York' ORDER BY created_at DESC;
SELECT * FROM users WHERE country = 'US';
-- Does NOT use the index efficiently:
SELECT * FROM users WHERE city = 'New York'; -- city is not leftmost
SELECT * FROM users WHERE created_at > '2025-01-01'; -- created_at not leftmost
Rule of thumb: Place columns in order of:
- Equality conditions (highest selectivity first)
- Range conditions
- Sort order
Monitoring Index Usage
Always verify your indexes are being used:
-- Check index usage statistics
SELECT
schemaname,
tablename,
indexname,
idx_scan as scans,
idx_tup_read as tuples_read,
idx_tup_fetch as tuples_fetched
FROM pg_stat_user_indexes
WHERE schemaname = 'public'
ORDER BY idx_scan ASC;
-- Find unused indexes (candidates for removal)
SELECT
schemaname,
tablename,
indexname
FROM pg_stat_user_indexes
WHERE idx_scan = 0
AND indexrelname NOT LIKE 'pg_toast%'
ORDER BY pg_relation_size(indexrelid) DESC;
-- Analyze query plan
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM orders
WHERE user_id = 123
AND status = 'pending'
ORDER BY created_at DESC
LIMIT 10;
Index Maintenance
Indexes require periodic maintenance to remain efficient:
-- Rebuild bloated indexes
REINDEX INDEX CONCURRENTLY idx_orders_user_created;
-- Update statistics for query planner
ANALYZE orders;
-- Vacuum to reclaim space
VACUUM ANALYZE orders;
Performance Impact: Before and After
Here's a real example from our production system:
Before indexing:
EXPLAIN ANALYZE
SELECT * FROM orders
WHERE user_id = 12345
AND status = 'completed'
ORDER BY created_at DESC
LIMIT 20;
-- Seq Scan on orders (cost=0.00..458291.77 rows=20)
-- Planning Time: 0.342 ms
-- Execution Time: 2847.623 ms ⬅️ Nearly 3 seconds!
After adding composite index:
CREATE INDEX idx_orders_user_status_created
ON orders(user_id, status, created_at DESC);
EXPLAIN ANALYZE
SELECT * FROM orders
WHERE user_id = 12345
AND status = 'completed'
ORDER BY created_at DESC
LIMIT 20;
-- Index Scan using idx_orders_user_status_created (cost=0.56..127.43 rows=20)
-- Planning Time: 0.398 ms
-- Execution Time: 0.847 ms ⬅️ 3,362x faster!
Common Pitfalls to Avoid
- Over-indexing: Every index slows down INSERT/UPDATE/DELETE operations
- Duplicate indexes: Check for redundant indexes with overlapping columns
- Wrong column order: Always put highest selectivity columns first
- Ignoring index bloat: Rebuild indexes periodically with REINDEX
- Not analyzing queries: Use EXPLAIN ANALYZE to verify index usage
Conclusion
Proper indexing is a balancing act between read and write performance. Start with indexes on foreign keys and frequently queried columns. Monitor usage statistics, and don't be afraid to drop unused indexes.
Remember: Measure, don't guess. Use EXPLAIN ANALYZE to verify performance improvements before deploying to production.
Key Takeaways
- B-tree indexes handle most use cases (equality and ranges)
- GIN indexes excel at JSONB, arrays, and full-text search
- Composite index column order matters significantly
- Partial indexes reduce overhead for filtered datasets
- Monitor index usage and remove unused indexes
- Always verify with EXPLAIN ANALYZE
Proper indexing can transform a slow database into a high-performance powerhouse. Invest the time to understand your query patterns and choose indexes wisely.
Related Articles
GraphQL API Design - Production Architecture and Best Practices for Scalable Systems
Master GraphQL API design covering schema design principles, resolver optimization, N+1 query prevention with DataLoader, authentication and authorization patterns, caching strategies, error handling, and production deployment for high-performance GraphQL systems.
Testing Strategies - Unit, Integration, and E2E Testing Best Practices for Production Quality
Comprehensive guide to testing strategies covering unit tests, integration tests, end-to-end testing, test-driven development, mocking patterns, testing pyramid, and production testing practices for reliable software delivery.
Monitoring and Observability - Production Systems Performance and Debugging at Scale
Master monitoring and observability covering metrics collection with Prometheus, distributed tracing with OpenTelemetry, log aggregation, alerting strategies, SLOs/SLIs, and production debugging techniques for reliable systems.
Written by StaticBlock Editorial
StaticBlock Editorial is a technical writer and software engineer specializing in web development, performance optimization, and developer tooling.