Database Optimization for High-Traffic Apps

Most “we need a bigger database” conversations are actually “we need to fix five queries” conversations. Hardware is cheap until it isn’t, and throwing RDS instance sizes at a slow application is the most expensive way to avoid reading a query plan.

This is the playbook we use when an IWWOMI client tells us their database is on fire. It assumes Postgres because that’s what most production systems we touch run on, but the principles port directly to MySQL, SQL Server, and anything else that speaks SQL.

The 80/20 rule is closer to 95/5

Pull the slow query log from any production database and you’ll find the same shape every time: a handful of queries account for the overwhelming majority of database time. Not 80/20 — more like 95/5. One bad query running 200 times a second will dwarf the cost of a hundred well-written ones.

This matters because it changes where you spend your effort. Don’t refactor the schema. Don’t migrate to a different database engine. Find the top ten queries by total time and fix those.

In Postgres, the pg_stat_statements extension gives you exactly this:

SELECT
  query,
  calls,
  total_exec_time,
  mean_exec_time,
  rows
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 20;

Sort by total_exec_time, not mean_exec_time. A query that takes 2 seconds but runs once a day matters less than one that takes 5ms but runs ten thousand times a minute.

If you cannot name your top five most expensive queries from memory, you are not optimizing — you are guessing.

Indexing: when to add, when to walk away

Indexes feel free. They are not. Every index is a parallel data structure the database must update on every INSERT, UPDATE, and DELETE that touches the indexed columns. This is write amplification, and it is the reason your seemingly innocent “let me index everything that might be queried” instinct will eventually destroy your write throughput.

A missing index on a 10M-row table can take a query from 5ms to 4 seconds. The same index on a write-heavy table can take a transaction from 2ms to 20ms. Both numbers matter.

Rules we follow:

Index columns used in WHERE, JOIN ON, and ORDER BY. If a column never appears in any of those, it doesn’t need an index.
For composite indexes, lead with the most selective column. An index on (tenant_id, created_at) is useless if every query filters by created_at only.
Use partial indexes when most rows are irrelevant. If 99% of your orders table is status = 'completed' and you only ever query status = 'pending', a partial index is 100x smaller and faster:

CREATE INDEX idx_orders_pending
ON orders (created_at)
WHERE status = 'pending';

Drop indexes that aren’t used. pg_stat_user_indexes tells you which indexes have an idx_scan count of zero. Those are pure overhead.

For a deeper treatment of how B-tree indexes actually work, use-the-index-luke.com is still the best resource on the internet.

The N+1 problem and why your ORM is lying to you

The N+1 query problem is the single most common performance bug we find in code reviews. It looks innocent:

posts = Post.objects.all()
for post in posts:
    print(post.author.name)

That’s one query for posts, then N queries for authors. With 500 posts you’ve just done 501 round trips to the database. At 2ms per round trip that’s a one-second response time from queries that should take 5ms total.

ORMs hide this because they make post.author look like a property access. It isn’t — it’s a network call. Every ORM has a solution: select_related / prefetch_related in Django, includes in Rails, eager loading in Sequelize, Include in EF Core. Learn yours and use it.

The fix:

posts = Post.objects.select_related('author').all()

One query. Done.

Reading EXPLAIN ANALYZE without crying

EXPLAIN ANALYZE is the single most important tool in your database toolkit. Most developers run it, see a wall of text, and close the tab. Don’t.

Take this query:

EXPLAIN ANALYZE
SELECT u.email, COUNT(o.id)
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
WHERE u.created_at > '2026-01-01'
GROUP BY u.email;

A healthy plan reads from bottom to top and looks like this:

HashAggregate  (cost=... rows=12000) (actual time=180.2..183.4 rows=11842 loops=1)
  ->  Hash Right Join  (actual time=20.1..150.3 rows=84000 loops=1)
        Hash Cond: (o.user_id = u.id)
        ->  Seq Scan on orders o  (actual time=0.01..40.2 rows=500000 loops=1)
        ->  Hash  (actual time=18.9..18.9 rows=12000 loops=1)
              ->  Index Scan using idx_users_created_at on users u
                    (actual time=0.03..15.2 rows=12000 loops=1)

Three things to look at, in order:

Seq Scan on a big table. That Seq Scan on orders is reading 500K rows. If you usually filter orders by user, add an index on orders.user_id.
Row estimate vs actual. If the planner thinks a step returns 100 rows and it actually returns 100,000, your statistics are stale. Run ANALYZE.
The slowest node. Time is cumulative up the tree. Find the node where the gap between its actual time and its children’s actual time is largest. That’s where the work is happening.

Connection pooling: PgBouncer or bust

Postgres connections are heavy. Each one forks a backend process and allocates several megabytes of memory. A Postgres instance configured for 100 connections will fall over the moment your Node.js app fleet decides it needs 800.

Use PgBouncer. Put it between your app and Postgres. A minimal production config:

[databases]
appdb = host=db.internal port=5432 dbname=appdb

[pgbouncer]
listen_addr = 0.0.0.0
listen_port = 6432
auth_type = scram-sha-256
auth_file = /etc/pgbouncer/userlist.txt

pool_mode = transaction
max_client_conn = 2000
default_pool_size = 25
reserve_pool_size = 5
reserve_pool_timeout = 3
server_idle_timeout = 600

server_tls_sslmode = require

Two settings matter most:

pool_mode = transaction is what you want 95% of the time. It lets one Postgres connection serve thousands of clients by handing the connection over between transactions. The catch: you cannot use session-level features like LISTEN/NOTIFY, prepared statements (without workarounds), or temporary tables.
default_pool_size = 25 means PgBouncer will open at most 25 connections per database+user pair. With 2000 client connections sharing 25 server connections, you’re getting 80x amplification on your Postgres connection budget.

For cloud-managed Postgres, see our cloud migration guide for notes on RDS Proxy and equivalent services.

Caching: Redis, Memcached, and the hardest problem in computer science

The two hard things in computer science are naming things, cache invalidation, and off-by-one errors. We can’t help with the first one.

Redis vs Memcached: for most cases, pick Redis. Memcached is simpler and slightly faster for pure key-value workloads, but Redis gives you sorted sets, pub/sub, Lua scripting, persistence, and replication. The performance gap is smaller than it used to be, and the feature gap matters.

Cache invalidation strategies, ranked by how much pain they cause:

TTL-only. Set a TTL and let stale data exist for that window. Simplest. Works when stale data is acceptable for 30 seconds to 5 minutes.
Write-through. On every write, update the cache and the database in the same transaction-ish flow. No staleness, but doubles the surface area for bugs.
Cache-aside with explicit invalidation. Read from cache, fall through to DB on miss. On writes, delete the cache key. The default for most apps. The trap: you must invalidate every key that derives from the changed data, and lists of things are easy to forget.
Event-driven invalidation. Publish change events from the DB (via WAL, CDC, or application events) and let cache consumers invalidate themselves. Powerful, but it’s a distributed system now.

Cache what’s expensive to compute and cheap to be wrong about. Don’t cache user balances. Do cache the home page feed.

Read replicas and sharding: know when

Read replicas are easy. Provision one, point read-only queries at it, done. The two gotchas:

Replication lag. A read replica is always a few milliseconds (or seconds, under load) behind the primary. If a user writes a comment and immediately reloads the page reading from a replica, the comment is missing. Either route post-write reads to the primary, or accept the eventual consistency.
You’re not actually reducing write load. Replicas replay every write. They scale reads, not writes.

Sharding is hard. Don’t do it until you have to. The signals that you have to:

Your primary’s write IOPS is saturated and vertical scaling has run out.
A single table is so large that maintenance operations (VACUUM, index builds, backups) take longer than your maintenance window.
You’re already doing application-level multi-tenancy and a tenant-keyed shard scheme is natural.

Sharding turns one operational problem into N operational problems. If you adopt it, consider whether the microservices architecture pattern of database-per-service gets you most of the benefit with less coordination cost.

Postgres-specific: VACUUM, partial indexes, JSONB

A few Postgres details that bite in production:

VACUUM is not optional. Postgres uses MVCC, which means updates and deletes don’t remove rows — they mark them dead. VACUUM reclaims that space. autovacuum handles this in the background, but on high-write tables the defaults are too conservative. Tune per-table:

ALTER TABLE events SET (
  autovacuum_vacuum_scale_factor = 0.05,
  autovacuum_analyze_scale_factor = 0.02
);

This makes autovacuum trigger at 5% dead tuples instead of the default 20%. On a 100M-row table that’s the difference between vacuuming at 5M dead rows and 20M.

JSONB indexing. JSONB is great until you query inside it without an index and discover sequential scans on a 50GB table. GIN indexes make JSONB containment queries fast:

CREATE INDEX idx_events_payload ON events USING GIN (payload jsonb_path_ops);

-- This query now uses the index
SELECT * FROM events WHERE payload @> '{"type": "checkout"}';

Use jsonb_path_ops when you only need containment (@>); it’s smaller and faster than the default GIN operator class.

Partial indexes for soft-deletes. If your tables have a deleted_at column and every query filters WHERE deleted_at IS NULL, every index should be partial:

CREATE INDEX idx_users_email
ON users (email)
WHERE deleted_at IS NULL;

Smaller index, faster scans, and the planner uses it automatically when your query matches the predicate.

Monitoring is the optimization

You can’t optimize what you don’t measure. The minimum:

pg_stat_statements enabled and queried weekly.
Slow query log threshold at 100ms in development, 500ms in production.
Dashboards for connection count, replication lag, cache hit ratio (target: above 99%), and transaction rate.
Alerts on autovacuum lag, deadlocks per minute, and replica lag exceeding your application’s tolerance.

We covered the broader observability stack in DevOps best practices — database monitoring slots in there.

For deeper reading, the official Postgres performance docs are good and short.

Need a performance audit?

If your application is slow and you can’t tell whether the problem is the database, the application code, or the infrastructure, that’s exactly the audit IWWOMI runs most often. We dig into the query log, the plans, the indexes, and the schema, and come back with a prioritized list of the five things that will actually move the latency curve.

Get in touch and we’ll take a look.

Database Optimization for High-Traffic Apps: A Practical Guide

The 80/20 rule is closer to 95/5

Indexing: when to add, when to walk away

The N+1 problem and why your ORM is lying to you

Reading EXPLAIN ANALYZE without crying

Connection pooling: PgBouncer or bust

Caching: Redis, Memcached, and the hardest problem in computer science

Read replicas and sharding: know when

Postgres-specific: VACUUM, partial indexes, JSONB

Monitoring is the optimization

Need a performance audit?

Let's discuss your next project

Related posts

From Document Chaos to Living Data: Anatomy of a Desktop Bridge

From Public Cloud to Self-Hosted PaaS: A Migration Story

The Synthesis Wall: Frontier AI Without Sending Your Data