Database Indexing Strategies Every Backend Developer Should Know

Most backend performance problems do not start in your TypeScript code. They start in your database. A feature ships, traffic grows, and suddenly an endpoint that felt instant in staging now takes three seconds in production. Nine times out of ten, the root cause is a missing or poorly shaped index.

Indexes sit at the border between your application design and your data model. If you already follow good layering and domain boundaries like in the Clean Architecture guide, understanding indexing is the next step that lets you ship features without slowly turning your database into a bottleneck.

This article walks through database indexing strategies every backend developer should know. We will focus on relational databases like PostgreSQL and MySQL, use concrete SQL examples, ASCII architecture diagrams, and mental models that are practical for everyday work. Think of this as pairing on performance with a senior engineer who has broken a few production systems so you do not have to.

If your work also touches vector search or semantic retrieval, treat this as a foundation for the storage side of your system and pair it with Vector databases for semantic search for the AI heavy workloads.

What an index really is in practice

At a high level, an index is a separate data structure that lets the database find rows faster without scanning the full table. In most OLTP databases the default index type is a B-tree. You can think of it as a sorted search tree over one or more columns.

Without an index, the database often does a full table scan:

Full table scan:
- Reads every row.
- Evaluates the filter for each row.
- Keeps only the rows that match.

With a good index, the database can jump straight to the relevant slice of data, similar to how you use the index of a book instead of reading from page 1 each time.

A simple before and after

Imagine this table:

CREATE TABLE users (
  id          BIGSERIAL PRIMARY KEY,
  email       TEXT NOT NULL,
  status      TEXT NOT NULL,              -- 'active', 'invited', 'disabled'
  created_at  TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

Your backend runs this query on almost every request:

SELECT id, email
FROM users
WHERE email = $1;

If there is no index on email, the database may scan the entire users table each time. With a unique index:

CREATE UNIQUE INDEX users_email_key
  ON users (email);

the database can perform a fast index lookup. Instead of touching every row, it walks the B-tree to the right leaf node and reads only the relevant page.

You do not need to understand B-tree internals to be productive, but you do need the habit of thinking about which queries are hot and which index supports each of them.

Where indexes fit in your backend architecture

In a typical Node or TypeScript backend, indexes live inside the persistence layer, but they affect the whole system.

Client  →  HTTP / RPC  →  Service / Use Case  →  Repository  →  Database  →  Indexes

Service or use case layer decides which data is needed.
Repository or data access layer encodes the actual queries.
Database executes those queries and decides when to use which index.
Indexes are the thing that make your most important queries boringly fast.

If you already follow a layered approach similar to the one in Clean architecture in full-stack projects, a nice rule of thumb is:

Design the domain and APIs first.
Write down the concrete SQL queries your repositories will run.
For each important query, design the index that will support it.
Capture that index in a migration so it is versioned like the rest of your schema.

This keeps indexing as a normal design decision, not a late night emergency fix.

Core indexing strategies you will use constantly

You will see many index types in documentation, but a few cover almost all real world use cases:

Primary key and unique indexes.
Single column indexes.
Composite (multi column) indexes.
Covering indexes.
Partial (filtered) indexes.
Expression or functional indexes.

We will walk through each with queries that look like what you actually ship.

Primary key and unique indexes

Every serious table has a primary key. In PostgreSQL and MySQL, declaring a primary key automatically creates a unique index.

CREATE TABLE orders (
  id           BIGSERIAL PRIMARY KEY,
  user_id      BIGINT NOT NULL,
  status       TEXT NOT NULL,
  total_cents  INTEGER NOT NULL,
  created_at   TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

The common pattern in your application is to load a single order by its primary key:

SELECT *
FROM orders
WHERE id = $1;

Thanks to the primary key index, that query is as fast as you can reasonably hope for. You will also use unique indexes for business invariants:

CREATE UNIQUE INDEX users_email_key
  ON users (email);

This not only prevents duplicates at the database level, it also makes SELECT * FROM users WHERE email = $1 very efficient.

If you design APIs in a resource oriented style like in REST vs GraphQL for beginners, primary and unique indexes are the baseline that keep resource lookups predictable.

Single column indexes

A single column index is the simplest explicit index you will add.

CREATE INDEX orders_user_id_idx
  ON orders (user_id);

This helps queries like:

SELECT *
FROM orders
WHERE user_id = $1
LIMIT 50;

The database can now find the subset of rows for that user_id without scanning the entire table. Single column indexes are useful for simple filters or foreign keys, but as your queries combine multiple filters and sort orders they start to show their limits. That is where composite indexes enter.

Composite (multi column) indexes

Composite indexes index more than one column, in a defined order.

Imagine a real dashboard query:

SELECT id, user_id, total_cents, created_at
FROM orders
WHERE user_id = $1
  AND status = 'paid'
ORDER BY created_at DESC
LIMIT 50;

The database wants to:

Filter on user_id.
Filter on status.
Sort on created_at DESC.

You can support this with a composite index:

CREATE INDEX orders_user_status_created_idx
  ON orders (user_id, status, created_at DESC);

The left to right rule is crucial here:

The index can efficiently serve filters on (user_id) or (user_id, status).
It cannot efficiently serve a query that only filters on status without user_id.
Putting created_at last lets the index support the ORDER BY created_at DESC for rows that match user_id and status.

A simple mental model: think of this index as a phone book sorted by city, then last name, then first name. You can easily look up everyone named "Ada" in "London" because the data is ordered by that prefix, but if you only know the first name you effectively lose the ordering benefit.

Design composite indexes by reading the query from WHERE and ORDER BY clauses and choosing an order that maximizes how often the leading columns are used together.

Covering indexes that avoid table lookups

Indexes store at least the indexed columns and some reference to the row (like a pointer to the heap tuple in PostgreSQL). A covering index is one where the index contains all the columns the query needs to return, so the database can answer the query from the index alone.

In PostgreSQL you can add extra non key columns with INCLUDE:

CREATE INDEX orders_user_status_created_cover_idx
  ON orders (user_id, status, created_at DESC)
  INCLUDE (total_cents);

Now the dashboard query from before:

SELECT id, user_id, total_cents, created_at
FROM orders
WHERE user_id = $1
  AND status = 'paid'
ORDER BY created_at DESC
LIMIT 50;

can be served entirely from the index pages. The database does not have to visit the underlying table for each matching row. That often gives a significant speed up for read heavy list endpoints.

As with any optimization there is a trade off. Larger indexes take more disk space and make writes a little slower. Use covering indexes on your hottest queries, not everywhere.

Partial (filtered) indexes

Some tables accumulate a lot of cold data over time. Think of sessions, soft deleted records, archived events, or long lived state machines. You often query only a small active subset. Partial indexes are perfect for this scenario.

Consider a sessions table:

CREATE TABLE sessions (
  id          BIGSERIAL PRIMARY KEY,
  user_id     BIGINT NOT NULL,
  token       TEXT NOT NULL,
  active      BOOLEAN NOT NULL DEFAULT TRUE,
  created_at  TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  expires_at  TIMESTAMPTZ NOT NULL
);

Most of your queries look like:

SELECT *
FROM sessions
WHERE user_id = $1
  AND active = TRUE;

Instead of indexing every historical row, add a partial index:

CREATE INDEX sessions_active_user_idx
  ON sessions (user_id)
  WHERE active = TRUE;

Benefits:

Index is smaller because it only includes active sessions.
Writes are cheaper when sessions flip to inactive.
The planner can still use a full table scan or another index for historical analytics.

Partial indexes are especially valuable for soft delete flags and status fields where only a narrow slice of rows is hot.

Expression and functional indexes

Sometimes your queries filter on expressions, not raw columns. If the expression does not match any index definition, the planner may ignore your index.

Example:

SELECT *
FROM users
WHERE LOWER(email) = LOWER($1);

You may have an index on email, but the query applies LOWER to it, which can prevent the index from being used. The fix is to declare an index that matches the expression:

CREATE INDEX users_lower_email_idx
  ON users (LOWER(email));

Now the planner can use this index to satisfy case insensitive email lookups.

You can do the same for date bucketing, JSON fields, and other computed values:

CREATE INDEX events_occurred_on_idx
  ON events (date_trunc('day', occurred_at));

CREATE INDEX events_meta_customer_id_idx
  ON events ((meta->>'customer_id'));

Expression indexes are a great companion to analytics style queries. If you later move some of this workload to a vector database or search engine as described in Vector databases for semantic search, the same indexing mindset still applies: index the access pattern, not just the field.

A simple workflow for designing indexes from queries

The most important principle is simple:

You index queries, not tables.

Start from realistic use cases, not from abstract schemas.

Step 1: capture the real query

Imagine a feature specification:

Show the last 50 paid orders for the current user in reverse chronological order.

The resulting query:

SELECT id, user_id, total_cents, created_at
FROM orders
WHERE user_id = $1
  AND status = 'paid'
ORDER BY created_at DESC
LIMIT 50;

Write the query down explicitly in your design doc or code review. This may be raw SQL or the ORM query it maps to. The key is to see the exact WHERE, JOIN, and ORDER BY.

Step 2: design the minimal supporting index

Based on the query, you might propose:

CREATE INDEX orders_user_status_created_cover_idx
  ON orders (user_id, status, created_at DESC)
  INCLUDE (total_cents);

This index:

Supports filtering by user_id and status.
Supports sorting by created_at DESC for that subset.
Covers all selected columns, so it can become an index only scan.

For multi tenant systems, you might add tenant_id as the leading column to keep per tenant queries fast:

CREATE INDEX orders_tenant_user_status_created_idx
  ON orders (tenant_id, user_id, status, created_at DESC);

Step 3: encode the index in a migration

Indexes should live next to the rest of your schema. Here is a minimal example migration file:

-- 2025_11_20_add_orders_indexes.sql

CREATE INDEX IF NOT EXISTS orders_user_status_created_cover_idx
  ON orders (user_id, status, created_at DESC)
  INCLUDE (total_cents);

This will fit naturally into the sort of architecture described in Clean architecture in full-stack projects, where database changes are part of a controlled pipeline.

Step 4: verify usage with EXPLAIN

Indexes help only if the planner chooses them. Get used to reading query plans. In PostgreSQL:

EXPLAIN ANALYZE
SELECT id, user_id, total_cents, created_at
FROM orders
WHERE user_id = 42
  AND status = 'paid'
ORDER BY created_at DESC
LIMIT 50;

You want to see an Index Scan or Index Only Scan on orders_user_status_created_cover_idx instead of a Seq Scan on orders.

If you see a sequential scan:

Confirm that the index exists and is not invalid.
Check that the query text matches the index definition.
Check row count and statistics; for tiny tables a full scan may be cheaper.

Reading query plans is similar in spirit to reading React profiler traces. It is uncomfortable at first, but once you can spot obvious problems it becomes a superpower. For a frontend analogue, see how we reason about work and bytes in React performance and bundle size optimization.

Common indexing pitfalls and how to avoid them

Over indexing everything

Indexes are not free:

They consume disk.
They slow down writes, updates, and deletes because each index must be updated.
They increase planning complexity for the database.

Resist the urge to add a new index for every query. A good rule:

Index primary keys and unique constraints by default.
Add indexes for the handful of hottest read queries on each table.
Use slow query logs in production to drive further indexing decisions.

Wrong column order in composite indexes

The most frequent bug in composite indexes is choosing the wrong column order. Remember:

Leading columns should match the most common filters.
If you always filter by tenant_id in a multi tenant system, put tenant_id first.
If you sometimes filter only by status but not by user_id, you may need a separate index.

When in doubt, list your actual queries and see how often each pattern appears. Design indexes for the ones that matter under load.

Functions and casts on indexed columns

Wrapping indexed columns in functions can silently kill index usage:

SELECT *
FROM users
WHERE created_at::date = CURRENT_DATE;

The safe rewrite:

SELECT *
FROM users
WHERE created_at >= CURRENT_DATE
  AND created_at < CURRENT_DATE + INTERVAL '1 day';

This version can use a plain index on created_at. If you truly need a more complex expression, add a matching expression index.

Ignoring the write side

Backend developers often focus on query latency and forget write performance. Every extra index makes inserts, updates, and deletes slightly more expensive.

For write heavy tables:

Start with primary and essential foreign keys.
Add only the most valuable read indexes.
Consider summarizing or denormalized tables for heavy analytics instead of indexing every possible filter.

This is similar to the trade offs you make when designing APIs in REST vs GraphQL for beginners: flexibility and power always come with some cost, and you want that cost to be intentional.

Indexing in multi tenant systems

Most production SaaS systems are multi tenant. Nearly every key table includes a tenant_id or account_id column. Indexing strategies should reflect that reality.

Patterns that usually work well:

Put tenant_id first in indexes that are used by tenant scoped queries.
Combine tenant_id with other high cardinality columns like id, user_id, or external_id.
Use partial indexes when only a subset of data per tenant is hot.

Example:

CREATE TABLE documents (
  id          BIGSERIAL PRIMARY KEY,
  tenant_id   BIGINT NOT NULL,
  title       TEXT NOT NULL,
  status      TEXT NOT NULL,
  created_at  TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

CREATE INDEX documents_tenant_status_created_idx
  ON documents (tenant_id, status, created_at DESC);

The common query:

SELECT id, title, created_at
FROM documents
WHERE tenant_id = $1
  AND status = 'active'
ORDER BY created_at DESC
LIMIT 50;

is now a straightforward index scan. Combined with good caching and API design as covered in Next.js SEO best practices and React performance and bundle size optimization, you can keep tenant experiences responsive even as datasets grow.

Putting it all together in a backend workflow

Here is a simple indexing workflow that fits nicely into modern TypeScript backends:

Feature idea
   ↓
Write domain and API design
   ↓
List concrete queries per use case
   ↓
Design or re-use indexes for those queries
   ↓
Add indexes in migrations
   ↓
Verify with EXPLAIN on realistic data
   ↓
Monitor slow queries in production and iterate

This looks boring, which is exactly what you want. Indexing becomes a regular part of feature work. Surprises show up early in code review instead of late in an incident channel.

Conclusion

Database indexing is one of the highest leverage skills a backend developer can learn. You rarely get praise for the index that prevented a production incident, but you will absolutely feel the absence of that index when it is missing.

If you remember only a few things, let them be these: think in terms of query patterns, design composite and partial indexes to match those patterns, and always verify with real query plans. Combine that with solid architecture practices from Clean architecture in full-stack projects and performance awareness from React performance and bundle size optimization, and you will be far ahead of most teams when it comes to backend reliability.

Actionable takeaways

Index concrete queries, not abstract tables: For each important use case, write out the actual SQL or ORM call, then design composite and covering indexes that align with the WHERE and ORDER BY clauses instead of sprinkling generic single column indexes.
Make indexing part of feature design: When adding new endpoints, decide which queries will be hot, capture them in design docs, and add the supporting indexes in migrations so they are reviewed, tested, and deployed along with the feature.
Use production signals to refine: Turn on slow query logging in production, run EXPLAIN ANALYZE on the worst offenders, and fix them with a combination of better query shapes and more precise indexes instead of reacting with random new indexes that slowly hurt write performance.