How to Build Your Own Feature Flag System from Scratch

Feature flags look deceptively simple. It is just a boolean in a config table, right. Then you start adding per environment overrides, gradual rollout, user targeting, experiments, and audit trails. Before long, the tiny if (FEATURE_X_ENABLED) branch has quietly grown into an entire subsystem that can break your releases if you get it wrong.

In this guide, we will design and build a feature flag system from first principles. The goal is not to clone every knob that a commercial platform offers, but to give you a solid, production ready core that you understand from top to bottom. We will look at architecture, data modeling, evaluation rules, caching, and observability, with TypeScript flavored examples. If you like the way Clean Architecture for Fullstack and Error Handling Patterns for Distributed Systems break complex systems into predictable pieces, this article will feel like their younger sibling.

TLDR

Start with a clear data model - flags, environments, variants, and targeting rules stored in a central service.
Keep evaluation fast and side effect free - a pure function that takes a flag definition plus context and returns a decision.
Use server side evaluation by default - expose thin SDKs and HTTP endpoints to your apps.
Add gradual rollout and targeting later - ship booleans first, then add percentage rollout and user rules.
Treat flags as configuration, not code - version them, audit them, and clean them up when they are no longer needed.

By the end, you should have enough to build your own lean feature flag service that integrates cleanly with a Next.js or Node stack, and enough context to decide when it is time to move to a dedicated platform.

Why feature flags matter more than ever

Modern teams ship continuously. Releases are no longer rare events that get a full freeze window, they are part of the daily rhythm. That is great for users, but it raises the cost of mistakes. A bad deployment should be something you can neutralize quickly without rolling back the entire release pipeline.

Feature flags give you that safety valve:

Turn features on or off per environment without redeploys.
Rollout gradually to a percentage of users while watching metrics.
Target specific customers, teams, or segments.
Run experiments without cloning entire code paths.

This idea connects directly to concepts in Next.js Core Web Vitals in 2025 where you tune performance per route, or AI Summarized Dashboards where you want to introduce new AI powered widgets safely. Flags are just a structured way to control change.

Architecture - a small control plane for behavior

Let us sketch a simple architecture that balances control and complexity. We will focus on server side evaluation since it fits most web and backend heavy products.

                   +-------------------+
                   |  Admin UI / CLI   |
                   |  (create flags,   |
                   |   set rules)      |
                   +---------+---------+
                             |
                             v
                      +------+------+
                      |  Flag API   |
                      |  Service    |
                      +------+------+
                             |
                             v
                 +-----------------------+
                 |  Database (flags,     |
                 |  rules, overrides)    |
                 +-----------------------+
                             ^
                             |
                             v
   +-----------+     +-----------+     +-----------+
   | Web App   |     | Backend   |     | Workers   |
   | (Next.js) |     | Services  |     | / Cron    |
   +-----------+     +-----------+     +-----------+
         \                |                 /
          \               |                /
           +--------------+---------------+
                          |
                          v
                    Flag Evaluation
                 (SDK or HTTP endpoint)

Roles:

The Flag API Service is your control plane. It stores definitions, rules, and current state.
The Admin UI or CLI is how humans change flags.
Your applications and workers call into the flag evaluation layer through an SDK or a small HTTP client.

You can host the Flag API alongside your existing backend if your scale is modest. Just keep the boundaries clear so you can extract it later if needed, similar to how you would treat billing or authentication boundaries that you might later replace with specialized services.

If the idea of separating a control plane from a data plane feels familiar, that is because it shows up in many other areas too, from Edge Functions and Serverless Compute to Document QA with RAG.

Data model - flags, variants, and rules

We will start with a relational model that can live comfortably in Postgres. At minimum, you need:

A feature_flags table for the logical flag.
An environments table if you have more than prod and staging.
A flag_variants table when you support more than simple booleans.
A flag_rules table to describe targeting logic per environment.

Here is one possible schema:

CREATE TABLE environments (
  id          uuid PRIMARY KEY,
  key         text UNIQUE NOT NULL,  -- 'prod', 'staging'
  name        text NOT NULL
);

CREATE TABLE feature_flags (
  id            uuid PRIMARY KEY,
  key           text UNIQUE NOT NULL, -- 'new-dashboard'
  name          text NOT NULL,
  description   text,
  created_by    text,
  created_at    timestamptz NOT NULL DEFAULT now(),
  archived      boolean NOT NULL DEFAULT false
);

CREATE TABLE flag_variants (
  id             uuid PRIMARY KEY,
  flag_id        uuid REFERENCES feature_flags(id) ON DELETE CASCADE,
  key            text NOT NULL,   -- 'on', 'off', 'control', 'variant-a'
  value          jsonb NOT NULL,  -- arbitrary payload if needed
  UNIQUE(flag_id, key)
);

CREATE TABLE flag_rules (
  id              uuid PRIMARY KEY,
  flag_id         uuid REFERENCES feature_flags(id) ON DELETE CASCADE,
  environment_id  uuid REFERENCES environments(id) ON DELETE CASCADE,
  priority        integer NOT NULL,
  rule_type       text NOT NULL, -- 'percentage', 'user-ids', 'segment'
  conditions      jsonb NOT NULL,
  variant_key     text NOT NULL,
  enabled         boolean NOT NULL DEFAULT true
);

Example conditions JSON for a percentage rule:

{
  "rollout": 25
}

Or for a user id allowlist:

{
  "userIds": ["user_123", "user_456"]
}

With this model, you can represent:

A simple boolean flag - two variants on and off, plus a default rule.
Gradual rollout - percentage rules that assign a variant based on hashing user ids.
Targeted rollout - rules that match specific users or segments first, then fall back to percentage or default.

The evaluation function - pure and predictable

The heart of your system is a function that takes:

A flag definition with its variants and rules for a given environment.
A context object (user id, organization id, request attributes).

And returns:

A resolved variant for that context.
Optional metadata about which rule decided the outcome.

In TypeScript, you can think of it as:

type FlagVariant = {
  key: string;
  value: unknown;
};

type FlagRule =
  | {
      type: "percentage";
      rollout: number; // 0 to 100
      variantKey: string;
    }
  | {
      type: "user-ids";
      userIds: string[];
      variantKey: string;
    };

type FlagDefinition = {
  key: string;
  variants: FlagVariant[];
  rules: FlagRule[];
  defaultVariantKey: string;
};

type EvaluationContext = {
  userId?: string;
  environmentKey: string;
};

type EvaluationResult = {
  variant: FlagVariant;
  reason: string;
};

The evaluation logic can then be implemented as a pure function:

const hashToPercentage = (input: string): number => {
  let hash = 0;
  for (let i = 0; i < input.length; i += 1) {
    hash = (hash * 31 + input.charCodeAt(i)) >>> 0;
  }
  return hash % 100;
};

export const evaluateFlag = (
  flag: FlagDefinition,
  context: EvaluationContext
): EvaluationResult => {
  for (const rule of flag.rules) {
    if (rule.type === "user-ids" && context.userId) {
      if (rule.userIds.includes(context.userId)) {
        const variant = flag.variants.find(
          (v) => v.key === rule.variantKey
        );
        if (variant) {
          return { variant, reason: "user-ids match" };
        }
      }
    }

    if (rule.type === "percentage" && context.userId) {
      const bucket = hashToPercentage(
        `${flag.key}:${context.environmentKey}:${context.userId}`
      );
      if (bucket < rule.rollout) {
        const variant = flag.variants.find(
          (v) => v.key === rule.variantKey
        );
        if (variant) {
          return { variant, reason: "percentage rollout" };
        }
      }
    }
  }

  const fallback =
    flag.variants.find((v) => v.key === flag.defaultVariantKey) ??
    flag.variants[0];

  return { variant: fallback, reason: "default" };
};

By keeping this function pure and deterministic, you make it easy to test and reason about. There are no time based or random elements, everything flows from hashing stable identifiers. This kind of determinism is exactly what you want in systems that directly affect production behavior, similar to retry logic or circuit breakers that you might design following the ideas in Error Handling Patterns for Distributed Systems.

Exposing flags to applications - HTTP or SDK

Once you have evaluation nailed down, you need a way for applications to ask questions like:

For user 123 in prod, is new-dashboard enabled.

The simplest integration is an HTTP endpoint in your Flag API service:

POST /v1/flags/evaluate
Content-Type: application/json

{
  "environmentKey": "prod",
  "userId": "user_123",
  "flagKeys": ["new-dashboard", "ai-summary-widget"]
}

Response:

{
  "flags": {
    "new-dashboard": {
      "variantKey": "on",
      "value": true,
      "reason": "percentage rollout"
    },
    "ai-summary-widget": {
      "variantKey": "control",
      "value": false,
      "reason": "default"
    }
  }
}

Your SDKs in web apps and backend services can be very thin wrappers around this endpoint, with local caching to avoid hitting the API on every request.

In a Node or Next.js app, a minimal client could look like:

type EvaluateManyResponse = {
  flags: Record<
    string,
    { variantKey: string; value: unknown; reason: string }
  >;
};

export const evaluateFlags = async (params: {
  environmentKey: string;
  userId: string;
  flagKeys: string[];
}): Promise<EvaluateManyResponse> => {
  const res = await fetch("https://flags.myapp.com/v1/flags/evaluate", {
    method: "POST",
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify(params),
  });

  if (!res.ok) {
    throw new Error(`Flag evaluation failed with ${res.status}`);
  }

  return res.json();
};

On the frontend, you may wrap this in a React hook that prefetches relevant flags on page load, similar to how you might fetch configuration or content for a dashboard. If you already built something like AI Summarized Dashboards, the mental model is almost the same - pull configuration once, cache locally, and react to changes when they happen.

Caching and performance

If you call your flag service on every request, you will either spend more on latency than you save in safety, or you will overload the service. Instead, layer your caching thoughtfully:

Database to Flag Service - cache flag definitions in memory, invalidate on change.
Flag Service to Applications - cache evaluation results in memory or in a local data store such as Redis.
Applications - prefetch flags for a user and cache them in process for a short TTL.

One simple pattern is to periodically poll definitions in the Flag API service:

let cachedFlags: Map<string, FlagDefinition> = new Map();

export const refreshFlagCache = async () => {
  const rows = await db.selectFrom("feature_flags").selectAll().execute();
  // Join rules and variants as needed and rebuild FlagDefinition objects
  cachedFlags = hydrateDefinitions(rows);
};

Call refreshFlagCache on startup and on a schedule, or trigger it via a message from your admin UI when someone changes a flag. For large fleets, you will eventually want something more robust such as event streaming or a configuration service, but polling is often enough for teams that are just starting.

If you are running your Flag Service on serverless infrastructure, take the usual advice from Edge Functions and Serverless Compute Effectively in 2025 about cold starts and connection management, and weigh whether a long lived container is a better fit.

Safety, observability, and audit trails

Feature flags are a control plane for behavior, which means you want a clear story for:

Who changed what, when, and why.
What the system decided at runtime.
How to debug surprising behavior quickly.

Focus on three areas:

Audit logs in your admin UI - log every change to flag definitions and rules along with user identity and a reason field.
Evaluation logs or metrics in your Flag Service - track evaluation counts, cache hit ratios, and the distribution of variant decisions.
Change review - for sensitive flags such as billing or security, require code review like you would for schema changes.

The techniques you use here look very similar to the ones in AI Maintain Code Quality and Reduce Bugs, where observability and controlled rollout are key to safe adoption of new behavior.

Cleaning up stale flags

Without discipline, flags tend to accumulate and slowly turn your codebase into a museum of half remembered experiments. The antidote is explicit lifecycle:

When you add a flag, record an owner, an intended lifetime, and a cleanup condition.
When a feature is fully rolled out, schedule a task to:
- Remove checks from the code.
- Archive or delete the flag definition.
- Remove associated rules and variants.

You can automate part of this with reports that show:

Flags that have been at 100 percent rollout for more than N days.
Flags that no longer have any references in code (using static analysis or search).

Pairing this cleanup work with regular refactoring and architecture maintenance, like you would for the boundaries described in Clean Architecture for Fullstack, keeps your system from turning into a tangle.

Putting it together in a Next.js app

Let us tie this back to a concrete scenario. Suppose you have a Next.js dashboard and you want to gate a new AI summary panel behind a flag for a handful of early access users.

In your Flag Service:
- Create a flag ai-summary-widget with variants on and off.
- Add a rule for the prod environment that enables on for a small list of user ids.
In your backend:
- Add a small client that calls the evaluation endpoint with environmentKey = 'prod' and the current user id.
- Cache the result in the request context or session.
In your Next.js app:
- Expose a helper isAiSummaryEnabled that reads the cached flag decision.
- Use it to decide whether to render the panel in your page component.

You now have a switch that product and engineering can control in near real time, without redeploys, and with a full audit history. As that feature matures, you can extend the rules to percentage rollout, segment targeting, and eventually clean up the flag when everyone has it.

Actionable next steps

If you want to build your own feature flag system over the next couple of sprints, here is a pragmatic roadmap:

Model your flags in the database - implement feature_flags, flag_variants, and flag_rules.
Implement a pure evaluation function - as shown above, with at least user id based rollout and a deterministic hash.
Expose an HTTP evaluation endpoint and a tiny SDK - integrate it into one backend service and one frontend to start.
Add basic admin tooling - this can be as simple as a CLI or a minimal internal page that creates and edits flags.
Invest in observability and cleanup - track changes, usage, and stale flags so complexity does not creep up silently.

From there, you can grow in the directions your product needs most - more advanced targeting, client side evaluation, or eventually migration to a dedicated platform if your scale or requirements justify it. The important part is that you will already understand the mechanics deeply, so you can judge tradeoffs instead of treating feature flags as a magical black box.