Flagd on Kubernetes: Complete Production Guide to Architecture, Implementation and Troubleshooting

Introduction

Frequent deployments increase delivery speed but also raise the risk of exposing unstable features.
Traditional approaches couple deployment with release, making rollback slow and risky.

Feature flagging decouples deployment from release by controlling feature exposure at runtime. This guide presents a production-focused implementation of Flagd on Kubernetes with architecture, workflow, infrastructure design, hands-on steps, and real troubleshooting.

Problem Statement

Common production challenges:

  • All-or-nothing releases (high blast radius)
  • Slow rollback (requires redeploy)
  • Limited experimentation (no safe canary/A-B)
  • Tight coupling (deploy = release)
  • Low visibility into feature decisions

Goal: Safe rollouts, instant rollback, controlled experimentation, and observability.

What is Flagd?

Flagd is an open-source, stateless feature flag evaluation service aligned with the OpenFeature specification.

Capabilities
  • Boolean and multivariate flags
  • Targeting via context (user, region, headers)
  • Canary rollouts and A/B testing
  • Instant kill switch

Why Flagd

  • Kubernetes-native
  • Horizontally scalable (stateless)
  • Works with OpenFeature SDKs

Architecture Overview

The following diagram shows how Flagd integrates into Kubernetes and controls feature behavior at runtime.

Figure: Flagd architecture with ConfigMap-based configuration and application pods

Components
  • Developer/Git: defines and versions flags
  • ConfigMap/CRD: distributes configuration
  • Flagd Deployment: evaluates flags (HA)
  • Service: stable endpoint
  • Applications: query via OpenFeature
  • Observability: Prometheus/Grafana

How The Flow Works

  1. Developer defines flags in JSON/YAML
  2. CI/CD or GitOps applies ConfigMap/CRD
  3. Flagd watches and reloads changes dynamically
  4. Application queries Flagd via SDK
  5. Flagd evaluates and returns variant
  6. Application behavior changes without redeploy

Infrastructure Design (Production)

Core Resources
  • Deployment (flagd) with replicas ≥ 2
  • Service (ClusterIP)
  • ConfigMap/CRD
  • RBAC (read-only)
  • HPA (optional)
Deployment Models
  • Centralized service (simpler)
  • Sidecar (low latency)
  • DaemonSet(node-local)
High Availability
  • Multiple replicas
  • Probes and rolling updates
Networking & Security
  • Internal DNS via ClusterIP
  • mTLS with service mesh (optional)
  • NetworkPolicies
  • No secrets in flags

Step-by-Step Implementation (with Screenshots)

Step 0: Verify Cluster

kubectl get nodes

Expected: node in Ready state.

Screenshot: kubectl get nodes output

Step 1: Create Flag Configuration

{ 
"flags": {
  "new-feature": {
    "state": "ENABLED",
    "variants": {
      "on": true,
      "off": false
     },
    "defaultVariant": "off"
   }
  }
 }

Step 2: Create Namespace & ConfigMap

Create a dedicated namespace for Flagd and store the feature flag configuration in a ConfigMap.

kubectl create namespace flagd

kubectl create configmap flagd-config \
  --from-file=flags.json \
  -n flagd

Verify:

kubectl get configmap -n flagd

kubectl get configmap flagd-config -n flagd -o yaml

Step 3: Deploy Flagd

kubectl create deployment flagd \
  --image=ghcr.io/open-feature/flagd:latest \
  -n flagd-tests

Step 4: Verify Pods and Service

kubectl get pods -n flagd-tests

kubectl get svc -n flagd-tests

Step 5: Port Forward & Test API

This command forwards the Flagd service running inside the Kubernetes cluster to your local machine,
allowing you to test it using localhost.

kubectl port-forward svc/flagd 8013:8013 -n flagd-tests

Step 6: Evaluate Flag (Simulated App Call)

source flagd-env/bin/activate

python app.py

Step 7: Dynamic Update (No Restart)

Edit the ConfigMap, change the flag state, save the file, and rerun the application.

kubectl edit configmap flagd-config -n flagd-tests

python app.py

Step 8: Scale for High Availability

kubectl scale deployment flagd --replicas=3 -n flagd-tests

kubectl get pods -n flagd-tests

Step 9: Logs

kubectl logs -n flagd-tests -l app=flagd

Troubleshooting (Real Issue Encountered)

Issue: Kubernetes Not Reachable

Error:

Unable to connect to the server:
dial tcp 192.168.x.x:8443: no route to host

Root Cause

Fix

minikube status

minikube start

kubectl get nodes

Before vs After Using Flagd

Before After
Feature release tied to deployment Runtime feature control
Slow rollback Instant toggle
No user segmentation Canary and A/B supported

Production Use Cases

  • Canary deployment (gradual rollout)
  • A/B testing (variant comparison)
  • Kill switch (instant disable)
  • Environment gating

Observability

Metrics

  • flag_evaluations_total
  • flag_errors_total
  • evaluation_latency_seconds

Stack

  • Prometheus
  • Grafana

Screenshot: Grafana dashboard (optional)

Best Practices

  • Assign an owner and expiry date to each flag
  • Use clear naming conventions
  • Always define defaultVariant
  • Avoid long-lived flags
  • Use GitOps for flag management
  • Monitor evaluation metrics

Common Mistakes

  • Keeping flags forever
  • No monitoring
  • Poor naming conventions
  • Using flags for static configurations

Problems Solved by Flagd

Problem Solution
Risky releases Gradual rollout
Slow rollback Instant toggle
No experimentation A/B testing
Tight coupling Runtime control

Conclusion

Flagd enables safe and flexible feature releases by decoupling deployment from exposure. With Kubernetes and proper observability, teams can adopt progressive delivery with reduced risk and better control.

References

Related Solutions