{"id":31356,"date":"2026-05-19T15:00:58","date_gmt":"2026-05-19T09:30:58","guid":{"rendered":"https:\/\/opstree.com\/blog\/?p=31356"},"modified":"2026-05-19T17:59:59","modified_gmt":"2026-05-19T12:29:59","slug":"kubernetes-production-guide-architectur","status":"publish","type":"post","link":"https:\/\/opstree.com\/blog\/kubernetes-production-guide-architectur\/","title":{"rendered":"Flagd on Kubernetes: Complete Production Guide to Architecture, Implementation and Troubleshooting"},"content":{"rendered":"<div style=\"background: #f8fafc; padding: 18px; border: 1px solid #e2e8f0; border-radius: 6px; font-family: Inter, Arial, sans-serif; margin: 20px 0;\">\n<h2 style=\"margin-top: 0; font-size: 18px;\">Table of Contents<\/h2>\n<ol style=\"margin: 0; padding-left: 18px; line-height: 1.7; font-size: 14px;\">\n<li><a style=\"text-decoration: none; color: #2563eb;\" href=\"#introduction\">Introduction<\/a><\/li>\n<li><a style=\"text-decoration: none; color: #2563eb;\" href=\"#problem-statement\">Problem Statement<\/a><\/li>\n<li><a style=\"text-decoration: none; color: #2563eb;\" href=\"#what-is-flagd\">What is Flagd?<\/a><\/li>\n<li><a style=\"text-decoration: none; color: #2563eb;\" href=\"#why-flagd\">Why Flagd<\/a><\/li>\n<li><a style=\"text-decoration: none; color: #2563eb;\" href=\"#architecture-overview\">Architecture Overview<\/a><\/li>\n<li><a style=\"text-decoration: none; color: #2563eb;\" href=\"#infrastructure-design-production\">Infrastructure Design (Production)<\/a><\/li>\n<li><a style=\"text-decoration: none; color: #2563eb;\" href=\"#step-by-step-implementation\">Step-by-Step Implementation (with Screenshots)<\/a><\/li>\n<li><a style=\"text-decoration: none; color: #2563eb;\" href=\"#troubleshooting\">Troubleshooting (Real Issue Encountered)<\/a><\/li>\n<li><a style=\"text-decoration: none; color: #2563eb;\" href=\"#before-vs-after-using-flagd\">Before vs After Using Flagd<\/a><\/li>\n<li><a style=\"text-decoration: none; color: #2563eb;\" href=\"#production-use-cases\">Production Use Cases<\/a><\/li>\n<li><a style=\"text-decoration: none; color: #2563eb;\" href=\"#observability\">Observability<\/a><\/li>\n<li><a style=\"text-decoration: none; color: #2563eb;\" href=\"#best-practices\">Best Practices<\/a><\/li>\n<li><a style=\"text-decoration: none; color: #2563eb;\" href=\"#common-mistakes\">Common Mistakes<\/a><\/li>\n<li><a style=\"text-decoration: none; color: #2563eb;\" href=\"#problems-solved-by-flagd\">Problems Solved by Flagd<\/a><\/li>\n<li><a style=\"text-decoration: none; color: #2563eb;\" href=\"#conclusion\">Conclusion<\/a><\/li>\n<\/ol>\n<\/div>\n<h2 id=\"introduction\">Introduction<\/h2>\n<p>Frequent deployments increase delivery speed but also raise the risk of exposing unstable features.<br \/>\nTraditional approaches couple deployment with release, making rollback slow and risky.<\/p>\n<p>Feature flagging decouples deployment from release by controlling feature exposure at runtime. This guide presents a production-focused implementation of Flagd on <a href=\"https:\/\/opstree.com\/blog\/a-quick-overview-of-kubernetes-architecture\/\" target=\"_blank\" rel=\"noopener\">Kubernetes with architecture<\/a>, workflow, infrastructure design, hands-on steps, and real troubleshooting.<\/p>\n<h2 id=\"problem-statement\">Problem Statement<\/h2>\n<p>Common production challenges:<\/p>\n<ul>\n<li>All-or-nothing releases (high blast radius)<\/li>\n<li>Slow <a href=\"https:\/\/opstree.com\/blog\/ecs-rollback-with-jenkins-active-choice-parameter\/\" target=\"_blank\" rel=\"noopener\">rollback<\/a> (requires redeploy)<\/li>\n<li>Limited experimentation (no safe canary\/A-B)<\/li>\n<li>Tight coupling (deploy = release)<\/li>\n<li>Low visibility into feature decisions<\/li>\n<\/ul>\n<p><strong>Goal:<\/strong> Safe rollouts, instant rollback, controlled experimentation, and observability.<\/p>\n<h2 id=\"what-is-flagd\">What is Flagd?<\/h2>\n<p>Flagd is an <a href=\"https:\/\/opstree.com\/opstree-open-source\/\" target=\"_blank\" rel=\"noopener\">open-source<\/a>, stateless feature flag evaluation service aligned with the OpenFeature specification.<\/p>\n<h5>Capabilities<\/h5>\n<ul>\n<li>Boolean and multivariate flags<\/li>\n<li>Targeting via context (user, region, headers)<\/li>\n<li>Canary rollouts and A\/B testing<\/li>\n<li>Instant kill switch<\/li>\n<\/ul>\n<h2 id=\"why-flagd\">Why Flagd<\/h2>\n<ul>\n<li>Kubernetes-native<\/li>\n<li>Horizontally scalable (stateless)<\/li>\n<li>Works with OpenFeature SDKs<\/li>\n<\/ul>\n<h2 id=\"architecture-overview\">Architecture Overview<\/h2>\n<p>The following diagram shows how Flagd integrates into Kubernetes and controls feature behavior at runtime.<\/p>\n<p><strong>Figure:<\/strong> Flagd architecture with ConfigMap-based configuration and application pods<\/p>\n<h5>Components<\/h5>\n<ul>\n<li>Developer\/Git: defines and versions flags<\/li>\n<li>ConfigMap\/CRD: distributes configuration<\/li>\n<li>Flagd Deployment: evaluates flags (HA)<\/li>\n<li>Service: stable endpoint<\/li>\n<li>Applications: query via OpenFeature<\/li>\n<li><a href=\"https:\/\/opstree.com\/blog\/mttr-is-high-because-your-observability-is-fragmented\/\" target=\"_blank\" rel=\"noopener\">Observability<\/a>: Prometheus\/Grafana<\/li>\n<\/ul>\n<h3>How The Flow Works<\/h3>\n<ol>\n<li>Developer defines flags in JSON\/YAML<\/li>\n<li><a href=\"https:\/\/opstree.com\/blog\/the-role-of-rbac-in-securing-your-ci-cd-pipeline\/\" target=\"_blank\" rel=\"noopener\">CI\/CD<\/a> or <a href=\"https:\/\/opstree.com\/blog\/why-gitops-is-so-exciting\/\" target=\"_blank\" rel=\"noopener\">GitOps<\/a> applies ConfigMap\/CRD<\/li>\n<li>Flagd watches and reloads changes dynamically<\/li>\n<li>Application queries Flagd via SDK<\/li>\n<li>Flagd evaluates and returns variant<\/li>\n<li>Application behavior changes without redeploy<\/li>\n<\/ol>\n<h2 id=\"infrastructure-design-production\">Infrastructure Design (Production)<\/h2>\n<h5>Core Resources<\/h5>\n<ul>\n<li>Deployment (flagd) with replicas \u2265 2<\/li>\n<li>Service (ClusterIP)<\/li>\n<li>ConfigMap\/CRD<\/li>\n<li>RBAC (read-only)<\/li>\n<li>HPA (optional)<\/li>\n<\/ul>\n<h5>Deployment Models<\/h5>\n<ul>\n<li>Centralized service (simpler)<\/li>\n<li>Sidecar (low latency)<\/li>\n<li>DaemonSet(node-local)<\/li>\n<\/ul>\n<h5>High Availability<\/h5>\n<ul>\n<li>Multiple replicas<\/li>\n<li>Probes and rolling updates<\/li>\n<\/ul>\n<h5>Networking &amp; Security<\/h5>\n<ul>\n<li>Internal DNS via ClusterIP<\/li>\n<li>mTLS with service mesh (optional)<\/li>\n<li>NetworkPolicies<\/li>\n<li>No secrets in flags<\/li>\n<\/ul>\n<h2 id=\"step-by-step-implementation\">Step-by-Step Implementation (with Screenshots)<\/h2>\n<h3>Step 0: Verify Cluster<\/h3>\n<p>kubectl get nodes<\/p>\n<p><strong>Expected:<\/strong> node in Ready state.<\/p>\n<p><strong>Screenshot:<\/strong> kubectl get nodes output<\/p>\n<h3>Step 1: Create Flag Configuration<\/h3>\n<pre style=\"background: #0f172a; color: #e2e8f0; padding: 16px; border-radius: 8px; overflow-x: auto; font-size: 14px; line-height: 1.6;\">{ \r\n\"flags\": {\r\n  \"new-feature\": {\r\n    \"state\": \"ENABLED\",\r\n    \"variants\": {\r\n      \"on\": true,\r\n      \"off\": false\r\n     },\r\n    \"defaultVariant\": \"off\"\r\n   }\r\n  }\r\n }\r\n<\/pre>\n<h3 id=\"step-2-create-namespace-configmap\">Step 2: Create Namespace &amp; ConfigMap<\/h3>\n<p>Create a dedicated namespace for Flagd and store the feature flag configuration in a ConfigMap.<\/p>\n<pre style=\"background: #0f172a; color: #e2e8f0; padding: 16px; border-radius: 8px; overflow-x: auto; font-size: 14px; line-height: 1.6;\">kubectl create namespace flagd\r\n\r\nkubectl create configmap flagd-config \\\r\n  --from-file=flags.json \\\r\n  -n flagd\r\n<\/pre>\n<p><strong>Verify:<\/strong><\/p>\n<pre style=\"background: #0f172a; color: #e2e8f0; padding: 16px; border-radius: 8px; overflow-x: auto; font-size: 14px; line-height: 1.6;\">kubectl get configmap -n flagd\r\n\r\nkubectl get configmap flagd-config -n flagd -o yaml\r\n<\/pre>\n<h3 id=\"step-3-deploy-flagd\">Step 3: Deploy Flagd<\/h3>\n<pre style=\"background: #0f172a; color: #e2e8f0; padding: 16px; border-radius: 8px; overflow-x: auto; font-size: 14px; line-height: 1.6;\">kubectl create deployment flagd \\\r\n  --image=ghcr.io\/open-feature\/flagd:latest \\\r\n  -n flagd-tests\r\n<\/pre>\n<h3 id=\"step-4-verify-pods-service\">Step 4: Verify Pods and Service<\/h3>\n<pre style=\"background: #0f172a; color: #e2e8f0; padding: 16px; border-radius: 8px; overflow-x: auto; font-size: 14px; line-height: 1.6;\">kubectl get pods -n flagd-tests\r\n\r\nkubectl get svc -n flagd-tests\r\n<\/pre>\n<h3 id=\"step-5-port-forward-test-api\">Step 5: Port Forward &amp; Test API<\/h3>\n<p>This command forwards the Flagd service running inside the <a href=\"https:\/\/opstree.com\/blog\/manage-your-kubernetes-cluster-much-more-with-buildpiper\/\" target=\"_blank\" rel=\"noopener\">Kubernetes cluster<\/a> to your local machine,<br \/>\nallowing you to test it using <code>localhost<\/code>.<\/p>\n<pre style=\"background: #0f172a; color: #e2e8f0; padding: 16px; border-radius: 8px; overflow-x: auto; font-size: 14px; line-height: 1.6;\">kubectl port-forward svc\/flagd 8013:8013 -n flagd-tests\r\n<\/pre>\n<h3 id=\"step-6-evaluate-flag\">Step 6: Evaluate Flag (Simulated App Call)<\/h3>\n<pre style=\"background: #0f172a; color: #e2e8f0; padding: 16px; border-radius: 8px; overflow-x: auto; font-size: 14px; line-height: 1.6;\">source flagd-env\/bin\/activate\r\n\r\npython app.py\r\n<\/pre>\n<h3 id=\"step-7-dynamic-update\">Step 7: Dynamic Update (No Restart)<\/h3>\n<p>Edit the ConfigMap, change the flag state, save the file, and rerun the application.<\/p>\n<pre style=\"background: #0f172a; color: #e2e8f0; padding: 16px; border-radius: 8px; overflow-x: auto; font-size: 14px; line-height: 1.6;\">kubectl edit configmap flagd-config -n flagd-tests\r\n\r\npython app.py\r\n<\/pre>\n<h3 id=\"step-8-scale-high-availability\">Step 8: Scale for High Availability<\/h3>\n<pre style=\"background: #0f172a; color: #e2e8f0; padding: 16px; border-radius: 8px; overflow-x: auto; font-size: 14px; line-height: 1.6;\">kubectl scale deployment flagd --replicas=3 -n flagd-tests\r\n\r\nkubectl get pods -n flagd-tests\r\n<\/pre>\n<h3 id=\"step-9-view-logs\">Step 9: Logs<\/h3>\n<pre style=\"background: #0f172a; color: #e2e8f0; padding: 16px; border-radius: 8px; overflow-x: auto; font-size: 14px; line-height: 1.6;\">kubectl logs -n flagd-tests -l app=flagd\r\n<\/pre>\n<h2 id=\"troubleshooting\">Troubleshooting (Real Issue Encountered)<\/h2>\n<h3 id=\"issue-kubernetes-not-reachable\">Issue: Kubernetes Not Reachable<\/h3>\n<p><strong>Error:<\/strong><\/p>\n<pre style=\"background: #0f172a; color: #e2e8f0; padding: 16px; border-radius: 8px; overflow-x: auto; font-size: 14px; line-height: 1.6;\">Unable to connect to the server:\r\ndial tcp 192.168.x.x:8443: no route to host\r\n<\/pre>\n<h3 id=\"root-cause\">Root Cause<\/h3>\n<ul>\n<li>Minikube cluster stopped<\/li>\n<li><a href=\"https:\/\/opstree.com\/blog\/what-is-kubernetes-api\/\" target=\"_blank\" rel=\"noopener\">Kubernetes API<\/a> unreachable<\/li>\n<\/ul>\n<h3 id=\"fix\">Fix<\/h3>\n<pre style=\"background: #0f172a; color: #e2e8f0; padding: 16px; border-radius: 8px; overflow-x: auto; font-size: 14px; line-height: 1.6;\">minikube status\r\n\r\nminikube start\r\n\r\nkubectl get nodes\r\n<\/pre>\n<h2 id=\"before-vs-after-using-flagd\">Before vs After Using Flagd<\/h2>\n<div style=\"overflow-x: auto; width: 100%; margin: 20px 0;\">\n<table style=\"width: 100%; min-width: 600px; border-collapse: collapse; border: 1px solid #e5e7eb; font-size: 14px; font-family: Inter, Arial, sans-serif;\">\n<thead>\n<tr style=\"background: #f8fafc;\">\n<th style=\"border: 1px solid #e5e7eb; padding: 12px; text-align: left;\">Before<\/th>\n<th style=\"border: 1px solid #e5e7eb; padding: 12px; text-align: left;\">After<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td style=\"border: 1px solid #e5e7eb; padding: 12px;\">Feature release tied to deployment<\/td>\n<td style=\"border: 1px solid #e5e7eb; padding: 12px;\">Runtime feature control<\/td>\n<\/tr>\n<tr>\n<td style=\"border: 1px solid #e5e7eb; padding: 12px;\">Slow rollback<\/td>\n<td style=\"border: 1px solid #e5e7eb; padding: 12px;\">Instant toggle<\/td>\n<\/tr>\n<tr>\n<td style=\"border: 1px solid #e5e7eb; padding: 12px;\">No user segmentation<\/td>\n<td style=\"border: 1px solid #e5e7eb; padding: 12px;\">Canary and A\/B supported<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<h2 id=\"production-use-cases\">Production Use Cases<\/h2>\n<ul>\n<li>Canary deployment (gradual rollout)<\/li>\n<li>A\/B testing (variant comparison)<\/li>\n<li>Kill switch (instant disable)<\/li>\n<li>Environment gating<\/li>\n<\/ul>\n<h2 id=\"observability\">Observability<\/h2>\n<h3 id=\"metrics\">Metrics<\/h3>\n<ul>\n<li><code>flag_evaluations_total<\/code><\/li>\n<li><code>flag_errors_total<\/code><\/li>\n<li><code>evaluation_latency_seconds<\/code><\/li>\n<\/ul>\n<h3 id=\"observability-stack\">Stack<\/h3>\n<ul>\n<li>Prometheus<\/li>\n<li>Grafana<\/li>\n<li><strong>Screenshot:<\/strong> Grafana dashboard (optional)<\/li>\n<\/ul>\n<h2 id=\"best-practices\">Best Practices<\/h2>\n<ul>\n<li>Assign an owner and expiry date to each flag<\/li>\n<li>Use clear naming conventions<\/li>\n<li>Always define <code>defaultVariant<\/code><\/li>\n<li>Avoid long-lived flags<\/li>\n<li>Use GitOps for flag management<\/li>\n<li>Monitor evaluation metrics<\/li>\n<\/ul>\n<h2 id=\"common-mistakes\">Common Mistakes<\/h2>\n<ul>\n<li>Keeping flags forever<\/li>\n<li>No monitoring<\/li>\n<li>Poor naming conventions<\/li>\n<li>Using flags for static configurations<\/li>\n<\/ul>\n<h2 id=\"problems-solved-by-flagd\">Problems Solved by Flagd<\/h2>\n<div style=\"overflow-x: auto; width: 100%; margin: 20px 0;\">\n<table style=\"width: 100%; min-width: 600px; border-collapse: collapse; border: 1px solid #e5e7eb; font-size: 14px; font-family: Inter, Arial, sans-serif; line-height: 1.6;\">\n<thead>\n<tr style=\"background: #f8fafc;\">\n<th style=\"border: 1px solid #e5e7eb; padding: 12px; text-align: left;\">Problem<\/th>\n<th style=\"border: 1px solid #e5e7eb; padding: 12px; text-align: left;\">Solution<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td style=\"border: 1px solid #e5e7eb; padding: 12px;\">Risky releases<\/td>\n<td style=\"border: 1px solid #e5e7eb; padding: 12px;\">Gradual rollout<\/td>\n<\/tr>\n<tr>\n<td style=\"border: 1px solid #e5e7eb; padding: 12px;\">Slow rollback<\/td>\n<td style=\"border: 1px solid #e5e7eb; padding: 12px;\">Instant toggle<\/td>\n<\/tr>\n<tr>\n<td style=\"border: 1px solid #e5e7eb; padding: 12px;\">No experimentation<\/td>\n<td style=\"border: 1px solid #e5e7eb; padding: 12px;\">A\/B testing<\/td>\n<\/tr>\n<tr>\n<td style=\"border: 1px solid #e5e7eb; padding: 12px;\">Tight coupling<\/td>\n<td style=\"border: 1px solid #e5e7eb; padding: 12px;\">Runtime control<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<h2 id=\"conclusion\">Conclusion<\/h2>\n<p>Flagd enables safe and flexible feature releases by decoupling deployment from exposure. With Kubernetes and proper observability, teams can adopt progressive delivery with reduced risk and better control.<\/p>\n<h2>References<\/h2>\n<ul>\n<li><a href=\"https:\/\/openfeature.dev\/\" target=\"_blank\" rel=\"noopener\">OpenFeature<\/a><\/li>\n<li><a href=\"https:\/\/flagd.dev\/\" target=\"_blank\" rel=\"noopener\">Flagd Docs<\/a><\/li>\n<li><a href=\"https:\/\/kubernetes.io\/docs\/home\/\" target=\"_blank\" rel=\"noopener\">Kubernetes Docs<\/a><\/li>\n<\/ul>\n<h3>Related Solutions<\/h3>\n<ul>\n<li><a href=\"https:\/\/opstree.com\/aws-consulting-services\/\" target=\"_blank\" rel=\"noopener\">AWS Consulting Services<\/a><\/li>\n<li><a href=\"https:\/\/opstree.com\/solutions\/observability-and-system-reliability\/\" target=\"_blank\" rel=\"noopener\">unified observability solution<\/a><\/li>\n<li><a href=\"https:\/\/opstree.com\/services\/devops-and-devsecops-services\/\" target=\"_blank\" rel=\"noopener\">DevSecOps Automation Services<\/a><\/li>\n<\/ul>\n<p><!-- notionvc: f1bf90e7-1bc8-4b9d-b07a-25bbf4c3c788 --><\/p>\n<p><!-- notionvc: c0dadb68-c99d-4594-aa39-12b42bf2cc0e --><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Table of Contents Introduction Problem Statement What is Flagd? Why Flagd Architecture Overview Infrastructure Design (Production) Step-by-Step Implementation (with Screenshots) Troubleshooting (Real Issue Encountered) Before vs After Using Flagd Production Use Cases Observability Best Practices Common Mistakes Problems Solved by Flagd Conclusion Introduction Frequent deployments increase delivery speed but also raise the risk of exposing [&hellip;]<\/p>\n","protected":false},"author":244582730,"featured_media":31362,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_coblocks_attr":"","_coblocks_dimensions":"","_coblocks_responsive_height":"","_coblocks_accordion_ie_support":"","jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","enabled":false},"version":2}},"categories":[768739351],"tags":[741475795,768739634,502915258,553793656,343865,768739612],"blocksy_meta":[],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"https:\/\/opstree.com\/blog\/wp-content\/uploads\/2026\/05\/Untitled-design-30.png","jetpack_likes_enabled":true,"jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/pfDBOm-89K","jetpack-related-posts":[],"_links":{"self":[{"href":"https:\/\/opstree.com\/blog\/wp-json\/wp\/v2\/posts\/31356"}],"collection":[{"href":"https:\/\/opstree.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/opstree.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/opstree.com\/blog\/wp-json\/wp\/v2\/users\/244582730"}],"replies":[{"embeddable":true,"href":"https:\/\/opstree.com\/blog\/wp-json\/wp\/v2\/comments?post=31356"}],"version-history":[{"count":9,"href":"https:\/\/opstree.com\/blog\/wp-json\/wp\/v2\/posts\/31356\/revisions"}],"predecessor-version":[{"id":31375,"href":"https:\/\/opstree.com\/blog\/wp-json\/wp\/v2\/posts\/31356\/revisions\/31375"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/opstree.com\/blog\/wp-json\/wp\/v2\/media\/31362"}],"wp:attachment":[{"href":"https:\/\/opstree.com\/blog\/wp-json\/wp\/v2\/media?parent=31356"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/opstree.com\/blog\/wp-json\/wp\/v2\/categories?post=31356"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/opstree.com\/blog\/wp-json\/wp\/v2\/tags?post=31356"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}