Istio Circuit Breaker – When Failure is a Better Option

Istio-Circuit-Breaker

What is a Circuit Breaker?

We often hear that electronic devices may cease to function due to a circuit break. Essentially, a circuit breaker is an electronic switch designed to safeguard an electronic circuit from damage caused by overcurrent, overload, or short circuit. Its primary function is to interrupt the flow of current after detecting a fault through protective measures.

What is a Circuit Breaker in Microservices Architecture?

As we discussed, the concept of a circuit breaker in terms of electronic devices. Similarly, we need to implement a circuit breaker in microservices architecture. The question arises: why do we need a circuit breaker in microservices architecture and how can we implement it? Let’s illustrate with an example.

In the above diagram, we can observe several microservices and their dependencies on each other. Service A depends on service B, and service B depends on services F and E. Service F relies on a third-party application over which we have no control.

Let’s consider that service A provides the following features:

  • Current account balance
  • Credit card details 
  • Investment details 

Normally, the third-party application has an average response time of 1 second. However, for example purposes, let’s assume that its response time increases to 5 seconds due to issues on their end. Consequently, the requests from service F to the third-party application increase from 1 second to 5 seconds. This slowdown affects service B, causing it to hold requests in wait for a response. As a result, service A also experiences delays as it waits for service B to respond. Ultimately, this cascading impact affects other features of service A, when ideally, only the credit card feature should have been impacted because service B handles the feature for credit card details.

Let’s Implement the Circuit breaker

To address such scenarios, we can implement circuit breakers. Essentially, we can adhere to the following two principles:

  • Remote communication should fail quickly in the event of an issue, rather than consuming resources while waiting for a response that may never arrive.
  • If a dependency consistently fails, it is better to halt further requests until the dependency has recovered.

Now, suppose we have a deadline timeout. In the event of any issue at the end of the third-party application, such as an increase in response time from 1 second to 5 seconds, our deadline timeout will trigger the connection to fail quickly. Consequently, only the feature of service A that is dependent on the third-party application will be affected. This ensures that there is no resource utilisation while waiting for a response from the third-party app, thereby preventing any adverse effects on other features of service A.

How to Achieve Circuit Breaker With Istio?

Istio is a service mesh that integrates essential network services into the infrastructure, providing features such as service discovery and policy enforcement to regulate communication among services within the mesh.

One of the advantages of Istio is its ability to handle failures by implementing circuit breakers. Networks are inherently unreliable, and Istio’s circuit breaker feature helps mitigate the cascading effects of failures.

Let’s consider an example within my organization where we’ve implemented Istio’s circuit breaker. Our microservices interact with several third-party applications external to our organization. Since we lack control over these third-party applications, any outages they experience directly impact our microservices. When these external applications encounter issues, our dependent microservices face delays or fail to receive responses, resulting in a buildup of requests on our pods. This situation leads to a cascading impact on our entire application.

To address this issue, we sought to implement Istio’s circuit breaker feature. However, upon reviewing Istio’s documentation and related blogs, we discovered that they typically recommend tripping the circuit breaker for the entire domain, which isn’t suitable for our scenario. In this blog, we’ll explain how we implemented API-based circuit breakers to address our specific needs.

Istio Circuit Breaker Components

  • ServiceEntry 
  • Virtual Service 
  • Destination Rule

Walkthrough With a Sample Application

Let’s suppose we have two source applications (our microservices) and a destination application (a third-party application) with the following APIs.

  • /dest/200
  • /dest/500
  • /dest/504
  • /dest/502

Similarly, the source application has corresponding APIs to interact with the destination application APIs.

For our demonstration, we will continuously call the /src/200 API. Meanwhile, we will begin calling the /src/500 APIs to trigger the circuit breaker specifically for the /src/500 endpoint.

As we can see we are continuously hitting the source API 200 and getting response 200 back.

Here, we observe that we are making requests to the /src/500 endpoint, and we are receiving 500 responses in return. Once the number of consecutive failures surpasses the maximum threshold set by Istio’s circuit breaker, it activates, injecting 503 status codes and halting further calls to the destination application for a specified duration.


Here, we can observe in the Istio proxy log that Istio has injected the UH – No Healthy Upstream flag, in addition to the 503 error, because the Istio circuit breaker has been triggered due to breaching the threshold of consecutive errors.

[2022-04-18T01:48:28.608Z] "GET /dest/500 HTTP/1.1" 503 UH "-" "-" 0 19 0 - "-" "Go-http-client/1.1" "4b6f0a59-e4bc-9200-a1d7-f875c2f063bc" "dest-app.dsdev." "-" - - 54.182.0.45:80 10.0.31.122:33336 - dest-500

Here, we are making parallel calls to the source APIs, both the /src/200 and /src/500 endpoints. In the second part of the screenshot, we observe responses with the 500 status code. Once the circuit breaker is triggered, Istio injects 503 response codes, indicating service unavailable, and ceases further requests to the destination application.

apiVersion: networking.istio.io/v1alpha3
kind: ServiceEntry
metadata:
 name: dest-app
 namespace: default
spec:
 hosts:
 - dest-app.dsdev.xyx.com
 ports:
   - number: 80
     name: http-port
     protocol: HTTP
   - number: 443
     name: https-port
     protocol: HTTPS
 resolution: DNS
 location: MESH_EXTERNAL

In the snippet of code above, we are declaring the ServiceEntry for the domain of the destination application. For additional details about ServiceEntry, you can refer to the documentation. here.

apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
 name: dest-app
 namespace: default
spec:
 hosts:
   - dest-app.dsdev.xyx.com
 http:
 - name: "dest-200"
   match:
   - uri:
       prefix: /dest/200
     port: 80
   route:
   - destination:
       host: dest-app.dsdev.xyx.com
       port:
         number: 443
       subset: health200
 - name: "dest-500"
   match:
   - uri:
       prefix: /dest/500
     port: 80
   route:
   - destination:
       host: dest-app.dsdev.xyx.com
       port:
         number: 443
       subset: error500
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
 name: dest-app
 namespace: default
spec:
 host: dest-app.dsdev.xyx.com
 subsets:
 - name: health200
   trafficPolicy:
     tls:
       mode: SIMPLE
       sni: dest-app.dsdev.xyx.com
     outlierDetection:
       baseEjectionTime: 20s
       consecutiveErrors: 200
       interval: 20s
       maxEjectionPercent: 100
 - name: error500
   trafficPolicy:
     loadBalancer:
         simple: ROUND_ROBIN
     connectionPool:
       tcp:
         maxConnections: 1
         connectTimeout: 1000ms
     tls:
       mode: SIMPLE
       sni: dest-app.dsdev.xyx.com
     outlierDetection:
       baseEjectionTime: 1m
       consecutiveGatewayErrors: 3
       interval: 20s
       maxEjectionPercent: 100

Conclusion

When summarizing this blog, we observe that sometimes failure can serve as a viable option to manage during disasters, ensuring the continuous operation of your application even if certain features are not functioning as expected. With the assistance of Istio, it offers a convenient way to implement this without necessitating any changes to the code base.

Let us know in the comment section if you have any questions or feedback.

Before you go:

Clap if you liked it 👏, comment and share this article to reach more community 🧞.

Blog Pundits: Sanjeev Pandey and Sandeep Rawat

OpsTree is an End-to-End DevOps Solution Provider.

Connect with Us

Author: Ashwani Singh

Senior DevOps | AWS Architect | Docker | Python | Bash | ElasticSearch | MongoDB | CICD | Automation | Microservices

Leave a Reply