Before diving into what Circuit Breakers really are, let's understand why it exist in the first place. Any pattern exists to solve a particular problem. The goal of this blog will be go through problem statement first and then dive into solution. It will be a long ride. If you are busy and just want to see the code, feel free to jump to the last section.
In any complex software architecture, its pretty common for systems to make remote calls to software running in different process, probably on different machines across a network.
Let's assume there are two services Service 1 and Service 2. Both services are hosted on a separate hardware and they are communicating through network. Let's take a look at the setup visually 🦊
In an ideal world, this setup works as expected. One service sends a request to another service and gets a response back. Everything works great, and your software has 100% 🤩 availability
But in reality, software systems don't work like this. There will be failures. These failures can be due to transient errors, network partitions, etc. Now let's see what happens when our Service 2 becomes unhealthy and stops responding. Now if we try to make a request from service 1. There is no response back. 😞
The circuit breaker pattern can prevent a service from repeatedly trying to execute an operation (in our case, calling another service) that's likely to fail, wasting CPU cycles and resources.
Why can't we just keep retrying and hope the service will return the response?
In electrical terms, A circuit breaker is a safety device designed to interrupt the flow of electrical current when abnormal conditions are detected and fail first before your house burns down. 🔥
We can apply the same technique to software by wrapping dangerous operations with a component that can circumvent calls when the system is not healthy.
A circuit breaker can be in one of the three states.
CLOSED : In the normal “closed” state, the circuit breaker executes operations as usual. These can be calls out to another system, or they can be internal operations that are subject to timeout or other execution failure
OPEN : If the call succeeds, nothing extraordinary happens. If it fails, however, the circuit breaker makes a note of the failure. Once the number of failures (or frequency of failures, in more sophisticated cases) exceeds a threshold, the circuit breaker trips and “opens” the circuit.
HALF-OPEN: When the circuit is “open”, calls to the circuit breaker fail immediately, without any attempt to execute the real operation. After a suitable amount of time, the circuit breaker decides that the operation has a chance of succeeding, so it goes into the “half-open” state. In this state, the next call to the circuit breaker is allowed to execute the operation. Should the call succeed, the circuit breaker resets and returns to the “closed” state
Let's take a look at it visually 🤖
- Start the servers.
- Double click on Server 2 to make it unhealthy
- Start making calls
- After few failures, the circuit breaker will transition to OPEN state and all calls will be blocked at Circuit Breaker
- After 1 seconds, the Circuit breaker will transition to HALF_OPEN state and the calls will be allowed to Service 2.
- Now, if we double-click again on Service 2 to make it healthy. It will start working and circuit breaker will transition to CLOSED state.
- For the sake of demo, I have reduced the transition time to 1 second but it depends on the service and should be chosen wisely.
Implementation
Now that we have understood the concept, Let's move on and implement it. While reading and collecting content for this blog, I found a really popular Opossum A Node.js-based circuit breaker library
So, I went ahead and reverse-engineered the code and picked up the crucial features in order to keep it simple. Feel free to clone and extend it or add missing features.
Let's go through requirements first:
- As a consumer of library, I should be able to wrap or call my dangerous functions with circuit breaker.
- The circuit breaker should emit events in case of failures and state transitions. This is important for tracking and emitting metrics.
Let's define a few of the parameters that the circuit breaker will accept.
1type CircuitBreakerParameters = { 2 timeout?: number; // The default timeout if the call or action is taking more time than defined timeout 3 failureThresholdPercentage?: number; // Failure threshold percentage. Let's say if 50% of total calls failed 4 halfOpenTimeout?:number; // The time interval after which the CIRCUIT will transition to half-open state 5 successThreshold?: number; // When in half-open state, how many calls should the circuit breaker check before transitioning to CLOSED state 6} 7
Also, as discussed above, the circuit breaker can be in one the three states
1 enum State { 2 OPEN = "OPEN", 3 CLOSED = "CLOSED", 4 HALF_OPEN = "HALF-OPEN", 5} 6
Since, we have a requirement of emitting events on state transitions and failures, we can extend our Circuit Breaker class from EventEmitter which is available in node js.
1class CircuitBreaker extends EventEmitter { 2 action: Function; // Actual function which needs to be run 3 options: CircuitBreakerParameters; 4 state: State; 5 private semaphore: Semaphore; // Ignore: This for checking the concurrent access to the action being executed 6 private failureCount: number; 7 private successCount: number; 8 private totalCalls: number; // Track total calls to calculate failure rate 9 private lastFailureTime: number | null; 10 11 12 constructor(action: Function, options: CircuitBreakerParameters = {}) { 13 super(); 14 this.action = action; 15 this.options = { 16 capacity: Number.isInteger(options.capacity) 17 ? options.capacity 18 : Number.MAX_SAFE_INTEGER, 19 timeout: options.timeout || 1000, 20 failureThresholdPercentage: options.failureThresholdPercentage || 50, 21 successThreshold: options.successThreshold || 2, 22 halfOpenTimeout: options.halfOpenTimeout || 5000, 23 }; 24 // Initially the circuit breaker will be in CLOSED state 25 this.state = State.CLOSED; 26 this.semaphore = new Semaphore(this.options.capacity); 27 this.failureCount = 0; 28 this.successCount = 0; 29 this.totalCalls = 0; 30 this.lastFailureTime = null; 31 32 33 if (!action) { 34 throw new CircuitBreakerError( 35 "No action provided. Please provide something to execute", 36 ); 37 } 38 } 39 40 41
Now we will focus on the actual core part.
1 close() { 2 // In case of transitioning to CLOSED state, we want to reset all counters. So that new failures can be tracked. 3 if (this.state !== State.CLOSED) { 4 this.state = State.CLOSED; 5 this.failureCount = 0; 6 this.successCount = 0; 7 this.totalCalls = 0; 8 this.lastFailureTime = null; 9 // Emitting CLOSED event. Client can use this event and publish metrics 10 this.emit("closed"); 11 } 12 } 13 14 open() { 15 if (this.state !== State.OPEN) { 16 this.state = State.OPEN; 17 this.lastFailureTime = Date.now(); 18 this.emit("open"); 19 // Scheduling transition to HALF-OPEN state after sometime 20 setTimeout(() => this.halfOpen(), this.options.halfOpenTimeout!); 21 } 22 } 23 24 fire<T extends any[]>(...args: T) { 25 return this.call(this.action, ...args); 26 } 27 28 async call(context: typeof this.action, ...rest: unknown[]) { 29 const args = rest.slice(); 30 31 // Event for tracking the actual function execution 32 this.emit("fire", args); 33 34 if (this.state === State.OPEN) { 35 throw new CircuitBreakerError("CircuitBreaker is open"); 36 } 37 38 if (this.state === State.HALF_OPEN && !this.semaphore.test()) { 39 throw new CircuitBreakerError( 40 "CircuitBreaker is half-open and at capacity", 41 ); 42 } 43 44 this.totalCalls++; // Increment total calls 45 46 let timeout: ReturnType<typeof setTimeout>; 47 let timeoutError = false; 48 49 try { 50 // Acquire the semaphore before executing the action 51 await this.semaphore.acquire(); 52 53 const result = await new Promise(async (resolve, reject) => { 54 // To check if the action takes more time than expected 55 if (this.options.timeout) { 56 timeout = setTimeout(() => { 57 timeoutError = true; 58 this.semaphore.release(); 59 reject(new Error(`Timed out after ${this.options.timeout}ms`)); 60 }, this.options.timeout); 61 } 62 63 try { 64 const actionResult = this.action.apply(context, args); 65 const result = await (typeof actionResult.then === "function" 66 ? actionResult 67 : Promise.resolve(actionResult)); 68 69 if (!timeoutError) { 70 // This is considered a failure. Now the failure count will increase 71 clearTimeout(timeout); 72 this.semaphore.release(); 73 this.recordSuccess(); 74 resolve(result); 75 } 76 } catch (error) { 77 if (!timeoutError) { 78 clearTimeout(timeout); 79 this.semaphore.release(); 80 this.recordFailure(); 81 reject(error); 82 } 83 } 84 }); 85 86 return result; 87 } catch (error) { 88 this.emit("failure", error); 89 throw error; 90 } 91 } 92 93 private calculateFailureRate(): number { 94 if (this.totalCalls === 0) return 0; 95 return (this.failureCount / this.totalCalls) * 100; 96 } 97 98 private recordSuccess() { 99 if (this.state === State.HALF_OPEN) { 100 this.successCount++; 101 if (this.successCount >= this.options.successThreshold!) { 102 this.close(); 103 } 104 } 105 } 106 107 private recordFailure() { 108 this.failureCount++; 109 const failureRate = this.calculateFailureRate(); 110 if (failureRate >= this.options.failureThresholdPercentage!) { 111 this.open(); 112 } 113 } 114 115
Let's focus on the call function.
- If the timeout happens before the action can return a response. The circuit breaker records this as a failure and increments failureCount.
- If the failureThreshold reaches specified value, the circuit breaker transitions to OPEN state.
- In open state we start a setTimeout to transition to HALF_OPEN state to check if the action can be performed successfully now.
Rest of the code is pretty much self-explanatory. If you are interested in diving deeper, please have a look at the GitHub repo Circuit-Breaker. That covers the core functionality of the circuit breaker. One important feature that most Circuit Breaker libraries provide is the option to accept the default function in case of failure.
I am too lazy to implement it 🥱 May be I will add in the repo later.
Phewww, that's pretty much it! If you're still reading, you're either a true warrior of resilience (just like our Circuit Breaker 😄) or you just really love animations. Either way, thanks for sticking around! ☮️