Microservices Communication: Hystrix As The Jon Snow

In the previous  Microservice Tutorial ,we have learned about How to use Zuul API gateway. In this Tutorial , we will learn about Hystrix which act as a Circuit breaker of the services. Circuit breaker -- the term is new to you in terms of Software Architecture? Don’t  worry I will discuss in detail regarding the same.


But before that Let's discuss with a well-known incident who are working in Support Project(Monolith).


Birth of the Night’s King:

Folks who are in On call Support how many times it happens, You got a call in the night saying, System is not responding, it is a priority 1 issue. You wake up in odd time opens your laptop,
Check health check pages found some servers are down, Some servers have a huge memory spike.So immediately you take a thread dump and all the necessary details then restarted all the server in the pool. After restarting you will found things are quite normal and go to sleep if you are lucky enough then you got a good sleep but if you unlucky again in the morning you may face the same scenario.

So, Next day when you and your team researching why this happened , what is the root cause of the birth of White walkers, which ate up all precious resources and the server eventually got unresponsive.

You may find there is a resource leak in somewhere may be in code level--Someone forgot to close a precious resource, like connection. Or there were an unnecessary open threads. Or there is a blocking session in the database etc.

But hold on why we can’t find this resource leak/birth of Night King at the first time? Why the Night’s king grows up silently and when he is in action then we got notified?

So, It opens our eyes that there is a problem in our Architecture(King's Landing), there are no techniques for early detection of a Resource leak( No Jon Snow to Watch the Wall!!!).


A practical Scenario.
Let examine a simple scenario which may cause this type of scenario, Say we have an architecture where service A and Service B dependent on Service C.  Both Service A and B query  Service C API to get some result. Now Service C is used the underlying database to fetch result but unfortunately, programmer does not close the connection in finally block he does it in the try block.

Now in production, if any error occurs in Service C regarding Database connection/query, It does not release the connection so Connections are not back in Connection pools(Connection pool has finite resources).  But Service A and B does not aware of this scenario it queries Service C as the request comes and Service C ate up one by one free Connection from Connections pool. So after a certain time, all Connections are eaten up by Service C and there is no connection available as free in Connection pool and Night Walkers(Service C) eaten up your System. After restarting all the server its gives you relief for sometimes but if the Service C error continues (Programming fault) then again you might have to wake up in the morning (Night King’s is back).

It all happens due to Service A and B, They are not aware the Service C is not responding the way it should be. If they aware they just simply stop the querying then We should not have faced this situation. Here the concept of the Circuit breaker(In GOT NightWatch) comes up.

Resourse Leak--Birth of Night King


Circuit Breaker Pattern :

The Circuit breaker concept is same as an electrical circuit When the Circuit is closed electrons flow through the circuit but if any unusual thing happens it trips the circuit, Circuit is opened up so there is no flow of electrons through the circuit. It provides the circuit to recover itself after a certain amount of time, Circuit closes and flows of the electrons continues.

Netflix hystrix is such a framework which works on the same principle.

It always monitoring the calls so if any dependent service response is greater than the threshold limit it trips the circuit,  so no further calls will not flow to the dependent service. It gives dependent service to recover itself. In that time there is a fallback policy, all the request goes to that fallback path. After a certain amount of time again the circuit is closed and request flows as it is.

Please note that we can enable Hystrix(Jon Snow-- The King of North) in Spring cloud. previously, it supports only Service and Component level @Service or @Component. With the latest, it supports in @Controller also.

Hystrinx As JonSnow



Coding Time:

Lets recap the EmployeeDashBoardService , It calls EmployeeSearchService to find employee based on the id. Currently ig EmployeeSearch service is unavailable then EmployeeDashBoard Service does not got the result and show the error. But we want to show a Default Employee Value if EmployeeSeracgservice is not available so to incorporate the change in EmployeeDashboardService we have to do the following changes.


Step 1 : Add Hystrix plugin into pom.xml



<dependency>
      <groupId>org.springframework.cloud</groupId>
      <artifactId>spring-cloud-starter-hystrix</artifactId>
</dependency>


Step 2 : Add @EnableCircuitBreaker on top of  EmployeeDashBoardService, to enable Hystrix for this service.

package com.example.EmployeeDashBoardService;

import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.boot.web.client.RestTemplateBuilder;
import org.springframework.cloud.client.circuitbreaker.EnableCircuitBreaker;
import org.springframework.cloud.client.discovery.EnableDiscoveryClient;
import org.springframework.cloud.netflix.feign.EnableFeignClients;
import org.springframework.context.annotation.Bean;
import org.springframework.web.client.RestTemplate;

@EnableDiscoveryClient
@EnableCircuitBreaker
@EnableFeignClients
@SpringBootApplication
public class EmployeeDashBoardService {

   public static void main(String[] args) {
      SpringApplication.run(EmployeeDashBoardService.class, args);
   }

   @Bean
   public RestTemplate restTemplate(RestTemplateBuilder builder) {
      return builder.build();
   }
}


Step 3 :  Now we will change the EmployeeInfoController.java so it can be Hystrix enable.

package com.example.EmployeeDashBoardService.controller;

import java.util.Collection;

import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.cloud.context.config.annotation.RefreshScope;
import org.springframework.web.bind.annotation.PathVariable;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RestController;
import org.springframework.web.client.RestTemplate;

import com.example.EmployeeDashBoardService.domain.model.EmployeeInfo;
import com.netflix.appinfo.InstanceInfo;
import com.netflix.discovery.EurekaClient;
import com.netflix.discovery.shared.Application;
import com.netflix.hystrix.contrib.javanica.annotation.HystrixCommand;

@RefreshScope
@RestController
public class EmployeeInfoController {
   
    @Autowired
    private RestTemplate restTemplate;
   
    @Autowired
    private EurekaClient eurekaClient;
   
    @Value("${service.employyesearch.serviceId}")
    private String employeeSearchServiceId;


   @RequestMapping("/dashboard/{myself}")
   @HystrixCommand(fallbackMethod="defaultMe")
   public EmployeeInfo findme(@PathVariable Long myself){
      Application application = eurekaClient.getApplication(employeeSearchServiceId);
       InstanceInfo instanceInfo = application.getInstances().get(0);
       String url = "http://"+instanceInfo.getIPAddr()+ ":"+instanceInfo.getPort()+"/"+"employee/find/"+myself;
       System.out.println("URL" + url);
       EmployeeInfo emp = restTemplate.getForObject(url, EmployeeInfo.class);
       System.out.println("RESPONSE " + emp);
       return emp;
   }
   
   private EmployeeInfo defaultMe(Long id){
      EmployeeInfo info = new EmployeeInfo();
      info.setEmployeeId(id);
      info.setName("Hystrix fallback");
      info.setCompanyInfo("Netfilx");
      info.setDesignation("Fallback");
      return info;
   }
   
   
   @RequestMapping("/dashboard/peers")
   public  Collection<EmployeeInfo> findPeers(){
      Application application = eurekaClient.getApplication(employeeSearchServiceId);
       InstanceInfo instanceInfo = application.getInstances().get(0);
       String url = "http://"+instanceInfo.getIPAddr()+ ":"+instanceInfo.getPort()+"/"+"employee/findall";
       System.out.println("URL" + url);
       Collection<EmployeeInfo> list= restTemplate.getForObject(url, Collection.class);
        System.out.println("RESPONSE " + list);
       return list;
   }
}




Carefully note the method named findme, It actually calls the EmployeeService, So I use a
@HystrixCommand(fallbackMethod="defaultMe") annotation on top of this method, by doing we instruct Spring to proxy this method, so that if any error will occur or Employee Service is not available it goes through the fallback method and called it, and shows the default value rather than showing an error

For that, we add the attribute fallbackmethod=defaultMe where default me is the default method, Please note that method signature and return type must be the same of the findme method. Unless you facing an error no Such method found. It internally uses Spring AOP which intercept the method call.

If the EmployeeService is not available then it calls defaultMe Method and returns the default employee.

Let's check the same,

Start Config server, Eureka server, and EmployeeDashBoard service, intentionally I not started the EmployeeSearchService so it is unavailable when we call findme method

If you hit the following URL




You will see the following response as the Actual EmployeeSearchService is down.

{
  "employeeId":2,
  "name":"Hystrix fallback",
  "practiceArea":null,
  "designation":"Fallback",
  "companyInfo":"Netfilx"
}


Microservices Tutorial: Ribbon as a Load balancer


In the previous Microservice tutorial , we have learned How to communicate with other Microservice using Feign as a REST client and Eureka server as a Service discovery.

In all cases, We consider only one instance of a Microservice-- which calls another instance of dependent Microservice(EmployeeDasBoard service call to EmployeeSearch service).
This is good for demo purpose or when you are practicing How to develop Microservice.
In production, Certainly it is not the case-- we break Monolith application to Microservice applications because we can scale each service based on the payload. So Single instance of a service is unimaginable in production-- so what we generally do is, using a load balancer which balancing the payload among multiple instances of a service.


Before digging into Ribbon the Client side Load Balancer for Microservice architecture, Let discuss How our old fashioned Java EE services AKA Monolith maintains Load balancing.


Server Side Load Balancing :  In java EE architecture we deploy our war/ear files into multiple application servers, then we create a pool of server and put a load balancer(Netscaler)in front of it. Which has a public IP. The client makes a request using that public IP and Netscaler decides in which internal application server it forwards the request by Round robin or Sticky session algorithm. We call it Server side load balancing.

server side Load Balancing
Server Side Load Balancing


Problem : The problem of server side load balancing is if one or more servers stop responding we have to manually remove those servers from Load balancer by updating IP table of the Load balancer.
Another problem is we have to implement failover policy to provide the client a seamless experience.
But Microservice not using the server side load balancing. It uses client side Load balancing.


Client side Load Balancing : To understand Client Side Load balancing let's recap the Microservice architecture.  We generally create a Service discovery like Eureka or Consul where each service instance register when bootstrapped. Eureka server maintains a Service registry, it maintains all the instances of the service as Key/value map.Where {service id} of your Microservice serves as Key and instance serve as Value. Now if one Microservice wants to communicate other Microservice it generally looks up the service registry using DiscoveryClient and Eureka server returns all the instances of the calling Microservices to the caller service. Now it is Caller service headache which instance it calls. Here Client side Load balancing stepped in. Client side Load Balancer maintains Algorithm like Round robin or Zone specific by which it can invoke instances of calling services. The advantage is as Service registry always updated itself if one instance goes down it removes it from its registry so When Client side Load balancer talks to Eureka server it always updates itself so there is no manual intervention unlike server side load balancing to remove an Instance.

Another Advantage is as Load balancer is in client side you can control its Load balancing algorithm programmatically.

Ribbon provides this facility so we will use Ribbon for Client side Load balancing.



client side load balancing
Client Side Load Balancing






Coding Time

We will configure Ribbon in Our EmployeeDashBoradService which will communicate with Eureka to fetch EmployeeSearchservice instances.

Step 1: To enable Ribbon in EmployeeDashBoard we have to add the following dependency in pom.xml

<dependency>
      <groupId>org.springframework.cloud</groupId>
      <artifactId>spring-cloud-starter-ribbon</artifactId>
</dependency>

Step 2:  Now we have to Enable Ribbon so it can Load balance the EmployeeSerach Application so for that we need to put @RibbonClient(name="EmployeeSearch") on top of the EmployeeServiceProxy interface. By doing this we instruct Spring boot to communicate Eureka server and get the list of instances for service id EmployeeSerach. Please note that this is the {service-id} for the Employeeserach application.
package com.example.EmployeeDashBoardService.controller;

import java.util.Collection;

import org.springframework.cloud.netflix.feign.FeignClient;
import org.springframework.cloud.netflix.ribbon.RibbonClient;
import org.springframework.web.bind.annotation.PathVariable;
import org.springframework.web.bind.annotation.RequestMapping;

import com.example.EmployeeDashBoardService.domain.model.EmployeeInfo;



@FeignClient(name="EmployeeSearch" )
@RibbonClient(name="EmployeeSearch")
public interface EmployeeServiceProxy {
   
   @RequestMapping("/employee/find/{id}")
   public EmployeeInfo findById(@PathVariable(value="id") Long id);
   
   @RequestMapping("/employee/findall")
   public Collection<EmployeeInfo> findAll();

}


Our Ribbon Client is ready now.

Testing time:

Start Configserver and Eureka server first.
Then Start EmployeeService it will up on port 8080 as we mentioned in bootstrap.preoperties.
Now Run another instance but this time starts with -Dserver.port=8082 so another instance up on 8082 port.

After that run the EmployeeDashBoard service.

Now check the Eureka server GUI it will look like following





Now if you hit the following URL


You can see the following response

{
   "employeeId": 1,
   "name": "Shamik  Mitra",
   "practiceArea": "Java",
   "designation": "Architect",
   "companyInfo": "Cognizant"
}

Now open the EmployeedashBorad Console you can see following lines are printed in console

DynamicServerListLoadBalancer for client EmployeeSearch initialized: DynamicServerListLoadBalancer:{NFLoadBalancer:name=EmployeeSearch,current list of Servers=[192.168.0.103:8080, localhost:8082],Load balancer stats=Zone stats: {defaultzone=[Zone:defaultzone;    Instance count:2;    Active connections count: 0;    Circuit breaker tripped count: 0;    Active connections per server: 0.0;]
},Server stats: [[Server:localhost:8082;    Zone:defaultZone;    Total Requests:0;    Successive connection failure:0;    Total blackout seconds:0;    Last connection made:Thu Jan 01 05:30:00 IST 1970;    First connection made: Thu Jan 01 05:30:00 IST 1970;    Active Connections:0;    total failure count in last (1000) msecs:0;    average resp time:0.0;    90 percentile resp time:0.0;    95 percentile resp time:0.0;    min resp time:0.0;    max resp time:0.0;    stddev resp time:0.0]
, [Server:192.168.0.103:8080;    Zone:defaultZone;    Total Requests:0;    Successive connection failure:0;    Total blackout seconds:0;    Last connection made:Thu Jan 01 05:30:00 IST 1970;    First connection made: Thu Jan 01 05:30:00 IST 1970;    Active Connections:0;    total failure count in last (1000) msecs:0;    average resp time:0.0;    90 percentile resp time:0.0;    95 percentile resp time:0.0;    min resp time:0.0;    max resp time:0.0;    stddev resp time:0.0]
]}ServerList:org.springframework.cloud.netflix.ribbon.eureka.DomainExtractingServerList@a1df28c
2017-08-04 22:56:47.180  INFO 3293 --- [erListUpdater-0] c.netflix.config.ChainedDynamicProperty  : Flipping property: EmployeeSearch.ribbon.ActiveConnectionsLimit to use NEXT property: niws.loadbalancer.availabilityFilteringRule.activeConnectionsLimit = 2147483647