Search This Blog

Sunday, September 14, 2014

Dynamic Service Discovery with Apache Curator

Service Discovery is a core part of most Service Oriented Architectures in particular and Distributed Network Component Architectures in general.

Traditional usages to find the location of a service have been to use Local file system, Shared File System, Database or some 3rd party Service Registry that statically define service locations. In many cases, one knows exactly the number of services that will be used and their deployment locations thus leading to a statically defined registry as an ideal candidate for discovery. In larger scale systems statically defined locations can become harder to manage as -
  • Number of services in the ecosystem grow (micro-services)
  • Number of deployment 'environments' grow - Test Environment 1, Test Enviroment 2...etc
  • Elasticity scalability becomes desirable - More important with Cloud based environments
  • Multiple services versions need to be supported simultaneously (A/B test)
Enter the concept of Dynamic Service Registration and Discovery. Services on becoming available register themselves as a member ready to participate with some 'registry'. Service Consumers are notified of the availability of the new service and avail its usage. If the service terminates for whatever reason, the Service Consumer is notified of the same and can stop availing its usage. This is not a novel concept and has likely been addressed in different ways.

Any Dynamic Service Registration System presents the following challenges -
  • Availability - The Service Registry can become the SPOF (Single Point Of Failure) for an entire ecosystem
  • Reliability - Keeping the registry and thus the service consumers in sync with which service instance is currently available. A service might go down and if the registry and service consumer are not notified immediately, service calls might fail.
  • Generality - Registry that supports multiple operating systems and technologies.
Again, the above is solvable by different ways. The benefit we have is that there are a number of tools that intrinsically support the requirements of a dynamic registry such as ZooKeeper, Netflix Eureka, Linked In's Dynamic Discovery, Doozer etc.  Jason Wilder has an excellent BLOG on Open Source Service Discovery and a comparison of popular open source service registries.

I have been looking into ZooKeeper and thus will share a few of my findings on using it but before I do the same, I would like to take a moment to discuss Load Balancing.

In standard H/A architectures a Load Balancer serves to balance traffic across the different service instances as shown below -

The same works well but poses the following challenges -
  • A new Load Balancer for every service and every environment
  • An additional hop between a Service Consumer and Service
  • Adding/Removing services per need and time to do the same
  • Intelligent Routing and Supporting Service Degradation
A Dynamic Service Registry could help with the mentioned concerns  where different instances of Service C register themselves with ZooKeeper and Service A and Service B on obtaining locations of Service C execute calls directly to them  -

ZooKeeper's Ephermal nodes serve well to maintain registered service instances as an Ephermal node only exists as long the client is alive, so if a client dies, the Ephermal node is deleted.  It is possible to use ZooKeeper directly and develop code that will register services, discover registered service etc. However, there is a library called Apache Curator that was initially created by Netflix to make writing ZooKeeper based application easier and more reliable. It was then incorporated as an Apache Project. Curator provides 'recipes' to common use cases one would use ZooKeeper for, i.e., Leader Election, Queue, Semaphore, Distributed Locks etc. In addition to these 'recipes', it also provides a mechanism for Service Registration and Discovery.

The remainder of this BLOG will look at providing a simple example that demonstrates the use of Apache Curator in the context of a simple JAX-RS web service that gets registered and is subsequently dynamically discovered by clients. For the sake of the example, I will continue to use the Notes JAX-RS Web Service that I have used on previous posts.

Service Registration -

In the example domain, all services will be registered under the ZooKeeper path "/services".  When the Notes Service starts, it registers itself with ZooKeeper as an ephermal node in the path "/services/notes/XXXXXXXX", where XXX indicates the ephermal instance.

A simple class called ServiceRegistrar is responsible for registering the Notes Service:
public class ServiceRegistrarImpl implements ServiceRegistrar {
  private static final String BASE_PATH = "/services";
  private static final String SERVICE_NAME = "notes";  

  private final CuratorFramework client; // A Curator Client
  private ServiceDiscovery<InstanceDetails> serviceDiscovery;
  // This instance of the service
  private ServiceInstance<InstanceDetails> thisInstance;
  private final JsonInstanceSerializer<InstanceDetails> serializer; 
  public ServiceRegistrarImpl(CuratorFramework client, int servicePort) throws UnknownHostException, Exception {
    this.client = client;
    serializer = new JsonInstanceSerializer<>(InstanceDetails.class); // Payload Serializer
    UriSpec uriSpec = new UriSpec("{scheme}://{address}:{port}"); // Scheme, address and port
    thisInstance = ServiceInstance.<InstanceDetails>builder().name(SERVICE_NAME)
      .address(InetAddress.getLocalHost().getHostAddress()) // Service information 
      .payload(new InstanceDetails()).port(servicePort) // Port and payload
      .build(); // this instance definition

  public void registerService() throws Exception {
    serviceDiscovery = ServiceDiscoveryBuilder.builder(InstanceDetails.class)
    serviceDiscovery.start(); // Registers this service
  public void close() throws IOException {
    try {
    catch (Exception e) {"Error Closing Discovery", e);
A Servlet Listener is defined whose responsibility is to start the registrar -
public class ServiceRegistrarListener implements ServletContextListener {
  private static final Logger LOG = Logger.getLogger(ServiceRegistrarListener.class);
  public void contextDestroyed(ServletContextEvent sc) {  

  public void contextInitialized(ServletContextEvent sc) {
    try {
    catch (Exception e) {
      LOG.error("Error Registering Service", e);
      throw new RuntimeException("Exception Registering Service", e);

Service Discovery -
A Simple Discoverer class is shown below which uses the Curator library in obtaining instances of registered Notes Services. Curator provides a ServiceDiscovery class that is made aware of a Service Registry. A ServiceProvider is then used to obtain registered service instances for a particular service.
public class ServiceDiscoverer {
  private static final Logger LOG = Logger.getLogger(ServiceDiscovery.class);
  private static final String BASE_PATH = "/services";

  private final CuratorFramework curatorClient;

  private final ServiceDiscovery<InstanceDetails> serviceDiscovery;

  private final ServiceProvider<InstanceDetails> serviceProvider;

  private final JsonInstanceSerializer<InstanceDetails> serializer;

  private final List<Closeable> closeAbles = Lists.newArrayList();

  public ServiceDiscoverer(String zookeeperAddress, String serviceName) throws Exception {
    curatorClient = CuratorFrameworkFactory.newClient(zookeeperAddress,
      new ExponentialBackoffRetry(1000, 3)); // Ideally this would be injected

    serializer = new JsonInstanceSerializer<>(InstanceDetails.class); // Payload Serializer

    serviceDiscovery = ServiceDiscoveryBuilder.builder(InstanceDetails.class).client(curatorClient)
        .basePath(BASE_PATH).serializer(serializer).build(); // Service Discovery

    serviceProvider = serviceDiscovery.serviceProviderBuilder().serviceName(serviceName).build(); // Service Provider for a particular service

  public void start() {
    try {
      closeAbles.add(0, serviceDiscovery);
      closeAbles.add(0, serviceProvider);
    catch (Exception e) {
      throw new RuntimeException("Error starting Service Discoverer", e);

  public void close() {
    for (Closeable closeable : closeAbles) {
      // Close all
  public ServiceInstance<InstanceDetails> getServiceUrl() {
    try {
      return instance = serviceProvider.getInstance();
    catch (Exception e) {
      throw new RuntimeException("Error obtaining service url", e);
The ServiceProvider is used in the Notes Service Client as follows:
public class NotesClientImpl implements NotesClient {
  private static final Logger LOG = Logger.getLogger(NotesClientImpl.class);

  public static final String SERVICE_NAME = "notes";

  private final Client webServiceClient; // JAX-RS Client

  private final ServiceDiscoverer serviceDiscoverer; // Service Disoverer

   * @param getNotesServerUrl() Server URI
   * @throws Exception
  public NotesClientImpl(String zookeeperAddress) throws Exception {
    serviceDiscoverer = new ServiceDiscoverer(zookeeperAddress, SERVICE_NAME);

    ClientConfig config = new ClientConfig();
    webServiceClient = ClientBuilder.newClient(config);

  public NoteResult create(Note note) {
    ServiceInstance<InstanceDetails> instance = serviceDiscoverer.getServiceInstance();
    try {
      return webServiceClient
          .post(Entity.entity(note, MediaType.APPLICATION_XML_TYPE), NoteResult.class);
    catch (ProcessingException e) {
      // If a ProcessingException occurs, the current ServiceInstance is marked as down
      throw e;
In the above code, a ServiceInstance is obtained from the ServiceDiscoverer. It is important to 'not' locally hold onto the instance but use it during the processing of a call and then let go of it. If a occurs say due to a Service Instance dying abruptly during processing for the request, then, notifying Curator to not use that instance for a certain period of time is the recommended direction. There is a time out delay between when the service instance dies and ZooKeeper times out the ephermal node associated with the service. So unless the instance is marked as being in an error state on the service client, the ServiceDiscoverer would continue to provide the downed service instance for usage.

Integration Test -

A simple Integration Test has been shown where a number of Notes Server instances are started up and they register themselves with a test ZooKeeper instance. The Notes Client is able to round robin to each of those instances. Each of the servers are designed to randomly die after a period of time leading to them being un-registered .When a Notes Server Instance goes down, it will no longer be invoked as part of the test. There is a Notes Server started that runs for a long time and acts as a fall back for demonstration completeness.
public class SimpleIntegrationTest {
  private TestingServer testServer; // ZooKeeper Server

  private NotesServer longRunningNotesServer; // A Notes Server that does not die for the demo
  private static final int MAX_SERVERS = 10; // Max Servers to use
  public void setUp() throws Exception {
    testServer = new TestingServer(9001);   // Start the Curator ZooKeeper Test Server

  public void tearDown() throws IOException {...}

  public void integration() throws Exception {
    CyclicBarrier barrier = new CyclicBarrier(MAX_SERVERS + 1);
    for (Integer port : ports()) { // Start Instances of Notes Server on different ports
      new NotesServer(port, barrier).start(); // Start an Instance of the Notes Server at the provided port
    longRunningNotesServer = new NotesServer(9210, 500000L, null); // Start a Long Running Notes Server
    barrier.await(); // Wait for all Servers start
    NotesClient notesClient = new NotesClientImpl(testServer.getConnectString());
    List<NoteResult> noteResults = Lists.newArrayList();
    int i = 0;
    int maxNotes = 10000;
    while (i < maxNotes) {
      try {
        Note note = new Note("" + i, "" + i);
        NoteResult result = notesClient.create(note);
      catch (Exception e) {
        e.printStackTrace(); // If an error occurs on one instance, try again

    // Ensure that all Notes Servers have been utilized
    assertEquals(MAX_SERVERS +1, notesClient.getServiceInstancesUsed().size());
  private static Set<Integer> ports() {
    // Provide a unique set of ports for running multiple Notes Server

Thoughts -
  • ZooKeeper is CP so in the event of Partition, writes can suffer.
  • ZooKeeper is SPOF of the ecosystem. Should have a contingency strategy.
    • Linked In's Dynamic Discovery Architecture addresses something like this but has a lot of moving parts
    • Simplicity of a Load Balancer makes one think YAGNI
      • Tools like Puppet Labs might make a lesser case for burden of standing up environments as well
  • Service Discovery allow clients to be 'smart'. Support for Service Degradation, alerting, and monitoring kind of go hand in hand.
  • Service Registration and De-Registration can be smartly built as well. For example, a Service could itself determine it's not healthy for whatever reason and de-registerer itself from the registry.
  • Supporting multiple environments where an ecosystem is self discoverable is simpler than having to provide overrides across different environments.
  • Managing  downed nodes with Curator needs to be handled carefully. Idempotence is your friend here.
  • Curator can definitely help with intelligent load balancing and different Load Balancing Strategies
    • Round Robin
    • Random
    • Custom one that involves Service Degradation
  • Curator provides a REST service that helps non-java clients wishing to participate in service registration and discovery.
  • If Netflix, Linked In and Amazon do something like this, it warrants looking into :-)


Anonymous said...

thanks for your document but I can't download your source code

The requested resource () is not available.

Sanjay Acharya said...

Sorry but the repo I hosted the code is no longer present and this is one BLOG whose content I don't seem to have to share anymore.