Node.js clustered app with autodiscovery. Part 1 – gathering the blocks.

In today’s article I would share my experience in building the Node.js application, which can make a cluster with another copies of the app even if it doesn’t know where the other apps are located at the moment of the application start.

Sounds too complicated, so first let’s talk about clusters.

What are those clusters?

Many of the developers have heard about the Elasticsearch (part 1 and part 2 of my articles – how to setup the Elasticsearch AWS cluster), the open source search engine, which allows you to build a powerful search system around your data. One of the strong sides of the elasticsearch is an option to be run in the cluster. This means, that if you have, for example, three different servers, each running it’s own copy of the ES – it’s still possible that they would work as one entity, as one big brain which uses the power of all of those 3 servers. This allows you to scale ES quite easily.

In the ES it’s done through the configuration, the module which is responsible for discovery is called Zen. That’s how it looks in the configuration (below is the part of the configuration):

The important config option is called discovery.zen.ping.unicast.hosts , there it’s possible to let ES know on which IP addressed we expect to met our “cluster friends” – the other nodes of the cluster (when you run cluster inside one private network you don’t need to do that). When the application instances will be started – they would find each other and form a cluster. Which looks like this:

Elasticsearch cluster

As you see – we do have three instances, all of those form the cluster, one of the instances is marked with star – which means that it is the master.
As outcome, one particular example, – clustering allows us as customer, to send information for the indexation to any node and then, the cluster will decide which node would process and store that information. Which makes the interaction very convenient and abstract.

So I came up with a thought – how it’s possible to do that using Node.js? And, what is even more important – how can I run the cluster and do not let it know the list of all the possible cluster nodes on the start.

Node.js clustering

First of all – one more time, in this article, by cluster I mean the fact that couple of node apps, running in several different hosts, are forming one big brain.

So, in one of the Node weekly mails I got to know the node-discover library. Which indeed allows the developer to form the clusters from the node applications. Here is the basic usage of the node-discover:

You just need to include the library and run the constructor. Under the hood it will send the multicast UDP query, like “Hey the other IP addresses in the private network, I’m here, I want to be a part of your cluster family”. Others would listen for it and form a cluster.

But there was a blocker, at the time as I found it – it was assumed that you want to run the cluster in one private network. My idea was to be able to run it in the different networks. I have searched for the options and hadn’t found any, so I have created an issue. The author of the library confirmed, that it’s not possible right now, but was so kind and productive, that it became possible in two hours! Dan used the same approach as Elasticsearch – unicast messaging.

I’ve tested it, and it worked – great! One problem was still in place – you need to pass all the possible IP addresses at the moment of discovery. At that time I heard about the etcd.

Etcd, what is that?

Etcd, which stands for etc distributed is the distributes key-value storage for storing the shared configurations. There are three especially strong sides of that service (despite the others, which are just strong):

  1. It is designed to work in clusters (meaning you should launch more then one instance of etcd, preferably on several hosts)
  2. It’s written in Go, and I like Go
  3. It uses RAFT algorithm to be sure that data in the cluster could be trusted. I do not know how it works, but it sounds solid.

Under the hood, etcd is the REST API.

You can store the key there:

You can get a value from the service:

And receive the next response:

And, what I’ve liked the most – there is a mechanism of watching, you can subscribe to the value and receive the updates each time when the value is updated. From the WEB perspective it’s done using simple long-pooling.

Whenever the key would be updated, you will be notified:

Another thing to mention – etcd supports not only keys but directories entities as well. Meaning that you can create directory or subscribe to its changes.

Ok, so all the block are gathered, in theory we do have everything and now we can build the clustered Node.js app.

Please, follow the link to the Part 2, where I would show you the demo setup of the Node.js clustered app with autodiscovery.