Elasticsearch cluster on AWS. Part 2 – configuring the elasticsearch.

In this part, we would cover the elasticsearch configuration and, finally, launch and test our cluster. Assoming that you’ve already read and did the steps from the Elasticsearch cluster on AWS. Part 1- preparing the environment.

Let’s assume that the Homer instance would be the leading one (just because it would hold the additional elasticsearch monitoring plugin). Now, let’s install and configure the elasticsearch there. The installation is inspired by this gist. You can find the lates elasticsearch version here.

First of all we need java to be installed.

To enable the elasticsearch start during the server start we should execute the next command:

Now, let’s install the elasticsearch-cloud-aws plugin.

Installing the elasticsearch-cloud-aws plugin

Installing the elasticsearch-cloud-aws plugin

Another thing we should install is the kopf plugin, the one we would use to overview our cluster configuration.

Installing the kopf plugin

Installing the kopf plugin

Now, we should update the elasticsearch configuration file.

Backup the default file:

 

Now let’s create the new one, for homer it should look like this:

This content should be saved in /etc/elasticsearch/elasticsearch.yml .

 

Let’s talk about the configuration parameters in particular:

  • path.data: “/srv/elasticsearch/data” – here you can state the custom path for storing data, it’s useful when you add the additional big size partition to your server and want to store data there. In our case it’s commented out, because we use the default storage.
  • # additional configuration section – in this section we tell the elasticsearch how it should work with our memory. This configuration tells it to not store a lot of data which it uses during the search in the heap memory. Besides that, there is a configuration option from the standard elasticsearch configuration recomendations.
  • # AWS discovery – in the AWS discovery section, we tell the instance how it should search the other cluster nodes. Basically it’s the configuration of our elasticsearch-cloud-aws plugin.
  • node.name – the name of our node
  • discovery.zen.ping.multicast.enabled – we tell the current node, that it should not try to find the other ones
  • network.publish_host – here we should state the public IP of our server (the one we attached with the help of elastic IP). This is the IP which the other nodes would use to interact with this one.

Important thing for the elasticsearch-cloud-aws plugin – if you decided to configure your aws security group stricktly (e.g. do not allow the 3rd party IPs to access the servers inside), be sure that you allow the access to the servers not only for the public IPs of the nodes which are in the cluster, but also to the private ones. The private IP is the IP of the instance inside the AWS network, you can find it here:

Instance private IP

Instance private IP

The thing is, that when your node would find the other, it would try to interact with them using the internal IPs. That’s the bug/issue in the current implementation of the plugin.

We’re almost there, the only thing which left is the additional configuration file, where we tell the elasticsearch how to use our system resources.

The /etc/default/elasticsearch one.

The updated content should look the next way:

Two parameters were updated:

  • ES_HEAP_SIZE – as the elasticsearch developers suggest, here you should state the 50% of the RAM memory your server have.
  • MAX_LOCKED_MEMORY – this param makes our previous configuration value (bootstrap.mlockall: true) works.

Those parameters are important if you use the powerful instance, because by default elasticsearch would use 1gb heap memory for the thread, which would lead to this kind of error:

Now we can start our elasticsearch instance.

The configs in our case should look this way:

 

After the configuration you should start the elasticsearch in the same way as on homer. Let’s give the nodes 1 minute to boot and now we are able to check our cluster via the web interface.

Just open the http://homer:9200/_plugin/kopf in the browser:

Cluster overview via kopf

Cluster overview via kopf

Here you see that the cluster state is green, which is good. And there are three nodes in the cluster.

We made it. Now you’re free to send the index tasks to any node, those would be spreaded over the others. You can also query any node and receive the result. Internally the cluster would interact with its nodes, spread the indexed data, spread the indexing tasks and so on.

Just to be sure that it’s working, let’s quickly configure the logstash to push some data to the ES cluster.

The logstash operations were done on my local machine, not on the elasticsearch nodes. I would not cover the logstash installation, you can download and unpack it by yourself, just state the config I used to generate some events:

Run the logstash:

Now let’s go back to the kopf plugin, the thing we see is:

Filled cluster overview via kopf

Filled cluster overview via kopf

The solid green squares you see are the primary shards. As you see they are distributed across the cluster, which is good, because the indexing process takes place on the node where the primary shard is located. This way we are sure that the indexing load would be distributed over our cluster.

That’s all, now your elasticsearch cluster is fully operational!

As homework – install the Kibana plugin somewhere, so you can browse the data from your elasticsearch cluster.

Kibana

Kibana