Elasticsearch cluster on AWS. Part 2 – configuring the elasticsearch.

In this part, we would cover the elasticsearch configuration and, finally, launch and test our cluster. Assoming that you’ve already read and did the steps from the Elasticsearch cluster on AWS. Part 1- preparing the environment.

Let’s assume that the Homer instance would be the leading one (just because it would hold the additional elasticsearch monitoring plugin). Now, let’s install and configure the elasticsearch there. The installation is inspired by this gist. You can find the lates elasticsearch version here.

First of all we need java to be installed.

To enable the elasticsearch start during the server start we should execute the next command:

Now, let’s install the elasticsearch-cloud-aws plugin.

Installing the elasticsearch-cloud-aws plugin

Installing the elasticsearch-cloud-aws plugin

Another thing we should install is the kopf plugin, the one we would use to overview our cluster configuration.

Installing the kopf plugin

Installing the kopf plugin

Now, we should update the elasticsearch configuration file.

Backup the default file:

 

Now let’s create the new one, for homer it should look like this:

This content should be saved in /etc/elasticsearch/elasticsearch.yml .

 

Let’s talk about the configuration parameters in particular:

  • path.data: “/srv/elasticsearch/data” – here you can state the custom path for storing data, it’s useful when you add the additional big size partition to your server and want to store data there. In our case it’s commented out, because we use the default storage.
  • # additional configuration section – in this section we tell the elasticsearch how it should work with our memory. This configuration tells it to not store a lot of data which it uses during the search in the heap memory. Besides that, there is a configuration option from the standard elasticsearch configuration recomendations.
  • # AWS discovery – in the AWS discovery section, we tell the instance how it should search the other cluster nodes. Basically it’s the configuration of our elasticsearch-cloud-aws plugin.
  • node.name – the name of our node
  • discovery.zen.ping.multicast.enabled – we tell the current node, that it should not try to find the other ones
  • network.publish_host – here we should state the public IP of our server (the one we attached with the help of elastic IP). This is the IP which the other nodes would use to interact with this one.

Important thing for the elasticsearch-cloud-aws plugin – if you decided to configure your aws security group stricktly (e.g. do not allow the 3rd party IPs to access the servers inside), be sure that you allow the access to the servers not only for the public IPs of the nodes which are in the cluster, but also to the private ones. The private IP is the IP of the instance inside the AWS network, you can find it here:

Instance private IP

Instance private IP

The thing is, that when your node would find the other, it would try to interact with them using the internal IPs. That’s the bug/issue in the current implementation of the plugin.

We’re almost there, the only thing which left is the additional configuration file, where we tell the elasticsearch how to use our system resources.

The /etc/default/elasticsearch one.

The updated content should look the next way:

Two parameters were updated:

  • ES_HEAP_SIZE – as the elasticsearch developers suggest, here you should state the 50% of the RAM memory your server have.
  • MAX_LOCKED_MEMORY – this param makes our previous configuration value (bootstrap.mlockall: true) works.

Those parameters are important if you use the powerful instance, because by default elasticsearch would use 1gb heap memory for the thread, which would lead to this kind of error:

Now we can start our elasticsearch instance.

The configs in our case should look this way:

 

After the configuration you should start the elasticsearch in the same way as on homer. Let’s give the nodes 1 minute to boot and now we are able to check our cluster via the web interface.

Just open the http://homer:9200/_plugin/kopf in the browser:

Cluster overview via kopf

Cluster overview via kopf

Here you see that the cluster state is green, which is good. And there are three nodes in the cluster.

We made it. Now you’re free to send the index tasks to any node, those would be spreaded over the others. You can also query any node and receive the result. Internally the cluster would interact with its nodes, spread the indexed data, spread the indexing tasks and so on.

Just to be sure that it’s working, let’s quickly configure the logstash to push some data to the ES cluster.

The logstash operations were done on my local machine, not on the elasticsearch nodes. I would not cover the logstash installation, you can download and unpack it by yourself, just state the config I used to generate some events:

Run the logstash:

Now let’s go back to the kopf plugin, the thing we see is:

Filled cluster overview via kopf

Filled cluster overview via kopf

The solid green squares you see are the primary shards. As you see they are distributed across the cluster, which is good, because the indexing process takes place on the node where the primary shard is located. This way we are sure that the indexing load would be distributed over our cluster.

That’s all, now your elasticsearch cluster is fully operational!

As homework – install the Kibana plugin somewhere, so you can browse the data from your elasticsearch cluster.

Kibana

Kibana

 

 

  • Pingback: Elasticsearch cluster on AWS. Part 1- preparing the environment. - Pavel Polyakov's blog()

  • Santi Aime

    Hi Pavel, nice post!

    In the section when you set up the elasticsearch.yml, do you use the credentials of the elasticsearch aws user that you created in part 1 to configure cloud.aws.access_key?

    Thanks

    • Hi Santi,

      Yes, those are the credentials from the picture “User credentials”, but the screenshot is showing the credentials of the different user.

      • Santi Aime

        Great article Pavel, I have my cluster up and running.
        Regards

  • Abdul Wahid

    I followed everything but when i checked the status of elastic search “sudo service elasticsearch status” then its says ” * elasticsearch is not running”. Even it says OK when I started the elasticsearch service.

    • Hi Abdul,

      Try to check the /var/log/elasticsearch/ folder and analyse the errors, if they are there.

      • Abdul Wahid

        Just checked the log directory but its empty. Also it tried to access the elastic search instance with elastic IP but its not accessible.

        • Sorry, then I’m not able to help you.

          If there is some elasticsearch start – failed or successful, then there should be the logs.

  • Nick Maslov

    thanks, it works – I also used the tags to explicitly point to ec2 instances

  • Vipul Sharma

    you have created an elasticsearch cluster but you are using only one es host in logstash configuration. If that host goes down the logstash will go down. I am trying to use cloud aws in logstash also to discover the hosts in cluster but i am failing.
    Can u please help me if you know about it

  • satyam naolekar

    Hi Pavel,
    Thanks for the illustrative tutorial on this topic. I am really grateful.
    I have followed everything on the 1st and 2nd tutorial, but I am facing one problem in this setup.
    When I start nodes on different machine, they are not able to communicate and if I look into kopf I am able to see one node only, while I was expecting to see 3 nodes associated to single cluster.
    can you give any pointers on that ??

    PS:
    I have taken care of changed values in elasticsearch.yml

    discovery.ec2.availability_zones: “us-west-2b”
    cloud.aws.region: “us-west”
    discovery.ec2.groups: “elasticsearch-cluster”

    –Thanks and regards
    Satyam Naolekar

    • Hi Satyam,

      What you could try – check the logs in /var/log/elasticserach/ . There, during the cluster start the aws plugin would state which instances he has found and considered them as cluster ones.

      • satyam naolekar

        Hi Pavel,

        Thanks for the reply.
        I checked the logs but I couldn’t find any useful information , it looks like it is not even searching for the nodes in other machines. I feel I am missing some setting but could not figure out what.
        I will try to enplane my system here:

        I have created 3 macro instances and all are under a common group and discovery mode is set as
        discovery.ec2.groups: “elasticsearch-cluster”,
        even then it is not searching.

        I am putting log file info, please see if you can help.
        Thanks a lot in advance.

        –Regards
        Satyam Naoleakar
        ################################################3
        This is log for first instance

        15-01-30 09:50:13,215][WARN ][common.jna ] Unable to lock JVM memory (ENOMEM). This can result in part of the JVM being swapped out. Increase RLIMIT_MEMLOCK (ulimit).

        [2015-01-30 09:50:13,319][INFO ][node ] [ES-1] version[1.4.2], pid[1463], build[927caff/2014-12-16T14:11:12Z]

        [2015-01-30 09:50:13,319][INFO ][node ] [ES-1] initializing …

        [2015-01-30 09:50:13,339][INFO ][plugins ] [ES-1] loaded [cloud-aws], sites [kopf]

        [2015-01-30 09:50:16,982][INFO ][node ] [ES-1] initialized

        [2015-01-30 09:50:16,982][INFO ][node ] [ES-1] starting …

        [2015-01-30 09:50:17,044][INFO ][transport ] [ES-1] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/54.153.178.41:9300]}

        [2015-01-30 09:50:17,052][INFO ][discovery ] [ES-1] testindex/y0LasEwDT9mnCexsQsU2OA

        [2015-01-30 09:50:47,052][WARN ][discovery ] [ES-1] waited for 30s and no initial state was set by the discovery

        [2015-01-30 09:50:47,058][INFO ][http ] [ES-1] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/54.153.178.41:9200]}

        [2015-01-30 09:50:47,058][INFO ][node ] [ES-1] started

        [2015-01-30 09:50:47,911][INFO ][cluster.service ] [ES-1] new_master [ES-1][y0LasEwDT9mnCexsQsU2OA][ip-172-31-42-46][inet[/54.153.178.41:9300]]{maste=true}, reason: zen-disco-join (elected_as_master)

        [2015-01-30 09:50:47,932][INFO ][gateway ] [ES-1] recovered [0] indices into cluster_state

        ################################
        This is a log for 2nd instance

        [2015-01-30 09:51:14,281][WARN ][common.jna ] Unable to lock JVM memory (ENOMEM). This can result in part of the JVM being swapped out. Increase RLIMIT_MEMLOCK (ulimit).

        [2015-01-30 09:51:14,385][INFO ][node ] [ES-2] version[1.4.2], pid[1640], build[927caff/2014-12-16T14:11:12Z]

        [2015-01-30 09:51:14,385][INFO ][node ] [ES-2] initializing …

        [2015-01-30 09:51:14,398][INFO ][plugins ] [ES-2] loaded [cloud-aws], sites []

        [2015-01-30 09:51:17,953][INFO ][node ] [ES-2] initialized

        [2015-01-30 09:51:17,953][INFO ][node ] [ES-2] starting …

        [2015-01-30 09:51:18,010][INFO ][transport ] [ES-2] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/54.153.128.244:9300]}

        [2015-01-30 09:51:18,017][INFO ][discovery ] [ES-2] testindex/cx8Y8I5sQi6MPb9avJDOVw

        [2015-01-30 09:51:48,017][WARN ][discovery ] [ES-2] waited for 30s and no initial state was set by the discovery

        [2015-01-30 09:51:48,020][INFO ][http ] [ES-2] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/54.153.128.244:9200]}

        [2015-01-30 09:51:48,020][INFO ][node ] [ES-2] starte

        #################################################

        Sample for my elasticsearch.yml file from 3rd node.

        # paths

        # path.data: “/srv/elasticsearch/data”

        # additional configuration

        bootstrap.mlockall: true

        indices.fielddata.cache.size: “30%”

        indices.cache.filter.size: “30%”

        # AWS discovery

        cloud.aws.access_key: “XXX”

        cloud.aws.secret_key: “XXX”

        plugin.mandatory: “cloud-aws”

        cluster.name: “impartus”

        node.name: “ES-2”

        node.master: false

        discovery.type: “ec2”

        discovery.ec2.groups: “elasticsearch-cluster”

        discovery.ec2.host_type: “public_ip”

        discovery.ec2.ping_timeout: “30s”

        cloud.aws.region: “us-west”

        discovery.ec2.availability_zones: “us-west-2b”

        discovery.zen.ping.multicast.enabled: false

        network.publish_host: “54.153.178.244”

      • satyam naolekar

        Thanks Pavel, I have found the issue, actually region name was wrong
        It should be
        cloud.aws.region: “us-west-2”, now all the nodes are able to communicate with each other .

        Thanks for the tutorial.

        Regards
        Satyam

  • gg4u

    Not working anymore? Set up ES on one single node, as test. ES 2.0.0. Now the *great* thing is: Starting Elasticsearch Server [ OK ] but cannot curl to port 9200: access denied. I found out that the service is actually not starting at all, and return a problem on the configuration YAML file.

    sudo -u elasticsearch /usr/share/elasticsearch/bin/elasticsearch -d -p /var/run/elasticsearch.pid –default.config=/etc/elasticsearch/elasticsearch.yml –default.path.home=/usr/share/elasticsearch –default.path.logs=/var/log/elasticsearch –default.path.data=/var/lib/elasticsearch –default.path.work=/tmp/elasticsearch –default.path.conf=/etc/elasticsearch

    [user]@[host]:/etc/elasticsearch# [2015-11-04 14:28:16,210][INFO ][bootstrap ] es.default.config is no longer supported. elasticsearch.yml must be placed in the config directory and cannot be renamed.

    Any help on this?

    • HI gg4u,

      Sorry, I still use 1.7.3, had no experience with ES 2.

      Regards,

      • gg4u

        Hi Pavel, thanks anyway for coming back to me. Have you got a suggestion how to attempt with a downgrade without loosing all configuration? Also, I would like to put ES on a single instance 1GB, hosting also the web app and the web servers. The index I have is 0.5GB, and I use ES only for search, no analitics – that is, I should use more or less 20% of indexes, as searches follow zips law (ES blog says that estimate). Considering that there is a SSD drive, do you think it would be enough? which parameters should i take into account or any suggestion to tweak configuration accordingly? thank you!

        • I do not recommend to mix different services on 1 server, when it is possible. But, if you really want to do that – the only thing you can adjust is max HEAP memory, I think, make it like 350Mb.

          Regarding the downgrading – for now I don’t know if ES 2 and ES 1.* have compatible format, I think not. But I would recommend you to install ES 2, anyway, in case you don’t need a cluster, it should be pretty easy, even if the documentation is new. But this version is better for the long term solution.

          • gg4u

            Hi Pavel, I took a closer look. Here what I did:
            1. found that my path:data directory set in YAML file was not working, probably due to permission. ES could not create the directory path:data/mycluster.
            path:log working instead (and manually set to var/log/elasticsearch as pointed out in ES guidelines).

            I commented out path:data and move to next debug step, now I wonder: where will data directory be stored once I will fed data?

            2. Found out that mandatory plugin was not created, due to:
            cloud-aws ERROR: Could not find plugin descriptor ‘plugin-descriptor.properties’ in plugin zip
            Tried to use cloud-aws-2.2.0 but same error.
            I commented it out.

            3. consequentely, I commented out all discovery setting for EC2 AWS.

            I am now able to start ES. Curl to localhost 9200 is OK, but still failed from public IP.

            4. Tried to allow cors in YAML file and allow origins “*” as suggested here:
            http://stackoverflow.com/questions/27508721/elasticsearch-on-ec2-cannot-hit-public-iptimeout

            however no solution: can curl from localhost, not from public IP.

            If you have suggestions, much appreciated.
            Hope this info could help other people who got stucked as me.

          • Amstrong Huang

            hello,

            Had the same problem you did. Could see my ES but couldnt connect to it via the public ip. The solution is:

            1. follow https://github.com/andrewpuch/elasticsearch-logstash-kibana-tutorial but adapt it for version 2.0 obviously

            2. in the config files copy and paste everything just as the manual says, but add this in the end http.host: 0

            3. Enjoy.

  • shravan

    Hi all,
    I am successfully installed the ELK-stack on EC2, t2.small instance and RAM:2gb.
    Am installed JDK 1.8, tomcat 7, Elasticsearch-1.7.3, logstash-1.5.4 and kibana-4.1.2-linux-x64 and also installed the head,bigdesk and AWS cloud plugin also.
    Am able to load the tomcat log file (catalina.out) its loading successfully into elasticsearch and data also visualized in kibana also..
    when am trying to do filter the tomcat log data like “LOGLEVEL” using a custom pattern in “grok” its not working and my log file contains almost all one week data…
    Am using the default patterns..
    Please help me in this issue… and how to write the custom patterns for log files… My logstash conf file is look like below…..

    input
    {
    file
    {
    path => “/usr/share/apache-tomcat-8.0.23/logs”
    type => “tomcatlogs”
    start_position => “beginning”
    }
    }
    filter {
    if [type] == “tomcatlogs” {
    grok {
    patterns_dir => “/opt/ELK/logstash-1.5.4/vendor/bundle/jruby/1.9/gems/logstash-patterns-core-0.3.0/patterns/grok-patterns”
    match => { “message” => “%{TIMESTAMP_ISO8601:logtime} %{LOGLEVEL:SEVERE} %{GREEDYDATA:other}” }
    # match => { “message” => “%{SYSLOGTIMESTAMP:syslog_timestamp} %{IP:IPAddress}/%{IP:SourceIP} %{GREEDYDATA:Message}” }
    }
    date {
    match => [ “logtime”, “MMM d HH:mm:ss”, “MMM dd HH:mm:ss” ]
    }
    }
    }
    output
    {
    elasticsearch
    {
    host => “localhost”
    protocol => “http”
    port => “9200”
    }
    }

    please suggest me to find a solution for this issue….

    Thanks,
    Shravan K.

    • Hi Shravan,

      I think first you need to check if your logstash+ES pair is working. Try to force logstash to read from the static file. Without any grok patterns, when you will have this working – start the next step.
      Then point the logstash to the dynamic logs directory – check if it is working.
      Then add the grok matching, when necessary.

      You can check if your matching works here: http://grokconstructor.appspot.com/do/match .

      The main point – work step by step, do not try to setup everything right away.

      Regards,

  • Varun

    Awesome Post!! Have one doubt, as you launched ES cluster do we’ve to specify all three ES nodes IP in Logstash server Output or just one ES node IP which would then replicate the data to remaining two, if yes,then what will happen if that one node goes down? Also, is it possible to use ELB(Elastic Load Balancer) to ensure HA of ES cluster?

    • Hi,

      In theory you should point your output to some load balancer, to have an option to fail. In the example case indeed, we do point the output only to one host, but it’s just an example.

  • martyc

    This is very helpful
    I’ve been trying this using docker on AWS machines. Each of the three machines work and I can goto 9200 on each etc. though they do not seem to be aware of each other or discovery isn’t working correctly.
    I have them all in the same cluster name and security group & vpc witheach ones ip address as its publish.host. Any ideas?

    • Hi martyc,

      Unfortunately I don’t have any experience with Docker ES configuration.

      However, you are right, the machines should be able to access each other. Try to login on the machine N1 (if bash is available) and telnet to the machine N2 on the port 9200.

      Or another thing, if you are using docker compose – I think you can use the hostnames in the ES configuration, which would incapsulate the IPs.

      • martyc

        Thank you, I will try hostnames to see if they work too as if they do it will be very cool. My docker-compose fikle works well with hostnames as constraints so maybe they will work for configuration also
        I solved my issue by adding the network.bind_host: 0.0.0.0 which is required for Docker to work. Your tutorial has been very helpful

  • Cyril Prince

    Worked on my AWS Linux Machine

    cluster.name: myES_Cluster
    node.name: ESNODE_CYR
    node.master: true
    node.data: true
    transport.host: localhost
    transport.tcp.port: 9300
    http.port: 9200
    network.host: 0.0.0.0
    discovery.zen.minimum_master_nodes:

    I have tried with this on the elasticsearch.yml (key:value) and worked fine for me. But it takes 2 days to fix it 😉 :slight_smile: , going on with ES Doc is so tough.