Elasticsearch cluster on AWS. Part 1- preparing the environment.

In this article, I would show how to build a solid elasticsearch cluster on AWS. In the part 1 we would cover all the details of the AWS EC2 instance creation and preparing the environment to setup the elasticsearch cluster.

First of all to build the AWS cluster we should launch 3 EC2 instances. Why 3, not 2? After this small discussion I found that the cluster of 2 instances is not much better then the cluster of single instance.

As operation system we would use Ubuntu.

Creating the instances

Creating the instances

Selecting the operation system

Selecting the operation system

As our cluster would not be the real one, then we would use m3.medium kind of instance. If you’re building the real cluster, then, you probably want to have a lot of CPUs (so it would index everything faster) and a lot of RAM (so it can load your big indices to RAM during the search).

Selecting the instance type

Selecting the instance type

Configuring the instance

Configuring the instance

On the step 4, it could be reasonable to add some space to your instatnce. As you’re going to index a lot and therefore store a lot of information. In our case we would live it as is, but have in mind that you can do that. Here you can read how to mount the EBS to your instance.

Adding the storage

Adding the storage

Tagging the instance

Tagging the instance

It’s reasonable to add the elasticsearch cluster to the separate security group:

Adding the new security group

Adding the new security group

Configuring the security group

Configuring the security group

Next steps – reviewing the instances we are going to launch and pointing the key pair to access them.

Reviewing the instances

Reviewing the instances

Pointing the key pair

Pointing the key pair

The instances are launching:

The instances are launching

The instances are launching

Overview of created instances

Overview of created instances

How we should create 3 elastic IPs and assign them to our instances, so we can restart the instances without the fear that they would receive new IP. Also, let’s agree, how we would name our instances, in my case those would be Homer, Marge and Bart.

 

Allocating the elastic IP

Allocating the elastic IP

Allocating the new address

Allocating the new address

Associate address 1

Associate address 1

Associate address 2

Associate address 2

Associated addresses overview

Associated addresses overview

The instances overview

The instances overview

One more thing, to make the configuration even more convinient – we would update our local /etc/hosts to associate the instance names with their IPs.

And now, we would state that hostname on each of the instances (you should do it while you are on the instance).

(the same should be done for the other instances)

After that, we would see the hostname, when we’re logged to the host:

Overview of the hostname on the instance

Overview of the hostname on the instance

Another thing we should do in the AWS back-end – create the IAM user, so the cluster instances would receive the access to our AWS account and would be able to find each other.

To do that, open the IAM Management Console and create a new user.

Creating the user

Creating the user

Entering the user names

Entering the user names

User credentials

User credentials

You should write down those credentials, because we would use them during the instances configuration. IAM Management Console allows you to download them.

Plus, you should give the user read-only permissions, so it can read the EC2 instances list.

User permissions

User permissions

User policy template

User policy template

User policy overview

User policy overview

That’s all. Now we are ready to start the cluster configuration.

Elasticsearch cluster on AWS. Part 2 – configuring the elasticsearch.

  • Pingback: Elasticsearch cluster on AWS. Part 2 - configuring the elasticsearch. - Pavel Polyakov's blog()

  • Mark Kerzner

    Pavel,

    thank you for the manual. As always, there is a lot of information available on any subject, but not all of it is clear and to the point. Your is. One question though: I like the idea of assigning elastic ip’s. Now I can stop the cluster when I don’t need it, and hope that it will come up correctly later on. I could potentially use this approach with Hadoop clusters as well. However, does it mean that my machines are going to communicate with each other through external IP’s, and if so, will such communications be charged for by Amazon? Or will Amazon figure out to use the internal ones? Or better yet, will the internal ones also be constant?

    Thank you – Спасибо

    Mark

    • Hi Mark,

      Thanks for the feedback.

      If I understand correctly – yes, the communication between the servers would be done using the external IPs. If the servers are located inside the one region, then there are no additional costs.
      The internal IPs are dynamic, so I prefer do not use them.

      • Mark Kerzner

        Pavel,

        if you are correct, life is much easier. So, there is no cost, because the external IP in the same region are resolved into internal ones, and the security group rules would be the same for external IP as for internals – that is, once you open all ports within the security group, it will be open also for the external IPs.

        I just could not find the place in the documentation where this is confirmed explicitly, such as maybe here, http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-instance-addressing.html.

        This really would be a big deal for me, because it would change the instructions I give to the students when I teach Hadoop 🙂

        Thank you,
        Mark

        • Hi Mark,

          Unfortunately I can’t say how it works internally, but as I know the external IPs in the same region are not billable.

          • Mark Kerzner

            Pavel, what will you say to this quote from AmazonData Transfer OUT From Amazon EC2 ToAmazon S3, Amazon Glacier, Amazon DynamoDB, Amazon SES, Amazon SQS, or Amazon SimpleDB in the same AWS Region$0.00 per GBAmazon EC2, Amazon RDS, Amazon Redshift or Amazon ElastiCache instances, Amazon Elastic Load Balancing, or Elastic Network Interfaces in the same Availability Zone Using a private IP address$0.00 per GB Using a public or Elastic IP address$0.01 per GB

          • S Srini

            Thanks for the interesting article Pavel. Mark’s comments on charges for the use elastic IP address even within the same zone is still valid. This is an unknown amount, which can be significant per month. How do we address hadoop services breaking if I were to start/stop the ec2 clusters please?

      • Mark Kerzner