✨Configure Hadoop and start cluster services using Ansible Playbook📌!!!

3 min readMar 21, 2021

Introduction

We use Ansible to automate configuration management, deployment and other IT operations by simply writing playbooks. It is an open-source tool that increases our productivity at a large scale, saving us a lot of time and hassle when we need to perform configuration management on multiple nodes.

In this article, we’ll be automating a Hadoop cluster setup with the help of Ansible. To keep things simple, our cluster comprises two ec2 VM’s over an AWS cloud as Master node and data node and one RedHat vm is our controller node for ansible.

Setting up Ansible

If you already have Ansible set up, you can skip this section.

First, run the ansible --version command to check what version of Ansible you have installed. If this command does not run, then you can install Ansible using pip (Python needs to be installed for Ansible). To install Ansible with pip, you can run pip3 install ansible.

Next we need to create an inventory file that holds the IP address of all our managed nodes. So create an inventory file at any location (eg. vi /root/ip.txt, preferably in some directory where you can later keep your ansible configuration file too.

[namenode] and [datanode] are labels that we can use when writing our playbook. You can name your labels as you like.

Next, create a directory for your ansible configuration file,

[root@localhost ~]# mkdir /etc/ansible

In this directory, create a configuration file, vim ansible.cfg and add the following content,

With this, our setup for ansible is complete. We can now start writing our playbook to configure Hadoop.

Creating Playbook

Create a directory as your workspace, for example mkdir /hadoopwsInside this workspace, create a playbook (extension .yml), for example,vim hadoop.yml.In this task,i created two seperate playbooks for Master and datanode.

To Know the code of the playbooks, kindly visit https://github.com/swapnil0309/Setting-Up-Hadoop-Cluster-with-Ansible.git

Lets, Check the connectivity between Control node and Managed node i.e(namenode and datanode).