Configure hadoop cluster using ansible

Deepanshu Yadav
4 min readMar 16, 2021

prerequisites- Ansible installed on your system

if not installed you can install it using the command pip3 install ansible. or you can also use yum, first configure the epel and then use yum install ansible(this one is better way)

🔥Now let’s see how to write ansible playbook.🔥

First configure the inventory file with the IPs that you want to use as namenode and datanode/slave. Below is my inventory where ip 101 is master and 110 is of slave. see below👇

Now, first you have to make some configuration files of hadoop that we will be uploading in master and slave systems. There are two types of files core-site.xml and hdfs-site.xml both of master and slaves.

I have a separate folder named task11 where i am putting all these files so that it can be used by playbook.

workspace

The four files are below👇

hdfs-site-master.xml
core-site-master.xml
hdfs-site-slave.xml
core-site-slave.xml

Now time to write the playbook:

In the above ss i have first configured the master/namenode. In the hosts, it is the ip of the system that i want to be master.

In the tasks part i first copied the java and hadoop RPMs to master(or managed node) which are already present in my controller node and then installed them using the shell module.

Then i made the directory /nn which i want to be as the main folder of master. Then i uploaded the hdfs and core files of master that we made using the template module.

Then the last part i did is first formatted the /nn folder and the started the master services using the shell module.

In the end just used the debug module to print the o/p of jps command that shows master configured properly or not.

Now the time to write the second part of the playbook i.e. configuring the datanode👇

All the steps here also are also same, only the uploaded files names are different and the ip that you want to be datanode/slave.

Now time to run the playbook

🍤So, this playbook will configure the cluster as you can see above. Finally you can go to your master or slave system and see the reports. it would be like below.🍤

🔰Below is the playbook code you can use. Just change the IPs of namenode and datanode. Also change the IPs in the configuration files and name of the shared folders if you want.🔰

- hosts: 192.168.0.101
tasks:
- copy:
src: "/root/hadoop-1.2.1-1.x86_64.rpm"
dest: "/root/hadoop-1.2.1-1.x86_64.rpm"
- copy:
src: "/root/jdk-8u171-linux-x64.rpm"
dest: "/root/jdk-8u171-linux-x64.rpm"
- shell: "rpm -ivh /root/jdk-8u171-linux-x64.rpm"
- shell: "rpm -ivh /root/hadoop-1.2.1-1.x86_64.rpm"
- file:
path: "/nn"
state: directory
- template:
src: hdfs-site-master.xml
dest: /etc/hadoop/hdfs-site.xml
- template:
src: core-site-master.xml
dest: /etc/hadoop/core-site.xml
- shell:
cmd: "echo Y | hadoop namenode -format"
ignore_errors: True
- command: "hadoop-daemon.sh start namenode"
- command: "jps"
register: x
- debug:
var: x
- hosts: 192.168.0.110
tasks:
- copy:
src: "/root/hadoop-1.2.1-1.x86_64.rpm"
dest: "/root/hadoop-1.2.1-1.x86_64.rpm"
- copy:
src: "/root/jdk-8u171-linux-x64.rpm"
dest: "/root/jdk-8u171-linux-x64.rpm"
- shell: "rpm -ivh /root/jdk-8u171-linux-x64.rpm"
- shell: "rpm -ivh /root/hadoop-1.2.1-1.x86_64.rpm"
- file:
path: "/dn2"
state: directory
- template:
src: hdfs-site-slave.xml
dest: /etc/hadoop/hdfs-site.xml
- template:
src: core-site-slave.xml
dest: /etc/hadoop/core-site.xml
- command: "hadoop-daemon.sh start datanode"
- command: "jps"
register: y
- debug:
var: y

--

--