Hadoop Administration: Installation scripts of Apache Hadoop (2.6.0) on Ubuntu Unicorn as Multi-Node cluster
March 8, 2015 Leave a comment
Recently , just published a quick step by step guide on deployment of Apache Hadoop (2.6.0) single -node cluster on Ubuntu unicorn(14.10) image, you can get the full installation video here.
Here, the full deployment of Apache Hadoop (2.6.0) multi-node cluster setup details are provided. The primary hardward requirements are needed to run the setup :
1. VMware Player/Workstation(if Windows/Linux) or VMware Fusion(if OSX)
2. More than 4 GB of RAM for primary OS
3. More than 60 GB of Disk space
4. Intel VT-X capable processor.
5. Ubuntu/CentOs/Red Hat/Sese OS Image(as guest OS)
Now, the step by step multinode hadoop clustering scripts are provided.
Checkout the Ipaddress of each master & slaves node:
$ifconfig
Namenode > hadoopmaster > 192.168.23.132
Datanodes > hadoopslave1 > 192.168.23.133
hadoopslave2 > 192.168.23.134
hadoopslave3 > 192.168.23.135
Clone Hadoop Single node cluster as hadoopmaster
Hadoopmaster Node
$ sudo gedit /etc/hosts
hadoopmaster 192.168.23.132
hadoopslave1 192.168.23.133
hadoopslave2 192.168.23.134
hadoopslave3 192.168.23.135
$ sudo gedit /etc/hostname
hadoopmaster
$ cd /usr/local/hadoop/etc/hadoop
$ sudo gedit core-site.xml
replace localhost as hadoopmaster
$ sudo gedit hdfs-site.xml
replace value 1 as 3 (represents no of datanode)
$ sudo gedit yarn-site.xml
add the following configuration
<configuration>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>hadoopmaster:8025</value>
<property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>hadoopmaster:8030</value>
<property>
<property>
<name>yarn.resourcemanager.address</name>
<value>hadoopmaster:8050</value>
<property>
</configuration>
$ sudo gedit mapred-site.xml.template
replace mapreduce.framework.name as mapred.job.tracker
replace yarn as hadoopmaster:54311
$ sudo rm -rf /usr/local/hadoop/hadoop_data
Shutdown hadoopmaster node
Clone Hadoopmaster Node as hadoopslave1, hadoopslave2, hadoopslave3
Hadoopslave Node (conf should be done on each slavenode)
$ sudo gedit /etc/hostname
hadoopslave<nodenumberhere>
$ sudo mkdir -p /usr/local/hadoop/hadoop_data/hdfs/datanode
$ sudo chown -R trainer:trainer /usr/local/hadoop
$ sudo gedit /usr/local/hadoop/etc/hadoop/hdfs-site.xml
remove dfs.namenode.dir property section
reboot all nodes
Hadoopmaster Node
$ sudo gedit /usr/local/hadoop/etc/hadoop/masters
hadoopmaster
$ sudo gedit /usr/local/hadoop/etc/hadoop/slaves
remove localhost and add
hadoopslave1
hadoopslave2
hadoopslave3
$ sudo gedit /usr/local/hadoop/etc/hadoop/hdfs-site.xml
remove dfs.datanode.dir property section
$ sudo mkdir -p /usr/local/hadoop/hadoop_data/hdfs/namenode
$ sudo chown -R trainer:trainer /usr/local/hadoop
$ sudo ssh-copy-id -i ~/.ssh/id_dsa.pub trainer@hadoopmaster
$ sudo ssh-copy-id -i ~/.ssh/id_dsa.pub trainer@hadoopslave1
$ sudo ssh-copy-id -i ~/.ssh/id_dsa.pub trainer@hadoopslave2
$ sudo ssh-copy-id -i ~/.ssh/id_dsa.pub trainer@hadoopslave3
$ ssh hadoopmaster
$ exit
$ ssh hadoopslave1
$ exit
$ ssh hadoopslave2
$ exit
$ ssh hadoopslave3
$ exit
$ hadoop namenode -format
$ start-all.sh
$ jps (check in all 3 datanodes)
for checking Hadoop web console :
http://hadoopmasteripaddress :8088/
http://hadoopmasteripaddress :50070/
http://hadoopmasteripaddress :50090/
http://hadoopmasteripaddress :50075/