Install multinode cloudera hadoop cluster cdh5.4.0 manually
This document will guide you regarding how to install multinode cloudera hadoop cluster cdh5.4.0 without Cloudera manager.
In this tutorial I have used 2 Centos 6.6 virtual machines viz. master.hadoop.com & slave.hadoop.com.
Prerequisites:
CentOS 6.X
jdk1.7.X is needed in order to get CDH working. If you have lower version of jdk, please uninstall it and install jdk1.7.X
Master machine – master.hadoop.com (192.168.111.130)
Daemons that we are going to install on master are :
Namenode
HistoryServer
Slave machine – slave.hadoop.com (192.168.111.131)
Daemons that we are going to install on master are :
Resource Manager (Yarn)
Node-manager
Secondary Namenode
Datanode
Important configuration before proceeding further: please add both the hostname and ip information to /etc/hosts file on each host.
[root@master ~]# cat /etc/hosts 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 192.168.111.130 master.hadoop.com 192.168.111.131 slave.hadoop.com
[root@slave ~]# cat /etc/hosts 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 192.168.111.130 master.hadoop.com 192.168.111.131 slave.hadoop.com
Please verify that both the hosts are ping’able from each other
Also,
Please stop the firewall and disable the selinux.
To stop firewall in centos :
service iptables stop && chkconfig iptables off
To disable selinux :
vim /etc/selinux/config
once file is opened, please verify that “SELINUX=disabled” is set.
1.Date should be in sync
Please make sure that master and slave machine’s date is in sync, if not please do it so by configuring NTP.
2.Passwordless ssh must be setup from master –> slave
To setup passwordless ssh follow below procedure :
2a. Generate rsa key pair using ssh-keygen command
[root@master conf]# ssh-keygen Generating public/private rsa key pair. Enter file in which to save the key (/root/.ssh/id_rsa): /root/.ssh/id_rsa already exists. Overwrite (y/n)?
2b. Copy generated public key to slave.hadoop.com
[root@master conf]# ssh-copy-id -i ~/.ssh/id_rsa.pub root@slave.hadoop.com
Now try logging into the machine, with “ssh ‘root@slave.hadoop.com'”, and check in:
.ssh/authorized_keys
to make sure we haven’t added extra keys that you weren’t expecting.
2c. Now try connecting to slave.hadoop.com using ssh
[root@master conf]# ssh root@slave.hadoop.com Last login: Fri Apr 24 14:20:43 2015 from master.hadoop.com [root@slave ~]# logout Connection to slave.hadoop.com closed. [root@master conf]#
That’s it! You have successfully configured passwordless ssh between master and slave node.
3. Internet connection
Please make sure that you have working internet connection, as we are going to download CDH packages in next steps.
4. Install cdh repo
4a. download cdh repo rpm
[root@master ~]# wget http://archive.cloudera.com/cdh5/one-click-install/redhat/6/x86_64/cloudera-cdh-5-0.x86_64.rpm
4b. install cdh repo downloaded in above step
[root@master ~]# yum --nogpgcheck localinstall cloudera-cdh-5-0.x86_64.rpm Loaded plugins: fastestmirror, refresh-packagekit, security setting up Local Package Process .... Complete!
4c. do the same steps on slave node
[root@slave ~]# wget http://archive.cloudera.com/cdh5/one-click-install/redhat/6/x86_64/cloudera-cdh-5-0.x86_64.rpm
[root@slave ~]# yum --nogpgcheck localinstall cloudera-cdh-5-0.x86_64.rpm Loaded plugins: fastestmirror, refresh-packagekit, security Setting up Local Package Process …… Complete!
5. Install and deploy ZooKeeper.
[root@master ~]# yum -y install zookeeper-server Loaded plugins: fastestmirror, refresh-packagekit, security Setting up Install Process ….. Complete!
5a. create zookeeper dir and apply permissions
[root@master ~]# mkdir -p /var/lib/zookeeper [root@master ~]# chown -R zookeeper /var/lib/zookeeper/
5b. Init zookeeper and start the service
[root@master ~]# service zookeeper-server init No myid provided, be sure to specify it in /var/lib/zookeeper/myid if using non-standalone
[root@master ~]# service zookeeper-server start JMX enabled by default Using config: /etc/zookeeper/conf/zoo.cfg Starting zookeeper ... STARTED
6. Install namenode on master machine
yum -y install hadoop-hdfs-namenode
7. Install secondary namenode on slave machine
yum -y install hadoop-hdfs-secondarynamenode
8. Install resource manager on slave machine
yum -y install hadoop-yarn-resourcemanager
9. Install nodemanager, datanode & mapreduce on slave node
yum -y install hadoop-yarn-nodemanager hadoop-hdfs-datanode hadoop-mapreduce
10. Install history server and yarn proxyserver on master machine
yum -y install hadoop-mapreduce-historyserver hadoop-yarn-proxyserver
11. On both the machine you can install hadoop-client package
yum -y install hadoop-client
Now we are done with the installation, it’s time to deploy HDFS!
1. On each node, execute below commands :
[root@master ~]# cp -r /etc/hadoop/conf.empty /etc/hadoop/conf.my_cluster [root@master ~]# alternatives --install /etc/hadoop/conf hadoop-conf /etc/hadoop/conf.my_cluster 50 [root@master ~]# alternatives --set hadoop-conf /etc/hadoop/conf.my_cluster
[root@slave ~]# cp -r /etc/hadoop/conf.empty /etc/hadoop/conf.my_cluster [root@slave ~]# alternatives --install /etc/hadoop/conf hadoop-conf /etc/hadoop/conf.my_cluster 50 [root@slave ~]# alternatives --set hadoop-conf /etc/hadoop/conf.my_cluster
2. Let’s configure hdfs properties now :
Goto /etc/hadoop/conf/ dir on master node and edit below property files:
2a. vim /etc/hadoop/conf/core-site.xml
Add below lines in it under <configuration> tag
<property> <name>fs.defaultFS</name> <value>hdfs://master.hadoop.com:8020</value> </property>
2b. vim /etc/hadoop/conf/hdfs-site.xml
<property> <name>dfs.permissions.superusergroup</name> <value>hadoop</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>file:///data/1/dfs/nn,file:///nfsmount/dfs/nn</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:///data/1/dfs/dn,file:///data/2/dfs/dn,file:///data/3/dfs/dn,file:///data/4/dfs/dn</value></property> <property> <name>dfs.namenode.http-address</name> <value>192.168.111.130:50070</value> <description> The address and the base port on which the dfs NameNode Web UI will listen. </description> </property>
3. scp core-site.xml and hdfs-site.xml to slave.hadoop.com at /etc/hadoop/conf/
[root@master conf]# scp core-site.xml hdfs-site.xml slave.hadoop.com:/etc/hadoop/conf/ core-site.xml 100% 1001 1.0KB/s 00:00 hdfs-site.xml 100% 1669 1.6KB/s 00:00 [root@master conf]#
4. Create local directories:
On master host run below commands:
mkdir -p /data/1/dfs/nn /nfsmount/dfs/nn chown -R hdfs:hdfs /data/1/dfs/nn /nfsmount/dfs/nn chmod 700 /data/1/dfs/nn /nfsmount/dfs/nn chmod go-rx /data/1/dfs/nn /nfsmount/dfs/nn
On slave host run below commands:
mkdir -p /data/1/dfs/dn /data/2/dfs/dn /data/3/dfs/dn /data/4/dfs/dn chown -R hdfs:hdfs /data/1/dfs/dn /data/2/dfs/dn /data/3/dfs/dn /data/4/dfs/dn
5. Format the namenode :
sudo -u hdfs hdfs namenode -format
6. Start hdfs services
Run below commands on master and slave
for x in `cd /etc/init.d ; ls hadoop-hdfs-*` ; do service $x start ; done
7. Create hdfs tmp dir
Run on any of the hadoop node
[root@slave ~]# sudo -u hdfs hadoop fs -mkdir /tmp [root@slave ~]# sudo -u hdfs hadoop fs -chmod -R 1777 /tmp
Congratulations! You have deployed hdfs successfully 
Deploy Yarn
1. Prepare yarn configuration properties
replace your /etc/hadoop/conf/mapred-site.xml with below contents on master host
[root@master conf]# cat mapred-site.xml
<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>yarn.app.mapreduce.am.staging-dir</name> <value>/user</value> </property> </configuration>
2. Replace your /etc/hadoop/conf/yarn-site.xml with below contents on master host
[root@master conf]# cat yarn-site.xml
<configuration> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> <property> <name>yarn.log-aggregation-enable</name> <value>true</value> </property> <property> <description>List of directories to store localized files in.</description> <name>yarn.nodemanager.local-dirs</name> <value>file:///var/lib/hadoop-yarn/cache/${user.name}/nm-local-dir</value> </property> <property> <description>Where to store container logs.</description> <name>yarn.nodemanager.log-dirs</name> <value>file:///var/log/hadoop-yarn/containers</value> </property> <property> <description>Where to aggregate logs to.</description> <name>yarn.nodemanager.remote-app-log-dir</name> <value>hdfs://master.hadoop.com:8020/var/log/hadoop-yarn/apps</value> </property> <property> <description>Classpath for typical applications.</description> <name>yarn.application.classpath</name> <value> $HADOOP_CONF_DIR, $HADOOP_COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*, $HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/lib/*, $HADOOP_MAPRED_HOME/*,$HADOOP_MAPRED_HOME/lib/*, $HADOOP_YARN_HOME/*,$HADOOP_YARN_HOME/lib/* </value> </property> <property> <name>yarn.resourcemanager.hostname</name> <value>slave.hadoop.com</value> </property> <property> <name>yarn.nodemanager.local-dirs</name> <value>file:///data/1/yarn/local,file:///data/2/yarn/local,file:///data/3/yarn/local</value> </property> <property> <name>yarn.nodemanager.log-dirs</name> <value>file:///data/1/yarn/logs,file:///data/2/yarn/logs,file:///data/3/yarn/logs</value> </property> </configuration>
3. Copy modified files to slave machine.
[root@master conf]# scp mapred-site.xml yarn-site.xml slave.hadoop.com:/etc/hadoop/conf/ mapred-site.xml 100% 1086 1.1KB/s 00:00 yarn-site.xml 100% 2787 2.7KB/s 00:00 [root@master conf]#
4. Configure local directories for yarn
To be done on yarn machine i.e. slave.hadoop.com in our case
[root@slave ~]# mkdir -p /data/1/yarn/local /data/2/yarn/local /data/3/yarn/local /data/4/yarn/local [root@slave ~]# mkdir -p /data/1/yarn/logs /data/2/yarn/logs /data/3/yarn/logs /data/4/yarn/logs [root@slave ~]# chown -R yarn:yarn /data/1/yarn/local /data/2/yarn/local /data/3/yarn/local /data/4/yarn/local [root@slave ~]# chown -R yarn:yarn /data/1/yarn/logs /data/2/yarn/logs /data/3/yarn/logs /data/4/yarn/logs
5. Configure the history server.
Add below properties in mapred-site.xml
<property> <name>mapreduce.jobhistory.address</name> <value>master.hadoop.com:10020</value> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>master.hadoop.com:19888</value> </property>
6. Configure proxy settings for history server
Add below properties in /etc/hadoop/conf/core-site.xml
<property> <name>hadoop.proxyuser.mapred.groups</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.mapred.hosts</name> <value>*</value> </property>
7. Copy modified files to slave.hadoop.com
[root@master conf]# scp mapred-site.xml core-site.xml slave.hadoop.com:/etc/hadoop/conf/ mapred-site.xml 100% 1299 1.3KB/s 00:00 core-site.xml 100% 1174 1.2KB/s 00:00 [root@master conf]#
8. Create history directories and set permissions
[root@master conf]# sudo -u hdfs hadoop fs -mkdir -p /user/history [root@master conf]# sudo -u hdfs hadoop fs -chmod -R 1777 /user/history [root@master conf]# sudo -u hdfs hadoop fs -chown mapred:hadoop /user/history
9. Create log directories and set permissions
[root@master conf]# sudo -u hdfs hadoop fs -mkdir -p /var/log/hadoop-yarn [root@master conf]# sudo -u hdfs hadoop fs -chown yarn:mapred /var/log/hadoop-yarn
10. Verify hdfs file structure
[root@master conf]# sudo -u hdfs hadoop fs -ls -R / drwxrwxrwt - hdfs hadoop 0 2015-04-25 01:16 /tmp drwxr-xr-x - hdfs hadoop 0 2015-04-25 02:52 /user drwxrwxrwt - mapred hadoop 0 2015-04-25 02:52 /user/history drwxr-xr-x - hdfs hadoop 0 2015-04-25 02:53 /var drwxr-xr-x - hdfs hadoop 0 2015-04-25 02:53 /var/log drwxr-xr-x - yarn mapred 0 2015-04-25 02:53 /var/log/hadoop-yarn [root@master conf]#
11. Start yarn and Jobhistory server
On slave.hadoop.com
[root@slave ~]# sudo service hadoop-yarn-resourcemanager start starting resourcemanager, logging to /var/log/hadoop-yarn/yarn-yarn-resourcemanager-slave.hadoop.com.out Started Hadoop resourcemanager: [ OK ] [root@slave ~]#
[root@slave ~]# sudo service hadoop-yarn-nodemanager start starting nodemanager, logging to /var/log/hadoop-yarn/yarn-yarn-nodemanager-slave.hadoop.com.out Started Hadoop nodemanager: [ OK ] [root@slave ~]#
On master.hadoop.com
[root@master conf]# sudo service hadoop-mapreduce-historyserver start starting historyserver, logging to /var/log/hadoop-mapreduce/mapred-mapred-historyserver-master.hadoop.com.out 15/04/25 02:56:01 INFO hs.JobHistoryServer: STARTUP_MSG: /************************************************************ STARTUP_MSG: Starting JobHistoryServer STARTUP_MSG: host = master.hadoop.com/192.168.111.130 STARTUP_MSG: args = [] STARTUP_MSG: version = 2.6.0-cdh5.4.0 STARTUP_MSG: classpath = STARTUP_MSG: build = http://github.com/cloudera/hadoop -r c788a14a5de9ecd968d1e2666e8765c5f018c271; compiled by 'jenkins' on 2015-04-21T19:18Z STARTUP_MSG: java = 1.7.0_79 - - - ************************************************************/ Started Hadoop historyserver: [ OK ] [root@master conf]#
12. Create user for running mapreduce jobs
[root@master conf]# sudo -u hdfs hadoop fs -mkdir /user/kuldeep [root@master conf]# sudo -u hdfs hadoop fs -chown kuldeep /user/kuldeep
13. Important: Don’t forget to set core hadoop services to auto start when OS boot ups.
On master.hadoop.com
[root@master conf]# sudo chkconfig hadoop-hdfs-namenode on [root@master conf]# sudo chkconfig hadoop-mapreduce-historyserver on
On slave.hadoop.com
[root@slave ~]# sudo chkconfig hadoop-yarn-resourcemanager on [root@slave ~]# sudo chkconfig hadoop-hdfs-secondarynamenode on [root@slave ~]# sudo chkconfig hadoop-yarn-nodemanager on [root@slave ~]# sudo chkconfig hadoop-hdfs-datanode on
Final step : check UIs 
Namenode UI
Yarn UI
Job History Server UI
Secondary Namenode UI








I just could hardly get away from your blog just before suggesting that we very appreciated the normal details anyone source on your website visitors? Will likely be back routinely so as to check out new articles
Thank you so much! Appriciate your feedback
Thank you so much Kuldeep for this awesome blog! really like the way you have presented this.
Keep up the great work.
Cheers!
Sometimes, the most effective posts are not the most popular, but this really is great in my book.
Peculiar article, just what I needed.
Thanks so much!!
Piece of writing writing is also a fun, if you be familiar with
after that you can write if not it is complex to write.
I am actually grateful to the holder of this site who has
shared this fantastic paragraph at here.
Amazing things here. I am very happy to peer your
post. Thank you a lot and I’m looking ahead to touch you.
Hi it’s me, I am also visiting this web page regularly,
this website is in fact nice and the visitors are genuinely sharing nice thoughts.
I wanted to thank you for this great read!! I definitely loved every
little bit of it. I have got you book-marked to look at new stuff you
post…
Thanks!
Awesome can you post for upgradation of hadoop one version to another.
Sure! it’s on our todo list. thank you
wow, awesome blog article.Really thank you! Keep writing.
Very neat article post.Thanks Again. Really Cool.
great read!
Thanks for sharing, this is a fantastic blog article.Thanks Again. Want more.
Im obliged for the blog post.Really looking forward to read more. Will read on…
Fantastic article.Really thank you! Fantastic.
Great post. Thanks for the great article!!
Really informative post.Much thanks again. Great.
I appreciate you sharing this article post.Thanks Again. Much obliged.
Wow, great blog post.Thanks Again. Will read on…
Im obliged for the blog post.Really thank you! Really Cool.
I am so grateful for your article.Thanks Again. Want more.
Awesome blog.Thanks Again. Awesome.
Thanks again for the blog article.Really thank you! Cool.
Wow, this post is good, my sister is analyzing such things, so I am going to convey her.
I really like and appreciate your blog article.Much thanks again. Will read on…
Great, thanks for sharing this article post.Thanks Again. Want more.
Awesome post.Much thanks again. Really Great.
A big thank you for your blog article.Really looking forward to read more. Really Great.
Really informative article post.Thanks Again. Really Great.
Great info!
very nice post, i certainly love this website, keep on it
I simply want to mention I am just newbie to weblog and truly loved your blog site. Probably I’m likely to bookmark your website . You surely come with superb well written articles. With thanks for sharing your webpage.
HI Sir,
A very big thanks to you for above cluster setup, through this I could able to setup my own cluster. But cloud you also tell me how to setup spark on this cluser, please .
The article is nice. However, the jobtracker and tasktracker services are not running. Could you please help me with that.
This is an awesome walkthrough. I’m in the process of using these steps to create docker containers for master and slave.
Question: Is there a reason why Resource Manager is on slave and not the master? I believe you only need one resource manager per cluster. If you are going to have multiple slaves, it’s going to be weird to have multiple resource managers
You can install it on any of the node. Here I have just tried to balance the load.