Showing posts with label Hadoop. Show all posts
Showing posts with label Hadoop. Show all posts

Saturday, October 26, 2013

Setup Hadoop 2.x (2.2.0) on Linux (Fedora19)

Prerequisite -

1- install Java (openjdk-7-jdk or oracle-jdk-7)
2- install openssh-server

Hadoop Installation Process

1- Add Hadoop Group and User
$ groupadd hadoop
$ useradd hduser -g hadoop
$ passwd hduser

2- After user is created, login using hduser and Setup SSH Certificate
$ ssh-keygen -t rsa -P 'some_text'

$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
$ ssh localhost

3- Download hadoop 2.2and extract
$ wget http://www.trieuvan.com/apache/hadoop/common/hadoop-2.2.0/hadoop-2.2.0.tar.gz
$ tar xvfz hadoop-2.2.0.tar.gz



4- Setup Hadoop Environment Variables
$ vi ~/.bashrc
and add below lines at the end of the file.

#Hadoop variables
export JAVA_HOME=/soft_install/jdk1.7.0_05/
export HADOOP_INSTALL=/home/hduser/hadoop-2.2.0
export PATH=$PATH:$HADOOP_INSTALL/bin
export PATH=$PATH:$HADOOP_INSTALL/sbin
export HADOOP_MAPRED_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_HOME=$HADOOP_INSTALL
export HADOOP_HDFS_HOME=$HADOOP_INSTALL
export YARN_HOME=$HADOOP_INSTALL
###end of paste

$ cd /home/hduser/hadoop-2.2.0
$ vi hadoop-env.sh

create file hadoop-env.sh by adding below lines.

#modify JAVA_HOME
export JAVA_HOME= /soft_install/jdk1.7.0_05/
$ . ~/.bashrc

$ hadoop version
Hadoop 2.2.0
Subversion https://svn.apache.org/repos/asf/hadoop/common -r 1529768
Compiled by hortonmu on 2013-10-07T06:28Z
Compiled with protoc 2.5.0
From source with checksum 79e53ce7994d1628b240f09af91e1af4
This command was run using /home/hduser/hadoop-2.2.0/share/hadoop/common/hadoop-common-2.2.0.jar


Congratulation..! hadoop is installed.


Post-Installation Configuration for Hadoop

$ cd /home/hduser/hadoop-2.2.0/etc/hadoop
$ vi core-site.xml
#Paste following between <configuration>

<property>
   <name>fs.default.name</name>
   <value>hdfs://localhost:9000</value>
</property>

$ vi yarn-site.xml

#Paste following between <configuration>

<property>
   <name>yarn.nodemanager.aux-services</name>
   <value>mapreduce_shuffle</value>
</property>
<property>
   <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
   <value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>


$ cp mapred-site.xml.template mapred-site.xml

$ vi mapred-site.xml
#Paste following between <configuration>

<property>
   <name>mapreduce.framework.name</name>
   <value>yarn</value>
</property>

$ cd ~
$ mkdir -p mydata/hdfs/namdnode
$ mkdir -p mydata/hdfs/datanode
$ cd /home/hduser/hadoop-2.2.0/etc/hadoop
$ vi hdfs-site.xml

Paste following between <configuration> tag

<property>
   <name>dfs.replication</name>
   <value>1</value>
 </property>
 <property>
   <name>dfs.namenode.name.dir</name>
   <value>file:/home/hduser/mydata/hdfs/namenode</value>
 </property>
 <property>
   <name>dfs.datanode.data.dir</name>
   <value>file:/home/hduser/mydata/hdfs/datanode</value>
 </property>

Format Namenode
$ hdfs namenode -format


Start Hadoop Service
$ start-dfs.sh
$ start-yarn.sh
$ jps
If everything successful then you will get following services.
8379 DataNode
9097 NodeManager
8805 ResourceManager
8202 NameNode
9253 Jps
8599 SecondaryNameNode