AdSense

Sunday, 9 February 2014

Hadoop and Ubuntu - step 4

Hadoop and Ubuntu - step 4 - Install and configure Hadoop is the last step for creating the single node of hadoop

In the step 1 of the set up available here, step 2 is available here. We took a look at installation of Linux based OS (Ubuntu) for Hadoop as we opted for Linux instead of Windows for Hadoop. We also saw the reasons for the preference. We installed and Configured our chosed Java - Oracle Java.
Hadoop
STEP 1 - Choose and configure (Linux) OS of choice on Machine of Choice
STEP 2 - Install Java and configure it on the machine
STEP 3 - Configure SSH and user for SSH on Ubuntu

STEP 4 - Download and Configure Hadoop

Login as HDUser. Download Hadoop - 2.x tar file from the any mirror here

Uncompress the Hadoop tar gz file and move it to /usr/local. We will also change owner. Use Terminal for all commands

cd Downloads

sudo tar vxzf hadoop-2.2.0.tar.gz -C /usr/local

cd /usr/local

sudo mv hadoop-2.2.0 hadoop

sudo chown -R hduser:hadoop hadoop

Update the HDUser's .bashrc file
cd ~


gksudo gedit .bashrc

Update the file at the end with below text. Use jdk folder name same as actual folder - something like "jdk-7-i386" (check in /usr/lib/jvm)


#Hadoop variables

export JAVA_HOME=/usr/lib/jvm/jdk/

export HADOOP_INSTALL=/usr/local/hadoop

export PATH=$PATH:$HADOOP_INSTALL/bin

export PATH=$PATH:$HADOOP_INSTALL/sbin

export HADOOP_MAPRED_HOME=$HADOOP_INSTALL

export HADOOP_COMMON_HOME=$HADOOP_INSTALL

export HADOOP_HDFS_HOME=$HADOOP_INSTALL

export YARN_HOME=$HADOOP_INSTALL

#end of update
Save and close

Now open hadoop-env.sh for updating Java Home (JAVA_HOME)
gksudo gedit /usr/local/hadoop/etc/hadoop/hadoop-env.sh

export JAVA_HOME=/usr/lib/jvm/jdk/

Save and close. Reboot the system and login with HDUser again.

Now, verify Hadoop installation for terminal
hadoop version

This should give something like below

Hadoop 2.2.0
Subversion https://svn.apache.org/repos/asf/hadoop/common -r 1529768

Compiled by hortonmu on 2013-10-07T06:28Z

Compiled with protoc 2.5.0

From source with checksum 79e53ce7994d1628b240f09af91e1af4

This command was run using /usr/local/hadoop/share/hadoop/common/hadoop-common-2.2.0.jar


If you get it, Congratulations!!! Hadoop is now successfully installed. If not, put me a comment on contact page

Now we configure it by updating its xml files

Open core-site.xml and add the given text between <configuration> </configuration> tags
gksudo gedit /usr/local/hadoop/etc/hadoop/core-site.xml

<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>

Save and Close the file

Open yarn-site.xml and add the given text between <configuration> </configuration> tags
gksudo gedit /usr/local/hadoop/etc/hadoop/yarn-site.xml

<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>

<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>

Save and Close the file

Open mapred-site.xml.template and add the given text between <configuration> </configuration> tags
gksudo gedit /usr/local/hadoop/etc/hadoop/mapred-site.xml.template

<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>

Save the file as mapred-site.xml in /usr/local/hadoop/etc/hadoop/ directory and Close the file

Lets now create Name Node and Data Node through terminal
cd ~

mkdir -p mydata/hdfs/namenode

mkdir -p mydata/hdfs/datanode
Now, update hdfs-site.xml and add the given text between <configuration> </configuration> tags
gksudo gedit /usr/local/hadoop/etc/hadoop/hdfs-site.xml

<property>
<name>dfs.replication</name>
<value>1</value>
</property>

<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/hduser/mydata/hdfs/namenode</value>
</property>

<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/hduser/mydata/hdfs/datanode</value>
</property>

Next, we will format the hdfs for our first use of Hadoop and start the Hadoop Services
hdfs namenode -format

start-dfs.sh

start-yarn.sh

Verify the Hadoop nodes running by
jps

The below should appear in output

2970 ResourceManager
3461 Jps
3177 NodeManager
2361 NameNode
2840 SecondaryNameNode

This completes the set up steps for Hadoop.

You can also learn about usage of Hadoop and about Hadoop architecture on BabaGyan.com.