Monday, 10 February 2014

Solutions for issues faced during hadoop configuration

Solutions for issues faced during hadoop configuration - starting as begineer for Hadoop use is not so straight forward. There are many steps and issues that one has to overcome. I have done it and so putting this up for everyone to refer. Step by step guide for Hadoop configuration is also available here - STEP 1, STEP 2, STEP 3 and STEP 4.

Solutions for issues faced during hadoop configuration

1. Which Hadoop to fetch:
There are two flavors of hadoop - 1.x and 2.x.
The 1.x is the initial one while 2.x was a parallel version which had YARN engine in it. So, go for Hadoop 2.x version
You can find more about hadoop here

2. Which machine to use:
Initial options are Windows and Linux. Since SSH will be extensively used, prefer a flavor of Linux for Hadoop. It will also eliminate the need to licence each instance/node that you will create.
Prefer Ubuntu if you are a extensive Windows user since you will not feel completely lost in the Unix like environment. Also, there is lot of online help on Ubuntu.
Use this guide for downloading Ubuntu and installing it on VM

3. Actual machines or Virtual machines:
I guess this is pretty easy to decide. Virtual machines offcorse. Will need atleast one actual machine with latest configurations and atleast 4GB RAM for VMs to run.

4. Which Virtualization environment:
There are many options but most popular will be Virtual Box by Oracle and VMWare. Virtual box is free and open source. Support wise it is good enough online so prefer Virtual Box.
You can find how to set up the box here

4. Which Java to use:
Most common Java versions for Linux based systems are OpenJDK; and there is always Oracle JDK available. As per the hadoop docomentation, choose a java version. It is best to go for Oracle JDK but an older and test version of Java.

You can find how to install java here

Major Issues:

1. Java and Ubuntu - 32 bit or 64 bit
If your machine is latest one as 64 bit, you may be tempted to go fir a 64 bit version of OS as well as Java. But just don't go for it yet.

Hadoop native libraries are compiled for 32 bit and if you are using 64 bit OS, you may run into problems and errors such as:
WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

The solution to this is recompiling the Hadoop native libraries in your 64 bit machine. Have a look at the native libraries page and building native libraries for hadoop page. But then, it will be lot better to use 32 bit OS instead, isn't it?
(shout yes!!!)

2. Virtual Box - low graphics mode
Virtual box may run into error as Low graphics is On if you are using 32 bit Ubuntu. This is due to a missing guest plugin which comes with Virtual box.
You will have to run the Linux Guest CD image and load it. For this, the initial step is setting the Ubuntu to run kernel commands

sudo apt-get install dkms
then load the Guest addon CD
sudo mount /dev/cdrom /cdrom
sudo sh ./

3. Virtual Box - mouse pointer appears little above the point
This is due to a missing patch. You can have a look at it here.

For fixing this, download the VBoxGuest-linux.c.patch patch file from above link. Then run these commands on your Ubuntu virtual machine
cp VBoxGuest-linux.c.patch /usr/src/vboxguest-4.1.16/vboxguest/VBoxGuest-linux.c
/etc/init.d/vboxadd setup
You can also learn about usage of Hadoop and about Hadoop architecture on