AdSense

Sunday, 23 February 2014

Exploring Cloud platform on Google

Exploring Cloud platform on Google requires us to enable billing in Google Cloud Console settings. However, Bitnami was free for an hour. Last time we looked at Bitmami cloud service and Ghost blog. It has been a busy 8-10 weeks and mostly dealing with Google Cloud Platform (GCP).

Google Cloud has below products:

Google Compute Engine (GCE)
Google App Engine (GAE)
Google Cloud SQL
Google Cloud Storage
Google Cloud Datastore
Google BigQuery
Google Cloud DNS
Google Cloud Endpoints
Google Translate API
Google Prediction API
Google Deployment Manager
Google Cloud SDK

I have been creating Virtual Machines (VM) for own purposes on Bitnami, AWS, Google, etc. For this, I found that while Bitnami was absolutely easy to start with, Google was touch difficult. However, the amount and quality of documents available for Google was superb.

To create and manage a VM on Google, we need below resources:

Google Compute Engine (GCE) Local Unix/Linux system OR Windows system with Cygwin Python Google Cloud SDK VNC Viewer on local system
00

To create a VM, navigate to https://console.developers.google.com. There you will find "Compute Engine" on left navigation panel. Click open the Compute Engine and from each next sub-tabs, you can create a VM, add disk to it, take snapshots, write network rules, perform load balancing, etc.

Once the VM is up and running, you need to use GCUTIL from google cloud sdk to access the VM. You can the ssh into the vm and start VNC server. Once VNC server is up and running, you can easily access the VM's UI from VNC viewer and do additional activities that you want to perform.
01

Soon enough, we will probably look at Amazon Web Services. Then we can have enough things ready for a comparison of the Cloud compute service providers. If possible, we will look at few more providers.

Monday, 17 February 2014

Tool: Project Stages Tracker in excel for Startups - free to use and share

Tool: Project Stages Tracker in excel for Startups - free to use and share In Project Management, there are many phases or stages such as Opportunity Discovery, Contract negotiation, Contract Sign off, Development, Testing and Implementation. Sometimes it is all followed by Support, Enhancement and Maintenance.

There are different tools - free and paid that are well verse of each phase of the Project Execution. However, for initial Start-Ups or Small Businesses, it will not be feasible to start using any available tool unless there is visibility of how the Business is going to grow.

For this reason and many other, we have created a simple to use to use and track excel - it spans from Opportunity to Delivery of Projects.

This is free to use and share EXCEL spreadsheet by BabaGyan.
There are three sheets - Report, Tracker and Values. The Report tab provides you with the summary of your business. The Values has the List of Values that you can use. Currently it is defined for Stages and Products/Solutions on BabaGyan.com.
The Tracker sheet is the one where you will list down all your actual Business Opportunities and Projects.

Good luck in using this. Free download and share. You can provide your feedback or queries on our Contact page or comments on this post. Here is Project Tracker Excel file to download.

Sunday, 16 February 2014

Exploring Ghost Blogging platform on Bitnami

BabaGyan.com Exploring Ghost Blogging platform on Bitnami. Ghost is free, open source blogging platform based on Node.js. It is maintained on Bitnami so I decided to visit bitnami and try it to post something on BabaGyan.com via Ghost. Simple review - Good to start :)

It was fun and project :)

Here are some screens:

This is the Editor
Ghost Editor Ghost Blog post for BabaGyan.com Ghost user info for BabaGyan.com Ghost Blog post for BabaGyan.com 00

Monday, 10 February 2014

Solutions for issues faced during hadoop configuration

Solutions for issues faced during hadoop configuration - starting as begineer for Hadoop use is not so straight forward. There are many steps and issues that one has to overcome. I have done it and so putting this up for everyone to refer. Step by step guide for Hadoop configuration is also available here - STEP 1, STEP 2, STEP 3 and STEP 4.
Hadoop

Solutions for issues faced during hadoop configuration

1. Which Hadoop to fetch:
There are two flavors of hadoop - 1.x and 2.x.
The 1.x is the initial one while 2.x was a parallel version which had YARN engine in it. So, go for Hadoop 2.x version
You can find more about hadoop here

2. Which machine to use:
Initial options are Windows and Linux. Since SSH will be extensively used, prefer a flavor of Linux for Hadoop. It will also eliminate the need to licence each instance/node that you will create.
Prefer Ubuntu if you are a extensive Windows user since you will not feel completely lost in the Unix like environment. Also, there is lot of online help on Ubuntu.
Use this guide for downloading Ubuntu and installing it on VM

3. Actual machines or Virtual machines:
I guess this is pretty easy to decide. Virtual machines offcorse. Will need atleast one actual machine with latest configurations and atleast 4GB RAM for VMs to run.

4. Which Virtualization environment:
There are many options but most popular will be Virtual Box by Oracle and VMWare. Virtual box is free and open source. Support wise it is good enough online so prefer Virtual Box.
You can find how to set up the box here

4. Which Java to use:
Most common Java versions for Linux based systems are OpenJDK; and there is always Oracle JDK available. As per the hadoop docomentation, choose a java version. It is best to go for Oracle JDK but an older and test version of Java.

You can find how to install java here

Major Issues:

1. Java and Ubuntu - 32 bit or 64 bit
If your machine is latest one as 64 bit, you may be tempted to go fir a 64 bit version of OS as well as Java. But just don't go for it yet.

Hadoop native libraries are compiled for 32 bit and if you are using 64 bit OS, you may run into problems and errors such as:
WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

The solution to this is recompiling the Hadoop native libraries in your 64 bit machine. Have a look at the native libraries page and building native libraries for hadoop page. But then, it will be lot better to use 32 bit OS instead, isn't it?
(shout yes!!!)

2. Virtual Box - low graphics mode
Virtual box may run into error as Low graphics is On if you are using 32 bit Ubuntu. This is due to a missing guest plugin which comes with Virtual box.
You will have to run the Linux Guest CD image and load it. For this, the initial step is setting the Ubuntu to run kernel commands

sudo apt-get install dkms
then load the Guest addon CD
sudo mount /dev/cdrom /cdrom
sudo sh ./VBoxLinuxAdditions.run


3. Virtual Box - mouse pointer appears little above the point
This is due to a missing patch. You can have a look at it here.

For fixing this, download the VBoxGuest-linux.c.patch patch file from above link. Then run these commands on your Ubuntu virtual machine
cp VBoxGuest-linux.c.patch /usr/src/vboxguest-4.1.16/vboxguest/VBoxGuest-linux.c
/etc/init.d/vboxadd setup
You can also learn about usage of Hadoop and about Hadoop architecture on BabaGyan.com.

Sunday, 9 February 2014

Hadoop and Ubuntu - step 4

Hadoop and Ubuntu - step 4 - Install and configure Hadoop is the last step for creating the single node of hadoop

In the step 1 of the set up available here, step 2 is available here. We took a look at installation of Linux based OS (Ubuntu) for Hadoop as we opted for Linux instead of Windows for Hadoop. We also saw the reasons for the preference. We installed and Configured our chosed Java - Oracle Java.
Hadoop
STEP 1 - Choose and configure (Linux) OS of choice on Machine of Choice
STEP 2 - Install Java and configure it on the machine
STEP 3 - Configure SSH and user for SSH on Ubuntu

STEP 4 - Download and Configure Hadoop

Login as HDUser. Download Hadoop - 2.x tar file from the any mirror here

Uncompress the Hadoop tar gz file and move it to /usr/local. We will also change owner. Use Terminal for all commands

cd Downloads

sudo tar vxzf hadoop-2.2.0.tar.gz -C /usr/local

cd /usr/local

sudo mv hadoop-2.2.0 hadoop

sudo chown -R hduser:hadoop hadoop

Update the HDUser's .bashrc file
cd ~


gksudo gedit .bashrc

Update the file at the end with below text. Use jdk folder name same as actual folder - something like "jdk-7-i386" (check in /usr/lib/jvm)


#Hadoop variables

export JAVA_HOME=/usr/lib/jvm/jdk/

export HADOOP_INSTALL=/usr/local/hadoop

export PATH=$PATH:$HADOOP_INSTALL/bin

export PATH=$PATH:$HADOOP_INSTALL/sbin

export HADOOP_MAPRED_HOME=$HADOOP_INSTALL

export HADOOP_COMMON_HOME=$HADOOP_INSTALL

export HADOOP_HDFS_HOME=$HADOOP_INSTALL

export YARN_HOME=$HADOOP_INSTALL

#end of update
Save and close

Now open hadoop-env.sh for updating Java Home (JAVA_HOME)
gksudo gedit /usr/local/hadoop/etc/hadoop/hadoop-env.sh

export JAVA_HOME=/usr/lib/jvm/jdk/

Save and close. Reboot the system and login with HDUser again.

Now, verify Hadoop installation for terminal
hadoop version

This should give something like below

Hadoop 2.2.0
Subversion https://svn.apache.org/repos/asf/hadoop/common -r 1529768

Compiled by hortonmu on 2013-10-07T06:28Z

Compiled with protoc 2.5.0

From source with checksum 79e53ce7994d1628b240f09af91e1af4

This command was run using /usr/local/hadoop/share/hadoop/common/hadoop-common-2.2.0.jar


If you get it, Congratulations!!! Hadoop is now successfully installed. If not, put me a comment on contact page

Now we configure it by updating its xml files

Open core-site.xml and add the given text between <configuration> </configuration> tags
gksudo gedit /usr/local/hadoop/etc/hadoop/core-site.xml

<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>

Save and Close the file

Open yarn-site.xml and add the given text between <configuration> </configuration> tags
gksudo gedit /usr/local/hadoop/etc/hadoop/yarn-site.xml

<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>

<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>

Save and Close the file

Open mapred-site.xml.template and add the given text between <configuration> </configuration> tags
gksudo gedit /usr/local/hadoop/etc/hadoop/mapred-site.xml.template

<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>

Save the file as mapred-site.xml in /usr/local/hadoop/etc/hadoop/ directory and Close the file

Lets now create Name Node and Data Node through terminal
cd ~

mkdir -p mydata/hdfs/namenode

mkdir -p mydata/hdfs/datanode
Now, update hdfs-site.xml and add the given text between <configuration> </configuration> tags
gksudo gedit /usr/local/hadoop/etc/hadoop/hdfs-site.xml

<property>
<name>dfs.replication</name>
<value>1</value>
</property>

<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/hduser/mydata/hdfs/namenode</value>
</property>

<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/hduser/mydata/hdfs/datanode</value>
</property>

Next, we will format the hdfs for our first use of Hadoop and start the Hadoop Services
hdfs namenode -format

start-dfs.sh

start-yarn.sh

Verify the Hadoop nodes running by
jps

The below should appear in output

2970 ResourceManager
3461 Jps
3177 NodeManager
2361 NameNode
2840 SecondaryNameNode

This completes the set up steps for Hadoop.

You can also learn about usage of Hadoop and about Hadoop architecture on BabaGyan.com.

Saturday, 8 February 2014

Hadoop and Ubuntu - step 3

Hadoop and Ubuntu - step 2 - Configure SSH and user for SSH on Ubuntu

In the step 1 of the set up available here, step 2 is available here. We took a look at installation of Linux based OS (Ubuntu) for Hadoop as we opted for Linux instead of Windows for Hadoop. We also saw the reasons for the preference. We installed and Configured our chosed Java - Oracle Java.
STEP 1- Choose and configure (Linux) OS of choice on Machine of Choice
00
STEP 2 - Install Java and configure it on the machine

STEP 3 - Configure SSH and user for SSH on Ubuntu

This step is pretty straight forward. We create a user and a user group. All hadoop cluster nodes will be using similar user name and will be part of same group. Lets call the group as Hadoop and the user as HdUser. Then we will create a SSH with its RSA key and no authentication (for ease of access by Hadoop).

Use Ubuntu Terminal window and below commands.

Create group

sudo addgroup hadoop

Create User and add it to the group
sudo adduser --ingroup hadoop hduser


Login as HdUser and generate SSH key
su - hduser

ssh-keygen -t rsa -P ""

Generating public/private rsa key pair. Enter file in which to save the key (/home/hduser/.ssh/id_rsa): Created directory '/home/hduser/.ssh'. Your identification has been saved in /home/hduser/.ssh/id_rsa. Your public key has been saved in /home/hduser/.ssh/id_rsa.pub. The key fingerprint is: 7b:62:<<more hex codes>>


hduser@ubuntu The key's randomart image is: <<som image>> 

Store the generated key
cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys

Test SSH
ssh localhost

The authenticity of host 'localhost (::1)' can't be established. RSA key fingerprint is c7:47:55:<<more hex code>>. 


Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added 'localhost' (RSA) to the list of known hosts. Linux ubuntu 2.6.32-22-generic #33-Ubuntu SMP Wed Apr 28 13:27:30 UTC 2010 i686 GNU/Linux Ubuntu 10.04 LTS <<info>>


This completes step 3. You can also learn about usage of Hadoop and about Hadoop architecture on BabaGyan.com.

Sunday, 2 February 2014

Hadoop and Ubuntu - step 2 - Install Oracle Java for Hadoop setup.

In the step 1 of the set up available here, we took a look at installation of Linux based OS (Ubuntu) for Hadoop as we opted for Linux instead of Windows for Hadoop. We also saw the reasons for the preference.
STEP 1- Choose and configure (Linux) OS of choice on Machine of Choice
00
STEP 2 - Install Java and configure it on the machine
Available choices for Java are OpenJDK or Oracle JAVA. I preferred Oracle Java. Follow the below instructions for Oracle Java configuration on Ubuntu.

Download the Oracle Java from its own official download page. Version should be compatible with your OS and machine type (32 or 64 bit). It will be now in some folder as Downloads.

Uncompress it from Terminal window using

tar -xvf jdk-7u2-linux-x64.tar.gz

The uncompressed directory (name depends on downloaded version, here jdk1.7.0) should be in /usr/lib under jvm. So lets move it there using. Only JRE will be available as Software through official Ubuntu. We need latest JRE with latest JDK from Oracle, hence we download.
sudo mkdir -p /usr/lib/jvm


sudo mv ./jdk1.7.0_02 /usr/lib/jvm/jdk1.7.0

Set environment variables for Java. Open and append file /etc/profile with below code
JAVA_HOME=/usr/local/java/jdk1.7.0_45


PATH=$PATH:$HOME/bin:$JAVA_HOME/bin


JRE_HOME=/usr/local/java/jdk1.7.0/jre


PATH=$PATH:$HOME/bin:$JRE_HOME/bin


export JAVA_HOME


export JRE_HOME


export PATH


Reboot now. After this, Ubuntu has to know that JDK is available; so run below commands
sudo update-alternatives --install "/usr/bin/java" "java" "/usr/lib/jvm/jdk1.7.0/bin/java" 1


sudo update-alternatives --install "/usr/bin/javac" "javac" "/usr/lib/jvm/jdk1.7.0/bin/javac" 1


sudo update-alternatives --config java


Do the same for Javac. That's it, done. You can now check for Java and Javac version as

java -version

java version "1.7.0"


Java(TM) SE Runtime Environment (build 1.7.0-b147)


Java HotSpot(TM) Client VM (build 21.0-b17, mixed mode) 


This completes step 2. You can also learn about usage of Hadoop and about Hadoop architecture on BabaGyan.com.

Hadoop and Ubuntu - step 1

Hadoop and Ubuntu - step 1 for setting up use of Hadoop using Ubuntu OS

There are basically two methods of using Hadoop:
1. Configure Hadoop on Windows - this involves use of Hadoop setup, Cygwin tool, Java and Eclipse.
I have configured this and initially configure it on my laptop, however, when I tried to perform the same configuration on another machine (to be used as another Hadoop node), Cygwin broke down.
As a result, I was not able to complete the whole set up of Hadoop.

2. Configure Hadoop on Linux - Because of the above experience, I decided to go with the Linux based OS for Hadoop.

Using a Linux based OS is best approach for below reasons:
1. Hadoop is designed for Linux based systems (yes, it is)
2. Hadoop requires SSH which is simple to configure in Linux (requires Cygwin in windows - Cygwin basically gives a feel and experience of Linux on Windows)
3. It is a system which is naturally more secure on secure OSs, exibit A - Linux.
STEP 1 - Choose and configure (Linux) OS of choice on Machine of Choice 00

I chose Ubuntu - freely and easily available, good GUI based support for heavy Windows user

I chose to perform the installation on Virtual Box - Open Source Virtualization tool by Oracle.

Down and install Virtual Box on your machine.

Download and install Ubuntu in the Virtual Box as a Guest OS

Get ready for further set up for Hadoop on Ubuntu

This completes step 1.

You can also learn about usage of Hadoop and about Hadoop architecture on BabaGyan.com.

Saturday, 1 February 2014

Usage of Hadoop

Usage of Hadoop's framework in industry - applicable to any of the industry.

Hadoop was originally conceived to solve the problem of storing large quantities of data at a very low cost even for the biggies like Google, Facebook or Yahoo. Low cost implementation of something which new and planned to be used often is very important.

Analytic applications are main output of Hadoop while the application categories can be broadly clubbed into three to four buckets.
1. Refine the data

Irrespective of the verticle/domain, the Hadoop is being used to refine large quantities of data into informational data which is more manageable. The resulting data is loaded into the existing systems and can be simply accessed by traditional tools. Note that this new set of data is much richer than the earlier existing data set.
2. Analysis of Data

Data Analysis' use case can often be where enterprises start by capturing data that was previously being discarded (exhaust data such as web logs, social media data, etc.). This data can be clubbed with other data and can be used to build more applications that uses the trends found in this data to make decisions.
3. Enhancement of applications

Existing applications (web or desktop or mobile) can be further enhanced by use of the data which Hadoop can provide. This can be used to provide user with better service which is customized so that the user comes to them instead of the competitor. Simply understanding of user patterns can achieve this for companies.

Also read - Hadoop architecture
Hadoop

Hadoop's architecture birdview

Hadoop's architecture birdview as said by Mike Olson - CEO of Cloudera

Hadoop is designed to run on a large number of machines that don't share any memory or disks.

That means you can buy a whole bunch of commodity servers, slap them in a rack, and run the Hadoop software on each one.

When you want to load all of your organization's data into Hadoop, what the software does is bust that data into pieces that it then spreads across your different servers. There's no one place where you go to talk to all of your data; Hadoop keeps track of where the data resides.
And because there are multiple copy stores, data stored on a server that goes offline or dies can be automatically replicated from a known good copy.
Hadoop

In a centralized database system, you have got one big disk connected to four or eight or 16 big processors. But that is as much horsepower as you can bring to bear. In a Hadoop cluster, every one of those servers has two or four or eight CPU cores.

You can run your indexing job by sending your code to each of the dozens of servers in your cluster, and each server operates on its own little piece of the data. Results are then delivered back to you in a unified whole. That's MapReduce: you map the operation out to all of those servers and then you reduce the results back into a single result set.

Architecturally, the reason you're able to deal with lots of data is because Hadoop spreads it out. And the reason you're able to ask complicated computational questions is because you have got all of these processors, working in parallel, harnessed together.