Monday, 30 June 2014

Case Study: Multiply Revenue as well as Customer Base

The Case:
This is a project which can be very well used by any indstry to Multiply their revenue, increase the customer base as well as multiply loyal customer base with increased customer satisfaction ratio.

The Company is a multi-national sports and lifestyle shoes manufacturer and retailer which has stores spread across Asia and Europe. In its 15 years of existence, Company had created a loyal customer base primarily in the form of trendy teens and middle-aged (25-40 years old) men and women.
While the footwear was mid-range to premium, the growth was never stumped for the first 14 years. Infact, they recorded a growth of 12% for the 5 year period from 2007-2012.

Prior to 2005, their only sales channel was in the form of company owned physical stores spread across Asia. Then in 2005, they expanded their presence through the franchise route to Europe. They also sell through multi-brand retail outlets and also have tie-ups with online retailers.
In 2008, they started their website followed by a mobile app in March 2013. While the web platform served as an end-to-end channel allowing potential customers to scan through the variety of footwear, make a choice and complete the order by making the payment, the mobile app was essentially a scanning platform with no functionality to complete the order.

In early 2013, seeing the boom of social media as a channel, they quickly designed a social media campaign where people could dedicate a love message to their upset counterparts on the Facebook page of Company.
Though the campaign was highly successful and widely appreciated in terms of its popularity, it did not translate to enhanced sales. Their social presence was restricted to this campaign.

In November 2013, the CEO and founder of the Company realized the need to use ‘Digital’ intelligently to achieve the business strategy and as a result, an expert in Digital Strategy and implementation was brought in.

The case study is to help the Company to digitally transform using below 3 point strategy:
Design a digital strategy leveraging the digital channels and assets to maximize the revenue for the Company.
How can the Company utilize technology to leverage the data from all channels to enhance the customer experience?
The technology architecture to support the digital strategy
The Solution:
Design a digital strategy leveraging the digital channels and assets to maximize the revenue for the Company.
Develop fully functional mobile and website apps
  • Design web and mobile app with new Content and Layout for attracting and engaging target customers for all mobile Operating Systems
  • All engagements to have facility to place Orders and make payments from apps
Promote apps on social media to target audience and loyal customers
  • Reach out on all social media platform and promote new app with attractive contests to attract target audience
  • Specially reach out to existing customers for expanding Loyal customer base
Track apps usage, user experience and feedback at stores
  • Create campaigns/ surveys on website and apps, track usage trends from the campaigns and track user satisfaction feedback from users at stores as well as on social media
Combine all information into “Data Lake” and top It up with Actionable Analytics Dashboard for maximizing revenue
  • Create Data Lake with all web and apps info, user feedback, carts, orders and all sources
  • Provide decision makers with Actionable Dashboards using Advanced Analytics
How can the Company utilize technology to leverage the data from all channels to enhance the customer experience?
Data from all channels and sources into Data Lake
  • Pull all data from apps and website, usage tracking, orders, carts and payments into a single destination Data Lake
  • The Data Lake is ready to use for Advanced Analytics to create dashboards using historical data from social media, payments and campaigns
Actionable dashboards on top of Data Lake
  • The information from Data Lake will be put into dashboards that will highlight User experience related points on negative as well as positive scale for all channels
  • Action items can be placed from the dashboards for channel managers so that User experience can be iteratively enhanced at all levels and channels
  • The dashboards and reports using information from Data Lake can also be effectively used to convey action items for maximizing revenue
Immediate use, effectiveness tracking and iterative enhancement
  • Historical data in Data Lake will enable Dashboards to be useful from day one
  • With incremental data and action items tracking, the solution will get further enhanced iteratively resulting in effective tracking of returns on investment for Analytics
The technology architecture to support the digital strategy
Cloud infrastructure for creating Data Lake
  • Due to presence of various data sources as well as due to huge amount of data for the Data Lake, using a scalable cloud infrastructure for Data Lake creation and Storage is required
    • Integration Engine – tools like Informatica, Talend or Pentaho
    • Cloud Engine – PaaS such as Heroku, Google App Engine or Redhat Openshift
    • Storage – Google BigQuery, Google Storage, AWS S3 or AWS Redshift
Advanced Data Analytics tools with Statistical engine
  • With need for user experience enhancement, sentiment analysis needs to be a part of Analytics solution on Data Lake
    • Statistical engine – R language and engine
    • Scripting engine – Python
  • Other advanced analytics techniques with ad-hoc data exploration feature on all devices need to be incorporated for suggesting immediate action items for maximizing revenue and improving user experience
    • Analytics and presentation engine – Qlikview, Tableau or Pentaho BI

Sunday, 23 February 2014

Exploring Cloud platform on Google

Exploring Cloud platform on Google requires us to enable billing in Google Cloud Console settings. However, Bitnami was free for an hour. Last time we looked at Bitmami cloud service and Ghost blog. It has been a busy 8-10 weeks and mostly dealing with Google Cloud Platform (GCP).

Google Cloud has below products:

Google Compute Engine (GCE)
Google App Engine (GAE)
Google Cloud SQL
Google Cloud Storage
Google Cloud Datastore
Google BigQuery
Google Cloud DNS
Google Cloud Endpoints
Google Translate API
Google Prediction API
Google Deployment Manager
Google Cloud SDK

I have been creating Virtual Machines (VM) for own purposes on Bitnami, AWS, Google, etc. For this, I found that while Bitnami was absolutely easy to start with, Google was touch difficult. However, the amount and quality of documents available for Google was superb.

To create and manage a VM on Google, we need below resources:

Google Compute Engine (GCE) Local Unix/Linux system OR Windows system with Cygwin Python Google Cloud SDK VNC Viewer on local system

To create a VM, navigate to There you will find "Compute Engine" on left navigation panel. Click open the Compute Engine and from each next sub-tabs, you can create a VM, add disk to it, take snapshots, write network rules, perform load balancing, etc.

Once the VM is up and running, you need to use GCUTIL from google cloud sdk to access the VM. You can the ssh into the vm and start VNC server. Once VNC server is up and running, you can easily access the VM's UI from VNC viewer and do additional activities that you want to perform.

Soon enough, we will probably look at Amazon Web Services. Then we can have enough things ready for a comparison of the Cloud compute service providers. If possible, we will look at few more providers.

Monday, 17 February 2014

Tool: Project Stages Tracker in excel for Startups - free to use and share

Tool: Project Stages Tracker in excel for Startups - free to use and share In Project Management, there are many phases or stages such as Opportunity Discovery, Contract negotiation, Contract Sign off, Development, Testing and Implementation. Sometimes it is all followed by Support, Enhancement and Maintenance.

There are different tools - free and paid that are well verse of each phase of the Project Execution. However, for initial Start-Ups or Small Businesses, it will not be feasible to start using any available tool unless there is visibility of how the Business is going to grow.

For this reason and many other, we have created a simple to use to use and track excel - it spans from Opportunity to Delivery of Projects.

This is free to use and share EXCEL spreadsheet by BabaGyan.
There are three sheets - Report, Tracker and Values. The Report tab provides you with the summary of your business. The Values has the List of Values that you can use. Currently it is defined for Stages and Products/Solutions on
The Tracker sheet is the one where you will list down all your actual Business Opportunities and Projects.

Good luck in using this. Free download and share. You can provide your feedback or queries on our Contact page or comments on this post. Here is Project Tracker Excel file to download.

Sunday, 16 February 2014

Exploring Ghost Blogging platform on Bitnami Exploring Ghost Blogging platform on Bitnami. Ghost is free, open source blogging platform based on Node.js. It is maintained on Bitnami so I decided to visit bitnami and try it to post something on via Ghost. Simple review - Good to start :)

It was fun and project :)

Here are some screens:

This is the Editor
Ghost Editor Ghost Blog post for Ghost user info for Ghost Blog post for 00

Monday, 10 February 2014

Solutions for issues faced during hadoop configuration

Solutions for issues faced during hadoop configuration - starting as begineer for Hadoop use is not so straight forward. There are many steps and issues that one has to overcome. I have done it and so putting this up for everyone to refer. Step by step guide for Hadoop configuration is also available here - STEP 1, STEP 2, STEP 3 and STEP 4.

Solutions for issues faced during hadoop configuration

1. Which Hadoop to fetch:
There are two flavors of hadoop - 1.x and 2.x.
The 1.x is the initial one while 2.x was a parallel version which had YARN engine in it. So, go for Hadoop 2.x version
You can find more about hadoop here

2. Which machine to use:
Initial options are Windows and Linux. Since SSH will be extensively used, prefer a flavor of Linux for Hadoop. It will also eliminate the need to licence each instance/node that you will create.
Prefer Ubuntu if you are a extensive Windows user since you will not feel completely lost in the Unix like environment. Also, there is lot of online help on Ubuntu.
Use this guide for downloading Ubuntu and installing it on VM

3. Actual machines or Virtual machines:
I guess this is pretty easy to decide. Virtual machines offcorse. Will need atleast one actual machine with latest configurations and atleast 4GB RAM for VMs to run.

4. Which Virtualization environment:
There are many options but most popular will be Virtual Box by Oracle and VMWare. Virtual box is free and open source. Support wise it is good enough online so prefer Virtual Box.
You can find how to set up the box here

4. Which Java to use:
Most common Java versions for Linux based systems are OpenJDK; and there is always Oracle JDK available. As per the hadoop docomentation, choose a java version. It is best to go for Oracle JDK but an older and test version of Java.

You can find how to install java here

Major Issues:

1. Java and Ubuntu - 32 bit or 64 bit
If your machine is latest one as 64 bit, you may be tempted to go fir a 64 bit version of OS as well as Java. But just don't go for it yet.

Hadoop native libraries are compiled for 32 bit and if you are using 64 bit OS, you may run into problems and errors such as:
WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

The solution to this is recompiling the Hadoop native libraries in your 64 bit machine. Have a look at the native libraries page and building native libraries for hadoop page. But then, it will be lot better to use 32 bit OS instead, isn't it?
(shout yes!!!)

2. Virtual Box - low graphics mode
Virtual box may run into error as Low graphics is On if you are using 32 bit Ubuntu. This is due to a missing guest plugin which comes with Virtual box.
You will have to run the Linux Guest CD image and load it. For this, the initial step is setting the Ubuntu to run kernel commands

sudo apt-get install dkms
then load the Guest addon CD
sudo mount /dev/cdrom /cdrom
sudo sh ./

3. Virtual Box - mouse pointer appears little above the point
This is due to a missing patch. You can have a look at it here.

For fixing this, download the VBoxGuest-linux.c.patch patch file from above link. Then run these commands on your Ubuntu virtual machine
cp VBoxGuest-linux.c.patch /usr/src/vboxguest-4.1.16/vboxguest/VBoxGuest-linux.c
/etc/init.d/vboxadd setup
You can also learn about usage of Hadoop and about Hadoop architecture on

Sunday, 9 February 2014

Hadoop and Ubuntu - step 4

Hadoop and Ubuntu - step 4 - Install and configure Hadoop is the last step for creating the single node of hadoop

In the step 1 of the set up available here, step 2 is available here. We took a look at installation of Linux based OS (Ubuntu) for Hadoop as we opted for Linux instead of Windows for Hadoop. We also saw the reasons for the preference. We installed and Configured our chosed Java - Oracle Java.
STEP 1 - Choose and configure (Linux) OS of choice on Machine of Choice
STEP 2 - Install Java and configure it on the machine
STEP 3 - Configure SSH and user for SSH on Ubuntu

STEP 4 - Download and Configure Hadoop

Login as HDUser. Download Hadoop - 2.x tar file from the any mirror here

Uncompress the Hadoop tar gz file and move it to /usr/local. We will also change owner. Use Terminal for all commands

cd Downloads

sudo tar vxzf hadoop-2.2.0.tar.gz -C /usr/local

cd /usr/local

sudo mv hadoop-2.2.0 hadoop

sudo chown -R hduser:hadoop hadoop

Update the HDUser's .bashrc file
cd ~

gksudo gedit .bashrc

Update the file at the end with below text. Use jdk folder name same as actual folder - something like "jdk-7-i386" (check in /usr/lib/jvm)

#Hadoop variables

export JAVA_HOME=/usr/lib/jvm/jdk/

export HADOOP_INSTALL=/usr/local/hadoop







#end of update
Save and close

Now open for updating Java Home (JAVA_HOME)
gksudo gedit /usr/local/hadoop/etc/hadoop/

export JAVA_HOME=/usr/lib/jvm/jdk/

Save and close. Reboot the system and login with HDUser again.

Now, verify Hadoop installation for terminal
hadoop version

This should give something like below

Hadoop 2.2.0
Subversion -r 1529768

Compiled by hortonmu on 2013-10-07T06:28Z

Compiled with protoc 2.5.0

From source with checksum 79e53ce7994d1628b240f09af91e1af4

This command was run using /usr/local/hadoop/share/hadoop/common/hadoop-common-2.2.0.jar

If you get it, Congratulations!!! Hadoop is now successfully installed. If not, put me a comment on contact page

Now we configure it by updating its xml files

Open core-site.xml and add the given text between <configuration> </configuration> tags
gksudo gedit /usr/local/hadoop/etc/hadoop/core-site.xml


Save and Close the file

Open yarn-site.xml and add the given text between <configuration> </configuration> tags
gksudo gedit /usr/local/hadoop/etc/hadoop/yarn-site.xml



Save and Close the file

Open mapred-site.xml.template and add the given text between <configuration> </configuration> tags
gksudo gedit /usr/local/hadoop/etc/hadoop/mapred-site.xml.template


Save the file as mapred-site.xml in /usr/local/hadoop/etc/hadoop/ directory and Close the file

Lets now create Name Node and Data Node through terminal
cd ~

mkdir -p mydata/hdfs/namenode

mkdir -p mydata/hdfs/datanode
Now, update hdfs-site.xml and add the given text between <configuration> </configuration> tags
gksudo gedit /usr/local/hadoop/etc/hadoop/hdfs-site.xml




Next, we will format the hdfs for our first use of Hadoop and start the Hadoop Services
hdfs namenode -format

Verify the Hadoop nodes running by

The below should appear in output

2970 ResourceManager
3461 Jps
3177 NodeManager
2361 NameNode
2840 SecondaryNameNode

This completes the set up steps for Hadoop.

You can also learn about usage of Hadoop and about Hadoop architecture on

Saturday, 8 February 2014

Hadoop and Ubuntu - step 3

Hadoop and Ubuntu - step 2 - Configure SSH and user for SSH on Ubuntu

In the step 1 of the set up available here, step 2 is available here. We took a look at installation of Linux based OS (Ubuntu) for Hadoop as we opted for Linux instead of Windows for Hadoop. We also saw the reasons for the preference. We installed and Configured our chosed Java - Oracle Java.
STEP 1- Choose and configure (Linux) OS of choice on Machine of Choice
STEP 2 - Install Java and configure it on the machine

STEP 3 - Configure SSH and user for SSH on Ubuntu

This step is pretty straight forward. We create a user and a user group. All hadoop cluster nodes will be using similar user name and will be part of same group. Lets call the group as Hadoop and the user as HdUser. Then we will create a SSH with its RSA key and no authentication (for ease of access by Hadoop).

Use Ubuntu Terminal window and below commands.

Create group

sudo addgroup hadoop

Create User and add it to the group
sudo adduser --ingroup hadoop hduser

Login as HdUser and generate SSH key
su - hduser

ssh-keygen -t rsa -P ""

Generating public/private rsa key pair. Enter file in which to save the key (/home/hduser/.ssh/id_rsa): Created directory '/home/hduser/.ssh'. Your identification has been saved in /home/hduser/.ssh/id_rsa. Your public key has been saved in /home/hduser/.ssh/ The key fingerprint is: 7b:62:<<more hex codes>>

hduser@ubuntu The key's randomart image is: <<som image>> 

Store the generated key
cat $HOME/.ssh/ >> $HOME/.ssh/authorized_keys

Test SSH
ssh localhost

The authenticity of host 'localhost (::1)' can't be established. RSA key fingerprint is c7:47:55:<<more hex code>>. 

Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added 'localhost' (RSA) to the list of known hosts. Linux ubuntu 2.6.32-22-generic #33-Ubuntu SMP Wed Apr 28 13:27:30 UTC 2010 i686 GNU/Linux Ubuntu 10.04 LTS <<info>>

This completes step 3. You can also learn about usage of Hadoop and about Hadoop architecture on

Sunday, 2 February 2014

Hadoop and Ubuntu - step 2 - Install Oracle Java for Hadoop setup.

In the step 1 of the set up available here, we took a look at installation of Linux based OS (Ubuntu) for Hadoop as we opted for Linux instead of Windows for Hadoop. We also saw the reasons for the preference.
STEP 1- Choose and configure (Linux) OS of choice on Machine of Choice
STEP 2 - Install Java and configure it on the machine
Available choices for Java are OpenJDK or Oracle JAVA. I preferred Oracle Java. Follow the below instructions for Oracle Java configuration on Ubuntu.

Download the Oracle Java from its own official download page. Version should be compatible with your OS and machine type (32 or 64 bit). It will be now in some folder as Downloads.

Uncompress it from Terminal window using

tar -xvf jdk-7u2-linux-x64.tar.gz

The uncompressed directory (name depends on downloaded version, here jdk1.7.0) should be in /usr/lib under jvm. So lets move it there using. Only JRE will be available as Software through official Ubuntu. We need latest JRE with latest JDK from Oracle, hence we download.
sudo mkdir -p /usr/lib/jvm

sudo mv ./jdk1.7.0_02 /usr/lib/jvm/jdk1.7.0

Set environment variables for Java. Open and append file /etc/profile with below code




export JAVA_HOME

export JRE_HOME

export PATH

Reboot now. After this, Ubuntu has to know that JDK is available; so run below commands
sudo update-alternatives --install "/usr/bin/java" "java" "/usr/lib/jvm/jdk1.7.0/bin/java" 1

sudo update-alternatives --install "/usr/bin/javac" "javac" "/usr/lib/jvm/jdk1.7.0/bin/javac" 1

sudo update-alternatives --config java

Do the same for Javac. That's it, done. You can now check for Java and Javac version as

java -version

java version "1.7.0"

Java(TM) SE Runtime Environment (build 1.7.0-b147)

Java HotSpot(TM) Client VM (build 21.0-b17, mixed mode) 

This completes step 2. You can also learn about usage of Hadoop and about Hadoop architecture on

Hadoop and Ubuntu - step 1

Hadoop and Ubuntu - step 1 for setting up use of Hadoop using Ubuntu OS

There are basically two methods of using Hadoop:
1. Configure Hadoop on Windows - this involves use of Hadoop setup, Cygwin tool, Java and Eclipse.
I have configured this and initially configure it on my laptop, however, when I tried to perform the same configuration on another machine (to be used as another Hadoop node), Cygwin broke down.
As a result, I was not able to complete the whole set up of Hadoop.

2. Configure Hadoop on Linux - Because of the above experience, I decided to go with the Linux based OS for Hadoop.

Using a Linux based OS is best approach for below reasons:
1. Hadoop is designed for Linux based systems (yes, it is)
2. Hadoop requires SSH which is simple to configure in Linux (requires Cygwin in windows - Cygwin basically gives a feel and experience of Linux on Windows)
3. It is a system which is naturally more secure on secure OSs, exibit A - Linux.
STEP 1 - Choose and configure (Linux) OS of choice on Machine of Choice 00

I chose Ubuntu - freely and easily available, good GUI based support for heavy Windows user

I chose to perform the installation on Virtual Box - Open Source Virtualization tool by Oracle.

Down and install Virtual Box on your machine.

Download and install Ubuntu in the Virtual Box as a Guest OS

Get ready for further set up for Hadoop on Ubuntu

This completes step 1.

You can also learn about usage of Hadoop and about Hadoop architecture on

Saturday, 1 February 2014

Usage of Hadoop

Usage of Hadoop's framework in industry - applicable to any of the industry.

Hadoop was originally conceived to solve the problem of storing large quantities of data at a very low cost even for the biggies like Google, Facebook or Yahoo. Low cost implementation of something which new and planned to be used often is very important.

Analytic applications are main output of Hadoop while the application categories can be broadly clubbed into three to four buckets.
1. Refine the data

Irrespective of the verticle/domain, the Hadoop is being used to refine large quantities of data into informational data which is more manageable. The resulting data is loaded into the existing systems and can be simply accessed by traditional tools. Note that this new set of data is much richer than the earlier existing data set.
2. Analysis of Data

Data Analysis' use case can often be where enterprises start by capturing data that was previously being discarded (exhaust data such as web logs, social media data, etc.). This data can be clubbed with other data and can be used to build more applications that uses the trends found in this data to make decisions.
3. Enhancement of applications

Existing applications (web or desktop or mobile) can be further enhanced by use of the data which Hadoop can provide. This can be used to provide user with better service which is customized so that the user comes to them instead of the competitor. Simply understanding of user patterns can achieve this for companies.

Also read - Hadoop architecture

Hadoop's architecture birdview

Hadoop's architecture birdview as said by Mike Olson - CEO of Cloudera

Hadoop is designed to run on a large number of machines that don't share any memory or disks.

That means you can buy a whole bunch of commodity servers, slap them in a rack, and run the Hadoop software on each one.

When you want to load all of your organization's data into Hadoop, what the software does is bust that data into pieces that it then spreads across your different servers. There's no one place where you go to talk to all of your data; Hadoop keeps track of where the data resides.
And because there are multiple copy stores, data stored on a server that goes offline or dies can be automatically replicated from a known good copy.

In a centralized database system, you have got one big disk connected to four or eight or 16 big processors. But that is as much horsepower as you can bring to bear. In a Hadoop cluster, every one of those servers has two or four or eight CPU cores.

You can run your indexing job by sending your code to each of the dozens of servers in your cluster, and each server operates on its own little piece of the data. Results are then delivered back to you in a unified whole. That's MapReduce: you map the operation out to all of those servers and then you reduce the results back into a single result set.

Architecturally, the reason you're able to deal with lots of data is because Hadoop spreads it out. And the reason you're able to ask complicated computational questions is because you have got all of these processors, working in parallel, harnessed together.

Sunday, 26 January 2014

Pros and Cons of Gmail displaying images in your emails

The news from GMail is that the Google's Gmail service will finally show images by default in emails. This has been done by serving all the images users receive on its Gmail's proxy servers, instead of serving them from external host servers. 01

Usually the external images are not secured and hence are not displayed directly in emails. By hosting it on GMail's proxy server, GMail is making sure that the images are virus free.

In a post on the official Gmail blog, the company said: "Your messages are more safe and secure, your images are checked for known viruses or malware."

The new Gmail functionality will begin rolling out Thursday to the Web; Gmail mobile apps will get an update early next year.
Here is an interesting thing I would like to check on - the email marketing teams usually use images to track if the email was read by the user or not. Now that the images will be on GMail servers and displayed to user by default, the tracking may be more correct.
Earlier, I believe, that without the click on "Display Images Below", it would have been difficult for trackers to check if email was actully read or not on GMail.

Also, you can change the setting to "Display Images Below" option by using General Settings of GMail.

Saturday, 25 January 2014

CLT20 2013 Twitter Analysis of League

CLT20 just concluded with Mumbai Indians winning the trophy. Congratulation!!!! Mumbai take the double of the IPL and Champions League title the same year. Chennai Super Kings did it in 2010.
Sachin Tendulkar completed 50000 Runs in Cricket... AWESOME!!!
Here is how the Twitter world was discussing the different topics in Champions Trophy T20 this year (2013).

Click here to find a graphical picture of the tweets during CLT20 2013. Total list of topics are here.
This is our very own Social Media Bigdata project using Twitter Firehose - Live Tweets. There were about 300 to 1000 tweets analysed Live per second during the matches.

Babagyan CLT20 2013 Analysis

This is How it works at the technical level.

The twitter data is loaded in to the Analytics server by using Python program. This ocean of tweets is kept into the BabaGyan Gyankund (knowledge base) for further analysis.
Then another program (R with Python) scans the GyanKund tweets and uses its Intelligence to find the relevance and then sends it to database and web server.
The database is then accessed by's Analytics page to show analysis results in text and graphical format.

Thursday, 23 January 2014

ETL BI BigData Past Future

ETL BI BigData Past Future 00

Earlier, many a man years ago, there were only financial application that were used to perform transactions and log data into the database. Then came the applications to cater needs of Industries and hence begun the big volume data storage, backups and history in form of data. It was prominent even in case of financial applications, but not so visible because Financial firms were not as BIG as Industries.
The applications for industry got the term such as ERP.
Now, arise the question, why to store loads and loads of data? Can that be brought to some use? and How? This was the inception of Business Intelligence and an obvious process of Extraction Transformation and Loading (ETL).

This has, from a long time, resulted in many technologies and also jobs in the field of ETL and BI.

Now, after being a niche field for so many years, ETL + BI field has entered a transformation with addition of Big Data compliments.
The transformation has been the result of many things - Internet, discovery of various data sources in various formats such as machine logs, social website data, network logs. Then, there has been huge progress in the field on in-memory database processing and cloud computing making analysis of unbelievably huge data possible.

What was ETL and Bi is now ETL + BI + BigData + Data Mining + Data Analytic + so on and on.....

Here is a list of technologies - incomplete list but a list never-the-less
1. Technologies such as SAP BW (SAP NetWeaver Business Intelligence), SAP BO (SAP Business Objects/ Business Intelligence 4.0), Xcelsius, SAP HANA from ERP leaders SAP,
2. Oracle reports, Oracle Business Intelligence, Oracle forms, Oracle WareHouse Builder from database leaders Oracle
3. Infosphere DataStage, Cognos from mighty IBM
4. SQL Server services - SSIS. SSRS. SSAS as a part of MSBI from Microsoft
5. TeraData, SAS, Ab Initio, Microstrategy,
6. Latest in the area - Tableau, QlikView, Spotfire (Tibco), Sybase IQ, Siebel,
7. Technologies like R systems, Python, Php - all are also becoming a part of new form of ETL + BI + BigData

Soon, you will see some more projects demonstration on on BigData, Data Mining, Text Mining and many more. Stay tuned.

Tuesday, 21 January 2014

Tool: Daily Expense Tracker in excel - free to use and share

Tool: Daily Expense Tracker in excel - free to use and share How many times it has happened that you wondered where all your money went? Where did you spend all of it? How frequently do you take out cash and spend it even more frequently and immediately.
All you wanted is to know a trend of your spending so that you can plan and if necessary, control it.

There are many sites and phone applications that are available for this but not all are easy to use.
Every application has its own method of usage - some ask you to create and define bank accounts while others just don't seem to be secure. All you really want is is place where you can mention how much you spend and earn and keep a track.

Here is an option for you in our very own EXCEL spreadsheet by BabaGyan.
All you have to do is update the rows with your account transaction - both credit and debit. Also available is a Cash sheet where you should enter your cash transactions. The ATM withdrawal transaction from Account tab is added into Available Cash in your Cash tab. This way, you don't need to worry about Cash Credits at all.

There is also an optional Forecasting tab. Here you should list down all the routine transaction like Rent payment, Fuel expenses, EMI installments, etc. This will help you predict how much you can save at the end of the month. If you need more to save, just spend less - at least try for it.

In short - Update the cells in Orange before starting to use the sheet and daily update the cells in Green.

Good luck in using this. Free download and share. You can provide your feedback or queries on our Contact page or comments on this post. Here is Expense tracking Month Excel file to download.

Sunday, 19 January 2014

Ways to create a website

Ways to create a website - Parts of a Website and alternative ways for each part

This is a general question that comes to minds of anyone who want to create an online portfolio or someone running a business and wanting to take it to the web or even to any person curios about Internet.
What are the ways in which websites are made or can be made?

To answer this in most simple way - Many!!! You probably guessed it too. And Yes!! helps you in each of these ways of website development.

Coming back, do not worry, I am not going to stop with this answer. I will brief through some of the most popular and some of the most advance ways of website creation.

There are basically 3 parts of any website - the domain name, the user interface and the server processes. The domain name is the name of the website such as which is purchased by any Registrar. The User Interface and server processes can be separately programmed and deployed on a Website Host.

Then comes the User Interface which can be developed in many options too - like HTML, HTML5, HTML with CSS, Flash, HTML with .Net, HTML with Java, Javascripts, etc.

And then comes the Server process which run on something popularly known as a webserver in combination with a database server optional. The complexity of the webserver depends on the complexity of the things that website should perform.

These complexities provides many alternatives of Host for your website. Host is a computer/server which stores your User interface and server files and executes them for you and eventually makes your Website LIVE.

There are many big and small players in hosting and domain registration. Cost and Complexity drive the choices. Most simple web hosting can be a blog which is mostly free. Most complex kind of hosting can be webserver using cloud computing technology.

Examples of blogs are and blog technologies are and and many more. Web hosting is also provided by many free as well as paid web hosting companies like and Whereas, Amazon Web Services and are examples of Cloud computing technologies which can be used for website hosting too.

Friday, 17 January 2014

LifeSaver: Business Intelligence Projects - Success and Failures

When we talk about business intelligence, we talk about how a certain project was hugely successful while some did not even walk on its feet. We analysed the situation and found few known-unknown combination of things.

Commonly, companies and CIO offices complain about many stuff during and for a BI implementation. We think the top 3 are:

  1. Data - less or more data, accessible or not-so-accessible data
  2. reports and dashboard that are slow in response and probably not suited for the need
  3. IT specific analytics tools which needs training at technical level
Even with these complaints, BI has been a top implementation priority for many years now. Organizations do recognize the value of data and analytics for decisions and outcomes.
So, there is a big need that the BI initiative does not stop and be a part of failed projects. Question is - how can you ensure this?
Answers is not specific to BI — it is same as any other project. For BI, it is even more difficult because the projects are more difficult to kick off and leap. Plus there are many examples on the list of failed BI projects. So, ensuring BI project's success is not just necessary for companies but also of immediate importance.

I always believed that the business have to drive the initiative for BI projects as they are the best judge of its usage. IT, needless to say, plays big part too. So, it ends up being a balance between the two.

We can come up with some major points as necessity for success of BI project
  1. Start from business and keep business in business intelligence
  2. Independence from IT - self service by business a priority
  3. Data as Good foundation
  4. Tools suitable for organization and people
  5. Train the users and prepare them for independence and change
Along with these, some things needs to be avoided too like AVOID
  1. IT-led - which seems easy but often ends in failure due to lack of business inputs
  2. Too strict or no process
  3. Using governance in everything
  4. Shedding responsibilities if external partners are brought to help
  5. Too much focus on development using tools
  6. Ignoring user training.
Hope this helps to achieve success in BI implementations.

Tuesday, 14 January 2014

LifeSaver: Differences between UDT and IDT

Differences between UDT and IDT - universe design tool and the information design tool from SAP Business Objects (SAP BO) version 3.x and 4.x.

The information design tool or IDT is a new modeling tool from SAP for the semantic layer that enables you to play with metadata from relational and OLAP sources, and to create and deploy SAP BusinessObjects universes.

The old Business Objects 3.1 Classic universe features available in Universe Designer have been redeveloped and incorporated in the information design tool.

This post will help you in finding the options in IDT which you must be already very familiar in Universe Designer. Thus enabling you to migrate from BO 3.1 XI to BO 4.0 with no issues.
Start using IDT based on your expertize of Universe Designer.

Universe design > Quick Design Wizard
No similar functionality

Relational universe creation workflow:
1. Click File > New > Project.
2. Click File > New > Relational Connection.
3. Click File > New > Data Foundation.
4. Click File > New > Business Layer.

OLAP universe creation workflow:
1. Click File > New > Project.
2. Click File > New > OLAP Connection.
3. Click File > New > Business Layer.

File > Import
Click File > Retrieve a Published Universe.
File > Export

Universe publishing workflow:
1. From within Local Projects, right-click the Connection (*.cnx) and
click Publish Connection to a Repository.
2. From within Local Projects, right-click the Business Layer (*.blx) and click Publish > To a Repository.

File > Metadata Exchange
No similar functionality

File > Parameters > Definition > Description
From within a Data Foundation (*.dfx), click Properties.
File > Parameters > Definition > Connection

Relational connection workflow:
1. Click File > New > Relational Connection.
OLAP connection workflow:
1. Click File > New > OLAP Connection.

File > Parameters > Summary
From within a Data Foundation (*.dfx), click Properties > Summary.
From within a Business Layer (*.blx), click Properties > Summary.

File > Parameters > Links
No similar functionality

File > Parameters > Strategies
No similar functionality

File > Parameters > Controls
From within a Business Layer (*.blx), click Properties.

File > Parameters > SQL
From within a Data Foundation (*.dfx), click Properties.
From within a Business Layer (*.blx), click Properties.

File > Parameters > Parameter
From within a Data Foundation (*.dfx), click Properties > Parameters.
From within a Business Layer (*.blx), click Properties > Parameters.

File > Print
From within the Local Projects view, right-click a Business Layer (*.blx) and click Print.
From within the Local Projects view, right-click a Data Foundation (*.dfx) and click Print.
From within a Data Foundation (*.dfx), click Print View to Bitmap.

Edit > Undo Action
Click Edit > Undo Action.

Edit > Find/Replace
Click Window > Find/Replace.
From within a Business Layer (*.blx), click Show/Hide Search Panel.
From within a Data Foundation (*.dfx), click Show/Hide Search Panel.

Edit > Hide Item(s)
From within a Business Layer (*.blx), right-click an object and click Change State.

Edit > Object Properties > Definition
From within a Business Layer (*.blx), double-click an object.

Edit > Object Properties > Properties
From within a Business Layer (*.blx), double-click an object and click Advanced.

Edit > Object Properties > Advanced
From within a Business Layer (*.blx), double-click an object and click Advanced.
From within a Business Layer (*.blx), right-click an object and click Change Access Level.

Edit > Object Properties > Keys
From within a Business Layer (*.blx), double-click an object and click Keys.

Edit > Object Properties > Source Information
From within a Business Layer (*.blx), double-click an object and click Source Information.

Edit > Object Format
From within a Business Layer (*.blx), right-click an object and click Edit Display Format.

Edit > Rename Table
From within a Data Foundation (*.dfx), right-click a table and click Edit.

Edit > Edit Derived Table
From within a Data Foundation (*.dfx), right-click a Derived Table and click Edit.

View > Arrange Tables
From within a Data Foundation (*.dfx), click Auto Arrange Tables.

View > Refresh Structure
From within a Data Foundation (*.dfx), click Detect > Refresh Structure.

View > Table Values
From within a Data Foundation (*.dfx), right-click a table and click Show Table Values.

View > Change Table Display
From within a Data Foundation (*.dfx), right-click a table and click Display.

View > Number of Rows in Table
From within a Data Foundation (*.dfx), click Detect > Row Count.

View > Grid Lines/Page Breaks
No similar functionality

View > List Mode
No similar functionality

Insert > Tables
From within a Data Foundation (*.dfx), click Insert > Tables.

Insert > Stored Procedures
No similar functionality

Insert > Derived Table
From within a Data Foundation (*.dfx), click Insert > Derived Table.

Insert > Alias
From within a Data Foundation (*.dfx), click Insert > Alias.

Insert > Join
From within a Data Foundation (*.dfx), click Insert > Join.

Insert > Context
From within a Data Foundation (*.dfx), click Aliases and Contexts > Insert Context.

Insert > Class/Subclass
From within a Business Layer (*.blx), click Insert Item > Folder.

Insert > Object
From within a Business Layer (*.blx), click Insert Item > Dimension.
From within a Business Layer (*.blx), click Insert Item > Measure. From within a Business Layer (*.blx), right-click a dimension and click New > Attribute.

Insert > Condition
From within a Business Layer (*.blx), click Insert Item > Filter.

Insert > Candidate Objects
No similar functionality

Insert > User Objects
No similar functionality

Insert > Universe
No similar functionality

Tools > Connections
From within a Data Foundation (*.dfx), click Connection.
From within a Data Foundation (*.dfx), click Connection > Change Connection.
From within Repository Resources, click Session > Connections.

Tools > Hierarchies
From within a Business Layer (*.blx), click Navigation Paths.

Tools > List of Values
From within a Business Layer (*.blx), click Parameters and Lists of values.
From within a Data Foundation (*.dfx), click Parameters and Lists of values.

Tools > Aggregate Navigation
From within a Business Layer (*.blx), click Actions > Set Aggregate Navigation.

Tools > List of Aliases
From within a Data Foundation (*.dfx), click Aliases and Contexts.

Tools > List of Derived Tables
No similar functionality

Tools > Query Panel
From within a Business Layer (*.blx), click Queries.

Tools > Automated Detection
From within a Data Foundation (*.dfx), click Detect.

Tools > Check Integrity
From within a Business Layer (*.blx), click Check Integrity.
Click Window > Check Integrity Problems.

Tools > Login As
From within the Repository Resources view, click Insert Session.

Tools > Change Password
No similar functionality

Tools > Manage Security
Click Window > Security Editor.

Tools > Options
Click Window > Preferences.

Window > Arrange/Split
From within a Data Foundation (*.dfx), click Insert > View.

Data Quality in ETL and BI - Reasons Impacts Solutions and Operations

The amount of Data that an organization stores and processes in the current time has increased many folds. This has also exposed the associated problems of poor quality of data. If the quality of data is bad then the information created by that data is not useful. Lot of efforts and money is being put in by organizations to improve the quality of data. 00
The Quality of data refers to the following
  • Accuracy
  • Consistency
  • Integrity
  • Uniqueness
Reasons for data quality issues:
  • Inaccurate data entry
  • No process or rules in application to validate data entry
  • Lack of Master Data Management (MDM) strategy
  1. Phone number having values as 1111111111, 0000010100
  2. Customer Name as "ABC", "ZZZZ"
  3. Two records in a table like below:
Timothy, Kenny. 10 East Avenue
Tim, Kenny. 10 East Avenue
Impact of poor data quality:
  • Incomplete and misleading analysis
  • Increase in spending on incorrect data
  • Financial impacts when data is related to accounts and finance
  • Targeted market campaigns impacted adversely
  • Purchasing of data quality tools like First Logic, Trillium, etc.
  • Complicated ETL
  • Additional Cleansing in ETL process results in longer time to complete ETL cycle
Solutions for data quality:
  • Stringent data validations by means of rules in applications at source
  • Avoiding duplicate master entries by use of MDM solutions
  • If above not done, then using ETL process to handle data quality issues
  • Send feedback for bad data quality and correct at source, then reload
  • Maintain audit for data quality issues emerging from source systems
Data Cleansing Operations:
  • Removing invalid characters
remove extra and special characters from addresses, phone numbers, etc
  • Correcting data formats
formats for phone numbers, email addresses, etc
  • Identiying and removing duplicates
  • Building data quality audit and feedback system
record data quality in audit tables
automate process to send information of data quality to source system

Monday, 13 January 2014

Problem Solving Methodology - Technique for How to Solve a Problem

Problem Solving Methodology - Technique for How to Solve a Problem Problems are so very frequent in nearly everything - from work to personal life. Whether you are a project manager or just a team member, you are surrounded with and encountered by problems and you are expected to solve them.
It is always a feeling in mind that "there should have been some way to solve each and every problem". Apparently, there is a generic problem solving methodology - steps to follow - techniques to implement - for solving any problem. Follow the below steps and solve nearly every problem you ever face.

Step 1 - Identifying and diagnosing the problem
The real problem will arrive to the surface only after the facts have been gathered and analyzed. Therefore, start with an assumption that can later be confirmed or corrected.
Step 2 - Gather facts, feelings and opinions
What happened? Where? When and How? Who and what is affected? Is it likely to happen again? Does it need to be corrected? Time and expense may require problem solver to think thorough what they need, and assign priorities to more critical elements.
Step 3 - Restate the problem
The facts help make this possible, and provide supporting data. The actual problem may or may not be the same as assumed in Step 1.
Step 4 - Identify alternative solutions
Generate ideas. Do not eliminate any possible solutions until several have been discussed.
Step 5 - Evaluate alternatives
Which alternative will provide optimum solutions? What are the risks? Will the solution create new problems?
Step 6 - Implement the decision
Who must be involved? To what extent? How, When and Where?
Who will the decision impact?
What might go wrong?
How will the results be reported and verified?
Step 7 - Evaluate the results
Test the solution against the desired results. Modify the solution if better results are needed.

Sunday, 12 January 2014

What is Mentoring?

What is Mentoring: (Content courtesy - from a session attained in past)

“ If you would thoroughly know anything , teach it to others”
-Tryon Edwards.

The best leaders enjoy helping their people to learn, grow and succeed in their career. At times these relationships grow to a mentoring arrangement.
Mentoring goes beyond simply directing and instructing others. Mentors are advisers, teachers, sounding boards, cheer leaders and critics rolled into one.

Through mentoring you can give those who are less experienced an opportunity to improve their understanding of business practices, understand the policies, discuss problems , analyze and learn from mistakes of others and celebrate success. Leaders are expected to share their wisdom with people from other parts of the organization known as mentees.
The mentees tend to learn more quickly than they would through the normal process of trial and error.
Consider your mentoring commitment carefully – Don’t agree to mentor someone if you don’t have the time and/or interest.

  • Be clear how much time you are able to give to mentoring relationship.
  • Look for informal as well as formal mentoring opportunities
  • Consider whether your company would benefit from a formal relationship or is it done informally
  • It takes time to develop the relationship . What skills abilities , or knowledge needs improvement
  • Decide frequency of meetings
  • Notice of cancellations
  • Confidentiality
  • Topics that would off limits
  • Establish a pace that is reasonable
  • Don’t try to download your knowledge and experience all at once Remember you did not learn everything all at once
  • Keep your mentoring discussions focused on relevant goals and challenges that your protégé (protege) is facing, learning is often most effective when delivered just in time
  • Be accessible
  • You may choose weekly, biweekly or monthly meetings
  • Establish parameters for when you are available for consultations by e-mail or voice mail between mentoring sessions
  • Offer your own ideas based on your experience but don’t expect that your mentee would necessarily do things the way you do it
  • Encourage creative individual thinking
  • Be an encouraging confident
  • Encourage your protégé or mentee to aim at high standards and push him to set up challenging goals. Raise the bar when you feel they are ready
  • Balance praise and constructive criticism
  • Help protégé to do analysis but remind them that mistakes are part of learning
  • Treat your discussions as confidential as possible. Respect the trust of your protégé
  • Acknowledge and celebrate success
  • Don’t give all the answers
  • When protégé asks for your help with a problem, ask him to suggest few solutions. Encourage discussions and exploration of various courses of action, raising any significant concern or points your protégé has missed
  • Influence your protégé to take specific direction only if you feel he might be about to choose a course of action.
  • Otherwise encourage your protégé to make choices and decisions that he feels are best
  • Know when it is time to let go
  • When protégé becomes proficient and is reluctant to tell that he has out grown you or may simply be unaware that it is time to move on
  • Help your mentee to plan the future before ending your formal mentoring relationship
  • General transferable qualities of leaders across the board

Saturday, 11 January 2014

QlikView advantage over query based BI

Query-based BI tools have been in and around for decades for decision support. Variations of query-based BI software are on the market, Some are flexible and others are high-performance.
But they all share one critical flaw: they are unable to inherently maintain associations among data elements

Query-based tools divorce data from its context. People making complex business decisions don’t always have full access to their supporting data, even when they have access to BI software.
Some data is available only as isolated and discrete queries, without context between one query and the next. This leaves gaps for people trying to make data-driven business decisions.

With query-based tools, creating associations among all available data elements would require a business analyst or IT professional to cram every associated field into a single query, a nearly impossible task.
The alternative hard coding associations among queries into the application layer is equally daunting.

QlikView is the world’s first associative, in-memory business intelligence platform. QlikView manages associations among data sets at the engine level, not the application level, by storing individual tables in its in-memory associative engine.
Every data point in the analytic dataset is associated with every other data point in the dataset. Datasets can be hundreds of tables with thousands of fields. Unlike traditional query-based BI tools, when the QlikView user selects a data point, no queries are fired.
Instead, all the other fields instantaneously filter and re-aggregate themselves based on the user’s selection. Selections are highlighted in green. Datasets related to the selection are highlighted in white.
Unrelated data is highlighted in gray. This provides a very intuitive, user-friendly way for people to navigate their data on their way to business insight.

Query-based BI tools separate the application layer from the data layer. This leads to long deployments while expensive developers customize the application layer to manage the specific associations required to answer a particular business question.

When the BI application needs to answer a slightly different business question, the application layer must be altered again, which is time-consuming and expensive.
With QlikView, any and all aggregates are recalculated in real time, regardless of the source fields. All associations are stored generically against the entire dataset, ready to answer any business question as it comes up without requiring any customization. The data from all tables is always available in context and ready to answer the next business question.

Friday, 10 January 2014

Types of data

We so much talk about data all the time - do we all know in how many types data can be categorized based on its structure?
I am trying to compile a similar categorization for data around which ETI, BI, BigData & DataAnalytics have evolved
Structured data
This is the base of all the database systems which have dominated the market of the ETL and BI industry for so many years. The structured data refers mainly to the relational database where all the key structures and associations are well defined and also all dimensional data is properly associated with facts.
Semi-structured data
Data in the form of excel sheets, presentations, etc which can be used as structured data to some extend for analysis but automation for direct access is not so easy. It basically would need to be somehow turned into structured data and then analysed.
Syndicated data
Home away, Thompson Reuters, etc. which provide special data in their own formats for analysis are bucketed under the category of Syndicated data.
Unstructured data
Logs from systems and devices, social media data such as twitter and facebook. Even though one may have a feel to turn this data into structured data for analysis, the volume and frequency of unstructured data makes it nearly impossible to be converted to structured data for analysis in terms of feasibility and impact. So, unstructured data is analysed using different methods and tools.

The one that will soon be published on uses Python and R language in combination with Php, MySql and HTML5 too.

Thursday, 9 January 2014

QlikView Licensing - an Overview

QlikView is an visualization and Dashboarding tool along with its own system.
There is a lot of confusion on how exactly QlikView provides licensing. Organizations take up trainings with QlikView and then they are told about it. I though it would be nice to know thing free of cost on internet.

There are basically 4 types of QlikView license
QlikView Personal Edition:
The QlikView Personal Edition allows you to create QlikView documents using available free source connections. Free, as in, the connectors for which all sources are free. Example: SAP conncetor will need additional purchase. The documents created on one Personal Edition can only be worked upon on same edition/machine. You cannot create it at one place and edit at other. It is, by all means, Personal.
QlikView Desktop Edition:
The QlikView Desktop Edition is for a small organization who does not want any server but does need multiple people work on same QlikView document. Sharing of documents is allowed.
QlikView Enterprise Edition:
The QlikView Enterprise Edition is for organizations which need to have a central place for development and update of QlikView documents. Consider this as central repository of all the developments. Here, all users have access to same set of data in QlikView documents.
QlikView Publisher:
The QlikView Publisher is an addition to QlikView enterprise. It enables having end users access only certain amount of data from QlikView documents stored on Enterprise server. The amount of data is manager by roles and restrictions.

There is also a thing called "License lease" where an Enterprise can manage which all QLikView Desktops can connect to the central server.