Friday, 10 January 2014

Types of data

We so much talk about data all the time - do we all know in how many types data can be categorized based on its structure?
I am trying to compile a similar categorization for data around which ETI, BI, BigData & DataAnalytics have evolved
Structured data
This is the base of all the database systems which have dominated the market of the ETL and BI industry for so many years. The structured data refers mainly to the relational database where all the key structures and associations are well defined and also all dimensional data is properly associated with facts.
Semi-structured data
Data in the form of excel sheets, presentations, etc which can be used as structured data to some extend for analysis but automation for direct access is not so easy. It basically would need to be somehow turned into structured data and then analysed.
Syndicated data
Home away, Thompson Reuters, etc. which provide special data in their own formats for analysis are bucketed under the category of Syndicated data.
Unstructured data
Logs from systems and devices, social media data such as twitter and facebook. Even though one may have a feel to turn this data into structured data for analysis, the volume and frequency of unstructured data makes it nearly impossible to be converted to structured data for analysis in terms of feasibility and impact. So, unstructured data is analysed using different methods and tools.

The one that will soon be published on uses Python and R language in combination with Php, MySql and HTML5 too.