Friday, 27 September 2013

Lifesaver - SSIS All in one

SQL Server Integration Services

SQL Server Integration Services (SSIS) is a component of SQL Server which can be used to perform a wide range of Data Migration and ETL operations. SSIS is a component in MSBI process of SQL Server. This is a platform for Integration and Workflow applications. It is known for a fast and flexible OLTP and OLAP extensions used for data extraction, transformation, and loading (ETL).
The tool may also be used to automate maintenance of SQL Server databases and multidimensional data sets.

Business Intelligence Development Studio (BIDS) for Development of SSIS Projects SQL Server Management Studio (SSMS) to manage the SSIS Packages and Projects

SSIS workflow
Workflow is a set of instructions to specify how to execute tasks and containers within SSIS Packages

Control flow
Control flow contains one or more tasks & containers that execute when the package runs.
Control order or define the conditions for running the next task or container in the package control flow, we use precedence constraints to connect the tasks and containers in a package.
A subset of tasks and containers can also be grouped and run repeatedly as a unit within the package control flow.

Data flow
Data flow contains the sources and destinations that extract and load data, the transformations that modify and extend data, and the paths that link sources, transformations, and destinations
Data Flow task is the executable within the SSIS package that creates, orders, and runs the data flow.
Separate instance of the data flow engine is opened for each Data Flow task in a package.

Categories in the Data Flow
Data Sources, Transformations, and Data Destinations.

-Data Connection Errors - occur incase the connection manager cannot be initialized with the connection string. This applies to both Data Sources and Data Destinations along with Control Flows that use the Connection Strings.
-Data Transformation Errors - occur while data is being transformed over a Data Pipeline from Source to Destination.
-Expression Evaluation errors - occur if expressions that are evaluated at run time perform invalid

Environment variable
An environment variable configuration sets a package property equal to the value in an environment variable.
For configuring properties that are dependent on the computer that is executing the package.

Transformations available
AGGEGATE - It applies aggregate functions to Record Sets to produce new output records from aggregated values.
AUDIT - Adds Package and Task level Metadata - such as Machine Name, Execution Instance, Package Name, Package ID, etc..
CHARACTER MAP - Performs SQL Server level makes string data changes such as changing data from lower case to upper case.
CONDITIONAL SPLIT – Separates available input into separate output pipelines based on Boolean Expressions configured for each output.
COPY COLUMN - Add a copy of column to the output we can later transform the copy keeping the original for auditing.
DATA CONVERSION - Converts columns data types from one to another type. It stands for Explicit Column Conversion.
DATA MINING QUERY – Used to perform data mining query against analysis services and manage Predictions Graphs and Controls.
DERIVED COLUMN - Create a new (computed) column from given expressions.
EXPORT COLUMN – Used to export a Image specific column from the database to a flat file.
FUZZY GROUPING – Used for data cleansing by finding rows that are likely duplicates.
FUZZY LOOKUP - Used for Pattern Matching and Ranking based on fuzzy logic.
IMPORT COLUMN - Reads image specific column from database onto a flat file.
LOOKUP - Performs the lookup (searching) of a given reference object set against a data source. It is used for exact matches only.
MERGE - Merges two sorted data sets into a single data set into a single data flow.
MERGE JOIN - Merges two data sets into a single dataset using a join junction.
MULTI CAST - Sends a copy of supplied Data Source onto multiple Destinations.
ROW COUNT - Stores the resulting row count from the data flow / transformation into a variable.
ROW SAMPLING - Captures sample data by using a row count of the total rows in dataflow specified by rows or percentage.
UNION ALL - Merge multiple data sets into a single dataset.
PIVOT – Used for Normalization of data sources to reduce analomolies by converting rows into columns
UNPIVOT – Used for demoralizing the data structure by converts columns into rows incase of building Data Warehouses.

Logging features writes log entries when run-time events occur and can also write custom messages. This is not enabled by default.
Integration Services supports a diverse set of log providers, and gives you the ability to create custom log providers.
The Integration Services log providers can write log entries to text files, SQL Server Profiler, SQL Server, Windows Event Log, or XML files.
Logs are associated with packages and are configured at the package level.
Each task or container in a package can log information to any package log.
The tasks and containers in a package can be enabled for logging even if the package is not logging.

Deploy packages
BUILDing SSIS Projects generates a Deployment Manifest File.
Run the manifest file and decide whether to deploy this onto File System or onto SQL Server [ msdb].
SQL Server Deployment is very faster and more secure then File System Deployment.
Alternatively, import the package from SSMS from File System or SQ Server.

Variables & variable scope
Variables store values that a SSIS package and its containers, tasks, and event handlers can use at run time.
The scripts in the Script task and the Script component can also use variables.
The precedence constraints that sequence tasks and containers into a workflow can use variables when their constraint definitions include expressions.

Perfmon counterd
SQLServer:SSIS Service
SSIS Package Instances
SQLServer:SSIS Pipeline
BLOB bytes read
BLOB bytes written
BLOB files in use
Buffer memory
Buffers in use
Buffers spooled
Flat buffer memory
Flat buffers in use
Private buffer memory
Private buffers in use
Rows read
Rows written