What Is SSIS

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 16

o What is SSIS

o What is Data Integration


o Why SSIS
o How SSIS Works
o Requirements for SQL Server Integration Service
o What is the SSIS Package
o SSIS Tasks
o Example of Data Flow Task
o

What is SSIS?
o SSIS stands for SQL Server Integration Services.
o It is a component available in the Microsoft SQL Server database software used to
perform a wide range of integration tasks.
o It is a data warehousing tool used for data extraction, loading the data into another
database, transformations such as cleaning, aggregating, merging data, etc.
o SSIS tool also contains the graphical tools and window wizards workflow functions
such as sending email messages, ftp operations, data sources.
o SSIS is used to perform a wide range of transformation and integration tasks. As a
whole, the SSIS tool is used in data migration.

SSIS is a tool mainly used to perform two functionalities:


o Data Integration
SSIS performs data integration by combining the data from multiple sources and
provides unified data to the users.
o Workflow
Workflow can be used to perform several things. Sometimes we need to execute
some specific steps or a particular path which is either based on the time period or
the parameter passed to the package or the data queried from the database. It can
be used to automate the maintenance of SQL Server databases and provides the
update to the multidimensional analytical data.

What is Data Integration?


Data Integration is a process that you follow to integrate the data from multiple sources.
The data can be either heterogeneous data or homogeneous data. The data can be
structured, semi-structured, or unstructured. In Data Integration, the data from different
dissimilar data sources integrate to form some meaningful data.

Some methods are used to achieve data integration:

o Data Modelling: In Data Modelling, you need first to create the data model and
perform operations on it.
o Data Profiling: Data Profiling is a process which is used to check the errors,
inconsistency, or variations in the available data. Data Profiling ensures the data
quality where data quality refers to the accuracy, consistency, and completeness of
data.

Advantages of Data Integration:

o Reduce data complexity


It reduces data complexity which means that the data can be delivered to any
system. Data Integration maintains the complexity, streamlined connections, and
making it easy to deliver the data to any system.
o Data integrity
Data integrity plays a major role in data integration. It deals with cleansing and
validating the data. Everyone wants high quality and robust data, so to achieve this
data integration concept is used. Data integration is helpful in removing errors,
inconsistency, and duplication.
o Easy data collaboration
Accessibility comes under data collaboration. Accessibility means that the data can
be easily transformed, and people can easily integrate the data into projects, share
their results, and keep the data up-to-date.
o Smarter business decisions
It also provides you to make smarter decisions. An integrated data refers to the
transmit process within a company so that we can understand the information more
easily. An integrated data is much easier and informative.

Why SSIS?
SSIS is used because of the following reasons:
o Data can be loaded in parallel to many varied destinations
SSIS is used to combine the data from multiple data sources to generate a single
structure in a unified view. Basically, it is responsible for collecting the data,
extracting the data from multiple data sources, and merging into a single data
source.
o Integration with other products
SSSIS tool provides tight integration with other products of Microsoft.
o Cheaper than other ETL tools
SSSIS tool is cheaper than most of the other tools. It can resist with other base
products, their manageability, business intelligence, etc.
o Complex error handling within dataflows
SSSIS allows you to handle the complex error within a dataflow. You can start and
stop the dataflow based on the severity of the error. You can even send an email to
admin when some error occurs. When an error is resolved, then you can pick the
path in between the workflow.

History of SIS
Prior, to SSIS, SQL Server, Data Transformation Services (DTS) was used, which
was part of SQL Server 7 and 2000

Version Detail

SQL Server 2005 The Microsoft team decided to revamp DTS.


However, instead of update DTS, they
decided to name the product Integration
Services (SSIS).

2008 SQL server version Plenty of performance improvements were


made to SSIS. New sources were also
introduced.

SQL Server 2012 It was the biggest release for SSIS. With this
version, the concept of the project
deployment model introduced. It allows
entire projects, and their packages are
deployed to a server, in place of specific
packages.

SQL Server 2014 In this version, not many changes are made
for SSIS. But new sources or transformations
were added which was done by separate
downloads through CodePlex or the SQL
Server Feature Pack.

In SQL Server 2016 The version allows you to deploy entire


projects, instead, of individual packages.
There are additional sources especially cloud,
and big data sources and few changes were
made to the catalog.

How SSIS works?


We know that SSIS is a platform for two functions, i.e., Data Integration and workflow. Both
the tasks Data transformations and workflow creation are carried by using the SSIS
package. SSIS package consists of three components:

Operational data
Operational data is a database used to integrate the data from multiple data sources to
perform additional operations on the data. It is the place where the data is housed for
current operation before sending to the data warehouse for storing, reporting, or archiving.
ETL
o ETL is the most important process in SSIS tool. ETL is used to Extract, Transform,
and Load the data into a data warehouse.
o ETL is a process responsible for pulling out the data multiple data sources,
transforming the data into useful data, and then storing the data into a data
warehouse. The data can be in any format xml file, flat file, or any database file.
o It also ensures that the data stored in the data warehouse is relevant, accurate, high
quality, and useful to the business users.
o It can be easily accessed so that the data warehouse can be used effectively and
efficiently.
o It also helps the organization to make data-driven decisions by retrieving the
structured and unstructured data from multiple data sources.

which should be in the scope of that specific container.

Requirements for SQL Server Integration Services


The following are the requirements to install the SQL Server Integration Services:

o Install the SQL Server


o Install the SQL Server Data Tools

Follow the below steps to install the SQL Server Data tools:

Step 1: Click on the link https://docs.microsoft.com/en-us/sql/ssdt/previous-releases-of-


sql-server-data-tools-ssdt-and-ssdt-bi?view=sql-server-2017 to download the SQL Server
data tools.

Step 2: When you click on the above link, the screen appears shown below:
In the above screen, select the version of SSDT that you want to install.

Step 3: Once the downloading is completed, run the downloaded file. When you run the
downloaded file, the screen appears which is shown below:
Step 4: Click on the Next button.

Step 5: Select the visual studio instance and the tools that you want to install in the visual
studio 2017.
Step 6: Click on the Install button.

SSIS Architecture
Following are components of SSIS architecture:

 Control Flow (Stores containers and Tasks)


 Data Flow (Source, Destination, Transformations)
 Event Handler (sending of messages, Emails)
 Package Explorer (Offers a single view for all in package)
 Parameters (User Interaction)
Let's understand each component in detail:

1.Control Flow

Control flow is a brain of SSIS package. It helps you to arranges the order of
execution for all its components. The components contain containers and tasks which
are managed by precedence constraints.

2.Precedence Constraints

Precedence constrain are package component which direct tasks to execute in a


predefined order. It also defines the workflow of the entire SSIS package. It controls
the execution of the two linked tasks by executing the destination tasks based on the
result of the earlier task — business rules which are defined using special expressions.

3.Task

A 'Task' is an individual unit of work. It is the same as a method/function used in a


programming language. However, in SSIS, you don't use coding methods. Instead,
you will use drag & drop technique to design surface and to configure them.

4.Containers

The container is units for grouping tasks together into units of work. Apart from
offering visual consistency, it also allows you to declare variables and event handlers

Four types of containers in SSIS are:

 A Sequence Container
 A For Loop Container
 Foreach Loop Container

Sequence Container: allows you to organize subsidiary tasks by grouping them, and
allows you to you apply transactions or assign logging to the container.

For loop container: Provides the same functionality as the sequence Container


except that it also lets you run the tasks multiple times. However, it is based on an
evaluation condition, like a looping from 1 to 100.

For each Loop Container: It also allows looping. But the difference that instead of
using a condition expression, loop s done over a set of objects, likes files in a folder.
5.Data Flow

The main use of the SSIS tool is to extract data into the server's memory, transform it,
and write it to another destination. If Control Flow is the brain, Data Flow is the heart
of SSIS

6.Packages

Another core component of SSIS is the notion of a package. It is a collection of tasks


which execute in an orderly fashion. Here, president constraints help manage the order
in which the task will execute.

A package can help you to saves files onto a SQL Server, in the msdb or package
catalog database. It can save as a .dtsx file, which is a structured file very similar to
.rdl files are to Reporting Services.

7.Parameters

Parameters behave much like a variable but with a few main exceptions. It can be set
outside the package easily. It can be designated as values that must be passed in for
the package to start.

SSIS Tasks Types


In SSIS tool, you can add a task to control flow. There are different types of tasks
which perform various kinds of works.

Some important SSIS tasks are mentioned below:

Task Name Descriptions

Execute SQL Task As its name suggests, it will execute a SQL


statement against a relational database.

Data Flow Task This task can read data from one or more
sources. Transform the data when it is in the
memory and write it out against one or more
destinations.

Analysis Services Processing Task Use this task to process objects of a Tabular
model or as an SSAS cube.

Execute Package Task Use can use this SSIS task to execute other
packages from within the same project.

Execute Process Task With the help of this task, you can specify
command line parameters.

File System Task It performs manipulations in the file system.


Like moving, renaming, deleting files, and
creating directories.

FTP Tasks It allows you to perform basic FTP


functionalities.

Script Task This is a blank task. You can write NET code
which performs any task; you want to
perform.

Send Mail Task You can send an email to notifying users that
your package has is finished, or some error
occurs.

Bulk Insert Task Use can loads data into a table by using the
bulk insert command.

Script Task Runs a set of VB.NET or C# coding inside a


Visual Studio environment.

Web Service Task It executes a method on a web service.

WMI Event Watcher Task This task allows the SSIS package to wait for
and respond to certain WMI events.

XML Task This task helps you to merge, split, or


reformat any XML file.

Other Important ETL tools


 SAP Data Services
 SAS Data Management
 Oracle Warehouse Builder (OWB)
 PowerCenter Informatica
 IBM Infosphere Information Server
 Elixir Repertoire for Data ETL
 Sargent Data Flow

Advantages and Disadvantages of using SSIS


SSIS tool the offers the following advantages:

 Broad documentation and support


 Ease and speed of implementation
 Tight integration with SQL Server and visual studio
 Standardized data integration
 Offers real-time, message-based capabilities
 Support for distribution model
 Helps you to remove network as a bottleneck for insertion of data by SSIS into
SQL
 SISS allows you to use the SQL Server Destination instead of OLE DB to load
the data faster

Summary
 SQL Server Integration Service (SSIS) is a component of the Microsoft
 SSIS can be used to conduct a wide range of data integration tasks
 SSIS tool helps you to merge data from various data stores
 Important versions of SQL Server Integration Service are 2005, 2008, 2012,
2014 and 216
 Studio Environments, Relevant data integration functions, and Effective
implementation speed are some important features of SSIS
 Control Flow, Data Flow, Event Handler, Package Explorer, and Parameters
are essential SSIS architecture components
 Execute SQL Task, Data Flow Task, Analysis Services Processing Task,
Execute Package Task, Execute Process Task, File System Task, FTP Tasks,
Send Mail Task, Web Service Task are some important
 Broad documentation and support
 The biggest drawback of SSIS is that it lacks support for alternative data
integration styles
 SAP Data Services, SAS Data Management, Oracle Warehouse Builder
(OWB), PowerCenter Informatica, IBM Infosphere Information Server
 SSIS is an in-memory pipeline. Therefore, it's essential to make sure that all
transformations occur in memory

You might also like