Showing posts with label Data Flow Task. Show all posts
Showing posts with label Data Flow Task. Show all posts

April 17, 2014

SQL SERVER: SSIS – Merge Join Transformation

Now, let’s have a look at functionality of Merge Join Transformation task in SSIS.

Benefit of using Merge join is, input datasets can be combination of any two datasets from (Excel file, XML file, OLEDB table, Flat file).Output can be result of INNER, LEFT Outer, or FULL Outer Join on both the datasets.

Merge Join Transformation has two inputs and one output. It does not support an error output.

Use of Merge Join Transformation:

Merge Join is a two-step process. First step is to sort both the input datasets(tables) in the same order, and the second step is apply merge join on the common key.Here rows from both the sorted inputs get matched together.

To Understand Merge Join Transformation in better way, lets take an example with various configuration parameters in SSIS.

1. Create sample tables:

Now we will create input tables named “Department” and “Employee” in Test database.

  1: CREATE TABLE Department
  2: (
  3: 	Dept_No INT
  4: 	,Dept_Name VARCHAR(50)
  5: 	,Location VARCHAR(50) 
  6: 	CONSTRAINT PK_DEPT PRIMARY KEY (Dept_No)
  7: )
  8:  
  9: INSERT INTO Department VALUES (10, 'ACCOUNTING', 'Mumbai')
 10: INSERT INTO Department VALUES (20, 'RESEARCH',   'Delhi')
 11: INSERT INTO Department VALUES (30, 'SALES',      'Mexico')
 12: INSERT INTO Department VALUES (40, 'OPERATIONS', 'Sydney')
 13: GO
 14: 
 15: CREATE TABLE Employee
 16: (	
 17: 	Emp_No INT NOT NULL
 18: 	,Emp_Name VARCHAR(100)
 19: 	,Designation VARCHAR(50)
 20: 	,Manager INT
 21: 	,JoinDate DATE DEFAULT GETDATE()
 22: 	,Salary INT
 23:     ,Dept_No INT
 24:     CONSTRAINT PK_Employee PRIMARY KEY (Emp_No)
 25:     ,CONSTRAINT FK_Dept_No FOREIGN KEY (Dept_No) REFERENCES Department(Dept_No)
 26: )
 27: 
 28: INSERT INTO Employee
 29: 		(Emp_No,Emp_Name,Designation,Manager,Salary,Dept_No)
 30:  VALUES
 31:     (101, 'Tejas', 'MANAGER', 104, 4000, 20)
 32:     ,(102, 'Michel', 'ANALYST', 101, 1600,  30)
 33:     ,(103, 'Mark', 'DEVELOPER',102, 1250,  30)
 34:     ,(104, 'James', 'DIRECTOR',106, 2975,  10)
 35:     ,(105, 'Raj', 'ANALYST',7566, 3000,  20)
 36:     ,(106, 'TechnoBrains', 'PRESIDENT', NULL, 5000, 40)
 37: GO

2. Create Data Source Connection:

Select and drag “Data Flow Task”, from “Control Flow Items” to designer surface. Then double click it and Create a New OLEDB connection.

3. Select Input Data Sources:


Select two different Data Sources which you need to perform merge join on as “OLE_SRC_Employee” and “OLE_SRC_Department”. Create a new “OLEDB Connection” to map it to the source datasets.


SQL Yoga - Merge Join Transformation #1


4. OLEDB Source Editor:


Now double click on “OLEDB Source”, it will open “OLEDB Source Editor” in that provide table configuration parameters and columns mapping from “Columns” tab.


5. Data Sorting:


As the Merge Join Transformation accepts the sorted data as input, we will add the sort transformation in the flow. If you know that the data is already sorted then you can set “isSorted” Property as “True” in the “Advanced Editor” for OLEDB Source of the respective dataset. Or else you can use the Sort Transformation task from “Data Flow” Transformation.

Now we need to add two Sort components and join the green arrow pipeline from “Employee” to one of the sort transformation and other pipeline from “Department” to the other Sort Transformation.

SQL Yoga - Merge Join Transformation #2

6. Sort Transformation Editor Source 1:

In order to get sorted data, Double click on the “Sort Transformation” that we have connected to “Employee” Dataset to provide the key on which you want to perform sort so that data gets re-ordered in sorted form based on the keys provided. Provide the Sort type as well as sort order if there are multiple keys on which Sort operation will work.

SQL Yoga - Merge Join Transformation #3

7. Sort Transformation  Editor Source 2:


Now we have “Employee” table data in sorted form, in the same way need to configure the sort transformation for Source 2 “Department”.


For the same double click on the “Sort Transformation” which is connected to “Department” dataset, to provide the Sort key and order in which you want to perform the sort in “Sort Type” property in Editor. Please keep in mind that the Sort type for both the source needs to be of the same type. i.e. any one of ascending or descending order.


SQL Yoga - Merge Join Transformation #4


8. Merge Join Task Component:


Now we will add Merge Join Transformation, so that we can join both the sources together.Drag the pipeline from Employee sort to Merge Join. In “Input Output Selection” popup select Output as “Sort Output” and Input as “Merge Join Left Input”. In Input user has two options as




  1. Merge Join Left Input


  2. Merge Join Right Input

Using this two options user can specify whether the input needs to be considered as left or right side dataset result.


SQL Yoga - Merge Join Transformation #5


Now you need to drag the pipeline from other “Sort transformation” and connect it to “Merge Join Transformation” as second input. While connecting the second input to the Merge Join, it will not ask for the input type as you have already provided it for the first pipeline, so by default it will select the other type of input to the Merge Join. i.e. Left or Right accordingly.


SQL Yoga - Merge Join Transformation #6


9. Merge Join Transformation Editor:


In order to configure merge join double click on the “Merge Join Transformation” to open the Editor.You need to provide the Join Type to specify which type of join operation you want to perform on the selected dataset.


Different Join types are:



  1. Inner Join
  2. Left Outer Join
  3. Full Outer Join

Here we will select the “Inner Join” as Join Type as we need to display data from both the datasets. Select “Dept_No” as Join Key as it is the common field on which we can merge two datasets data.


SQL Yoga - Merge Join Transformation #7


10. Result table creation:


We need to create a table to store the output result into Test database as per the script provided.

  1: CREATE TABLE [Merge_Join_Output] 
  2: (
  3:     [Emp_No] INT,
  4:     [Emp_Name] VARCHAR(100),
  5:     [Designation] VARCHAR(50),
  6:     [Manager] INT,
  7:     [JoinDate] DATE,
  8:     [Salary] INT,
  9:     [Dept_No] INT,
 10:     [Dept_Name] VARCHAR(50),
 11:     [Work_Location] VARCHAR(50)
 12: )
 13: GO

 11. Select “OLEDB Destination Editor” to redirect your output to the “Merge_Join_Output” table as shown. In “Mappings” tab map the output columns accordingly.


SQL Yoga - Merge Join Transformation #8


12. Package Execution:


Execute the package and check for the results in the “Merge_Join_Output” table.SQL Yoga - Merge Join Transformation #9


13.Result in database


After successful execution of the package, we can check the result in “Merge_Join_Output” table.


Query:

  1: -- OLEDB Table 1
  2: SELECT * FROM Employee
  3: 
  4: -- OLEDB Table 2
  5: SELECT * FROM Department
  6: 
  7: -- Output data after Merge Join Operation
  8: SELECT * FROM Merge_Join_Output
  9: GO

SQL Result:


SQL Yoga - Merge Join Transformation #10


In this way we get the Merge Join result by combining both the tables data based on common data, such that it becomes easier to navigate information from the single merged table, instead of referring two different tables and link the related data.


Reference: Tejas Shah (www.SQLYoga.com)

March 31, 2014

SQL SERVER: SSIS - Look Up Transformation Task

Today, I am going to give basic example of Lookup Transformation Task in SSIS.

Lookup transformation performs lookup operation by joining data in input columns with reference table dataset columns.Lookup can be used to access addition information from the reference dataset based on the matching criteria Reference dataset can be OLEDB table, Excel file or cache file, or SQL query result.

Use of Look up Transformation:

In my source system (table), I have all the product with their details. Somehow I have products which belongs to the country which doesn’t exist in my reference (master) table. I assigned a job to rectify those products. I need to design ETL which gives me those records whenever we import products to our target database (table). So here, I am going to use “Lookup no match output” to capture those records by following steps:

Let’s take an example to easily understand how to use Lookup Transformation in SSIS.

1. Create Source Connection:

Select and drag “Data Flow Task”, from “Control Flow Items” to designer surface. Then double click it and Create a New OLEDB connection.

2. Create sample tables

Now we will create tables named ‘LKP_Countries_Source’ and ‘LKP_Countries’ into Test Database from the given script.

  1: CREATE TABLE [LKP_Countries_Source]
  2: (
  3: 	[CountryCode] [int] NULL,
  4: 	[CountryName] [varchar](100) NULL
  5: )
  6: GO 
  7: 
  8: INSERT INTO [LKP_Countries_Source] 
  9: (	[CountryCode]
 10: 	,[CountryName])
 11: VALUES 
 12: 	(91, N'India')
 13: 	,(92, N'Pakistan')
 14: 	,(93, N'Afghanistan')
 15: 	,(94, N'Sri Lanka')
 16: 	,(95, N'Myanmar')
 17: 	,(960, N'Maldives')
 18: 	,(961, N'Lebanon')
 19: 	,(962, N'Jordan')
 20: 	,(963, N'Syrian Arab Republic')
 21: 	,(964, N'Iraq')
 22: 	,(965, N'Kuwait')
 23: 	,(966, N'Saudi Arabia')
 24: 	,(967, N'Yemen')
 25: 	,(968, N'Oman')
 26: 	,(971, N'United Arab Emirates')
 27: 	,(972, N'Israel')
 28: 	,(1, N'USA')
 29: 	,(65, N'Singapore')
 30: GO
  1: CREATE TABLE [LKP_Countries]
  2: (
  3: 	[Country] [varchar](100) NULL
  4: 	,[Code] [int] NULL
  5: )
  6: GO
  7: 
  8: INSERT [LKP_Countries] 
  9: (
 10: 	[Country]
 11: 	,[Code]
 12: ) 
 13: VALUES 
 14: 	(N'INDIA', 91)
 15: 	,(N'SINGAPORE', 65)
 16: 	,(N'USA', 1)
 17: 	,(N'PAKISTAN', 92)
 18: GO
3. Create Lookup connection:

Now you Need to select the proper OLEDB connection in “Connection Manager” tab and the source table for lookup task.

SQLYoga - Lookup Transformation Task #2  
4. Columns Selection from Source Table:

Select the columns to use as output columns.
 
SQLYoga - Lookup Transformation Task #3
5. Lookup Transformation Editor:

Here, I have added “Lookup Data Transformation” Task to designer tab and click on edit to configure the Lookup transformation.
SQLYoga - Lookup Transformation Task #4
6. Handle No Match output:

Now we need to configure various sections in the “Lookup Transformation Editor”.
In General section, select “Redirect rows to no match output” to handle the unmatched data from the lookup task.
 
SQLYoga - Lookup Transformation Task #5

Here we will need to select Cache mode as “Full Cache”. This option is used to improve the performance while handling large scale of data.
Keep connection type as “OLEDB Connection Manager”, as we are using OLEDB source. When you use Cache File as data source then you will need to select “Cache Connection Manager”
Last option provide various ways in which not matched data can be handled.

  • Ignore failure – ignores the failure and executes the next task.
  • Redirect rows to error output – moves the not matched rows to red output to handle them separately.
  • Fail component – throws an exception and stops processing further tasks.
  • Redirect rows to no match output – switches rows to the secondary output, and user can handle it differently to matching data

7. Set Connection Manager for Lookup table:

In Connection section select the reference table with proper connection. This list will get compared with source dataset for matching the data.

SQLYoga - Lookup Transformation Task #6

8. Column mapping for Lookup table:

In Columns section, select the available input columns and map it with the available lookup columns. This will create a join between 2 source datasets.We have used “Full Cache Mode”, so Advanced section will be disabled and in “Error Output” keep fields as it is.SQLYoga - Lookup Transformation Task #7

9. Input Output Selection Setup:

Select new “OLEDB Destination” transformation and drag it to the designer surface. Drag green arrow from “Lookup Transformation” Task to “OLEDB Destination” and provide “Lookup Match Output” as Output and click OK. SQLYoga - Lookup Transformation Task #8

10. Create Output table for Match and Not Matched Data:

Now we will need to create output tables to store the matched as well as not matched result.

  1: CREATE TABLE [LKP_Output_Match] 
  2: (
  3:     [CountryCode] INT,
  4:     [CountryName] VARCHAR(100),
  5:     [Country_Calling_Code] INT
  6: )
  7: GO
  8: 
  9: CREATE TABLE [LKP_Output_NO_Match]
 10: (
 11: 	[CountryCode] INT NULL,
 12: 	[CountryName] VARCHAR(100) NULL,
 13: )
 14: GO
11. Data Mapping for Result table:
Now I need to provide mapping for the Output Table to store the Matched Data.

SQLYoga - Lookup Transformation Task #9
12. Complete Data Flow for Lookup Transformation:

In order to handle the “Not Matched data”, provide the link of Not Matched data to OLEDB Destination table “LKP_Output_No_Match”. It will store the not matched results.

SQLYoga - Lookup Transformation Task #10 
13. Package Execution and Result:

Now let’s execute the Package and check with the output inside the tables we have created to store the result as in
LKP_Output_Match” table for Matched Data and “LKP_Output_NO_Match” table for Not Matched Data.


  1: SELECT * FROM LKP_Output_Match
  2: 
  3: SELECT * FROM LKP_Output_NO_Match
  4: GO
Result :
 
SQLYoga - Lookup Transformation Task #11Lookup Transformation can be used in various ways according to the requirements and can be implemented accordingly. This was just an understanding document for Lookup Transformation.

Reference: Tejas Shah (www.SQLYoga.com)

April 6, 2010

SQL SERVER SSIS: Get File Name with Flat File Source, Data Flow Component

As I explained earlier about For Each Loop Container, which process each file from selected folder.

There is a requirement to save File name along with record, so later on we can identify which record comes from which file.

Let me explain it how to achieve with FileNameColumnName property of Flat File Connection to get it easily.

1. Setup Flat File Connection with CSV file, as mentioned Basic Example of Data Flow Task.
2. Right click on Flat File Source, and click on Show Advanced Editor:
Derived FileName Show Advanced Editor
3. Click on "Component Properties" and go to FileNameColumnName, Custom properties:
  Derived FileName Component Properties



4. Setup FileNameColumnName value with desired column name. Let's say "File Name". Congratulations, This column is added to output list with actual filename of that connection.

image
Let me know your comments/questions.
 

October 24, 2009

SQL SERVER SSIS: Basic Example Of Data Flow Task

Today I am going to give Basic example of Data Flow Task in SSIS, import CSV file to SQL SERVER Database

Many times user asked to import their data from their files. They might have data in Microsoft Access, Microsoft Excel, CSV File, Text Tile etc..

Microsoft SQL SERVER, SSIS, has features to import data from any heterogonous format to SQL and also capabilities to export it to any format. That is the power of SSIS.

1. Drag Control: Drag Data Flow Task from Toolbox, to Design interface


Data Flow Task 1
2. Configure Data Flow Task: Once you have Data Flow Task on Design surface, Double click on it, or Right click on task and click on "Edit", it will redirect you to Data Flow Tab, as shown in Figure 2:

Data Flow Task 2 
As shown in above figure, SQL SERVER SSIS give features to import/export data from/to many formats. You can find the different formats that is supported by SSIS to import data on LEFT PANE.

3. Configure Source Connection: Here, I have selected "Flat File Source", as we need to import data from CSV File. Drag "Flat File Source" to design surface and right click on edit and select "Edit", you will get screen like:

 Data Flow Task 3
As shown in figure, Click on "New", It will open a form which allows us to select file which we want to import to our Database.

First, Enter connection manager name and description.
Then, select a file which you want to import to database.
Then, Select Locale and Code page as per your requirement, For this demo, I don't need to change anything.
Then, we need to select Format of the file. SSIS provide us three formats: Delimited, Fixed Width, Ragged Right. We need to import CSV file (Comma Separated file), so I have selected "Delimited".
Then, We need to specify Text Qualifier, Select based upon your requirements.
Then, we need to specify delimiter for Header row, as we have CSV  file, I have selected CR LF, means new row for header.
Then, SSIS also give us features to skip rows. By Header Rows to skip, it allows us to skip n number of rows.
At Last, If there is Header in column names, we just need to select this checkbox, "Column names in the data row". So SSIS understands to set the first row as Header row.

That's it for configure file, Now we need to specify delimiter for column, which comes on next section called "Columns"

 Data Flow Task 4


As shown in figure, Now we need to specify delimiter for Rows and Columns.
As we are going to import CSV file (comma separated file), Row Delimiter is "{CR}{LF}" and column delimiter is ",".

NOTE:
Also notice that, it display Column Names in Header, as we have checked, "Column Names in the First data row".

This is the data that we are going to import to SQL Database.

You can use Advanced and Preview tab, to configure each column and preview, respectively.

Now once, configuration is done, we need to select OLE DB Destination, as we need to import it to SQL SERVER database.

4. Configure Destination Connection: Here, I have selected "OLE DB Destination", as we need to import data from CSV File to SQL SERVER Database. Drag "OLE DB Destination" to design surface.

Data Flow Task 5 
Now, we need to design data flow. Here we need to import data from Flat File connection, I have dragged "Success" (Green Arrow), link to OLE Db Destination for column Mapping.

Now, we need to configure OLE DB destination, to insert data to SQL Database. Right click on it and click on "EDIT", it will open screen like this:

 Data Flow Task 6
Data Flow Task 7



I have entered SQL server Login information to connect to SQL SERVER where I need to import data from CSV file.

Once, SQL SERVER is configured, we need to select Table Name to which we need to insert data from CSV file. See the following screen, I have selected "Employee" Table: Please find script to create "Employee Table"

CREATE TABLE [dbo].[Employee](
    [EmpID] [int] IDENTITY(1,1) NOT NULL,
    [EmployeeName] [varchar](max) NULL,
    [EmployeeCode] [varchar](max) NULL,
    [EmployeeDesignation] [varchar](max) NULL,
    [CreatedDate] [datetime] DEFAULT GETDATE()
)

Data Flow Task 8

Now we have selected SQL SERVER table to which we need to import data, Now we need to map columns names, so SSIS will insert data accordingly.

Data Flow Task 9

By default, SSIS provides mapping for columns which have same name, In this case Column Mapping is ok, so we don't need to do anything. If you want to change mapping, you can do it by just changing links between Input Columns and Destination Columns.

NOTE: Here Destination table has two more columns, EmpID and CreatedDate, which is not available in source file, CSV file. We can just ignore it, as EmpID is Identity column while CreatedDate is GETDATE(). We have such columns in SQL database to keep track when this record is added.

That's it. We have configured Data Flow Task.

Now just right click on package and "Execute Package". You will find that Employee table will be populated with information from CSV file.

Let me know if you have any questions in designing SSIS Package.