Now, let’s have a look at functionality of Merge Join Transformation task in SSIS. Benefit of using Merge join is, input datasets can be combination of any two datasets from (Excel file, XML file, OLEDB table, Flat file).Output can be result of INNER, LEFT Outer, or FULL Outer Join on both the datasets. Merge Join Transformation has two inputs and one output. It does not support an error output. Use of Merge Join Transformation: Merge Join is a two-step process. First step is to sort both the input datasets(tables) in the same order, and the second step is apply merge join on the common key.Here rows from both the sorted inputs get matched together. To Understand Merge Join Transformation in better way, lets take an example with various configuration parameters in SSIS. 1. Create sample tables: Now we will create input tables named “Department” and “Employee” in Test database. 1: CREATE TABLE Department2: (3: Dept_No INT4: ,Dept_Name VARCHAR(50)5: ,Location VARCHAR(50)6: CONSTRAINT PK_DEPT PRIMARY KEY (Dept_No)7: )8:9: INSERT INTO Department VALUES (10, 'ACCOUNTING', 'Mumbai')10: INSERT INTO Department VALUES (20, 'RESEARCH', 'Delhi')11: INSERT INTO Department VALUES (30, 'SALES', 'Mexico')12: INSERT INTO Department VALUES (40, 'OPERATIONS', 'Sydney')13: GO14:15: CREATE TABLE Employee16: (17: Emp_No INT NOT NULL18: ,Emp_Name VARCHAR(100)19: ,Designation VARCHAR(50)20: ,Manager INT21: ,JoinDate DATE DEFAULT GETDATE()22: ,Salary INT23: ,Dept_No INT24: CONSTRAINT PK_Employee PRIMARY KEY (Emp_No)25: ,CONSTRAINT FK_Dept_No FOREIGN KEY (Dept_No) REFERENCES Department(Dept_No)26: )27:28: INSERT INTO Employee29: (Emp_No,Emp_Name,Designation,Manager,Salary,Dept_No)30: VALUES31: (101, 'Tejas', 'MANAGER', 104, 4000, 20)32: ,(102, 'Michel', 'ANALYST', 101, 1600, 30)33: ,(103, 'Mark', 'DEVELOPER',102, 1250, 30)34: ,(104, 'James', 'DIRECTOR',106, 2975, 10)35: ,(105, 'Raj', 'ANALYST',7566, 3000, 20)36: ,(106, 'TechnoBrains', 'PRESIDENT', NULL, 5000, 40)37: GO 2. Create Data Source Connection: Select and drag “Data Flow Task”, from “Control Flow Items” to designer surface. Then double click it and Create a New OLEDB connection. 3. Select Input Data Sources: Select two different Data Sources which you need to perform merge join on as “OLE_SRC_Employee” and “OLE_SRC_Department”. Create a new “OLEDB Connection” to map it to the source datasets. 4. OLEDB Source Editor: Now double click on “OLEDB Source”, it will open “OLEDB Source Editor” in that provide table configuration parameters and columns mapping from “Columns” tab. 5. Data Sorting: As the Merge Join Transformation accepts the sorted data as input, we will add the sort transformation in the flow. If you know that the data is already sorted then you can set “isSorted” Property as “True” in the “Advanced Editor” for OLEDB Source of the respective dataset. Or else you can use the Sort Transformation task from “Data Flow” Transformation. Now we need to add two Sort components and join the green arrow pipeline from “Employee” to one of the sort transformation and other pipeline from “Department” to the other Sort Transformation. 6. Sort Transformation Editor Source 1: In order to get sorted data, Double click on the “Sort Transformation” that we have connected to “Employee” Dataset to provide the key on which you want to perform sort so that data gets re-ordered in sorted form based on the keys provided. Provide the Sort type as well as sort order if there are multiple keys on which Sort operation will work. 7. Sort Transformation Editor Source 2: Now we have “Employee” table data in sorted form, in the same way need to configure the sort transformation for Source 2 “Department”. For the same double click on the “Sort Transformation” which is connected to “Department” dataset, to provide the Sort key and order in which you want to perform the sort in “Sort Type” property in Editor. Please keep in mind that the Sort type for both the source needs to be of the same type. i.e. any one of ascending or descending order. 8. Merge Join Task Component: Now we will add Merge Join Transformation, so that we can join both the sources together.Drag the pipeline from Employee sort to Merge Join. In “Input Output Selection” popup select Output as “Sort Output” and Input as “Merge Join Left Input”. In Input user has two options as
Using this two options user can specify whether the input needs to be considered as left or right side dataset result. Now you need to drag the pipeline from other “Sort transformation” and connect it to “Merge Join Transformation” as second input. While connecting the second input to the Merge Join, it will not ask for the input type as you have already provided it for the first pipeline, so by default it will select the other type of input to the Merge Join. i.e. Left or Right accordingly.
9. Merge Join Transformation Editor: In order to configure merge join double click on the “Merge Join Transformation” to open the Editor.You need to provide the Join Type to specify which type of join operation you want to perform on the selected dataset. Different Join types are:
Here we will select the “Inner Join” as Join Type as we need to display data from both the datasets. Select “Dept_No” as Join Key as it is the common field on which we can merge two datasets data. 10. Result table creation: We need to create a table to store the output result into Test database as per the script provided. 1: CREATE TABLE [Merge_Join_Output]2: (3: [Emp_No] INT,4: [Emp_Name] VARCHAR(100),5: [Designation] VARCHAR(50),6: [Manager] INT,7: [JoinDate] DATE,8: [Salary] INT,9: [Dept_No] INT,10: [Dept_Name] VARCHAR(50),11: [Work_Location] VARCHAR(50)12: )13: GO 11. Select “OLEDB Destination Editor” to redirect your output to the “Merge_Join_Output” table as shown. In “Mappings” tab map the output columns accordingly.
12. Package Execution: Execute the package and check for the results in the “Merge_Join_Output” table. 13.Result in database After successful execution of the package, we can check the result in “Merge_Join_Output” table. Query: 1: -- OLEDB Table 12: SELECT * FROM Employee3:4: -- OLEDB Table 25: SELECT * FROM Department6:7: -- Output data after Merge Join Operation8: SELECT * FROM Merge_Join_Output9: GO SQL Result:
In this way we get the Merge Join result by combining both the tables data based on common data, such that it becomes easier to navigate information from the single merged table, instead of referring two different tables and link the related data. Reference: Tejas Shah (www.SQLYoga.com) |
Learn SQL and database management at SQLYoga for articles, tutorials, and tips to improve your skills and streamline data operations. Join our community!
April 17, 2014
SQL SERVER: SSIS – Merge Join Transformation
Labels:
Data Flow Task,
Data Transformation,
Merge Join,
Sort,
SQL Server,
SQL Yoga,
SSIS,
Tejas Shah
18+ years of Hands-on Experience
MICROSOFT CERTIFIED PROFESSIONAL (Microsoft SQL Server)
Proficient in .NET C#
Hands on working experience on MS SQL, DBA, Performance Tuning, Power BI, SSIS, and SSRS
Subscribe to:
Posts (Atom)