March 19, 2009

SQL SERVER: Find Duplicate Rows with Row_Number() SQL SERVER 2005


This is very known question to find Duplicate Rows from the Table and Delete Duplicate rows from the table
I came to know about "ROW_NUMBER()" of SQL SERVER 2005.
Lets see how we can use this feature:
There is one Table:
CREATE TABLE FindDuplicates(
EmpID INT IDENTITY(1,1),
EmpName VARCHAR(500)
)

INSERT some data into the table:
INSERT INTO FindDuplicates VALUES('A')
INSERT INTO FindDuplicates VALUES('A')
INSERT INTO FindDuplicates VALUES('B')
INSERT INTO FindDuplicates VALUES('C')
INSERT INTO FindDuplicates VALUES('C')
INSERT INTO FindDuplicates VALUES('D')
INSERT INTO FindDuplicates VALUES('D')

Now we have data like:
Find Duplicate Rows
Now, we need to Find the duplicate Rows from the table. So we need to get "A", "C" and "D".
Lets see the use of Row_Number():
;With CTE AS(
SELECT ROW_NUMBER() OVER(PARTITION BY EmpName ORDER BY EmpName) As RowID,
EmpID,
EmpName
FROM FindDuplicates
)
SELECT *
FROM CTE
WHERE RowID > 1

Output will be like:
Find Duplicate Rows
So we can Delete Duplicate Rows as we have EmpID (Primary Key)
I Used:
PARTITION BY: which will find same rows and assign ID accordingly.
ORDER BY: In which order we want to assign ID. By Default its Ascending

6 comments:

  1. If you have am identity column you don't need anything else:

    CREATE TABLE FindDuplicates(
    EmpID INT IDENTITY(1,1),
    EmpName VARCHAR(500)
    )

    ----INSERT some data into the table:

    INSERT INTO FindDuplicates VALUES('A')
    INSERT INTO FindDuplicates VALUES('A')
    INSERT INTO FindDuplicates VALUES('B')
    INSERT INTO FindDuplicates VALUES('C')
    INSERT INTO FindDuplicates VALUES('C')
    INSERT INTO FindDuplicates VALUES('D')
    INSERT INTO FindDuplicates VALUES('D')
    INSERT INTO FindDuplicates VALUES('D')
    INSERT INTO FindDuplicates VALUES('E')
    INSERT INTO FindDuplicates VALUES('E')
    INSERT INTO FindDuplicates VALUES('E')
    INSERT INTO FindDuplicates VALUES('E')


    select
    max(EmpID)
    from
    FindDuplicates
    group by
    EmpName
    having
    count(*)>1
    while @@rowcount>0
    delete from
    FindDuplicates
    where
    EmpID in
    (
    select
    max(EmpID)
    from
    FindDuplicates
    group by
    EmpName
    having
    count(*)>1
    )

    select * from FindDuplicates

    ReplyDelete
  2. Hi,

    Yes we can also use the way that you suggested to remove duplicate rows on basis of IDENTITY column.

    But when we need to identify the duplicate rows based on another columns too, then it better to use ROW_NUMBER.

    You can also find that syntax simplicity with the use of ROW_NUMBER.

    Tejas

    ReplyDelete
  3. Hi All,

    I want to display all duplicate records in the table. My query has to fetch all the records which are duplicate(First Name or Last Name). Also I want the ability to also pull names where there might be a middle initial placed in the end of the first name field, (i.e., "Maria Z. " vs. "Maria") as well.

    Please guide me to find this.



    Table:

    ID FirstName LastName

    1 Zach H Hoffman

    2 Zach Hoffman

    3 Troy Hoffman

    4 Shawn Livermore

    5 Prem S

    6 Jony Hoffman H

    7 Zach Modan



    I need the query to filter.........

    ID FirstName LastName

    1 Zach H Hoffman

    2 Zach Hoffman

    3 Troy Hoffman

    6 Jony Hoffman H

    7 Zach Modan



    I hope this example will give you clear idea.....




    Thanks in Advance

    Prem

    ReplyDelete
  4. Can We Do The Following With Identity Column:

    Remove Duplicates While Keeping One Row Of Each Duplicate(I Mean, On Encountering Duplicate Record, We Should Keep The One Row Out Of Those Duplicate Records).

    ReplyDelete
  5. Hi Singh,

    Yes, we can do the same with the ROW_NUMBER. What we need to do is just write following query:

    ;With CTE AS(
    SELECT ROW_NUMBER() OVER(PARTITION BY EmpName ORDER BY EmpName) As RowID,
    EmpID,
    EmpName
    FROM FindDuplicates
    )
    DELETE
    FROM CTE
    WHERE RowID > 1

    This above query, will keep row having RowID = 1 and duplicate rows will be removed from the table: "FindDuplicates".

    Thanks,
    Tejas
    SQLYoga.com

    ReplyDelete