Click here to Skip to main content
15,893,663 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
Hi All,

Can any one please direct me. I have to remove the duplicate rows from the flat file source which is imported into SQL server using SSIS.

Now the problem is i have to remove the duplicate rows if any from the input source i.e flat file before starting the increamental load.

googled and found adding sort with remove duplicate options will work but it drastically slows the execution.And few links say transformation or derived columns but i fear it wont work. And i have to keep execution time low as i have millions of records.

Any directions would be great. thanks.
Posted
Updated 28-Jun-12 23:29pm
v2
Comments
Herman<T>.Instance 29-Jun-12 5:35am    
please share table structure and the combination of fields that creates a unique row
Tamil Selvan K 29-Jun-12 7:24am    
Its flat file and i just need to with employeeid for duplicate records thats it.

What you can do is after import copy the data to another table that has an identifier field.
With the next statement you can determine the doubles and know by the unique ID that you can delete the doubles
after that you can set them into the table that needs the data

SQL
select uniqueID, EmployeeID FROM
(
    select Row_Number() over(partition by EmployeeID order by EmployeeID) as Rowno, uniqueID, EmployeeID from MyTableWithUniqueIDAdded
) as t
where t.Rowno > 1


clear enough?
 
Share this answer
 
v2
Thanks for reply digimanus. I finally used http://social.msdn.microsoft.com/Forums/en-US/sqlintegrationservices/thread/0d730d7c-0f15-4a97-9701-e66590fc2aa4[^]
As the performance being a key i just used staging table.Used SSIS to remove the duplicate records from the staging table by using execute sql task.
 
Share this answer
 

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900