Remove the duplicate values from the flat file source using SSIS

Question

0.00/5 (No votes)

See more:

Hi All,

Can any one please direct me. I have to remove the duplicate rows from the flat file source which is imported into SQL server using SSIS.

Now the problem is i have to remove the duplicate rows if any from the input source i.e flat file before starting the increamental load.

googled and found adding sort with remove duplicate options will work but it drastically slows the execution.And few links say transformation or derived columns but i fear it wont work. And i have to keep execution time low as i have millions of records.

Any directions would be great. thanks.

Posted 28-Jun-12 23:14pm

Tamil Selvan K

Updated 28-Jun-12 23:29pm

v2

Add a Solution

Comments

Herman<T>.Instance 29-Jun-12 5:35am

please share table structure and the combination of fields that creates a unique row

Tamil Selvan K 29-Jun-12 7:24am

Its flat file and i just need to with employeeid for duplicate records thats it.

2 solutions

Solution 1

What you can do is after import copy the data to another table that has an identifier field.
With the next statement you can determine the doubles and know by the unique ID that you can delete the doubles
after that you can set them into the table that needs the data

SQL

select uniqueID, EmployeeID FROM
(
    select Row_Number() over(partition by EmployeeID order by EmployeeID) as Rowno, uniqueID, EmployeeID from MyTableWithUniqueIDAdded
) as t
where t.Rowno > 1

clear enough?

Posted 29-Jun-12 4:12am

Herman<T>.Instance

Updated 29-Jun-12 4:14am

v2

Add a Solution

Add your solution here

Treat my content as plain text, not as HTML

Preview 0

…

Existing Members

Sign in to your account

...or Join us

Download, Vote, Comment, Publish.

Your Email
Password
Forgot your password?

Your Email
This email is in use. Do you need your password?
Optional Password

I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the CodeProject newsletters

When answering a question please:

Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question.

Let's work to help developers, not make them feel stupid.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Tamil Selvan K · Accepted Answer · 2012-07-03T03:32:00

Thanks for reply digimanus. I finally used http://social.msdn.microsoft.com/Forums/en-US/sqlintegrationservices/thread/0d730d7c-0f15-4a97-9701-e66590fc2aa4[^]
As the performance being a key i just used staging table.Used SSIS to remove the duplicate records from the staging table by using execute sql task.