Quote:
My SQL query is returning around 6 lac records.
You should have a really good reason to get that much of records; building a reporting server?
Quote:
I have to remove rows which Is having repeated value for one of the columns : AccountID.
As Maciej already mentioned, do this operation in the database. The reason is quite obvious,
Quote:
I am iterating through the dataset in C# code using for loop and deleting records by checking value of the column AccountID if it already exists
See for yourself, you ask the server to give you 6lac records (each record takes time, and then on the network it takes time as well, you might be using it locally, but your users are going to have it on a remote connection). Once you are done with that you are now forcing your CPU to find the duplicates and remove them. Worst way to write a SQL query and then find and remove duplicates. A better way to do this would be either using
SELECT DISTINCT
, or by grouping the data together using
GROUP BY
.
Quote:
This process is taking very long time. Pls suggest alternate ways to remove the duplicate records based on the value of column: AccountID.
Do you want to permanently remove the records? I doubt that, so you want to get the data aggregated, and non-duplicated. In this case, please use the approach I mentioned above.
If you want to remove the data, and do not want to allow the duplicate data (which is unlikely, but still), then you need to use the column as a primary key for the table.
You see the pattern in the my answer above? I am asking you to
make the database engine do all this stuff for you. The benefit is that database engine will automatically skip the records it does not need to return, and your CPU would only be rendering the results—not worrying about filtering.
SELECT Examples (Transact-SQL) | Microsoft Docs[
^] (See the
DISTINCT
examples)
GROUP BY (Transact-SQL) | Microsoft Docs[
^]
A quick analysis of your algorithm, that you copied from here,
c# - Best way to remove duplicate entries from a data table - Stack Overflow[
^] and you clearly missed a better answer than this stupid one, that was posted on the same thread,
c# - Best way to remove duplicate entries from a data table - Stack Overflow[
^]
public DataTable RemoveDuplicateRows(DataTable dTable, string colName)
{
Hashtable hTable = new Hashtable();
ArrayList duplicateList = new ArrayList();
foreach (DataRow drow in dTable.Rows)
{
if (hTable.Contains(drow[colName]))
duplicateList.Add(drow);
else
hTable.Add(drow[colName], string.Empty);
}
foreach (DataRow dRow in duplicateList)
dTable.Rows.Remove(dRow);
return dTable;
}
I would tell you that this is not a good approach, since you have the access to the database,
write better query. This approach means your 6 lac records be stored again in the memory, and then an iteration be done on them to verify if they are duplicate or not.
This also means, that if your database table has no duplicates, this algorithm is still going to run on it, half a million times, possibly storing every entity in the hash table, and worse.
I can go on, but you get the idea. :-)