How to find duplicates using linq?

Question

2.50/5 (2 votes)

See more:

Hi,
I have a list of strings. I want to find index of duplicates and remove them.
How can I do this?

What I have tried:

List<string> myList = new List<string> { "txt1", "txt2", "txt3", "txt1", "txt4", "txt5", "txt4" };
var indexOf = myList.Select((value, index) => new { value, index }).GroupBy(g => g.value).Where(pair => pair.Count() > 1).Select(pair => pair.index);

Posted 23-Aug-21 20:23pm

Alex Dunlop

Updated 24-Aug-21 13:25pm

Richard MacCutchan

v3

Add a Solution

Comments

Richard MacCutchan 24-Aug-21 2:49am

And? What happens when you run it?

Alex Dunlop 24-Aug-21 2:56am

The last part is wrong (pair.index).

BillWoodruff 25-Aug-21 2:52am

Question up-voted to show appreciation for this poster using the appropriate folder, and making a good effort towards solving the problem.

Alex, the behavior of GroupBy, and dealing with anonymous types, are advanced topics. The Value of each IGrouping in the IEnumerable produced by GroupBy is an IEnumerable which usually needs to be evaluated (turned into a List) before using it.

BillWoodruff 25-Aug-21 4:57am

If you want to see how a Zen master solves a problem like this: see Richard Deeeming's comment on my post below.

4 solutions

Add a Solution

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Richard MacCutchan · Accepted Answer · 2021-08-23T20:50:00

Solution 1

Try linq remove duplicates - Google Search[^].

[edit]
This will produce a list with the name and index, so you could maybe build on that:

C#

List<string> myList = new List<string> { "txt1", "txt2", "txt3", "txt1", "txt4", "txt5", "txt4" };
var unique = myList.Select((value, index) => new {Value =value, Index = index});
foreach (var x in unique)
{
    Console.Write($"{x.Value}: {x.Index}, ");
}
Console.WriteLine("
");

[/edit]

Posted 23-Aug-21 20:50pm

Richard MacCutchan

Updated 23-Aug-21 22:00pm

v2

Comments

Alex Dunlop 24-Aug-21 2:58am

Thanks. I know that Distict() can render unique list. But I want to find the index of them.

Richard MacCutchan 24-Aug-21 3:18am

There does not seem to be a simple way solution. You can use IndexOf, FindIndex etc, but I am not sure they will do what you want.

Richard MacCutchan 24-Aug-21 4:00am

See my update.

Maciej Los 24-Aug-21 4:07am

5ed!

BillWoodruff 24-Aug-21 17:43pm

I don't see the value of this: you create an IEnumerable of anonymous Types that replicates the structure of the List.

But, a for loop index is an easier way to get the index.

The one thing I see in this that could be exploited is the fact that each instance of the anonymous Type is unique.

imho, the OP's real problem is a lack of understanding of GroupBy and anonymous Types, and the fact that string instances with identical content, but different indexes in the list, will all be "equal" if compared. Yes: FindIndex, with the right predicate, can be used in a solution.

Richard MacCutchan 25-Aug-21 3:40am

Hence my comment above the code.

OriginalGriff · Accepted Answer · 2021-08-23T21:17:00

Solution 2

See here: Enumerable.Distinct Method (System.Linq) | Microsoft Docs[^]

C#

List<string> myList = new List<string> { "txt1", "txt2", "txt3", "txt1", "txt4", "txt5", "txt4" };
var unique = myList.Distinct();

Posted 23-Aug-21 21:17pm

OriginalGriff

Maciej Los · Accepted Answer · 2021-08-23T23:03:00

Solution 3

Quote:
I want to find index of duplicates and remove them

Well, if you want to remove duplicates from original list, take a look at below code and read comments:

C#

List<string> myList = new List<string> { "txt1", "txt2", "txt3", "txt1", "txt4", "txt5", "txt4" };
//get unique values and its indexes
List<Tuple<string, int>> uniquelist = myList.Distinct()
	.Select(x => new Tuple<string, int>(x, myList.IndexOf(x)))
	.ToList();
//find indexes of duplicates
List<int> indOfDup = Enumerable.Range(0, myList.Count)
	.Where(x => !uniquelist.Any(y => y.Item2==x))
	.ToList();
//remove duplicates from original list
for(int i = indOfDup.Count()-1; i>=0; i--)
{
	myList.RemoveAt(indOfDup[i]);
}

//done!
//original list has got non duplicates ;)

Good luck!

Posted 23-Aug-21 23:03pm

Maciej Los

Updated 23-Aug-21 23:04pm

v2

Comments

Richard MacCutchan 24-Aug-21 6:57am

+5. I had a feeling it could not be done in a single step.

Maciej Los 24-Aug-21 7:03am

Thank you, Richard.

BillWoodruff 24-Aug-21 20:17pm

+5 nice ... even though I think you are doing it the hard way :)

Maciej Los 25-Aug-21 0:19am

Thank you, Bill.

George Swan 25-Aug-21 11:31am

Does this work when there are more than 2 identical matching values?

Maciej Los 25-Aug-21 14:58pm

The easiest way to find out is to try it ;)

George, did i tell you what my last name means? "Łoś" (Los without polish signs) means Moose. Swan and Moose. Funny, i thought. :)

George Swan 25-Aug-21 15:45pm

I have tried it and it did not work for me but I am a bird of very little brain so I could be wrong. When I added another 'txt4' to the end of your list only one duplicate instance was removed and two were left.

Maciej Los 25-Aug-21 23:47pm

Well. I have tested it and it's working fine, unless i was missing the fact that there's more than one instance of text.
Hey! Do not say such of words. You're very smart person. Please, forgive me. It wasn't my intention to hurt you.

George Swan 26-Aug-21 2:27am

Thanks Maciej. On another point, I am a big fan of value tuples, they can be used in your example to simplify it and avoid the need to 'new up' objects which may save some time. Best wishes, George.

   //use named values
 List<(string value, int index)> uniquelist = myList.Distinct()
   //Simply declare the tuple rather than instantiate it
     .Select(x => ( x,  myList.IndexOf(x)))
     .ToList();
 List<int> indOfDup = Enumerable.Range(0, myList.Count)
   //reference the index by name rather than 'Item2'
     .Where(x => !uniquelist.Any(y => y.index == x))
     .ToList();

Maciej Los 26-Aug-21 2:57am

Good point! Value tuples are very useful. I prefer to use explicitly declared tuples.

BTW:
I've created .netFiddle with extra "txt4" at the end to prove that my code is working fine. Please, take a look at: RemoveDuplicatesFromOriginalList | C# Online Compiler | .NET Fiddle[^]

All the best, George!
Cheers!
Maciej

George Swan 26-Aug-21 4:16am

Maciej, you are quite correct, +5. In my code I added 'text4' instead of 'txt4' to the list. Please accept my apologies for a stupid mistake.

Maciej Los 26-Aug-21 5:02am

Thank you, George.
:)

BillWoodruff · Accepted Answer · 2021-08-24T13:25:00

A simpler way to find the duplicate indexes that also handles more than #1 duplicate entry:

C#

List<int> toremovestrs = new List<int>();

for (int i = 0; i < myList.Count; i++)
{
    int first = myList.IndexOf(myList[i]);
    string firststr = myList[first];

    int last = myList.LastIndexOf(myList[i]);

    if (first < last)
    {
        for (int j = first + 1; j <= last; j++)
        {
            if (myList[j] == firststr && ! toremovestrs.Contains(j))
            {
                toremovestrs.Add(j);
            }
        }
    }
}

Test:

List<string> myList = new List<string> { "txt1", "txt2", "txt3", "txt1", "txt4", "txt5", "txt4", "txt2", "txt3", "txt4","txt10", "txt10" };

// result
[0]: 3
[1]: 7
[2]: 8
[3]: 6
[4]: 9
[5]: 11

How to find duplicates using linq?

4 solutions

Solution 1

Solution 2

Solution 3

Solution 4

Add your solution here

Preview 0

How to find duplicates using linq?

4 solutions

Solution 1

Solution 2

Solution 3

Solution 4

Add your solution here

Preview 0

Existing Members

...or Join us