Click here to Skip to main content
15,880,972 members
Please Sign up or sign in to vote.
2.50/5 (2 votes)
See more:
Hi,
I have a list of strings. I want to find index of duplicates and remove them.
How can I do this?

What I have tried:

List<string> myList = new List<string> { "txt1", "txt2", "txt3", "txt1", "txt4", "txt5", "txt4" };
var indexOf = myList.Select((value, index) => new { value, index }).GroupBy(g => g.value).Where(pair => pair.Count() > 1).Select(pair => pair.index);
Posted
Updated 24-Aug-21 13:25pm
v3
Comments
Richard MacCutchan 24-Aug-21 2:49am    
And? What happens when you run it?
Alex Dunlop 24-Aug-21 2:56am    
The last part is wrong (pair.index).
BillWoodruff 25-Aug-21 2:52am    
Question up-voted to show appreciation for this poster using the appropriate folder, and making a good effort towards solving the problem.

Alex, the behavior of GroupBy, and dealing with anonymous types, are advanced topics. The Value of each IGrouping in the IEnumerable produced by GroupBy is an IEnumerable which usually needs to be evaluated (turned into a List) before using it.
BillWoodruff 25-Aug-21 4:57am    
If you want to see how a Zen master solves a problem like this: see Richard Deeeming's comment on my post below.

Try linq remove duplicates - Google Search[^].

[edit]
This will produce a list with the name and index, so you could maybe build on that:
C#
List<string> myList = new List<string> { "txt1", "txt2", "txt3", "txt1", "txt4", "txt5", "txt4" };
var unique = myList.Select((value, index) => new {Value =value, Index = index});
foreach (var x in unique)
{
    Console.Write($"{x.Value}: {x.Index}, ");
}
Console.WriteLine("
");


[/edit]
 
Share this answer
 
v2
Comments
Alex Dunlop 24-Aug-21 2:58am    
Thanks. I know that Distict() can render unique list. But I want to find the index of them.
Richard MacCutchan 24-Aug-21 3:18am    
There does not seem to be a simple way solution. You can use IndexOf, FindIndex etc, but I am not sure they will do what you want.
Richard MacCutchan 24-Aug-21 4:00am    
See my update.
Maciej Los 24-Aug-21 4:07am    
5ed!
BillWoodruff 24-Aug-21 17:43pm    
I don't see the value of this: you create an IEnumerable of anonymous Types that replicates the structure of the List.

But, a for loop index is an easier way to get the index.

The one thing I see in this that could be exploited is the fact that each instance of the anonymous Type is unique.

imho, the OP's real problem is a lack of understanding of GroupBy and anonymous Types, and the fact that string instances with identical content, but different indexes in the list, will all be "equal" if compared. Yes: FindIndex, with the right predicate, can be used in a solution.
See here: Enumerable.Distinct Method (System.Linq) | Microsoft Docs[^]
C#
List<string> myList = new List<string> { "txt1", "txt2", "txt3", "txt1", "txt4", "txt5", "txt4" };
var unique = myList.Distinct();
 
Share this answer
 
Quote:
I want to find index of duplicates and remove them


Well, if you want to remove duplicates from original list, take a look at below code and read comments:

C#
List<string> myList = new List<string> { "txt1", "txt2", "txt3", "txt1", "txt4", "txt5", "txt4" };
//get unique values and its indexes
List<Tuple<string, int>> uniquelist = myList.Distinct()
	.Select(x => new Tuple<string, int>(x, myList.IndexOf(x)))
	.ToList();
//find indexes of duplicates
List<int> indOfDup = Enumerable.Range(0, myList.Count)
	.Where(x => !uniquelist.Any(y => y.Item2==x))
	.ToList();
//remove duplicates from original list
for(int i = indOfDup.Count()-1; i>=0; i--)
{
	myList.RemoveAt(indOfDup[i]);
}

//done!
//original list has got non duplicates ;)


Good luck!
 
Share this answer
 
v2
Comments
Richard MacCutchan 24-Aug-21 6:57am    
+5. I had a feeling it could not be done in a single step.
Maciej Los 24-Aug-21 7:03am    
Thank you, Richard.
BillWoodruff 24-Aug-21 20:17pm    
+5 nice ... even though I think you are doing it the hard way :)
Maciej Los 25-Aug-21 0:19am    
Thank you, Bill.
George Swan 25-Aug-21 11:31am    
Does this work when there are more than 2 identical matching values?
A simpler way to find the duplicate indexes that also handles more than #1 duplicate entry:
C#
List<int> toremovestrs = new List<int>();

for (int i = 0; i < myList.Count; i++)
{
    int first = myList.IndexOf(myList[i]);
    string firststr = myList[first];

    int last = myList.LastIndexOf(myList[i]);

    if (first < last)
    {
        for (int j = first + 1; j <= last; j++)
        {
            if (myList[j] == firststr && ! toremovestrs.Contains(j))
            {
                toremovestrs.Add(j);
            }
        }
    }
}
Test:
List<string> myList = new List<string> { "txt1", "txt2", "txt3", "txt1", "txt4", "txt5", "txt4", "txt2", "txt3", "txt4","txt10", "txt10" };

// result
[0]: 3
[1]: 7
[2]: 8
[3]: 6
[4]: 9
[5]: 11
 
Share this answer
 
Comments
Maciej Los 25-Aug-21 0:20am    
5ed!
Richard Deeming 25-Aug-21 4:14am    
I can see some room for improvement there! :)

Eg:
for (int i = myList.Count - 1; i > 0 /* No need to check the first element */; i--)
{
    int index = myList.IndexOf(myList[i]);
    if (index < i) // This is not the first occurrence of the element
    {
        myList.RemoveAt(i);
    }
}
BillWoodruff 25-Aug-21 4:55am    
Wonderful ! I hear the sound of one hand clapping :)

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900