Home > database >  How to remove duplicates in List of Class with Linq
How to remove duplicates in List of Class with Linq

Time:01-15

I have a list of class :

class GroupAssets
    {
        public string Name { get; set; }
        public List<string> Assets { get; set; }
    }

        List<GroupAssets> GroupList2 = new List<GroupAssets>{
                new GroupAssets { Name="Group1", Assets = new List<string>{ "A","B","C","D" }},
                new GroupAssets { Name="Group1", Assets = new List<string>{ "A","B","E","F" }},
                new GroupAssets { Name="Group3", Assets = new List<string>{ "A","B","H","G" }},
                new GroupAssets { Name="Group4", Assets = new List<string>{ "A","I","C","J" }}
    };

I would like to remove the duplicates and have this result :

Group1 => D
Group2 => E,F
Group3 => H,G
Group4 => I,J
Duplicate => A,B,C

Thank you for your help

CodePudding user response:

         List<GroupAssets> GroupList = new List<GroupAssets>{ 
                    new GroupAssets { Name="Group1", Assets = new List<string>{ "A","B","C","D" }},
                    new GroupAssets { Name="Group1", Assets = new List<string>{ "A","B","E","F" }},
                    new GroupAssets { Name="Group3", Assets = new List<string>{ "A","B","H","G" }},
                    new GroupAssets { Name="Group4", Assets = new List<string>{ "A","I","C","J" }}
        };
    var assetList = new Dictionary<string,int>();
    foreach (var g in GroupList.Select(x=> x.Assets)) {
         g.ForEach(x=> {
             if (!assetList.ContainsKey(x)) assetList.Add(x,1);
             else assetList[x]  ;
         });
    }

    var nonUnique = assetList.Where(x=> x.Value > 1).Select(x=> x.Key).ToList();
    nonUnique.ForEach(x=> { Console.WriteLine(x); });

alternative solution in case you want to know the total amount of duplicates

CodePudding user response:

You can use this approach:

HashSet<string> allUnique = new HashSet<string>();
HashSet<string> duplicates = new HashSet<string>();
foreach (string s in GroupList.SelectMany(ga => ga.Assets.SelectMany(a => a.Split(','))))
{
    if (!allUnique.Add(s)) duplicates.Add(s);
}

foreach (GroupAssets ga in GroupList)
{
    for (int i = 0; i < ga.Assets.Count; i  )
    {
        ga.Assets[i] = string.Join(",", ga.Assets[i].Split(',').Except(duplicates));
    }
}

But why you store a single string in a List which contains multiple comma separated values? Sounds like should store them in the Assets lists separately.

CodePudding user response:

Assuming that you have

List<GroupAssets> GroupList = new List<GroupAssets>{
  new GroupAssets { Name="Group1", Assets = new List<string>{ "A" ,"B", "C", "D" }},
  new GroupAssets { Name="Group2", Assets = new List<string>{ "A" ,"B", "E", "F" }},
  new GroupAssets { Name="Group3", Assets = new List<string>{ "A" ,"B", "H", "G" }},
  new GroupAssets { Name="Group4", Assets = new List<string>{ "A" ,"I", "C", "J" }},
};

note, that each Asset has 4 items (not 1) you can put

Code:

HashSet<string> duplicates = new HashSet<string>();
HashSet<string> all = new HashSet<string>();

foreach (var item in GroupList)
  foreach (var asset in item.Assets) 
    if (!all.Add(asset))     // duplicate if all contains the asset
      duplicates.Add(asset);

// removing duplicates from each Asset
foreach (var item in GroupList)
  item.Assets.RemoveAll(item => duplicates.Contains(item));

Let's have a look:

string report = string.Join(Environment.NewLine, GroupList
  .Select(item => $"{item.Name} => {string.Join(", ", item.Assets)}"));

Console.WriteLine(report);

Console.WriteLine("Duplicate => {string.Join(", ", duplicates)}");

Outcome:

Group1 => D
Group2 => E, F
Group3 => H, G
Group4 => I, J
Duplicate => A, B, C

If, however, each of Assets contains 1 comma separated item, you should add Split and Join:

HashSet<string> duplicates = new HashSet<string>();
HashSet<string> all = new HashSet<string>();

foreach (var item in GroupList)
  foreach (var asset in item.Assets.SelectMany(list => list.Split(','))) 
    if (!all.Add(asset)) 
      duplicates.Add(asset);

foreach (var item in GroupList) {
  item.Assets = item
    .Assets
    .Select(asset => asset.Split(',').Where(c => !duplicates.Contains(c)))
    .Where(asset => asset.Any())
    .Select(asset => string.Join(",", asset))
    .ToList();
}

CodePudding user response:

I assume you made a mistake and property GroupAssets.Assets contains the list of assets (new List<string>() {"A", "B"}) and not the list of comma separated strings with only one string in the list (new List<string>() {"A,B"}).

First you have to figure out what are the duplicates. You could group items by one of strings "A" to "J" and the value int is the number of occurrences of that key in all lists. We take code from another Stack Overflow question, enhanced with one SelectMany because we want to flatten many lists into one.

    var assetCount = GroupList
        .SelectMany(x => x.Assets)
        .GroupBy(x => x)
        .Select(s => new { Asset = s.Key, Count = s.Count() });

Then we make list of duplicates, and a list of groups with unique assets:

    var duplicates = assetCount.Where(x => x.Count > 1).Select(x => x.Asset).ToList();
    
    var uniqueAssetsGroupList = GroupList
        .Select(x => new GroupAssets() { Name = x.Name, Assets = x.Assets.Except(duplicates).ToList() });
    
    foreach (var group in uniqueAssetsGroupList)
        Console.WriteLine(string.Format("{0} => {1}", group.Name, string.Join(",", group.Assets)));

    Console.WriteLine("Duplicate => {0}", string.Join(",", duplicates));
  •  Tags:  
  • Related