使用LINQ按组列出的前三个文档

本文关键字:文档 三个 LINQ 使用 | 更新日期: 2023-09-27 17:57:47

我试图实现的是AccessGroup排名前三的文档。我所说的前三名是指拥有最多计数的文档。我当前的解决方案返回:

DocumentId AccessGroupId Count 2 1 5 1 1 3 3 1 2 5 1 2 4 1 1 6 1 1 8 1 1 10 1 1 ... 2 ...

我的目标是:

DocumentId AccessGroupId Count 2 1 5 1 1 3 3 1 2 ... 2 ...

我制作了一个可运行的LINQPad程序:GitHub Gist

void Main()
{
    var sampleData = new List<Foo>();
    sampleData.Add(new Foo { DocumentId = 1, AccessGroupId = 1 });
    sampleData.Add(new Foo { DocumentId = 2, AccessGroupId = 1 });
    sampleData.Add(new Foo { DocumentId = 3, AccessGroupId = 1 });
    sampleData.Add(new Foo { DocumentId = 2, AccessGroupId = 1 });
    sampleData.Add(new Foo { DocumentId = 2, AccessGroupId = 1 });
    sampleData.Add(new Foo { DocumentId = 2, AccessGroupId = 1 });
    sampleData.Add(new Foo { DocumentId = 3, AccessGroupId = 1 });
    sampleData.Add(new Foo { DocumentId = 2, AccessGroupId = 1 });
    sampleData.Add(new Foo { DocumentId = 1, AccessGroupId = 1 });
    sampleData.Add(new Foo { DocumentId = 1, AccessGroupId = 1 });
    sampleData.Add(new Foo { DocumentId = 4, AccessGroupId = 1 });
    sampleData.Add(new Foo { DocumentId = 5, AccessGroupId = 1 });
    sampleData.Add(new Foo { DocumentId = 6, AccessGroupId = 1 });
    sampleData.Add(new Foo { DocumentId = 5, AccessGroupId = 1 });
    sampleData.Add(new Foo { DocumentId = 8, AccessGroupId = 1 });
    sampleData.Add(new Foo { DocumentId = 10, AccessGroupId = 1 }); 
    sampleData.Add(new Foo { DocumentId = 2, AccessGroupId = 2 });
    sampleData.Add(new Foo { DocumentId = 2, AccessGroupId = 2 });
    sampleData.Add(new Foo { DocumentId = 2, AccessGroupId = 2 });
    sampleData.Add(new Foo { DocumentId = 3, AccessGroupId = 2 });
    sampleData.Add(new Foo { DocumentId = 3, AccessGroupId = 2 });
    sampleData.Add(new Foo { DocumentId = 3, AccessGroupId = 2 });
    sampleData.Add(new Foo { DocumentId = 3, AccessGroupId = 2 });
    sampleData.Add(new Foo { DocumentId = 3, AccessGroupId = 2 });
    sampleData.Add(new Foo { DocumentId = 3, AccessGroupId = 2 });
    sampleData.Add(new Foo { DocumentId = 3, AccessGroupId = 2 });
    sampleData.Add(new Foo { DocumentId = 3, AccessGroupId = 2 });
    sampleData.Add(new Foo { DocumentId = 4, AccessGroupId = 2 });
    sampleData.Add(new Foo { DocumentId = 4, AccessGroupId = 2 });
    sampleData.Add(new Foo { DocumentId = 4, AccessGroupId = 2 });
    sampleData.Add(new Foo { DocumentId = 4, AccessGroupId = 2 });
    sampleData.Add(new Foo { DocumentId = 4, AccessGroupId = 2 });      
    var x = (from entry in sampleData
     group entry by new { entry.DocumentId, entry.AccessGroupId } into g
     orderby  g.Key.AccessGroupId, g.Count() descending
     select new { DocumentId = g.Key.DocumentId, AccessGroupId = g.Key.AccessGroupId, Count = g.Count() }
    );
    Console.WriteLine(x);
}
public class Foo {
    public int DocumentId { get; set; }
    public int AccessGroupId { get; set; }
}

感谢您的帮助!

使用LINQ按组列出的前三个文档

您可以先按AccessGroupId分组,然后按DocumentId分组,按计数排序,取前3个。然后,您可以使用SelectMany来展开每个访问组的前3个文档。

var x = sampleData
    .GroupBy(x => x.AccessGroupId)
    .Select(accessGroup => new 
    { 
        AccessGroupId = accessGroup.Key, 
        TopThreeDocs = accessGroup.GroupBy(x => x.DocumentId)
                                  .OrderyByDescending(subg => subg.Count())
                                  .Take(3)
    })
    .SelectMany(x => x.TopThreeDocs.Select(y => new
    {
        x.AccessGroupId,
        DocumentId = y.Key,
        Count = y.Count()
    });

如下更改linq。

var x = (from entry in sampleData
                 group entry by new { entry.DocumentId, entry.AccessGroupId } into g
                 orderby g.Count() descending
                 select new { DocumentId = g.Key.DocumentId, AccessGroupId = g.Key.AccessGroupId, Count = g.Count() }
        ).Take(3);

Take(3)将选出前三名,希望能有所帮助。

使用Linq

如果你处理的是一个特别大的数据集,你可能会选择在linq之外做。

    sampleData
            .GroupBy(a=>new{a.AccessGroupId,a.DocumentId})
            .Select(a=>new{ Count=a.Count(),a.Key.AccessGroupId,a.Key.DocumentId })
            .OrderByDescending(a=>a.Count)
            .GroupBy(a=>a.AccessGroupId)
            .Select(a=>new{ AccessGroupId = a.Key, Values = a.Take(3)});

如果你想查看,请查看正在工作的小提琴

使用字典

可以肯定的是,使用Dictionary<int,Dictionary<int,int>>来存储计数会更有效。

    var cache = new Dictionary<int,Dictionary<int,int>>();
    foreach(var item in sampleData)
    {
        if(!cache.ContainsKey(item.AccessGroupId))
        {
            cache[item.AccessGroupId] = new Dictionary<int,int>();
        }
        if(!cache[item.AccessGroupId].ContainsKey(item.DocumentId))
        {
            cache[item.AccessGroupId][item.DocumentId]=0;
        }
        cache[item.AccessGroupId][item.DocumentId]++;
    }

    var results = cache
                  .Select(a=>new{ 
                            AccessGroupId = a.Key, 
                            Values = a.Value.OrderByDescending(b=>b.Value)
                                    .Select(b=>new{ DocumentId = b.Key, Count = b.Value })
                                    .Take(3)
                  });

用户友好程度较低,但与使用GroupBy相比,它确实更便宜,除非您使用的是Linq to Something,如果您想检查,请使用以下选项

我发现这是最简单的方法:

你首先需要获得每组的计数,然后你需要根据该计数排序,并取每组的前3名,然后使用Select Many:将该列表压平

var results = (from entry in sampleData
                group entry by new { entry.AccessGroupId, entry.DocumentId } into g
                select new
                {
                    AccessGroupId = g.Key.AccessGroupId,
                    DocumentId = g.Key.DocumentId,
                    Count = g.Count()
                }).OrderByDescending(x => x.Count)
                .GroupBy(x => x.AccessGroupId)
                .SelectMany(x => x.Take(3));