按DocumentDB中的字段分组

本文关键字：字段 DocumentDB | 更新日期: 2023-09-27 18:19:24

是否有可能以某种方式对DocumentDB中存储过程中的字段进行分组?

假设我有以下集合:

[
    {
        name: "Item A",
        priority: 1
    },
    {
        name: "Item B",
        priority: 2
    },
    {
        name: "Item C",
        priority: 2
    },
    {
        name: "Item D",
        priority: 1
    }
]

我想获得最高优先级组(在本例中为2优先级)中的所有项。我不知道什么是最重要的。例如:

[
    {
        name: "Item B",
        priority: 2
    },
    {
        name: "Item C",
        priority: 2
    }
]

使用一些粗糙的LINQ，它看起来像这样:

var highestPriority = 
    collection
        .GroupBy(x => x.Priority)
        .OrderByDescending(x => x.Key)
        .First();

按DocumentDB中的字段分组

DocumentDB目前不支持GROUP BY或任何其他聚合。这是第二个最受欢迎的功能，在DocumentDB UserVoice上被列为"Under Review"。

同时，DocumentDB -lumenize是作为存储过程编写的DocumentDB的聚合库。将cube.string作为存储过程加载，然后使用聚合配置调用它。对于这个例子来说，这有点夸张，但它完全能够完成您在这里要求的工作。如果将其传递到存储过程中:

{cubeConfig: {groupBy: "name", field: "priority", f: "max"}}

应该做你想做的。

注意，Lumenize可以做的远不止这些，包括使用其他函数(sum, count, min, max, median, p75等)，数据透视表，以及具有每个单元格多个度量的复杂n维超立方体的简单分组。

我从来没有尝试过加载立方体。. net的字符串，因为我们使用node.js，但它是作为字符串而不是javascript发布的，所以你可以很容易地加载和发送它。

或者，您可以编写一个存储过程来完成这个简单的聚合。

GroupBy在DocumentDB中仍然不受支持，最好的方法是上面描述的(使用存储过程)，或者像UserVoice项目中描述的那样使用Spark连接器。但是，如果要分组的集合相对较小，还有另一种解决方案:

获取集合中不分组的所有结果，并在内存中执行分组。

因此:

var highestPriority = 
collection
    .GroupBy(x => x.Priority)
    .OrderByDescending(x => x.Key)
    .First();

您使用

var highestPriority = 
collection
    .Where(<filter to reduce set>)
    .AsEnumerable()
    .GroupBy(x => x.Priority)
    .OrderByDescending(x => x.Key)
    .First();

. asenumerable ()从documentDB获得结果，然后在内存中完成groupBy。但请注意，这不是最佳解决方案，应该只在您确定结果集很小的情况下使用。