如何为二维LINQ操作创建一个高效的数据结构
本文关键字:一个 高效 数据结构 创建 LINQ 二维 操作 | 更新日期: 2023-09-27 18:16:27
问题:我有两种对象,我们称它们为Building
和Improvement
。大约有30个Improvement
实例,而可能有1-1000个Building
实例。对于Building
和Improvement
的每个组合,我必须执行一些繁重的计算,并将结果存储在Result
对象中。
Building
s和Improvement
s都可以用整数ID表示。
然后我需要能够:
-
有效地访问给定(编辑:见下面的评论)Building
和Improvement
的Result
- 对给定
Building
的所有Improvement
s执行Result
s的聚合,如。sum()和。average () - 对于给定的
Improvement
,对所有
Building
s的Result
s执行相同的聚合这将发生在web服务器后端,所以内存可能是一个问题,但速度是最重要的。
至今感想:
- 使用
Dictionary<Tuple<int, int>, Result>
,<BuildingID, ImprovementID>
作为key。这应该给我快速插入和单次查找,但我担心.Where()
和.Sum()
的性能。 - 使用二维数组,
BuildingID
s为一维,ImprovementID
s为一维,Result
为值。此外,构建两个Dictionary<int, int>
,将BuildingID
s和ImprovementID
s映射到它们各自的数组行/列索引。这可能意味着最多1000+Dictionary
s,这会是个问题吗? - 使用
List<Tuple<int, int, Result>>
。我认为这可能是效率最低的,有O(n)个插入,尽管我可能错了。
我是否错过了一个明显更好的选择?
编辑:事实证明,这只是我感兴趣的聚合值(每Building
和每Improvement
);
一般来说,Dictionary是查找效率最高的。当通过键访问时,查找效率和操作效率都是常数O(1)。这将有助于访问第一个点。
在第二个和第三个中,你需要遍历所有的项O(n),所以没有办法加速它,除非你想以指定的顺序O(n*n)遍历它们——然后你可以使用SortedDictionray O(n),但你会损害查找和操作效率(O(log n))。
所以我会选择你发布的第一个解决方案
您可以使用"字典的字典"来保存Result数据,例如:
// Building ID ↓ ↓ Improvement ID
var data = new Dictionary<int, Dictionary<int, Result>>();
这将使您快速找到特定建筑的改进。
然而,找到包含特定改进的建筑物将需要在所有建筑物上进行迭代。下面是一些示例代码:
using System;
using System.Linq;
using System.Collections.Generic;
namespace Demo
{
sealed class Result
{
public double Data;
}
sealed class Building
{
public int Id;
public int Value;
}
sealed class Improvement
{
public int Id;
public int Value;
}
class Program
{
void run()
{
// Building ID ↓ ↓ Improvement ID
var data = new Dictionary<int, Dictionary<int, Result>>();
for (int buildingKey = 1000; buildingKey < 2000; ++buildingKey)
{
var improvements = new Dictionary<int, Result>();
for (int improvementKey = 5000; improvementKey < 5030; ++improvementKey)
improvements.Add(improvementKey, new Result{ Data = buildingKey + improvementKey/1000.0 });
data.Add(buildingKey, improvements);
}
// Aggregate data for all improvements for building with ID == 1500:
int buildingId = 1500;
var sum = data[buildingId].Sum(result => result.Value.Data);
Console.WriteLine(sum);
// Aggregate data for all buildings with a given improvement.
int improvementId = 5010;
sum = data.Sum(improvements =>
{
Result result;
return improvements.Value.TryGetValue(improvementId, out result) ? result.Data : 0.0;
});
Console.WriteLine(sum);
}
static void Main()
{
new Program().run();
}
}
}
为了加速第二次聚合(对于给定ID的所有改进的数据求和),我们可以使用第二个字典:
// Improvment ID ↓ ↓ Building ID
var byImprovementId = new Dictionary<int, Dictionary<int, Result>>();
您将有一个额外的字典要维护,但它不是太复杂。使用一些像这样的嵌套字典可能会占用太多的内存,但这是值得考虑的。
如下面的注释所述,最好为id和字典本身定义类型。把它们放在一起就得到:
using System;
using System.Linq;
using System.Collections.Generic;
namespace Demo
{
sealed class Result
{
public double Data;
}
sealed class BuildingId
{
public BuildingId(int id)
{
Id = id;
}
public readonly int Id;
public override int GetHashCode()
{
return Id.GetHashCode();
}
public override bool Equals(object obj)
{
var other = obj as BuildingId;
if (other == null)
return false;
return this.Id == other.Id;
}
}
sealed class ImprovementId
{
public ImprovementId(int id)
{
Id = id;
}
public readonly int Id;
public override int GetHashCode()
{
return Id.GetHashCode();
}
public override bool Equals(object obj)
{
var other = obj as ImprovementId;
if (other == null)
return false;
return this.Id == other.Id;
}
}
sealed class Building
{
public BuildingId Id;
public int Value;
}
sealed class Improvement
{
public ImprovementId Id;
public int Value;
}
sealed class BuildingResults : Dictionary<BuildingId, Result>{}
sealed class ImprovementResults: Dictionary<ImprovementId, Result>{}
sealed class BuildingsById: Dictionary<BuildingId, ImprovementResults>{}
sealed class ImprovementsById: Dictionary<ImprovementId, BuildingResults>{}
class Program
{
void run()
{
var byBuildingId = CreateTestBuildingsById(); // Create some test data.
var byImprovementId = CreateImprovementsById(byBuildingId); // Create the alternative lookup dictionaries.
// Aggregate data for all improvements for building with ID == 1500:
BuildingId buildingId = new BuildingId(1500);
var sum = byBuildingId[buildingId].Sum(result => result.Value.Data);
Console.WriteLine(sum);
// Aggregate data for all buildings with a given improvement.
ImprovementId improvementId = new ImprovementId(5010);
sum = byBuildingId.Sum(improvements =>
{
Result result;
return improvements.Value.TryGetValue(improvementId, out result) ? result.Data : 0.0;
});
Console.WriteLine(sum);
// Aggregate data for all buildings with a given improvement using byImprovementId.
// This will be much faster than the above Linq.
sum = byImprovementId[improvementId].Sum(result => result.Value.Data);
Console.WriteLine(sum);
}
static BuildingsById CreateTestBuildingsById()
{
var byBuildingId = new BuildingsById();
for (int buildingKey = 1000; buildingKey < 2000; ++buildingKey)
{
var improvements = new ImprovementResults();
for (int improvementKey = 5000; improvementKey < 5030; ++improvementKey)
{
improvements.Add
(
new ImprovementId(improvementKey),
new Result
{
Data = buildingKey + improvementKey/1000.0
}
);
}
byBuildingId.Add(new BuildingId(buildingKey), improvements);
}
return byBuildingId;
}
static ImprovementsById CreateImprovementsById(BuildingsById byBuildingId)
{
var byImprovementId = new ImprovementsById();
foreach (var improvements in byBuildingId)
{
foreach (var improvement in improvements.Value)
{
if (!byImprovementId.ContainsKey(improvement.Key))
byImprovementId[improvement.Key] = new BuildingResults();
byImprovementId[improvement.Key].Add(improvements.Key, improvement.Value);
}
}
return byImprovementId;
}
static void Main()
{
new Program().run();
}
}
}
最后,这里是一个修改后的版本,它确定为特定改进的所有构建/改进组合的所有实例聚合数据所需的时间,并比较元组的字典和字典的字典的结果。
在任何调试器之外运行RELEASE构建的结果:
Dictionary of dictionaries took 00:00:00.2967741
Dictionary of tuples took 00:00:07.8164672
使用字典的字典要快得多,但这只有在你打算做很多这样的聚合时才重要。
using System;
using System.Diagnostics;
using System.Linq;
using System.Collections.Generic;
namespace Demo
{
sealed class Result
{
public double Data;
}
sealed class BuildingId
{
public BuildingId(int id)
{
Id = id;
}
public readonly int Id;
public override int GetHashCode()
{
return Id.GetHashCode();
}
public override bool Equals(object obj)
{
var other = obj as BuildingId;
if (other == null)
return false;
return this.Id == other.Id;
}
}
sealed class ImprovementId
{
public ImprovementId(int id)
{
Id = id;
}
public readonly int Id;
public override int GetHashCode()
{
return Id.GetHashCode();
}
public override bool Equals(object obj)
{
var other = obj as ImprovementId;
if (other == null)
return false;
return this.Id == other.Id;
}
}
sealed class Building
{
public BuildingId Id;
public int Value;
}
sealed class Improvement
{
public ImprovementId Id;
public int Value;
}
sealed class BuildingResults : Dictionary<BuildingId, Result>{}
sealed class ImprovementResults: Dictionary<ImprovementId, Result>{}
sealed class BuildingsById: Dictionary<BuildingId, ImprovementResults>{}
sealed class ImprovementsById: Dictionary<ImprovementId, BuildingResults>{}
class Program
{
void run()
{
var byBuildingId = CreateTestBuildingsById(); // Create some test data.
var byImprovementId = CreateImprovementsById(byBuildingId); // Create the alternative lookup dictionaries.
var testTuples = CreateTestTuples();
ImprovementId improvementId = new ImprovementId(5010);
int count = 10000;
Stopwatch sw = Stopwatch.StartNew();
for (int i = 0; i < count; ++i)
byImprovementId[improvementId].Sum(result => result.Value.Data);
Console.WriteLine("Dictionary of dictionaries took " + sw.Elapsed);
sw.Restart();
for (int i = 0; i < count; ++i)
testTuples.Where(result => result.Key.Item2.Equals(improvementId)).Sum(item => item.Value.Data);
Console.WriteLine("Dictionary of tuples took " + sw.Elapsed);
}
static Dictionary<Tuple<BuildingId, ImprovementId>, Result> CreateTestTuples()
{
var result = new Dictionary<Tuple<BuildingId, ImprovementId>, Result>();
for (int buildingKey = 1000; buildingKey < 2000; ++buildingKey)
for (int improvementKey = 5000; improvementKey < 5030; ++improvementKey)
result.Add(
new Tuple<BuildingId, ImprovementId>(new BuildingId(buildingKey), new ImprovementId(improvementKey)),
new Result
{
Data = buildingKey + improvementKey/1000.0
});
return result;
}
static BuildingsById CreateTestBuildingsById()
{
var byBuildingId = new BuildingsById();
for (int buildingKey = 1000; buildingKey < 2000; ++buildingKey)
{
var improvements = new ImprovementResults();
for (int improvementKey = 5000; improvementKey < 5030; ++improvementKey)
{
improvements.Add
(
new ImprovementId(improvementKey),
new Result
{
Data = buildingKey + improvementKey/1000.0
}
);
}
byBuildingId.Add(new BuildingId(buildingKey), improvements);
}
return byBuildingId;
}
static ImprovementsById CreateImprovementsById(BuildingsById byBuildingId)
{
var byImprovementId = new ImprovementsById();
foreach (var improvements in byBuildingId)
{
foreach (var improvement in improvements.Value)
{
if (!byImprovementId.ContainsKey(improvement.Key))
byImprovementId[improvement.Key] = new BuildingResults();
byImprovementId[improvement.Key].Add(improvements.Key, improvement.Value);
}
}
return byImprovementId;
}
static void Main()
{
new Program().run();
}
}
}
感谢您的回答,测试代码确实提供了信息:)
我的解决方案是放弃LINQ,并在繁重的计算后直接手动执行聚合,因为无论如何我必须迭代构建和改进的每个组合。
此外,我必须使用对象本身作为键,以便在对象被持久化到实体框架之前执行计算(即它们的id都是0)。
代码:public class Building {
public int ID { get; set; }
...
}
public class Improvement {
public int ID { get; set; }
...
}
public class Result {
public decimal Foo { get; set; }
public long Bar { get; set; }
...
public void Add(Result result) {
Foo += result.Foo;
Bar += result.Bar;
...
}
}
public class Calculator {
public Dictionary<Building, Result> ResultsByBuilding;
public Dictionary<Improvement, Result> ResultsByImprovement;
public void CalculateAndAggregate(IEnumerable<Building> buildings, IEnumerable<Improvement> improvements) {
ResultsByBuilding = new Dictionary<Building, Result>();
ResultsByImprovement = new Dictionary<Improvement, Result>();
for (building in buildings) {
for (improvement in improvements) {
Result result = DoHeavyCalculation(building, improvement);
if (ResultsByBuilding.ContainsKey(building)) {
ResultsByBuilding[building].Add(result);
} else {
ResultsByBuilding[building] = result;
}
if (ResultsByImprovement.ContainsKey(improvement)) {
ResultsByImprovement[improvement].Add(result);
} else {
ResultsByImprovement[improvement] = result;
}
}
}
}
}
public static void Main() {
var calculator = new Calculator();
IList<Building> buildings = GetBuildingsFromRepository();
IList<Improvement> improvements = GetImprovementsFromRepository();
calculator.CalculateAndAggregate(buildings, improvements);
DoStuffWithResults(calculator);
}
我这样做是因为我确切地知道我想要的聚合;如果我需要一个更动态的方法,我可能会选择像@MatthewWatson’s Dictionary of Dictionaries这样的东西。