如何为二维LINQ操作创建一个高效的数据结构

本文关键字:一个 高效 数据结构 创建 LINQ 二维 操作 | 更新日期: 2023-09-27 18:16:27

问题:我有两种对象,我们称它们为BuildingImprovement。大约有30个Improvement实例,而可能有1-1000个Building实例。对于BuildingImprovement的每个组合,我必须执行一些繁重的计算,并将结果存储在Result对象中。

Building s和Improvement s都可以用整数ID表示。

然后我需要能够:

  • 有效地访问给定BuildingImprovementResult (编辑:见下面的评论)
  • 对给定Building的所有Improvement s执行Result s的聚合,如。sum()和。average ()
  • 对于给定的Improvement
  • ,对所有Building s的Result s执行相同的聚合

这将发生在web服务器后端,所以内存可能是一个问题,但速度是最重要的。

至今感想:

  1. 使用Dictionary<Tuple<int, int>, Result>, <BuildingID, ImprovementID>作为key。这应该给我快速插入和单次查找,但我担心.Where().Sum()的性能。
  2. 使用二维数组,BuildingID s为一维,ImprovementID s为一维,Result为值。此外,构建两个Dictionary<int, int>,将BuildingID s和ImprovementID s映射到它们各自的数组行/列索引。这可能意味着最多1000+ Dictionary s,这会是个问题吗?
  3. 使用List<Tuple<int, int, Result>>。我认为这可能是效率最低的,有O(n)个插入,尽管我可能错了。

我是否错过了一个明显更好的选择?

编辑:事实证明,这只是我感兴趣的聚合值(每Building和每Improvement);

如何为二维LINQ操作创建一个高效的数据结构

一般来说,Dictionary是查找效率最高的。当通过键访问时,查找效率和操作效率都是常数O(1)。这将有助于访问第一个点。

在第二个和第三个中,你需要遍历所有的项O(n),所以没有办法加速它,除非你想以指定的顺序O(n*n)遍历它们——然后你可以使用SortedDictionray O(n),但你会损害查找和操作效率(O(log n))。

所以我会选择你发布的第一个解决方案

您可以使用"字典的字典"来保存Result数据,例如:

//             Building ID ↓               ↓ Improvement ID
var data = new Dictionary<int, Dictionary<int, Result>>();

这将使您快速找到特定建筑的改进。

然而,找到包含特定改进的建筑物将需要在所有建筑物上进行迭代。下面是一些示例代码:

using System;
using System.Linq;
using System.Collections.Generic;
namespace Demo
{
    sealed class Result
    {
        public double Data;
    }
    sealed class Building
    {
        public int Id;
        public int Value;
    }
    sealed class Improvement
    {
        public int Id;
        public int Value;
    }
    class Program
    {
        void run()
        {
            //             Building ID ↓               ↓ Improvement ID
            var data = new Dictionary<int, Dictionary<int, Result>>();
            for (int buildingKey = 1000; buildingKey < 2000; ++buildingKey)
            {
                var improvements = new Dictionary<int, Result>();
                for (int improvementKey = 5000; improvementKey < 5030; ++improvementKey)
                    improvements.Add(improvementKey, new Result{ Data = buildingKey + improvementKey/1000.0 });
                data.Add(buildingKey, improvements);
            }
            // Aggregate data for all improvements for building with ID == 1500:
            int buildingId = 1500;
            var sum = data[buildingId].Sum(result => result.Value.Data);
            Console.WriteLine(sum);
            // Aggregate data for all buildings with a given improvement.
            int improvementId = 5010;
            sum = data.Sum(improvements =>
            {
                Result result;
                return improvements.Value.TryGetValue(improvementId, out result) ? result.Data : 0.0;
            });
            Console.WriteLine(sum);
        }
        static void Main()
        {
            new Program().run();
        }
    }
}

为了加速第二次聚合(对于给定ID的所有改进的数据求和),我们可以使用第二个字典:

//                      Improvment ID ↓               ↓ Building ID
var byImprovementId = new Dictionary<int, Dictionary<int, Result>>();

您将有一个额外的字典要维护,但它不是太复杂。使用一些像这样的嵌套字典可能会占用太多的内存,但这是值得考虑的。

如下面的注释所述,最好为id和字典本身定义类型。把它们放在一起就得到:

using System;
using System.Linq;
using System.Collections.Generic;
namespace Demo
{
    sealed class Result
    {
        public double Data;
    }
    sealed class BuildingId
    {
        public BuildingId(int id)
        {
            Id = id;
        }
        public readonly int Id;
        public override int GetHashCode()
        {
            return Id.GetHashCode();
        }
        public override bool Equals(object obj)
        {
            var other = obj as BuildingId;
            if (other == null)
                return false;
            return this.Id == other.Id;
        }
    }
    sealed class ImprovementId
    {
        public ImprovementId(int id)
        {
            Id = id;
        }
        public readonly int Id;
        public override int GetHashCode()
        {
            return Id.GetHashCode();
        }
        public override bool Equals(object obj)
        {
            var other = obj as ImprovementId;
            if (other == null)
                return false;
            return this.Id == other.Id;
        }
    }
    sealed class Building
    {
        public BuildingId Id;
        public int Value;
    }
    sealed class Improvement
    {
        public ImprovementId Id;
        public int Value;
    }
    sealed class BuildingResults : Dictionary<BuildingId, Result>{}
    sealed class ImprovementResults: Dictionary<ImprovementId, Result>{}
    sealed class BuildingsById: Dictionary<BuildingId, ImprovementResults>{}
    sealed class ImprovementsById: Dictionary<ImprovementId, BuildingResults>{}
    class Program
    {
        void run()
        {
            var byBuildingId    = CreateTestBuildingsById();            // Create some test data.
            var byImprovementId = CreateImprovementsById(byBuildingId); // Create the alternative lookup dictionaries.
            // Aggregate data for all improvements for building with ID == 1500:
            BuildingId buildingId = new BuildingId(1500);
            var sum = byBuildingId[buildingId].Sum(result => result.Value.Data);
            Console.WriteLine(sum);
            // Aggregate data for all buildings with a given improvement.
            ImprovementId improvementId = new ImprovementId(5010);
            sum = byBuildingId.Sum(improvements =>
            {
                Result result;
                return improvements.Value.TryGetValue(improvementId, out result) ? result.Data : 0.0;
            });
            Console.WriteLine(sum);
            // Aggregate data for all buildings with a given improvement using byImprovementId.
            // This will be much faster than the above Linq.
            sum = byImprovementId[improvementId].Sum(result => result.Value.Data);
            Console.WriteLine(sum);
        }
        static BuildingsById CreateTestBuildingsById()
        {
            var byBuildingId = new BuildingsById();
            for (int buildingKey = 1000; buildingKey < 2000; ++buildingKey)
            {
                var improvements = new ImprovementResults();
                for (int improvementKey = 5000; improvementKey < 5030; ++improvementKey)
                {
                    improvements.Add
                    (
                        new ImprovementId(improvementKey),
                        new Result
                        {
                            Data = buildingKey + improvementKey/1000.0
                        }
                    );
                }
                byBuildingId.Add(new BuildingId(buildingKey), improvements);
            }
            return byBuildingId;
        }
        static ImprovementsById CreateImprovementsById(BuildingsById byBuildingId)
        {
            var byImprovementId = new ImprovementsById();
            foreach (var improvements in byBuildingId)
            {
                foreach (var improvement in improvements.Value)
                {
                    if (!byImprovementId.ContainsKey(improvement.Key))
                        byImprovementId[improvement.Key] = new BuildingResults();
                    byImprovementId[improvement.Key].Add(improvements.Key, improvement.Value);
                }
            }
            return byImprovementId;
        }
        static void Main()
        {
            new Program().run();
        }
    }
}

最后,这里是一个修改后的版本,它确定为特定改进的所有构建/改进组合的所有实例聚合数据所需的时间,并比较元组的字典和字典的字典的结果。

在任何调试器之外运行RELEASE构建的结果:

Dictionary of dictionaries took 00:00:00.2967741
Dictionary of tuples took 00:00:07.8164672

使用字典的字典要快得多,但这只有在你打算做很多这样的聚合时才重要。

using System;
using System.Diagnostics;
using System.Linq;
using System.Collections.Generic;
namespace Demo
{
    sealed class Result
    {
        public double Data;
    }
    sealed class BuildingId
    {
        public BuildingId(int id)
        {
            Id = id;
        }
        public readonly int Id;
        public override int GetHashCode()
        {
            return Id.GetHashCode();
        }
        public override bool Equals(object obj)
        {
            var other = obj as BuildingId;
            if (other == null)
                return false;
            return this.Id == other.Id;
        }
    }
    sealed class ImprovementId
    {
        public ImprovementId(int id)
        {
            Id = id;
        }
        public readonly int Id;
        public override int GetHashCode()
        {
            return Id.GetHashCode();
        }
        public override bool Equals(object obj)
        {
            var other = obj as ImprovementId;
            if (other == null)
                return false;
            return this.Id == other.Id;
        }
    }
    sealed class Building
    {
        public BuildingId Id;
        public int Value;
    }
    sealed class Improvement
    {
        public ImprovementId Id;
        public int Value;
    }
    sealed class BuildingResults : Dictionary<BuildingId, Result>{}
    sealed class ImprovementResults: Dictionary<ImprovementId, Result>{}
    sealed class BuildingsById: Dictionary<BuildingId, ImprovementResults>{}
    sealed class ImprovementsById: Dictionary<ImprovementId, BuildingResults>{}
    class Program
    {
        void run()
        {
            var byBuildingId    = CreateTestBuildingsById();            // Create some test data.
            var byImprovementId = CreateImprovementsById(byBuildingId); // Create the alternative lookup dictionaries.
            var testTuples      = CreateTestTuples();
            ImprovementId improvementId = new ImprovementId(5010);
            int count = 10000;
            Stopwatch sw = Stopwatch.StartNew();
            for (int i = 0; i < count; ++i)
                byImprovementId[improvementId].Sum(result => result.Value.Data);
            Console.WriteLine("Dictionary of dictionaries took " + sw.Elapsed);
            sw.Restart();
            for (int i = 0; i < count; ++i)
                testTuples.Where(result => result.Key.Item2.Equals(improvementId)).Sum(item => item.Value.Data);
            Console.WriteLine("Dictionary of tuples took " + sw.Elapsed);
        }
        static Dictionary<Tuple<BuildingId, ImprovementId>, Result> CreateTestTuples()
        {
            var result = new Dictionary<Tuple<BuildingId, ImprovementId>, Result>();
            for (int buildingKey = 1000; buildingKey < 2000; ++buildingKey)
                for (int improvementKey = 5000; improvementKey < 5030; ++improvementKey)
                    result.Add(
                        new Tuple<BuildingId, ImprovementId>(new BuildingId(buildingKey), new ImprovementId(improvementKey)),
                        new Result
                        {
                            Data = buildingKey + improvementKey/1000.0
                        });
            return result;
        }
        static BuildingsById CreateTestBuildingsById()
        {
            var byBuildingId = new BuildingsById();
            for (int buildingKey = 1000; buildingKey < 2000; ++buildingKey)
            {
                var improvements = new ImprovementResults();
                for (int improvementKey = 5000; improvementKey < 5030; ++improvementKey)
                {
                    improvements.Add
                    (
                        new ImprovementId(improvementKey),
                        new Result
                        {
                            Data = buildingKey + improvementKey/1000.0
                        }
                    );
                }
                byBuildingId.Add(new BuildingId(buildingKey), improvements);
            }
            return byBuildingId;
        }
        static ImprovementsById CreateImprovementsById(BuildingsById byBuildingId)
        {
            var byImprovementId = new ImprovementsById();
            foreach (var improvements in byBuildingId)
            {
                foreach (var improvement in improvements.Value)
                {
                    if (!byImprovementId.ContainsKey(improvement.Key))
                        byImprovementId[improvement.Key] = new BuildingResults();
                    byImprovementId[improvement.Key].Add(improvements.Key, improvement.Value);
                }
            }
            return byImprovementId;
        }
        static void Main()
        {
            new Program().run();
        }
    }
}

感谢您的回答,测试代码确实提供了信息:)

我的解决方案是放弃LINQ,并在繁重的计算后直接手动执行聚合,因为无论如何我必须迭代构建和改进的每个组合。

此外,我必须使用对象本身作为键,以便在对象被持久化到实体框架之前执行计算(即它们的id都是0)。

代码:

public class Building {
    public int ID { get; set; }
    ...
}
public class Improvement {
    public int ID { get; set; }
    ...
}
public class Result {
    public decimal Foo { get; set; }
    public long Bar { get; set; }
    ...
    public void Add(Result result) {
        Foo += result.Foo;
        Bar += result.Bar;
        ...
    }   
}
public class Calculator {
    public Dictionary<Building, Result> ResultsByBuilding;
    public Dictionary<Improvement, Result> ResultsByImprovement;
    public void CalculateAndAggregate(IEnumerable<Building> buildings, IEnumerable<Improvement> improvements) {
        ResultsByBuilding = new Dictionary<Building, Result>();
        ResultsByImprovement = new Dictionary<Improvement, Result>();
        for (building in buildings) {
            for (improvement in improvements) {
                Result result = DoHeavyCalculation(building, improvement);
                if (ResultsByBuilding.ContainsKey(building)) {
                    ResultsByBuilding[building].Add(result);
                } else {
                    ResultsByBuilding[building] = result;
                }
                if (ResultsByImprovement.ContainsKey(improvement)) {
                    ResultsByImprovement[improvement].Add(result);
                } else {
                    ResultsByImprovement[improvement] = result;
                }
            }
        }
    }
}
public static void Main() {
    var calculator = new Calculator();
    IList<Building> buildings = GetBuildingsFromRepository();
    IList<Improvement> improvements = GetImprovementsFromRepository();
    calculator.CalculateAndAggregate(buildings, improvements);
    DoStuffWithResults(calculator);
}

我这样做是因为我确切地知道我想要的聚合;如果我需要一个更动态的方法,我可能会选择像@MatthewWatson’s Dictionary of Dictionaries这样的东西。