在数据表中查找值的最佳性能?为?林克?其他

本文关键字：性能林克其他最佳数据表查找 | 更新日期: 2023-09-27 17:55:08

我在c#程序的DataTable中加载了一个大的txt文件。

我需要在此数据表中搜索多个值。

目前我使用一个简单的 For 循环，它很长！我真的需要争取时间。

有没有更好的方法来执行此操作？使用 Linq ？还是其他方法？

这是我的代码的基本示例：

foreach (DataRow row in DataTables[0].Rows) 
{
    for (int i = 0; i <= DataTables[1].Rows.Count - 1; i++)
    {
        if ((DataTables[1].Rows[i]["PRODUCT_CODE"].ToString().Trim() == row["PRODUCT_CODE"].ToString().Trim())
        {
            // Do Some Stuff
            // When the value is found, don't break the for...continue because there is severals "PRODUCT_CODE", not once.
        } 
    }
}

在数据表中查找值的最佳性能?为?林克?其他

HashSet<string> dt0 = new HashSet<string>();
foreach (DataRow row in DataTables[0].Rows) 
    dt0.Add(row["PRODUCT_CODE"].ToString().Trim());
for (int i = 0; i <= DataTables[1].Rows.Count - 1; i++)
{
    if ( dt0.Contains(DataTables[1].Rows[i]["PRODUCT_CODE"].ToString().Trim() == row["PRODUCT_CODE"].ToString().Trim())
    {
        // Do Some Stuff
        // When the value is found, don't break the for...continue because there is severals "PRODUCT_CODE", not once.
    } 
}

刚刚从 O（n^m）到 O（n+m）

如果您需要整行，则字典而不是哈希集

Dictionary<String, DataRow> dt0 = new Dictionary<String, DataRow>();

您应该使用较大的哈希集/字典。

我会给你更多，但你傲慢地问我是否认为这会更快。

您首先为什么要使用数据表？

一个简短的例子使用更多作为核心

Parallel.ForEach(dt.AsEnumerable(), row =>
{
    if (i["value1"].ToString() == "test")
    {
        Console.WriteLine(i["value1"]);
    }
});

其他解决方案

比较键非常快

Dictionary<string, Product> file1 = new Dictionary<string, Product>();
Dictionary<string, Product> file2 = new Dictionary<string, Product>();
//Add ProductCode in key
var product = new Product();
product.Code = "EAN1202";
product.Manufacturer = "Company";
product.Name = "Test";
product.Price = 12.05;
file1.Add(product.Code, product);
//One thread
foreach (var item in file1)
{
   if (file2.ContainsKey(item.Key))
   {
      // Do Some Stuff
   }
}
//Multi thread
Parallel.ForEach(file1, item =>
{
   if (file2.ContainsKey(item.Key))
   {
      // Do Some Stuff
   }
});

产品类别

public class Product
{
    public string Code;
    public string Manufacturer;
    public string Name;
    public double Price;
}

如果我们知道您在循环中做什么，这可能会更好一些，但这应该有效：

var dt1=DataTables[0].Rows.AsEnumerable();
var dt2=DataTables[1].Rows.AsEnumerable();
var results=dt1.Join(
  dt2,
  d1=>d1.Field<string>("PRODUCT_CODE").Trim(),
  d2=>d2.Field<string>("PRODUCT_CODE").Trim(),
  (d1,d2)=>new {d1,d2});
foreach(var row in results)
{
  // Do stuff with row.d1/row.d2
}

例如，如果数据表是从 SQL 源创建的，则最好改用联接，这将允许 SQL 服务器执行联接，而不是在客户端执行联接。此外，不使用数据表和使用 POCO 类会提高您的性能，并且您无需在联接期间对产品代码进行装箱/拆箱。