从多个连接表的子集加载数据集的最有效方法

本文关键字：数据集有效方法加载子集连接 | 更新日期: 2023-09-27 18:17:11

我有一个很大的库存系统，我不得不重写其中的一部分I/O部分。在它的核心，有一个产品表和一组相关的表。我需要尽可能高效地阅读其中的部分内容。我在c#中构造了这个查询:

select *                                    -- includes productid
into #tt
from products where productClass = 547      -- possibly more conditions
select * from #tt;
select * from productHistory where productid in (select productid from #tt);
select * from productSuppliers where productid in (select productid from #tt);
select * from productSafetyInfo where productid in (select productid from #tt);
select * from productMiscInfo where productid in (select productid from #tt);
drop table #tt;

这个查询给了我所需要的结果:5个结果集，每个结果集有0个、一个或多个记录(如果第一个返回0行，其他的当然也会返回0行)。然后程序将这些结果集塞入合适的DataSet。(然后将其传递给只需要这些记录的构造函数。)这个查询(有不同的条件)以批次运行。

我的问题是，是否有更有效的方法来检索这些数据?

将其作为单个连接重新工作将无法工作，因为每个子节点可能返回可变数量的行。

从多个连接表的子集加载数据集的最有效方法

如果您有产品索引。

    select * from products where productClass = 547   -- includes productid    
    select productHistory.* 
    from productHistory 
    join products 
      on products.productid = productHistory.productid 
      and products,productClass = 547;
    ...

如果productID是一个聚集索引，那么使用

可能会获得更好的持久性。

    CREATE TABLE #Temp (productid INT PRIMARY KEY CLUSTERED);
    insert into #temp
    select productid  from products where productClass = 547 
    order by productid;
    go
    select productHistory.* 
    from productHistory 
    join #Temp 
      on #Temp.productid = productHistory.productid;

在聚集索引上的连接似乎提供了最好的性能。想想看——SQL可以匹配第一个，并且知道它可以忘记其余的，然后移动到第二个，知道它可以向前移动(而不是回到顶部)。使用where (select…)SQL不能利用顺序。需要连接的表越多，使用#temp的理由就越多，因为创建#temp大约需要1/2秒的时间。如果你要使用#temp，你最好将它设置为一个结构化的temp

请确保在索引上连接JOIN表。否则，你将以表扫描和索引扫描告终，你的代码将非常缓慢，特别是在连接大型表时。

最佳实践是优化SQL查询以避免表扫描。

如果还没有，我强烈建议将其作为一个存储过程。

此外，我怀疑(但不能在没有测试的情况下证明)，如果对每个子表执行产品表上的连接，而不是复制到本地表，您将获得更好的性能。

最后，除非你能把这些数据组合起来，否则我认为没有比这更有效的方法了

如果不了解模式并对数据和表大小有更多的了解，就很难对查询端提出明确的改进建议。

但是，由于您使用的是批处理命令来返回多个结果集，因此您可以使用SqlDataAdapter来完成这部分工作，而不是"将结果塞进适当的数据集":

SqlDataAdapter adapter = new SqlDataAdapter(cmd);
DataSet results = new DataSet();
adapter.Fill(results);

之后，第一个结果集将在results中。表[0]，第二个结果。表[1]等。