如何在不影响性能的情况下执行LINQ GroupBy、Select和Take
本文关键字:GroupBy LINQ Select Take 执行 情况下 影响 性能 | 更新日期: 2023-09-27 17:50:46
背景
我有一个SQL数据集,它通过LINQ to Entities作为视图调用。其目的是在信用报告中提供30天未清、60天未清的未清账户余额,依此类推
在StackOverflow上为您提供一个示例表太难格式化了,但这里有SQL SELECT语句,它应该让您了解原始数据结构:
SELECT TOP 1000 [TransactionId]
,[IndustrySector]
,[DataContributorId]
,[ExperienceMonth]
,[ExperienceMonthText]
,[Balance]
,[ARCurrent]
,[AR1to30PD]
,[AR31to60PD]
,[AR61to90PD]
,[Ar91PlusPD]
,[WeightedDTP]
FROM [BCC].[dbo].[vwTransactionExperienceDetail]
现在,当我通过LINQ调用这个视图时,最终目标是构造一个对象,该对象将作为JSON返回给请求客户端。生成的对象需要是按Industry
分组的层次结构,然后按Contributors
(报告的数据(分组,最后按单个Reports
分组。要做到这一点,以下LINQ查询运行良好且速度相当快:
/// <summary>
/// Gets the 25 month experience detail report with summed parameters (balance, DTP, etc).
/// </summary>
/// <param name="id">The transaction id.</param>
/// <returns>List<ExperienceDetail></returns>
public static List<ExperienceDetail> Get25MonthExperienceDetail_Sum(int id)
{
var db = new BCCEntities();
return
db.vwTransactionExperienceDetails.Where(te => te.TransactionId == id)
.GroupBy(g => g.IndustrySector)
.Select(i => new ExperienceDetail
{
Industry = i.Key,
NumberOfContributors = i.GroupBy(c => c.DataContributorId).Count(),
Balance = i.Sum(s => s.Balance),
OneToThirty = i.Sum(s => s.ARCurrent),
ThirtyOneToSixty = i.Sum(s => s.AR1to30PD),
SixtyOneToNinety = i.Sum(s => s.AR31to60PD),
NinetyOneToOneTwenty = i.Sum(s => s.AR61to90PD),
OneTwentyOnePlus = i.Sum(s => s.Ar91PlusPD),
DTP = (i.Sum(s => s.Balance) != 0) ? i.Sum(s => s.WeightedDTP) / i.Sum(s => s.Balance) : i.Sum(s => s.WeightedDTP),
Contributions = i.GroupBy(dc => dc.DataContributorId).Select(c => new Contribution
{
Balance = c.Sum(s => s.Balance),
OneToThirty = c.Sum(s => s.ARCurrent),
ThirtyOneToSixty = c.Sum(s => s.AR1to30PD),
SixtyOneToNinety = c.Sum(s => s.AR31to60PD),
NinetyOneToOneTwenty = c.Sum(s => s.AR61to90PD),
OneTwentyOnePlus = c.Sum(s => s.Ar91PlusPD),
DTP = (c.Sum(s => s.Balance) != 0) ? c.Sum(s => s.WeightedDTP) / c.Sum(s => s.Balance) : c.Sum(s => s.WeightedDTP),
ContributorId = c.Key,
Reports = c.Select(r => new Report
{
DTP = (r.Balance != 0) ? r.WeightedDTP/r.Balance : r.WeightedDTP,
ReportDate = r.ExperienceMonth,
Balance = r.Balance,
OneToThirty = r.ARCurrent,
ThirtyOneToSixty = r.AR1to30PD,
SixtyOneToNinety = r.AR31to60PD,
NinetyOneToOneTwenty = r.AR61to90PD,
OneTwentyOnePlus = r.Ar91PlusPD,
ContributorId = r.DataContributorId,
Industry = i.Key
})
})
}).ToList();
}
问题
我需要创建一个额外的服务,该服务提供相同的数据,但仅针对每个贡献者最近报告的月份(DataContributorId
(。以下LINQ查询适用于此,但速度非常慢——返回结果几乎需要整整一分钟:
/// <summary>
/// Gets an experience detail report with summed parameters (balance, DTP, etc) for the most recent month.
/// </summary>
/// <param name="id">The transaction id.</param>
/// <returns>List<ExperienceDetail></returns>
public static List<ExperienceDetail> Get25MonthExperienceDetail_MostRecentMonth(int id)
{
var db = new BCCEntities();
db.CommandTimeout = 100000;
return
db.vwTransactionExperienceDetails.Where(te => te.TransactionId == id)
.OrderByDescending(o => o.ExperienceMonth)
.GroupBy(g => g.IndustrySector)
.Select(i => new ExperienceDetail
{
Industry = i.Key,
NumberOfContributors = i.GroupBy(c => c.DataContributorId).Count(),
Balance = i.GroupBy(dc => dc.DataContributorId).Sum(x => x.Select(z => z.Balance).FirstOrDefault()),
OneToThirty = i.Sum(s => s.ARCurrent),
ThirtyOneToSixty = i.Sum(s => s.AR1to30PD),
SixtyOneToNinety = i.Sum(s => s.AR31to60PD),
NinetyOneToOneTwenty = i.Sum(s => s.AR61to90PD),
OneTwentyOnePlus = i.Sum(s => s.Ar91PlusPD),
DTP = (i.Sum(s => s.Balance) != 0) ? i.Sum(s => s.WeightedDTP) / i.Sum(s => s.Balance) : i.Sum(s => s.WeightedDTP),
Contributions = i.GroupBy(dc => dc.DataContributorId).Select(c => new Contribution
{
Balance = c.Take(1).Sum(s => s.Balance),
OneToThirty = c.Take(1).Sum(s => s.ARCurrent),
ThirtyOneToSixty = c.Take(1).Sum(s => s.AR1to30PD),
SixtyOneToNinety = c.Take(1).Sum(s => s.AR31to60PD),
NinetyOneToOneTwenty = c.Take(1).Sum(s => s.AR61to90PD),
OneTwentyOnePlus = c.Take(1).Sum(s => s.Ar91PlusPD),
DTP = (c.Take(1).Sum(s => s.Balance) != 0) ? c.Take(1).Sum(s => s.WeightedDTP) / c.Take(1).Sum(s => s.Balance) : c.Take(1).Sum(s => s.WeightedDTP),
ContributorId = c.Key,
Reports = c.Select(r => new Report
{
DTP = (r.Balance != 0) ? r.WeightedDTP / r.Balance : r.WeightedDTP,
ReportDate = r.ExperienceMonth,
Balance = r.Balance,
OneToThirty = r.ARCurrent,
ThirtyOneToSixty = r.AR1to30PD,
SixtyOneToNinety = r.AR31to60PD,
NinetyOneToOneTwenty = r.AR61to90PD,
OneTwentyOnePlus = r.Ar91PlusPD,
ContributorId = r.DataContributorId,
Industry = i.Key
}).Take(1)
})
}).ToList();
}
问题
如何查询此"最近报告的月份"结果集而不考虑性能影响?在过去的几个小时里,我试图隔离查询中耗时最多的部分,但我似乎找不到它。诚然,我不知道如何有效地分析复杂LINQ查询的性能问题,我愿意发表评论。
最终的问题是:对于这个LINQ查询,是否有一种替代方案可以产生相同的结果集,而不会带来如此严重的性能损失?
提前谢谢。
假设数据集相当小,我只需要提取所有月份,转到ToList()
,然后过滤掉内存中最近的一个月。当查询变得复杂时,LINQ可以做一些非常奇怪的事情。
在您添加的第二个查询中:
Balance = i.GroupBy(dc => dc.DataContributorId).Sum(x => x.Select(z => z.Balance).FirstOrDefault()),
和
.OrderByDescending(o => o.ExperienceMonth)
尝试删除groupBy和orderBy以检查它们是否会导致性能问题,在这种情况下,请检查(并尝试(向这些列添加索引(如果不存在索引(。
另请检查SQL事件探查器(如果SQL Server 2005或更低版本(或SQL扩展事件(如果SQL服务器2008或更高版本(。
您可以尝试使用LinqPAD工具来检查查询生成的SQL DML
还有其他方法可以从数据库中获取数据:
- 在数据库上创建视图,并从LINQ中读取
- 写入实体SQL查询