如何在不影响性能的情况下执行LINQ GroupBy、Select和Take

本文关键字:GroupBy LINQ Select Take 执行 情况下 影响 性能 | 更新日期: 2023-09-27 17:50:46

背景

我有一个SQL数据集,它通过LINQ to Entities作为视图调用。其目的是在信用报告中提供30天未清、60天未清的未清账户余额,依此类推

在StackOverflow上为您提供一个示例表太难格式化了,但这里有SQL SELECT语句,它应该让您了解原始数据结构:

SELECT TOP 1000 [TransactionId]
      ,[IndustrySector]
      ,[DataContributorId]
      ,[ExperienceMonth]
      ,[ExperienceMonthText]
      ,[Balance]
      ,[ARCurrent]
      ,[AR1to30PD]
      ,[AR31to60PD]
      ,[AR61to90PD]
      ,[Ar91PlusPD]
      ,[WeightedDTP]
  FROM [BCC].[dbo].[vwTransactionExperienceDetail] 

现在,当我通过LINQ调用这个视图时,最终目标是构造一个对象,该对象将作为JSON返回给请求客户端。生成的对象需要是按Industry分组的层次结构,然后按Contributors(报告的数据(分组,最后按单个Reports分组。要做到这一点,以下LINQ查询运行良好且速度相当快:

        /// <summary>
        /// Gets the 25 month experience detail report with summed parameters (balance, DTP, etc).
        /// </summary>
        /// <param name="id">The transaction id.</param>
        /// <returns>List&lt;ExperienceDetail&gt;</returns>
        public static List<ExperienceDetail> Get25MonthExperienceDetail_Sum(int id)
        {
            var db = new BCCEntities();
            return
                db.vwTransactionExperienceDetails.Where(te => te.TransactionId == id)
                  .GroupBy(g => g.IndustrySector)
                  .Select(i => new ExperienceDetail
                      {
                          Industry = i.Key,
                          NumberOfContributors = i.GroupBy(c => c.DataContributorId).Count(),
                          Balance = i.Sum(s => s.Balance),
                          OneToThirty = i.Sum(s => s.ARCurrent),
                          ThirtyOneToSixty = i.Sum(s => s.AR1to30PD),
                          SixtyOneToNinety = i.Sum(s => s.AR31to60PD),
                          NinetyOneToOneTwenty = i.Sum(s => s.AR61to90PD),
                          OneTwentyOnePlus = i.Sum(s => s.Ar91PlusPD),
                          DTP = (i.Sum(s => s.Balance) != 0) ? i.Sum(s => s.WeightedDTP) / i.Sum(s => s.Balance) : i.Sum(s => s.WeightedDTP),
                          Contributions = i.GroupBy(dc => dc.DataContributorId).Select(c => new Contribution
                              {
                                  Balance = c.Sum(s => s.Balance),
                                  OneToThirty = c.Sum(s => s.ARCurrent),
                                  ThirtyOneToSixty = c.Sum(s => s.AR1to30PD),
                                  SixtyOneToNinety = c.Sum(s => s.AR31to60PD),
                                  NinetyOneToOneTwenty = c.Sum(s => s.AR61to90PD),
                                  OneTwentyOnePlus = c.Sum(s => s.Ar91PlusPD),
                                  DTP = (c.Sum(s => s.Balance) != 0) ? c.Sum(s => s.WeightedDTP) / c.Sum(s => s.Balance) : c.Sum(s => s.WeightedDTP),
                                  ContributorId = c.Key,
                                  Reports = c.Select(r => new Report
                                  {
                                      DTP = (r.Balance != 0) ? r.WeightedDTP/r.Balance : r.WeightedDTP,
                                      ReportDate = r.ExperienceMonth,
                                      Balance = r.Balance,
                                      OneToThirty = r.ARCurrent,
                                      ThirtyOneToSixty = r.AR1to30PD,
                                      SixtyOneToNinety = r.AR31to60PD,
                                      NinetyOneToOneTwenty = r.AR61to90PD,
                                      OneTwentyOnePlus = r.Ar91PlusPD,
                                      ContributorId = r.DataContributorId,
                                      Industry = i.Key
                                  })
                              })
                      }).ToList();
        }

问题

我需要创建一个额外的服务,该服务提供相同的数据,但仅针对每个贡献者最近报告的月份(DataContributorId(。以下LINQ查询适用于此,但速度非常慢——返回结果几乎需要整整一分钟:

        /// <summary>
        /// Gets an experience detail report with summed parameters (balance, DTP, etc) for the most recent month.
        /// </summary>
        /// <param name="id">The transaction id.</param>
        /// <returns>List&lt;ExperienceDetail&gt;</returns>
        public static List<ExperienceDetail> Get25MonthExperienceDetail_MostRecentMonth(int id)
        {
            var db = new BCCEntities();
            db.CommandTimeout = 100000;
            return
                db.vwTransactionExperienceDetails.Where(te => te.TransactionId == id)
                  .OrderByDescending(o => o.ExperienceMonth)
                  .GroupBy(g => g.IndustrySector)
                  .Select(i => new ExperienceDetail
                  {
                      Industry = i.Key,
                      NumberOfContributors = i.GroupBy(c => c.DataContributorId).Count(),
                      Balance = i.GroupBy(dc => dc.DataContributorId).Sum(x => x.Select(z => z.Balance).FirstOrDefault()),
                      OneToThirty = i.Sum(s => s.ARCurrent),
                      ThirtyOneToSixty = i.Sum(s => s.AR1to30PD),
                      SixtyOneToNinety = i.Sum(s => s.AR31to60PD),
                      NinetyOneToOneTwenty = i.Sum(s => s.AR61to90PD),
                      OneTwentyOnePlus = i.Sum(s => s.Ar91PlusPD),
                      DTP = (i.Sum(s => s.Balance) != 0) ? i.Sum(s => s.WeightedDTP) / i.Sum(s => s.Balance) : i.Sum(s => s.WeightedDTP),
                      Contributions = i.GroupBy(dc => dc.DataContributorId).Select(c => new Contribution
                      {
                          Balance = c.Take(1).Sum(s => s.Balance),
                          OneToThirty = c.Take(1).Sum(s => s.ARCurrent),
                          ThirtyOneToSixty = c.Take(1).Sum(s => s.AR1to30PD),
                          SixtyOneToNinety = c.Take(1).Sum(s => s.AR31to60PD),
                          NinetyOneToOneTwenty = c.Take(1).Sum(s => s.AR61to90PD),
                          OneTwentyOnePlus = c.Take(1).Sum(s => s.Ar91PlusPD),
                          DTP = (c.Take(1).Sum(s => s.Balance) != 0) ? c.Take(1).Sum(s => s.WeightedDTP) / c.Take(1).Sum(s => s.Balance) : c.Take(1).Sum(s => s.WeightedDTP),
                          ContributorId = c.Key,
                          Reports = c.Select(r => new Report
                          {
                              DTP = (r.Balance != 0) ? r.WeightedDTP / r.Balance : r.WeightedDTP,
                              ReportDate = r.ExperienceMonth,
                              Balance = r.Balance,
                              OneToThirty = r.ARCurrent,
                              ThirtyOneToSixty = r.AR1to30PD,
                              SixtyOneToNinety = r.AR31to60PD,
                              NinetyOneToOneTwenty = r.AR61to90PD,
                              OneTwentyOnePlus = r.Ar91PlusPD,
                              ContributorId = r.DataContributorId,
                              Industry = i.Key
                          }).Take(1)
                      })
                  }).ToList();
        }

问题

如何查询此"最近报告的月份"结果集而不考虑性能影响?在过去的几个小时里,我试图隔离查询中耗时最多的部分,但我似乎找不到它。诚然,我不知道如何有效地分析复杂LINQ查询的性能问题,我愿意发表评论。

最终的问题是:对于这个LINQ查询,是否有一种替代方案可以产生相同的结果集,而不会带来如此严重的性能损失?

提前谢谢。

如何在不影响性能的情况下执行LINQ GroupBy、Select和Take

假设数据集相当小,我只需要提取所有月份,转到ToList(),然后过滤掉内存中最近的一个月。当查询变得复杂时,LINQ可以做一些非常奇怪的事情。

在您添加的第二个查询中:

Balance = i.GroupBy(dc => dc.DataContributorId).Sum(x => x.Select(z => z.Balance).FirstOrDefault()),

.OrderByDescending(o => o.ExperienceMonth)

尝试删除groupBy和orderBy以检查它们是否会导致性能问题,在这种情况下,请检查(并尝试(向这些列添加索引(如果不存在索引(。

另请检查SQL事件探查器(如果SQL Server 2005或更低版本(或SQL扩展事件(如果SQL服务器2008或更高版本(。

您可以尝试使用LinqPAD工具来检查查询生成的SQL DML

还有其他方法可以从数据库中获取数据:

  1. 在数据库上创建视图,并从LINQ中读取
  2. 写入实体SQL查询