针对慢速分组的 LINQ 查询优化

本文关键字:LINQ 查询优化 | 更新日期: 2023-09-27 18:30:26

我有一个 LINQ 查询,它首先通过实体框架代码从 SQL 数据库获取数据。这有效,但它工作非常非常慢。

这是原始查询:

      var tmpResult = from mdv in allMetaDataValues
                  where mdv.Metadata.InputType == MetadataInputType.String && mdv.Metadata.ShowInFilter && !mdv.Metadata.IsHidden && !string.IsNullOrEmpty(mdv.ValueString)
                  group mdv by new
                  {
                    mdv.ValueString,
                    mdv.Metadata
                  } into g
                  let first = g.FirstOrDefault()
                  select new
                  {
                    MetadataTitle = g.Key.Metadata.Title,
                    MetadataID = g.Key.Metadata.ID,
                    CollectionColor = g.Key.Metadata.Collection.Color,
                    CollectionID = g.Key.Metadata.Collection.ID,
                    MetadataValueCount = 0,
                    MetadataValueTitle = g.Key.ValueString,
                    MetadataValueID = first.ID
                  };

这是从原始查询生成的 SQL:

{SELECT 
0 AS [C1], 
[Project4].[Title] AS [Title], 
[Project4].[ID] AS [ID], 
[Extent9].[Color] AS [Color], 
[Project4].[Collection_ID] AS [Collection_ID], 
[Project4].[ValueString] AS [ValueString], 
[Project4].[C1] AS [C2]
FROM   (SELECT 
    [Project2].[ValueString] AS [ValueString], 
    [Project2].[ID] AS [ID], 
    [Project2].[Title] AS [Title], 
    [Project2].[Collection_ID] AS [Collection_ID], 
    (SELECT TOP (1) 
        [Filter4].[ID1] AS [ID]
        FROM ( SELECT [Extent6].[ID] AS [ID1], [Extent6].[ValueString] AS [ValueString], [Extent7].[Collection_ID] AS [Collection_ID1], [Extent8].[ID] AS [ID2], [Extent8].[InputType] AS [InputType], [Extent8].[ShowInFilter] AS [ShowInFilter], [Extent8].[IsHidden] AS [IsHidden1]
            FROM   [dbo].[MetadataValue] AS [Extent6]
            LEFT OUTER JOIN [dbo].[Media] AS [Extent7] ON [Extent6].[Media_ID] = [Extent7].[ID]
            INNER JOIN [dbo].[Metadata] AS [Extent8] ON [Extent6].[Metadata_ID] = [Extent8].[ID]
            WHERE ( NOT (([Extent6].[ValueString] IS NULL) OR (( CAST(LEN([Extent6].[ValueString]) AS int)) = 0))) AND ([Extent7].[IsHidden] <> cast(1 as bit))
        )  AS [Filter4]
        WHERE (2 =  CAST( [Filter4].[InputType] AS int)) AND ([Filter4].[ShowInFilter] = 1) AND ([Filter4].[IsHidden1] <> cast(1 as bit)) AND ([Filter4].[Collection_ID1] = @p__linq__0) AND (([Project2].[ValueString] = [Filter4].[ValueString]) OR (([Project2].[ValueString] IS NULL) AND ([Filter4].[ValueString] IS NULL))) AND (([Project2].[ID] = [Filter4].[ID2]) OR (1 = 0))) AS [C1]
    FROM ( SELECT 
        [Distinct1].[ValueString] AS [ValueString], 
        [Distinct1].[ID] AS [ID], 
        [Distinct1].[Title] AS [Title], 
        [Distinct1].[Collection_ID] AS [Collection_ID]
        FROM ( SELECT DISTINCT 
            [Filter2].[ValueString] AS [ValueString], 
            [Filter2].[ID3] AS [ID], 
            [Filter2].[InputType1] AS [InputType], 
            [Filter2].[Title1] AS [Title], 
            [Filter2].[ShowInFilter1] AS [ShowInFilter], 
            [Filter2].[IsHidden2] AS [IsHidden], 
            [Filter2].[Collection_ID2] AS [Collection_ID]
            FROM ( SELECT [Filter1].[ValueString], [Filter1].[Collection_ID3], [Filter1].[IsHidden3], [Filter1].[ID3], [Filter1].[InputType1], [Filter1].[Title1], [Filter1].[ShowInFilter1], [Filter1].[IsHidden2], [Filter1].[Collection_ID2]
                FROM ( SELECT [Extent1].[ValueString] AS [ValueString], [Extent2].[Collection_ID] AS [Collection_ID3], [Extent4].[IsHidden] AS [IsHidden3], [Extent5].[ID] AS [ID3], [Extent5].[InputType] AS [InputType1], [Extent5].[Title] AS [Title1], [Extent5].[ShowInFilter] AS [ShowInFilter1], [Extent5].[IsHidden] AS [IsHidden2], [Extent5].[Collection_ID] AS [Collection_ID2]
                    FROM     [dbo].[MetadataValue] AS [Extent1]
                    LEFT OUTER JOIN [dbo].[Media] AS [Extent2] ON [Extent1].[Media_ID] = [Extent2].[ID]
                    INNER JOIN [dbo].[Metadata] AS [Extent3] ON [Extent1].[Metadata_ID] = [Extent3].[ID]
                    LEFT OUTER JOIN [dbo].[Metadata] AS [Extent4] ON [Extent1].[Metadata_ID] = [Extent4].[ID]
                    LEFT OUTER JOIN [dbo].[Metadata] AS [Extent5] ON [Extent1].[Metadata_ID] = [Extent5].[ID]
                    WHERE ( NOT (([Extent1].[ValueString] IS NULL) OR (( CAST(LEN([Extent1].[ValueString]) AS int)) = 0))) AND ([Extent2].[IsHidden] <> cast(1 as bit)) AND (2 =  CAST( [Extent3].[InputType] AS int)) AND ([Extent3].[ShowInFilter] = 1)
                )  AS [Filter1]
                WHERE [Filter1].[IsHidden3] <> cast(1 as bit)
            )  AS [Filter2]
            WHERE [Filter2].[Collection_ID3] = @p__linq__0
        )  AS [Distinct1]
    )  AS [Project2] ) AS [Project4]
LEFT OUTER JOIN [dbo].[Collection] AS [Extent9] ON [Project4].[Collection_ID] = [Extent9].[ID]}

如果我们删除"let first = g.FirstOrDefault()"并将"MetadataValueID = first.ID"更改为"MetadataValueID = 0",以便我们只有一个固定的ID = 0用于测试目的,那么数据加载速度非常快,生成的查询本身的大小是原始查询的一半所以这部分似乎使查询非常慢:

let first = g.FirstOrDefault()
...
  MetadataValueID = first.ID
};

如何重写?如果我尝试重写代码,它仍然很慢:

MetadataValueID = g.Select(x => x.ID).FirstOrDefault()

let first = g.Select(x => x.ID).FirstOrDefault()
...
  MetadataValueID = first
};

有什么建议吗?

针对慢速分组的 LINQ 查询优化

使用 EF,我一直觉得它在有效地翻译 g.Key.Metadata.Collection 之类的东西方面存在问题,所以我尝试更明确地连接并仅包含结果所需的字段。您可以使用include而不是使用存储库模式加入。

然后,您的查询将如下所示:

   from mdv in allMetaDataValues.Include("Metadata").Include("Metadata.Collection")
   where mdv.Metadata.InputType == MetadataInputType.String && 
         mdv.Metadata.ShowInFilter && 
         !mdv.Metadata.IsHidden && 
         !string.IsNullOrEmpty(mdv.ValueString)
   group mdv by new
   {
     MetadataID = mdv.Metadata.ID,
     CollectionID = mdv.Metadata.Collection.ID,
     mdv.Metadata.Title,
     mdv.Metadata.Collection.Color,
     mdv.ValueString
   } into g
   let first = g.FirstOrDefault().ID
   select new
   {
     MetadataTitle = g.Key.Title,
     MetadataID = g.Key.MetadataID,
     CollectionColor = g.Key.Color,
     CollectionID = g.Key.CollectionID,
     MetadataValueCount = 0,
     MetadataValueTitle = g.Key.ValueString,
     MetadataValueID = first
   }

玩linq的好工具是LinqPad。

问题还在于:

  let first = g.FirstOrDefault().ID

不能轻易翻译成SQL看到这个答案。但这种重写至少简化了它的基础查询。我仍然不清楚,为什么你需要一个集合中的第一个 ID 而不使用 orderby .

可以像这样重写:

let first =  (from f in allMetaDataValues
              where f.Metadata.ID == g.Key.MetadataID && 
                    f.ValuesString == g.Key.ValuesString select f.ID)
             .FirstOrDefault()

这样,您就不会让 EF 为您编写查询,并且可以准确指定如何执行选择。为了加快查询速度,您还可以考虑根据生成的查询向数据库添加索引 - 即使用此let first查询的 where 子句中使用的两个列的索引。

请尝试以下解决方案。
FirstOrDefault()替换为 .Take(1)FirstOrDefault()不是懒惰加载。

var tmpResult = from mdv in allMetaDataValues
                  where mdv.Metadata.InputType == MetadataInputType.String && mdv.Metadata.ShowInFilter && !mdv.Metadata.IsHidden && !string.IsNullOrEmpty(mdv.ValueString)
                  group mdv by new
                  {
                    mdv.ValueString,
                    mdv.Metadata
                  } into g
                  let first = g.Take(1)
                  select new
                  {
                    MetadataTitle = g.Key.Metadata.Title,
                    MetadataID = g.Key.Metadata.ID,
                    CollectionColor = g.Key.Metadata.Collection.Color,
                    CollectionID = g.Key.Metadata.Collection.ID,
                    MetadataValueCount = 0,
                    MetadataValueTitle = g.Key.ValueString,
                    MetadataValueID = first.ID
                  };