搜索引擎太慢
本文关键字:搜索引擎 | 更新日期: 2023-09-27 18:24:00
在我的数据库中有一个表Question with Content and Answer。我实现了一个使用这个的搜索引擎:
private Expression<Func<QuestionModel, bool>> CreatePredicate(string query, bool negate = false)
{
query = query.ToUpper();
if (IsATag(ref query))
{
return question => negate == question.Tags.Any(tag => tag.NamesCollection
.Any(translation =>
translation.Language == currentLanguage &&
translation.Translation.ToUpper().Equals(query)));
}
else if(IsSpecial(ref query))
{
return question => true;
}
else
{
return question => negate == question.Content.ToUpper().Contains(query) ||
question.Answer.ToUpper().Contains(query) ||
question.Specialization.NamesCollection.Any(trans =>
trans.Language == currentLanguage &&
trans.Translation.ToUpper().Contains(query));
}
}
事实上,它支持多个查询,但PrebicateBuilder 可以很好地处理这一问题
我创建了一些性能测试,但它对10K记录的运行速度不是很快(如果我使用5个查询,则可达30秒)。有没有一种智能的方法来增强过程?老实说,我从未见过网站运行如此缓慢,所以一定有什么。
这是我的查询生成的sql:
SELECT TOP (100)
[top].[Id] AS [Id],
[top].[Content] AS [Content],
[top].[Answer] AS [Answer],
[top].[Assessment] AS [Assessment],
[top].[State] AS [State],
[top].[AuthorId] AS [AuthorId],
[top].[AttachmentFileName] AS [AttachmentFileName],
[top].[SubmissionDate] AS [SubmissionDate],
[top].[Specialization_Id] AS [Specialization_Id],
[top].[Doctor_Id] AS [Doctor_Id]
FROM ( SELECT [Project16].[Id] AS [Id], [Project16].[Content] AS [Content], [Project16].[Answer] AS [Answer], [Project16].[Assessment] AS [Assessment], [Project16].[State] AS [State], [Project16].[AuthorId] AS [AuthorId], [Project16].[AttachmentFileName] AS [AttachmentFileName], [Project16].[SubmissionDate] AS [SubmissionDate], [Project16].[Specialization_Id] AS [Specialization_Id], [Project16].[Doctor_Id] AS [Doctor_Id]
FROM ( SELECT
[Extent1].[Id] AS [Id],
[Extent1].[Content] AS [Content],
[Extent1].[Answer] AS [Answer],
[Extent1].[Assessment] AS [Assessment],
[Extent1].[State] AS [State],
[Extent1].[AuthorId] AS [AuthorId],
[Extent1].[AttachmentFileName] AS [AttachmentFileName],
[Extent1].[SubmissionDate] AS [SubmissionDate],
[Extent1].[Specialization_Id] AS [Specialization_Id],
[Extent1].[Doctor_Id] AS [Doctor_Id]
FROM [QuestionModels] AS [Extent1]
WHERE ((@p__linq__0 = (CASE WHEN ((CHARINDEX(@p__linq__1, UPPER([Extent1].[Content]))) > 0) THEN cast(1 as bit) WHEN ( NOT ((CHARINDEX(@p__linq__1, UPPER([Extent1].[Content]))) > 0)) THEN cast(0 as bit) END)) OR ((CHARINDEX(@p__linq__2, UPPER([Extent1].[Answer]))) > 0) OR ( EXISTS (SELECT
1 AS [C1]
FROM [TranslationModels] AS [Extent2]
WHERE ([Extent2].[SpecializationModel_Id] IS NOT NULL) AND ([Extent1].[Specialization_Id] = [Extent2].[SpecializationModel_Id]) AND ([Extent2].[Language] = @p__linq__3) AND ((CHARINDEX(@p__linq__4, UPPER([Extent2].[Translation]))) > 0)
)) OR (@p__linq__5 = (CASE WHEN ((CHARINDEX(@p__linq__6, UPPER([Extent1].[Content]))) > 0) THEN cast(1 as bit) WHEN ( NOT ((CHARINDEX(@p__linq__6, UPPER([Extent1].[Content]))) > 0)) THEN cast(0 as bit) END)) OR ((CHARINDEX(@p__linq__7, UPPER([Extent1].[Answer]))) > 0) OR ( EXISTS (SELECT
1 AS [C1]
FROM [TranslationModels] AS [Extent3]
WHERE ([Extent3].[SpecializationModel_Id] IS NOT NULL) AND ([Extent1].[Specialization_Id] = [Extent3].[SpecializationModel_Id]) AND ([Extent3].[Language] = @p__linq__8) AND ((CHARINDEX(@p__linq__9, UPPER([Extent3].[Translation]))) > 0)
)) OR (@p__linq__10 = (CASE WHEN ((CHARINDEX(@p__linq__11, UPPER([Extent1].[Content]))) > 0) THEN cast(1 as bit) WHEN ( NOT ((CHARINDEX(@p__linq__11, UPPER([Extent1].[Content]))) > 0)) THEN cast(0 as bit) END)) OR ((CHARINDEX(@p__linq__12, UPPER([Extent1].[Answer]))) > 0) OR ( EXISTS (SELECT
1 AS [C1]
FROM [TranslationModels] AS [Extent4]
WHERE ([Extent4].[SpecializationModel_Id] IS NOT NULL) AND ([Extent1].[Specialization_Id] = [Extent4].[SpecializationModel_Id]) AND ([Extent4].[Language] = @p__linq__13) AND ((CHARINDEX(@p__linq__14, UPPER([Extent4].[Translation]))) > 0)
)) OR (@p__linq__15 = (CASE WHEN ( EXISTS (SELECT
1 AS [C1]
FROM ( SELECT
[Extent5].[TagModel_Id] AS [TagModel_Id]
FROM [TagModelQuestionModels] AS [Extent5]
WHERE [Extent1].[Id] = [Extent5].[QuestionModel_Id]
) AS [Project4]
WHERE EXISTS (SELECT
1 AS [C1]
FROM [TranslationModels] AS [Extent6]
WHERE ([Project4].[TagModel_Id] = [Extent6].[TagModel_Id]) AND ([Extent6].[Language] = @p__linq__16) AND ((UPPER([Extent6].[Translation])) = @p__linq__17)
)
)) THEN cast(1 as bit) WHEN ( NOT EXISTS (SELECT
1 AS [C1]
FROM ( SELECT
[Extent7].[TagModel_Id] AS [TagModel_Id]
FROM [TagModelQuestionModels] AS [Extent7]
WHERE [Extent1].[Id] = [Extent7].[QuestionModel_Id]
) AS [Project7]
WHERE EXISTS (SELECT
1 AS [C1]
FROM [TranslationModels] AS [Extent8]
WHERE ([Project7].[TagModel_Id] = [Extent8].[TagModel_Id]) AND ([Extent8].[Language] = @p__linq__16) AND ((UPPER([Extent8].[Translation])) = @p__linq__17)
)
)) THEN cast(0 as bit) END))) AND (@p__linq__18 = (CASE WHEN ( EXISTS (SELECT
1 AS [C1]
FROM ( SELECT
[Extent9].[TagModel_Id] AS [TagModel_Id]
FROM [TagModelQuestionModels] AS [Extent9]
WHERE [Extent1].[Id] = [Extent9].[QuestionModel_Id]
) AS [Project10]
WHERE EXISTS (SELECT
1 AS [C1]
FROM [TranslationModels] AS [Extent10]
WHERE ([Project10].[TagModel_Id] = [Extent10].[TagModel_Id]) AND ([Extent10].[Language] = @p__linq__19) AND ((UPPER([Extent10].[Translation])) = @p__linq__20)
)
)) THEN cast(1 as bit) WHEN ( NOT EXISTS (SELECT
1 AS [C1]
FROM ( SELECT
[Extent11].[TagModel_Id] AS [TagModel_Id]
FROM [TagModelQuestionModels] AS [Extent11]
WHERE [Extent1].[Id] = [Extent11].[QuestionModel_Id]
) AS [Project13]
WHERE EXISTS (SELECT
1 AS [C1]
FROM [TranslationModels] AS [Extent12]
WHERE ([Project13].[TagModel_Id] = [Extent12].[TagModel_Id]) AND ([Extent12].[Language] = @p__linq__19) AND ((UPPER([Extent12].[Translation])) = @p__linq__20)
)
)) THEN cast(0 as bit) END))
) AS [Project16]
ORDER BY [Project16].[SubmissionDate] ASC
OFFSET 1000 ROWS
) AS [top]
这些数据库关系似乎不会对性能产生太大影响。如果我使用一个不涉及它们的查询,那么速度几乎是一样的。
表的完全扫描和每行嵌套两次的子查询将很慢。一种更快的方法是排列数据库,使查询只需要进行一次索引查找或索引范围扫描。
许多想要构建搜索功能的人使用某种搜索索引。一些关系数据库支持有限形式的全文搜索,这可能足以满足您的情况。
许多人更喜欢使用专门的搜索索引。Lucene是一个流行的引擎,Solr和Elasticsearch通过网络公开了Lucene。