为什么这个Lucene.Net查询失败

本文关键字:查询 失败 Net Lucene 为什么 | 更新日期: 2023-09-27 17:57:41

我正在尝试转换我的搜索功能,以允许涉及多个单词的模糊搜索。我现有的搜索代码看起来像:

        // Split the search into seperate queries per word, and combine them into one major query
        var finalQuery = new BooleanQuery();
        string[] terms = searchString.Split(new[] { " " }, StringSplitOptions.RemoveEmptyEntries);
        foreach (string term in terms)
        {
            // Setup the fields to search
            string[] searchfields = new string[] 
            {
                // Various strings denoting the document fields available
            };
            var parser = new MultiFieldQueryParser(Lucene.Net.Util.Version.LUCENE_29, searchfields, new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_29));
            finalQuery.Add(parser.Parse(term), BooleanClause.Occur.MUST);
        }
        // Perform the search
        var directory = FSDirectory.Open(new DirectoryInfo(LuceneIndexBaseDirectory));
        var searcher = new IndexSearcher(directory, true);
        var hits = searcher.Search(finalQuery, MAX_RESULTS);

这是正确的,如果我有一个名称字段为"My name is Andrew"的实体,并且我搜索"Andrew name",Lucene就会正确地找到正确的文档。现在我想启用模糊搜索,以便正确找到"Anderw Name"。我改变了我的方法,使用以下代码:

        const int MAX_RESULTS = 10000;
        const float MIN_SIMILARITY = 0.5f;
        const int PREFIX_LENGTH = 3;
        if (string.IsNullOrWhiteSpace(searchString))
            throw new ArgumentException("Provided search string is empty");
        // Split the search into seperate queries per word, and combine them into one major query
        var finalQuery = new BooleanQuery();
        string[] terms = searchString.Split(new[] { " " }, StringSplitOptions.RemoveEmptyEntries);
        foreach (string term in terms)
        {
            // Setup the fields to search
            string[] searchfields = new string[] 
            {
                // Strings denoting document field names here
            };
            // Create a subquery where the term must match at least one of the fields
            var subquery = new BooleanQuery();
            foreach (string field in searchfields)
            {
                var queryTerm = new Term(field, term);
                var fuzzyQuery = new FuzzyQuery(queryTerm, MIN_SIMILARITY, PREFIX_LENGTH);
                subquery.Add(fuzzyQuery, BooleanClause.Occur.SHOULD);
            }
            // Add the subquery to the final query, but make at least one subquery match must be found
            finalQuery.Add(subquery, BooleanClause.Occur.MUST);
        }
        // Perform the search
        var directory = FSDirectory.Open(new DirectoryInfo(LuceneIndexBaseDirectory));
        var searcher = new IndexSearcher(directory, true);
        var hits = searcher.Search(finalQuery, MAX_RESULTS);

不幸的是,使用此代码,如果我提交搜索查询"Andrew Name"(与以前相同),则返回的结果为零。

核心思想是,所有术语必须至少在一个文档字段中找到,但每个术语可以位于不同的字段中。有人知道为什么我重写的查询失败了吗?


最终编辑:好的,事实证明我已经把这件事复杂化了很多,没有必要改变我的第一种方法。在返回到第一个代码片段后,我通过更改启用了模糊搜索

finalQuery.Add(parser.Parse(term), BooleanClause.Occur.MUST);

finalQuery.Add(parser.Parse(term.Replace("~", "") + "~"), BooleanClause.Occur.MUST);

为什么这个Lucene.Net查询失败

如果我将searchString重写为小写,您的代码对我有效。我假设您在索引时使用StandardAnalyzer,它将生成小写项。

您需要1)通过相同的分析器传递令牌(以实现相同的处理),2)应用与分析器相同的逻辑,或3)使用与您所做的处理匹配的分析器(WhitespaceAnalyzer)。

您想要这一行:

var queryTerm = new Term(term);

看起来像这样:

var queryTerm = new Term(field, term);

现在,您正在字段term(可能不存在)中搜索空字符串(永远找不到)。