如何在lucene索引中存储字段的升压因子

本文关键字:字段 存储 lucene 索引 | 更新日期: 2023-09-27 18:07:38

我使用lucene在地址簿中搜索产品。我想根据一些特定的标准来提升搜索结果。(例如,位置字段中的匹配应该比实体名称中的匹配具有更大的相关性。)这是我案例的固定标准。

我试图通过在索引时调用SetBoost()方法来存储字段的boost因子。但后来也结果的分数不尽如人意。它考虑每个字段相同的boost值。

谁能告诉我我哪里做错了?

我用来建立索引的代码

Directory objIndexDirectory =
  FSDirectory.Open(new System.IO.DirectoryInfo(<PathOfIndexFolder>));
StandardAnalyzer objAnalyzer =
  new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_29);
IndexWriter objWriter = new IndexWriter(
  objIndexDirectory, objAnalyzer, true, IndexWriter.MaxFieldLength.UNLIMITED);
Document objDocument = new Document();
Field objName =
  new Field("Name", "John Doe", Field.Store.YES, Field.Index.ANALYZED);
Field objLocation =
  new Field("Location", "NY", Field.Store.YES, Field.Index.NOT_ANALYZED);
objLocation.SetBoost((2f);
objDocument.Add(objName);
objDocument.Add(objLocation);
objWriter.AddDocument(objDocument);

我想要达到的是,假设index中有三个条目:

  1. John Doe, NY
  2. John Foo, New jersey
  3. XYZ,纽约

在这种情况下,如果搜索查询是"John NY",那么结果应该具有像

这样的相关性
  1. John Doe, NY
  2. XYZ,纽约
  3. John Foo, New jersey

如何在lucene索引中存储字段的升压因子

我不知道你认为你的方法有什么问题,但这里是我用来测试的代码:

class Program
{
    static void Main(string[] args)
    {
        RAMDirectory dir = new RAMDirectory();
        IndexWriter writer = new IndexWriter(dir, new StandardAnalyzer());
        AddDocument(writer, "John Doe", "NY");
        AddDocument(writer, "John Foo", "New Jersey");
        AddDocument(writer, "XYZ", "NY");
        writer.Commit();
        BooleanQuery query = new BooleanQuery();
        query.Add(new TermQuery(new Term("Name", "john")), BooleanClause.Occur.SHOULD);
        query.Add(new TermQuery(new Term("Location", "NY")), BooleanClause.Occur.SHOULD);
        IndexReader reader = writer.GetReader();
        IndexSearcher searcher = new IndexSearcher(reader);
        var hits = searcher.Search(query, null, 10);
        for (int i = 0; i < hits.totalHits; i++)
        {
            Document doc = searcher.Doc(hits.scoreDocs[i].doc);
            var explain = searcher.Explain(query, hits.scoreDocs[i].doc);
            Console.WriteLine("{0} - {1} - {2}", hits.scoreDocs[i].score, doc.ToString(), explain.ToString());
        }
    }
    private static void AddDocument(IndexWriter writer, string name, string address)
    {
        Document objDocument = new Document();
        Field objName = new Field("Name", name, Field.Store.YES, Field.Index.ANALYZED);
        Field objLocation = new Field("Location", address, Field.Store.YES, Field.Index.NOT_ANALYZED);
        objLocation.SetBoost(2f);
        objDocument.Add(objName);
        objDocument.Add(objLocation);
        writer.AddDocument(objDocument);
    }
}

这段代码确实按照您希望的顺序返回结果。实际上,如果排除boost,它将按此顺序返回它们。我不是Lucene得分方面的专家,但我相信这是因为您正在为"XYZ, NY"匹配"NY",而"John"查询是部分匹配。您可以通过Explain类读取打印出来的详细信息。

您尝试过MultiFieldQueryParser吗?