如何在lucene索引中存储字段的升压因子
本文关键字:字段 存储 lucene 索引 | 更新日期: 2023-09-27 18:07:38
我使用lucene在地址簿中搜索产品。我想根据一些特定的标准来提升搜索结果。(例如,位置字段中的匹配应该比实体名称中的匹配具有更大的相关性。)这是我案例的固定标准。
我试图通过在索引时调用SetBoost()方法来存储字段的boost因子。但后来也结果的分数不尽如人意。它考虑每个字段相同的boost值。
谁能告诉我我哪里做错了?我用来建立索引的代码
Directory objIndexDirectory =
FSDirectory.Open(new System.IO.DirectoryInfo(<PathOfIndexFolder>));
StandardAnalyzer objAnalyzer =
new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_29);
IndexWriter objWriter = new IndexWriter(
objIndexDirectory, objAnalyzer, true, IndexWriter.MaxFieldLength.UNLIMITED);
Document objDocument = new Document();
Field objName =
new Field("Name", "John Doe", Field.Store.YES, Field.Index.ANALYZED);
Field objLocation =
new Field("Location", "NY", Field.Store.YES, Field.Index.NOT_ANALYZED);
objLocation.SetBoost((2f);
objDocument.Add(objName);
objDocument.Add(objLocation);
objWriter.AddDocument(objDocument);
我想要达到的是,假设index中有三个条目:
- John Doe, NY John Foo, New jersey
- XYZ,纽约
在这种情况下,如果搜索查询是"John NY",那么结果应该具有像
这样的相关性- John Doe, NY
- XYZ,纽约 John Foo, New jersey
我不知道你认为你的方法有什么问题,但这里是我用来测试的代码:
class Program
{
static void Main(string[] args)
{
RAMDirectory dir = new RAMDirectory();
IndexWriter writer = new IndexWriter(dir, new StandardAnalyzer());
AddDocument(writer, "John Doe", "NY");
AddDocument(writer, "John Foo", "New Jersey");
AddDocument(writer, "XYZ", "NY");
writer.Commit();
BooleanQuery query = new BooleanQuery();
query.Add(new TermQuery(new Term("Name", "john")), BooleanClause.Occur.SHOULD);
query.Add(new TermQuery(new Term("Location", "NY")), BooleanClause.Occur.SHOULD);
IndexReader reader = writer.GetReader();
IndexSearcher searcher = new IndexSearcher(reader);
var hits = searcher.Search(query, null, 10);
for (int i = 0; i < hits.totalHits; i++)
{
Document doc = searcher.Doc(hits.scoreDocs[i].doc);
var explain = searcher.Explain(query, hits.scoreDocs[i].doc);
Console.WriteLine("{0} - {1} - {2}", hits.scoreDocs[i].score, doc.ToString(), explain.ToString());
}
}
private static void AddDocument(IndexWriter writer, string name, string address)
{
Document objDocument = new Document();
Field objName = new Field("Name", name, Field.Store.YES, Field.Index.ANALYZED);
Field objLocation = new Field("Location", address, Field.Store.YES, Field.Index.NOT_ANALYZED);
objLocation.SetBoost(2f);
objDocument.Add(objName);
objDocument.Add(objLocation);
writer.AddDocument(objDocument);
}
}
这段代码确实按照您希望的顺序返回结果。实际上,如果排除boost,它将按此顺序返回它们。我不是Lucene得分方面的专家,但我相信这是因为您正在为"XYZ, NY"匹配"NY",而"John"查询是部分匹配。您可以通过Explain类读取打印出来的详细信息。
您尝试过MultiFieldQueryParser吗?