为什么lucene增强查询比相同的普通查询得分低

本文关键字:查询 lucene 增强 为什么 | 更新日期: 2023-09-27 18:14:59

我正在测试lucene中的boost操作符,发现奇怪的行为

  1. query1 = "red fox"
  2. query2 = "red^1.2 fox"

当我对文本测试查询时:

《奇妙的红狐》

query2的得分低于query1。但是我希望query2会赢。

以下查询解释

解释query1

{0,4339554 = (MATCH) sum of:
  0,2169777 = (MATCH) weight(content:fox in 0), product of:
    0,7071068 = queryWeight(content:fox), product of:
      0,3068528 = idf(docFreq=1, maxDocs=1)
      2,304384 = queryNorm
    0,3068528 = (MATCH) fieldWeight(content:fox in 0), product of:
      1 = tf(termFreq(content:fox)=1)
      0,3068528 = idf(docFreq=1, maxDocs=1)
      1 = fieldNorm(field=content, doc=0)
  0,2169777 = (MATCH) weight(content:red in 0), product of:
    0,7071068 = queryWeight(content:red), product of:
      0,3068528 = idf(docFreq=1, maxDocs=1)
      2,304384 = queryNorm
    0,3068528 = (MATCH) fieldWeight(content:red in 0), product of:
      1 = tf(termFreq(content:red)=1)
      0,3068528 = idf(docFreq=1, maxDocs=1)
      1 = fieldNorm(field=content, doc=0)
}

解释query2

{0,4313012 = (MATCH) sum of:
  0,2396118 = (MATCH) weight(content:fox^1.25 in 0), product of:
    0,7808688 = queryWeight(content:fox^1.25), product of:
      1,25 = boost
      0,3068528 = idf(docFreq=1, maxDocs=1)
      2,035813 = queryNorm
    0,3068528 = (MATCH) fieldWeight(content:fox in 0), product of:
      1 = tf(termFreq(content:fox)=1)
      0,3068528 = idf(docFreq=1, maxDocs=1)
      1 = fieldNorm(field=content, doc=0)
  0,1916894 = (MATCH) weight(content:red in 0), product of:
    0,6246951 = queryWeight(content:red), product of:
      0,3068528 = idf(docFreq=1, maxDocs=1)
      2,035813 = queryNorm
    0,3068528 = (MATCH) fieldWeight(content:red in 0), product of:
      1 = tf(termFreq(content:red)=1)
      0,3068528 = idf(docFreq=1, maxDocs=1)
      1 = fieldNorm(field=content, doc=0)
}

我想知道为什么增强查询比正常查询得分低?

为什么lucene增强查询比相同的普通查询得分低

这是由于查询规范。评分算法的这一特性试图使评分在一个查询与下一个查询之间大致具有可比性。

计算如下:

queryNorm = 1/sumOfSquaredWeights½

地点:

sumOfSquaredWeights = query boost2·∑(idf·term boost)2

如果您从解释中删除该因素,只需将最终分数除以查询规范,您会发现第二个查询确实获得更高的分数:

  • query1——> .4339554/2.304384 = 0.1883

  • query2——> .4313012/2.035813 = 0.2119

更重要的一点是:您不应该过多地比较一个查询与下一个查询的分数。分数只与生成分数的查询相关。您可以在解释中看到,增强的术语对分数的相对权重更大,这是所有增强的真正目的。