Lucene.NET更新数据

本文关键字:数据 更新 NET Lucene | 更新日期: 2023-09-27 18:24:51

我几天前开始使用Lucene,但在调试我的解决方案时,我在Lucene中发现了一个问题。为了尝试解决这个问题,我创建了新的自定义项目,并开始测试不同的解决方案,但在用Lucene搜索解决方案两天后,我放弃了。。。

我的问题:

我创建了一个自定义类,创建了这个类的标准数组。创建Document对象并通过IndexWriter对其进行索引。一切都很好。搜索效果很好。但是,当我尝试使用IndexWriter.UpdateDocument更新任何文档,并说它使用索引"5"更新文档时,它会创建id为5的新文档。最后,我有两个id=5的文档,还有旧文档和新文档。如果在IndexWriter的构造函数"true"中替换id,那么当我更新它的相同代码时,它只保存1个更新的文档,并删除之前所有索引。确切地说,我不能一直更新所有的数据库,因为我的数据库很大(在我的构造函数上大约有600个互联网资源),我只需要更新更改的数据(将其替换为新数据),并保存以前的索引。也许有人知道我做错了什么?

附言:对不起我的英语。

class mydoc
{
    public string id;
    public string name;
    public string content;
    public mydoc(string ID, string Name, string Content)
    {
        id = ID;
        name = Name;
        this.content = Content;
    }
}
class Program
{
    static void Main(string[] args)
    {
        Console.WriteLine("Create data array...");
        mydoc[] docs = new mydoc[11];
        docs[0] = new mydoc("0", "Name0", "tet 5");
        docs[1] = new mydoc("1", "Name1", "aaaa text");
        docs[2] = new mydoc("2", "Name2", "and me test ");
        docs[3] = new mydoc("3", "Name3", "I am new tes 3");
        docs[4] = new mydoc("4", "Name4", "I am new tes 4");
        docs[5] = new mydoc("5", "Name5", "I am new test 5");
        docs[6] = new mydoc("6", "Name6", "I am new text 6");
        docs[7] = new mydoc("7", "Name7", "I am new text 7");
        docs[8] = new mydoc("8", "Name8", "I am new text 8");
        docs[9] = new mydoc("9", "Name9", "I am new text 9");
        docs[10] = new mydoc("10", "Name10", "I am new test 10");
        Console.WriteLine("index processing...");
        var dir = new DirectoryInfo("tmp");
        FSDirectory fsdir = FSDirectory.Open(dir);
        Analyzer analyzer = new StandardAnalyzer(Net.Util.Version.LUCENE_29);
        IndexWriter writer = new IndexWriter(fsdir , analyzer,true, IndexWriter.MaxFieldLength.UNLIMITED);
        for (int i = 0; i < docs.Length; i++)
        {
            writer.AddDocument(Convert(docs[i]));
        }
        writer.Optimize(true);
        writer.Close(true);
        Console.WriteLine("index done !");
        IndexReader reader = IndexReader.Open(fsdir, true);
        for (int i = 0; i < reader.MaxDoc;i++)
        {
            Document doc = reader.Document(i);
            Console.WriteLine("id = '"{0}'", Name = '"{1}'", Context = '"{2}'"", doc.Get("ID"),doc.Get("Name"),doc.Get("Content"));
        }
        reader.Close();
        // Update custom base
        IndexWriter updater = new IndexWriter(fsdir, analyzer, true, IndexWriter.MaxFieldLength.UNLIMITED);
        updater.UpdateDocument(new Term("0"), Convert(new mydoc("0", "New name 0", "prosto obitr test")), new StandardAnalyzer(Net.Util.Version.LUCENE_30));
        updater.UpdateDocument(new Term("1"), Convert(new mydoc("1", "New name 1", "prosto obitr test")),new StandardAnalyzer(Net.Util.Version.LUCENE_30));
        updater.UpdateDocument(new Term("2"), Convert(new mydoc("2", "New name 2", "prosto obitr test")), new StandardAnalyzer(Net.Util.Version.LUCENE_30));
        updater.UpdateDocument(new Term("3"), Convert(new mydoc("3", "New name 3", "prosto obitr test")), new StandardAnalyzer(Net.Util.Version.LUCENE_30));
        updater.UpdateDocument(new Term("4"), Convert(new mydoc("4", "New name 4", "prosto obitr test")), new StandardAnalyzer(Net.Util.Version.LUCENE_30));
        updater.UpdateDocument(new Term("5"), Convert(new mydoc("5", "New name 5", "prosto obitr test")), new StandardAnalyzer(Net.Util.Version.LUCENE_30));
        updater.UpdateDocument(new Term("6"), Convert(new mydoc("6", "New name 6", "prosto obitr test")), new StandardAnalyzer(Net.Util.Version.LUCENE_30));
        updater.UpdateDocument(new Term("7"), Convert(new mydoc("7", "New name 7", "prosto obitr test")), new StandardAnalyzer(Net.Util.Version.LUCENE_30));
        updater.UpdateDocument(new Term("8"), Convert(new mydoc("8", "New name 8", "prosto obitr test")), new StandardAnalyzer(Net.Util.Version.LUCENE_30));
        updater.UpdateDocument(new Term("9"), Convert(new mydoc("9", "New name 9", "prosto obitr test")), new StandardAnalyzer(Net.Util.Version.LUCENE_30));
        updater.Optimize();
        updater.Close(true);
        reader = IndexReader.Open(fsdir, true);
        Console.WriteLine("New updated data:");
        for (int i = 0; i < reader.MaxDoc; i++)
        {
            Document doc = reader.Document(i);
            Console.WriteLine("id = '"{0}'", Name = '"{1}'", Context = '"{2}'"", doc.Get("ID"), doc.Get("Name"), doc.Get("Content"));
        }
        Console.ReadKey();

        Console.WriteLine("search processing...");
        string query = "test";
        fsdir = FSDirectory.Open(dir);
        IndexSearcher searcher = new IndexSearcher(fsdir, true);
        Console.WriteLine("Searching phrase '"{0}'"", query);
        List<KeyValuePair<int, int>> results = find(query, searcher);
        searcher.Close();
        fsdir.Close();
        Console.WriteLine("Results:");
        for (int i = 0; i < results.Count; i++)
        {
            try
            {
                // Display founded id
                Console.WriteLine(results[i].Value);
            }
            catch (Exception ex)
            {
                continue;
            }
        }
        Console.WriteLine("'n'rDone !");
        Console.ReadKey();
    }
    static List<KeyValuePair<int,int>> find(string query, IndexSearcher searcher)
    {
        var parser = new MultiFieldQueryParser(Net.Util.Version.LUCENE_30, new[] { "Name", "Content" }, new SimpleAnalyzer());
        var score = searcher.Search(parser.Parse(query), 99).ScoreDocs;
        var docIDs = score.Select(x => new KeyValuePair<int, int>
            (
                x.Doc, int.Parse(searcher.Doc(x.Doc).Get("ID"))
            )
            ).ToList();
        return docIDs;
    }

    static Document Convert(mydoc doc)
    {
        var document = new Document();
        document.Add(new Field("ID", doc.id, Field.Store.YES, Field.Index.NOT_ANALYZED));
        document.Add(new Field("Name", doc.name, Field.Store.YES, Field.Index.ANALYZED));
        document.Add(new Field("Content", doc.content, Field.Store.YES, Field.Index.ANALYZED));
        return document;
    }
}

在这种情况下,doc[10]只是偏离了索引。如果在中

IndexWriter updater = new IndexWriter(fsdir, analyzer, true, IndexWriter.MaxFieldLength.UNLIMITED);   

"true"替换为"false",它会创建新文档,而不是替换旧文档。

updater.commit()也没有帮助。

Lucene.NET更新数据

问题已解决。我的错是没有正确理解中的术语类型

IndexUpdater.UpdateDocument(Term term, Document doc);

它需要创建这样的Term的新实例(在我的情况下):

updater.UpdateDocument(new Term("ID", "5"), Convert(new mydoc("5", "New name 5", "simple new test text")), new StandardAnalyzer(Net.Util.Version.LUCENE_30));

其中,术语构造函数字段"ID"是我的唯一字段,没有索引标志,"5"是索引中旧文档中旧值字段"ID)的文本。