字符串线性搜索c#

本文关键字:线性搜索 字符串 | 更新日期: 2023-09-27 18:00:48

我目前正在做一个小的C#练习,处理在文本文件中搜索相关术语/单词的问题,该程序将写出文本文件中包括搜索单词的所有句子。例如,我输入单词:"example",程序要做的是遍历文本文件中的所有句子,并提取出其中包含单词"example(示例("的句子。

The text file is structured as so: <sentenceDesignator> <text>
sentence 1: bla bla bla bla example of a sentence  //each line contains a sentence
sentence 2: this is not a good example of grammar
sentence 3: bla is not a real word, use better terms

我想做的是使用线性搜索来遍历文本文件中的所有行,并写出包含搜索字符串项的所有句子。

到目前为止我的代码:

        String filename = @"sentences.txt";
        if (!File.Exists(filename))
        {
            // Since we just created the file, this shouldn't happen.
            Console.WriteLine("{0} not found", filename);
            return;
        }
        else
        {
            Console.WriteLine("Successfully found {0}.", filename);
        }
        //making a listof type "Sentence" to hold all the sentences
        List<Sentence> sentences = new List<Sentence>();
        //the next lines of code...
        StreamReader reader = File.OpenText(filename);
        //first, write out all of the sentences in the text file
        //read a line(sentence) from a line in the text file
        string line = reader.ReadLine();
        while (line != null)
        {
            Sentence s = new Sentence();
            //we need something to split data...
            string[] lineArray = line.Split(':');
            s.sentenceDesignator = lineArray[0];
            s.Text = lineArray[1];
            Console.Write("'n{0}", line);
            line = reader.ReadLine();
        }
        //so far, we can write out all of the sentences in the text file. 
        Console.Write("'n'nOK!, search a term to diplay all their occurences: ");
        string searchTerm = Console.ReadLine();
       if(!line.Contains(searchterm))
       {
          Console.Write("'nThat term does not exist in any sentence.");
       }
       else
        {
            foreach (Sentence ss in sentences)
            {
                if (ss.sentenceDesignator.Contains(queryName))
                {
                    //I need help here
                }
            }
        }

字符串线性搜索c#

如果构建文件的索引,然后搜索索引,速度会快得多,就像线性搜索一样,每个搜索操作都是O(n),而索引搜索是O(n)用于构建索引,而O(log n)near-O(1)用于查找(取决于如何构建索引(。成本是增加了索引的内存消耗,但我会这样做:

private Dictionary<String,List<Int32>> _index = new Dictionary<String,List<Int32>>();
/// <summary>Populates an index of words in a text file. Takes O(n) where n is the size of the input text file.</summary>
public void BuildIndex(String fileName) {
    using(Stream inputTextFile = OpenFile(...)) {
        int currentPosition = 0;
        foreach(String word in GetWords(inputTextFile)) {
            word = word.ToUpperInvariant();
            if( !_index.ContainsKey( word ) ) _index.Add( word, new List<Int32>() );
            _index[word].Add( currentPosition );
            currentPosition = inputTextFile.Position;
        }
    }
}
/// <summary>Searches the text file (via its index) if the specified string (in its entirety) exists in the document. If so, it returns the position in the document where the string starts. Otherwise it returns -1. Lookup time is O(1) on the size of the input text file, and O(n) for the length of the query string.</summary>
public Int32 SearchIndex(String query) {
    String[] terms = query.Split(' ');
    Int32 startingPosition = -1;
    Int32 currentPosition = -1;
    Boolean first = true;
    foreach(String term in terms) {
        term = term.ToUpperInvariant();
        if( first ) {
            if( !_index.Contains( term ) ) return -1;
            startingPosition = _index[term][0];
        } else {
            if( !ContainsTerm( term, ++currentPosition ) ) return -1;
        }
        first = false;
    }
    return startingPosition;
}
/// <summary>Indicates if the specified term exists at the specified position.</summary>
private Boolean ContainsTerm(String term, Int32 expectedPosition) {
    if( !_index.ContainsKey(term) ) return false;
    List<Int32> positions = _index[term];
    foreach(Int32 pos in positions) {
        if( pos == expectedPosition ) return true;
    }
    return false;
}

OpenFileGetWords的实现应该是琐碎的。请注意,GetWords使用yield return在文件中构建由空格分隔的单词组成的IEnumerable<String>,并处理您的自定义文件格式。

我对最后一个if/else有点困惑。您似乎只是在将文件的最后一行与搜索项进行比较。另外,"queryName"来自哪里?你想打印出整个句子("一个句子的例子"(还是只打印出"句子1"?此外,如果您检查sentenceDesignator是否包含queryName,我想您应该检查实际的Text是否包含搜索项。

也许这会对你有所帮助:

var lines = File.ReadAllLines(fileName);    
var sentences = new List<Sentence>(lines.Count());
foreach (var line in lines)
{
    var lineArray = line.Split(':');
    sentences.Add(new Sentence { sentenceDesignator = lineArray[0], Text = lineArray[1]});
}
foreach (var sentence in sentences)
{
    if (sentence.Text.Contains(searchTerm))
    {
        Console.WriteLine(sentence.sentenceDesignator);
        //Console.WriteLine(sentence.Text);
    }
}