从文本文件中删除停止词

本文关键字:删除 文本 文件 | 更新日期: 2023-09-27 18:03:51

我想从我的文本文件中删除停止词,为此我编写了以下代码

 TextWriter tw = new StreamWriter("D:''output.txt");
 private void button1_Click(object sender, EventArgs e)
        {
            StreamReader reader = new StreamReader("D:''input1.txt");
            string line;
            while ((line = reader.ReadLine()) != null)
            {
                string[] parts = line.Split(' ');
                string[] stopWord = new string[] { "is", "are", "am","could","will" };
                foreach (string word in stopWord)
                {
                    line = line.Replace(word, "");
                    tw.Write("+"+line);
                }
                tw.Write("'r'n");
            } 

但是在输出文件中没有显示结果,输出文件仍然为空

从文本文件中删除停止词

正则表达式可能非常适合这项工作:

        Regex replacer = new Regex("'b(?:is|are|am|could|will)'b");
        using (TextWriter writer = new StreamWriter("C:''output.txt"))
        {
            using (StreamReader reader = new StreamReader("C:''input.txt"))
            {
                while (!reader.EndOfStream)
                {
                    string line = reader.ReadLine();
                    replacer.Replace(line, "");
                    writer.WriteLine(line);
                }
            }
            writer.Flush();
        }

此方法只会用空格替换单词,如果停词是另一个单词的一部分,则不处理停词。

祝你好运

以下工作对我来说是预期的。然而,这不是一个好方法,因为它会删除停顿词,即使它们是一个更大的词的一部分。此外,它不会清除删除的单词之间的额外空格。

string[] stopWord = new string[] { "is", "are", "am","could","will" };
TextWriter writer = new StreamWriter("C:''output.txt");
StreamReader reader = new StreamReader("C:''input.txt");
string line;
while ((line = reader.ReadLine()) != null)
{
    foreach (string word in stopWord)
    {
        line = line.Replace(word, "");
    }
    writer.WriteLine(line);
}
reader.Close();
writer.Close();

另外,我建议在创建流时使用using语句,以确保文件及时关闭。

你应该把IO对象包装在using语句中,这样它们才能被正确的处理。

using (TextWriter tw = new TextWrite("D:''output.txt"))
{
    using (StreamReader reader = new StreamReader("D:''input1.txt"))
    {
        string line;
        while ((line = reader.ReadLine()) != null)
        {
            string[] parts = line.Split(' ');
            string[] stopWord = new string[] { "is", "are", "am","could","will" };
            foreach (string word in stopWord)
            {
                line = line.Replace(word, "");
                tw.Write("+"+line);
            }
        }
    }
}

尝试在using() {}子句中包装StreamWriterStreamReader

using (TextWriter tw = new StreamWriter(@"D:'output.txt")
{
  ...
}

您可能还想在最后调用tw.Flush()