从文本文件中删除停止词
本文关键字:删除 文本 文件 | 更新日期: 2023-09-27 18:03:51
我想从我的文本文件中删除停止词,为此我编写了以下代码
TextWriter tw = new StreamWriter("D:''output.txt");
private void button1_Click(object sender, EventArgs e)
{
StreamReader reader = new StreamReader("D:''input1.txt");
string line;
while ((line = reader.ReadLine()) != null)
{
string[] parts = line.Split(' ');
string[] stopWord = new string[] { "is", "are", "am","could","will" };
foreach (string word in stopWord)
{
line = line.Replace(word, "");
tw.Write("+"+line);
}
tw.Write("'r'n");
}
但是在输出文件中没有显示结果,输出文件仍然为空
正则表达式可能非常适合这项工作:
Regex replacer = new Regex("'b(?:is|are|am|could|will)'b");
using (TextWriter writer = new StreamWriter("C:''output.txt"))
{
using (StreamReader reader = new StreamReader("C:''input.txt"))
{
while (!reader.EndOfStream)
{
string line = reader.ReadLine();
replacer.Replace(line, "");
writer.WriteLine(line);
}
}
writer.Flush();
}
此方法只会用空格替换单词,如果停词是另一个单词的一部分,则不处理停词。
祝你好运
以下工作对我来说是预期的。然而,这不是一个好方法,因为它会删除停顿词,即使它们是一个更大的词的一部分。此外,它不会清除删除的单词之间的额外空格。
string[] stopWord = new string[] { "is", "are", "am","could","will" };
TextWriter writer = new StreamWriter("C:''output.txt");
StreamReader reader = new StreamReader("C:''input.txt");
string line;
while ((line = reader.ReadLine()) != null)
{
foreach (string word in stopWord)
{
line = line.Replace(word, "");
}
writer.WriteLine(line);
}
reader.Close();
writer.Close();
另外,我建议在创建流时使用using
语句,以确保文件及时关闭。
你应该把IO对象包装在using语句中,这样它们才能被正确的处理。
using (TextWriter tw = new TextWrite("D:''output.txt"))
{
using (StreamReader reader = new StreamReader("D:''input1.txt"))
{
string line;
while ((line = reader.ReadLine()) != null)
{
string[] parts = line.Split(' ');
string[] stopWord = new string[] { "is", "are", "am","could","will" };
foreach (string word in stopWord)
{
line = line.Replace(word, "");
tw.Write("+"+line);
}
}
}
}
尝试在using() {}
子句中包装StreamWriter
和StreamReader
。
using (TextWriter tw = new StreamWriter(@"D:'output.txt")
{
...
}
您可能还想在最后调用tw.Flush()
。