使用streamreader/streamwriter在C#中逐行写入文件的速度非常慢

本文关键字：文件速度非常逐行 streamreader streamwriter 使用 | 更新日期: 2023-09-27 18:00:28

我编写了一个Winform应用程序，它读取文本文件的每一行，使用行上的RegEx进行搜索和替换，然后写回一个新文件。我选择了"逐行"的方法，因为有些文件太大，无法加载到内存中。

我正在使用BackgroundWorker对象，这样UI就可以随着作业的进度而更新。下面是处理文件中行的读取和输出的代码（为简洁起见，省略了部分）。

public void bgWorker_DoWork(object sender, DoWorkEventArgs e)
{
    // Details of obtaining file paths omitted for brevity
    int totalLineCount = File.ReadLines(inputFilePath).Count();
    using (StreamReader sr = new StreamReader(inputFilePath))
    {
      int currentLine = 0;
      String line;
      while ((line = sr.ReadLine()) != null)
      {
        currentLine++;
        // Match and replace contents of the line
        // omitted for brevity
        if (currentLine % 100 == 0)
        {
          int percentComplete = (currentLine * 100 / totalLineCount);
          bgWorker.ReportProgress(percentComplete);
        }
        using (FileStream fs = new FileStream(outputFilePath, FileMode.Append, FileAccess.Write))
        using (StreamWriter sw = new StreamWriter(fs))
        {
          sw.WriteLine(line);
        }
      }
    }
}

我正在处理的一些文件非常大（8GB，1.32亿行）。这个过程需要很长时间（一个2GB的文件大约需要9个小时才能完成）。它的工作速度大约为58 KB/秒。这是意料之中的事，还是应该加快进程？

使用streamreader/streamwriter在C#中逐行写入文件的速度非常慢

不要每次循环迭代都关闭并重新打开写入文件，只需在文件循环之外打开编写器即可。这应该会提高性能，因为编写器不再需要在每次循环迭代时都搜索到文件的末尾。

此外，File.ReadLines(inputFilePath).Count();会导致您读取输入文件两次，这可能会占用大量时间。计算基于偏离线的百分比，而不是基于偏离流位置的百分比。

public void bgWorker_DoWork(object sender, DoWorkEventArgs e) 
{ 
    // Details of obtaining file paths omitted for brevity
    using (StreamWriter sw = new StreamWriter(outputFilePath, true)) //You can use this constructor instead of FileStream, it does the same operation.
    using (StreamReader sr = new StreamReader(inputFilePath))
    {
      int lastPercentage = 0;
      String line;
      while ((line = sr.ReadLine()) != null)
      {
        // Match and replace contents of the line
        // omitted for brevity
        //Poisition and length are longs not ints so we need to cast at the end.
        int currentPercentage = (int)(sr.BaseStream.Position * 100L / sr.BaseStream.Length);
        if (lastPercentage != currentPercentage )
        {
          bgWorker.ReportProgress(currentPercentage );
          lastPercentage = currentPercentage;
        }
          sw.WriteLine(line);
      }
    }
}

除此之外，你还需要展示Match and replace contents of the line omitted for brevity的作用，我想这就是你缓慢的原因。对代码运行探查器，看看它在哪里花费了最多的时间，并将精力集中在那里。

遵循以下过程：

实例化读写器
循环线路，执行接下来的两个步骤
回路内变更线
循环写入更改行
处理读写器

这应该比在每个行循环上实例化编写器快得多。

我将很快添加一个代码示例。看起来有人在代码样本上打败了我——请参阅@Scott Chamberlain的回答。

在读取整个文件时删除顶部的ReadAllLines方法，以获取行数。