C# Reading Stream

本文关键字:Stream Reading | 更新日期: 2023-09-27 18:27:35

我想在C#中读取一个.txt文件,但我不会同时读取所有行。例如,考虑500行文本文件。我想要一个函数运行25次,每次读取20行。在函数的第一次调用中,将读取从1到20的行,第二次调用时,将读取21-40。

下面的简单代码在c++中实现了这一点,但我不知道如何在c#中实现:

string readLines(ifstream& i)
{
     string totalLine="", line = "";
     for(int i = 0; i < 20; i++){
          getline(i, line);
          totalLine += line;
     }
     return totalLine;
}
int main()
{
     // ...
     ifstream in;
     in.open(filename.c_str());
     while(true){
         string next20 = readLines(in);
         // so something with 20 lines.
     }
     // ...
}

C# Reading Stream

这里有各种选择,但一种简单的方法是:

using (var reader = File.OpenText("file.txt"))
{
    for (int i = 0; i < 25; i++)
    {
        HandleLines(reader);
    }
}
...
private void HandleLines(TextReader reader)
{
    for (int i = 0; i < 20; i++)
    {
        string line = reader.ReadLine();
        if (line != null) // Handle the file ending early
        {
            // Process the line
        }
    }
}

如果您试图调用LineRead()的次数尽可能少,并且您希望内存使用率最低您可以首先对文件中的行进行索引:

  1. 对文件进行一次分析,索引FileStream中每一行的位置
  2. 仅在所需位置调用ReadLine()

例如:

// Parse the file
var indexes = new List<long>();
using (var fs = File.OpenRead("text.txt"))
{
    indexes.Add(fs.Position);
    int chr;
    while ((chr = fs.ReadByte()) != -1)
    {
        if (chr == ''n')
        {                        
            indexes.Add(fs.Position);
        }
    }
}
int minLine = 21;
int maxLine = 40;
// Read the line
using (var fs = File.OpenRead("text.txt"))
{
    for(int i = minLine ; i <= maxLine ; i++)
    {
        fs.Position = indexes[ i ];
        using (var sr = new StreamReader(fs))
            Console.WriteLine(sr.ReadLine());
}

干杯!

您可以编写这样的Batch()方法:

public static IEnumerable<string> Batch(IEnumerable<string> input, int batchSize)
{
    int n = 0;
    var block = new StringBuilder();
    foreach (var line in input)
    {
        block.AppendLine(line);
        if (++n != batchSize)
            continue;
        yield return block.ToString();
        block.Clear();
        n = 0;
    }
    if (n != 0)
        yield return block.ToString();
}

这样称呼它:

string filename = "<Your filename goes here>";
var batches = Batch(File.ReadLines(filename), 20);
foreach (var block in batches)
{
    Console.Write(block);
    Console.WriteLine("------------------------");
}

Oops。GroupBy不会延迟评估,因此这将贪婪地消耗整个文件

var twentyLineGroups = 
    File.ReadLines(somePath)
        .Select((line, index) => new {line, index})
        .GroupBy(x => x.index / 20)
        .Select(g => g.Select(x => x.line));
foreach(IEnumerable<string> twentyLineGroup in twentyLineGroups)
{
    foreach(string line in twentyLineGroup)
    {
        //tada!
    }
}