类似于 SqlDataReader.Read() 的自定义功能

本文关键字：自定义功能 SqlDataReader Read 类似于 | 更新日期: 2023-09-27 18:37:22

我想创建一个类似于SqlDataReader.Read()的功能

我正在从 .txt/.csv 读取一个平面文件，并将其作为数据表返回到我的处理业务逻辑的类。这将循环访问数据表的行，并转换数据，写入结构化数据库。我将这种结构用于多个导入源。

不过，大文件的工作非常非常慢。我花了 2 小时来处理 30 MB 的数据，我想把它缩短到 30 分钟。朝这个方向迈出的一步是不要将整个文件读入 DataTable，而是逐行处理它，并防止内存被堵塞。

像这样的东西是理想的：伪代码。

FlatFileReader ffr = new FlatFileReader(); //Set FlatFileParameters
while(ffr.ReadRow(out DataTable parsedFlatFileRow))
{
     //...Business Logic for handling the parsedFlatFileRow
}

如何实现像.ReadRow(out DataTable parsedFlatFileRow)一样工作的方法？

这是正确的方向吗？

foreach(obj in ff.lazyreading()){
    //Business Logic
} 
...
class FlatFileWrapper{
    public IEnumerable<obj> lazyreading(){
        while(FileReader.ReadLine()){ 
            yield return parsedFileLine; 
        }
    } 
}

类似于 SqlDataReader.Read() 的自定义功能

正如 Tim 已经提到的，File.ReadLines 是你需要的：

"当你使用ReadLines时，你可以开始枚举集合返回整个集合之前的字符串"

您可以创建一个使用该方法的解析器，如下所示：

// object you want to create from the file lines.
public class Foo
{
    // add properties here....
}
// Parser only responsibility is create the objects.
public class FooParser
{
    public IEnumerable<Foo> ParseFile(string filename)
    {
        if(!File.Exists(filename))
            throw new FileNotFoundException("Could not find file to parse", filename);
        foreach(string line in File.ReadLines(filename))
        {
            Foo foo = CreateFoo(line);
            yield return foo;
        }
    }
    private Foo CreateFoo(string line)
    {
        // parse line/create instance of Foo here
        return new Foo {
            // ......
        };
    }
}

使用代码：

var parser = new FooParser();
foreach (Foo foo in parser.ParseFile(filename))
{
     //...Business Logic for handling the parsedFlatFileRow
}

您可以使用类似于StreamReader的File.ReadLines：

foreach(string line in File.ReadLines(path))
{
     //...Business Logic for handling the parsedFlatFileRow
}