C#性能-用AppendAllText分块写入文件

本文关键字:文件 AppendAllText 性能 | 更新日期: 2023-09-27 18:25:47

是否有更优雅/更快的方法来编写下面的代码?目前大约需要45秒。

query.sql有200000行长,每行都有sql:

SELECT N'+dave' AS [AccountName], N'20005' AS [EmployeeID], N'-6' AS [PlatformID] UNION ALL

我发现,通过分成1000个块,事情比等到最后并使用WriteAllText(大约需要20分钟才能运行)要快得多

static void Main(string[] args)
{
    var s = new Stopwatch();
    s.Start();
    string textToWrite = "";
    string[] lines = File.ReadAllLines(@"e:'temp'query.sql");
    int i = 0;
    foreach (var line in lines)
    {
        var bits = line.Split('''');
        var value1 = bits[1];
        var value2 = bits[3];
        var value3 = bits[5];
        var message = "INSERT [PreStaging].[Import_AccountEmployeeMapping] ([AccountName], [EmployeeID], [PlatformID]) VALUES (N" +
                    "'" + value1 + "', "
                    + value2 + ", "
                    + value3 + ")";
        textToWrite += message + Environment.NewLine;
        if (i % 1000 == 0)
        {
            Console.WriteLine(i + " " + DateTime.Now.ToLongTimeString());
            File.AppendAllText(@"e:'temp'query2.sql", textToWrite);
            textToWrite = "";
        }
        i++;
    }
    //File.WriteAllText(@"e:'temp'query2.sql", textToWrite);
    File.AppendAllText(@"e:'temp'query2.sql", textToWrite);
    s.Stop();
    TimeSpan ts = s.Elapsed;
    Console.WriteLine("Timespan: {0}m", ts.TotalMinutes);
    Console.WriteLine("Total records: " + i);
    Console.ReadLine();
}

编辑:StringBuilder解决方案(1000ms):

static void Main2(string[] args)
{
    var s = new Stopwatch();
    s.Start();
    var textToWrite = new StringBuilder();
    string[] lines = File.ReadAllLines(@"e:'temp'query.sql");
    int i = 0;
    foreach (var line in lines)
    {
        var bits = line.Split('''');
        var value1 = bits[1];
        var value2 = bits[3];
        var value3 = bits[5];
        var message = "INSERT [PreStaging].[Import_AccountEmployeeMapping] ([AccountName], [EmployeeID], [PlatformID]) VALUES (N" +
                    "'" + value1 + "', "
                    + value2 + ", "
                    + value3 + ")"
                    + Environment.NewLine;
        textToWrite.Append(message);
        // Buffering
        if (i % 1000 == 0)
        {
            Console.WriteLine(i + " " + DateTime.Now.ToLongTimeString());
            File.AppendAllText(@"e:'temp'query2.sql", textToWrite.ToString());
            textToWrite = new StringBuilder();
        }
        i++;
    }
    File.AppendAllText(@"e:'temp'query2.sql", textToWrite.ToString());
    s.Stop();
    TimeSpan ts = s.Elapsed;
    Console.WriteLine("Timespan: {0}ms", ts.TotalMilliseconds);
    Console.WriteLine("Total records: " + i);
    Console.ReadLine();
}

编辑:StreamWriter解决方案(450毫秒)

static void Main(string[] args)
    {
        var s = new Stopwatch();
        s.Start();
        string[] lines = File.ReadAllLines(@"e:'temp'query.sql");
        int i = 0;
        using (StreamWriter writer = File.AppendText(@"e:'temp'query2.sql"))
        {
            foreach (var line in lines)
            {
                var bits = line.Split('''');
                var value1 = bits[1];
                var value2 = bits[3];
                var value3 = bits[5];
                writer.WriteLine("INSERT [PreStaging].[Import_AccountEmployeeMapping] ([AccountName], [EmployeeID], [PlatformID]) VALUES (N'{0}', {1}, {2})",
                    value1, value2, value3);
                i++;
            }
        }
        s.Stop();
        TimeSpan ts = s.Elapsed;
        Console.WriteLine("Timespan: {0}ms", ts.TotalMilliseconds);
        Console.WriteLine("Total records: " + i);
        Console.ReadLine();
    }

C#性能-用AppendAllText分块写入文件

正如其他人所指出的,使用StringBuilder。因此,在您的情况下,声明:

StringBuilder textToWrite = new StringBuilder();

然后:

textToWrite.AppendLine(message);
if (i % 1000 == 0)
{
    Console.WriteLine(i + " " + DateTime.Now.ToLongTimeString());
    File.AppendAllText(@"e:'temp'query2.sql", textToWrite.ToString());
    textToWrite = new StringBuilder();
}

尽管您可能最好完全放弃缓冲:

using (StreamWriter writer = File.AppendText(filename))
{
    // initialization stuff here
    foreach (var line in lines)
    {
        var bits = line.Split('''');
        var value1 = bits[1];
        var value2 = bits[3];
        var value3 = bits[5];
        var message = "INSERT [PreStaging].[Import_AccountEmployeeMapping]                     ([AccountName], [EmployeeID], [PlatformID]) VALUES (N" +
                "'" + value1 + "', "
                + value2 + ", "
                + value3 + ")";
         writer.WriteLine(message); // write the line
    }
}

一个好的开始是使用.net中内置的StringBuilder类。这将避免大量的字符串分配和复制。

请参阅MSDN文档,了解其工作原理:http://msdn.microsoft.com/en-us/library/system.text.stringbuilder.aspx

另请参阅这篇Stackoverflow文章了解更多信息:连接字符串的最有效方法?

示例:

StringBuilder a = new StringBuilder();
a.Append("some text");
a.Append("more text");
string result = a.ToString();

哪个版本的sql server?做到这一点的最佳方法不是使用一个巨大的sql脚本,而是使用表值参数或使用sql服务器大容量复制支持。

最好的方法可能是同时打开两个文件,边读边写每一行,然后关闭文件。

然而,您最有可能遇到的最大问题是字符串串联。.NET中的字符串是不可变的,因此每次串联都会导致分配一个新的副本,这需要时间和内存(尽管GC最终会返回后者)。

如果将textToWrite替换为StringBuilder,并且最后只执行一个ToString(),您将看到许多更好的性能。

或者,老实说,您可能会对整件事做一个regex替换,然后就可以完成了,尽管我认为您必须像已经做的那样,首先将整个文件读取到内存中。

MemoryMappedFiles是高效的,因此它们可能值得研究。

string[] lines = File.ReadAllLines(@"e:'temp'query.sql");
using (var mmf = MemoryMappedFile.CreateFromFile(@"e:'temp'query2.sql", FileMode.Create, "txt", new FileInfo(@"e:'temp'query.sql")Length))
{       
    StringBuilder sb = new StringBuilder();
    using (MemoryMappedViewStream mmvs = mmf.CreateViewStream())
    {
       StreamWriter writer = new StreamWriter(mmvs);
       for (int i = 0; i < lines.Length; i++)
       {
          var bits = lines[i].Split('''');
          var value1 = bits[1];
          var value2 = bits[3];
          var value3 = bits[5];
          sb.AppendFormat("INSERT [PreStaging].[Import_AccountEmployeeMapping]
                          ([AccountName], [EmployeeID], [PlatformID])
                         VALUES (N'{0}', {1}, {2})", value1, value2, value3);

          writer.WriteLine(message.ToString()); 
      }
   }
}

您可能会发现,首先构建整个文本,然后将整个内容写入MemoryMappedFiled的性能更好,因为对ToString的调用更少。