C#性能-用AppendAllText分块写入文件
本文关键字:文件 AppendAllText 性能 | 更新日期: 2023-09-27 18:25:47
是否有更优雅/更快的方法来编写下面的代码?目前大约需要45秒。
query.sql有200000行长,每行都有sql:
SELECT N'+dave' AS [AccountName], N'20005' AS [EmployeeID], N'-6' AS [PlatformID] UNION ALL
我发现,通过分成1000个块,事情比等到最后并使用WriteAllText(大约需要20分钟才能运行)要快得多
static void Main(string[] args)
{
var s = new Stopwatch();
s.Start();
string textToWrite = "";
string[] lines = File.ReadAllLines(@"e:'temp'query.sql");
int i = 0;
foreach (var line in lines)
{
var bits = line.Split('''');
var value1 = bits[1];
var value2 = bits[3];
var value3 = bits[5];
var message = "INSERT [PreStaging].[Import_AccountEmployeeMapping] ([AccountName], [EmployeeID], [PlatformID]) VALUES (N" +
"'" + value1 + "', "
+ value2 + ", "
+ value3 + ")";
textToWrite += message + Environment.NewLine;
if (i % 1000 == 0)
{
Console.WriteLine(i + " " + DateTime.Now.ToLongTimeString());
File.AppendAllText(@"e:'temp'query2.sql", textToWrite);
textToWrite = "";
}
i++;
}
//File.WriteAllText(@"e:'temp'query2.sql", textToWrite);
File.AppendAllText(@"e:'temp'query2.sql", textToWrite);
s.Stop();
TimeSpan ts = s.Elapsed;
Console.WriteLine("Timespan: {0}m", ts.TotalMinutes);
Console.WriteLine("Total records: " + i);
Console.ReadLine();
}
编辑:StringBuilder解决方案(1000ms):
static void Main2(string[] args)
{
var s = new Stopwatch();
s.Start();
var textToWrite = new StringBuilder();
string[] lines = File.ReadAllLines(@"e:'temp'query.sql");
int i = 0;
foreach (var line in lines)
{
var bits = line.Split('''');
var value1 = bits[1];
var value2 = bits[3];
var value3 = bits[5];
var message = "INSERT [PreStaging].[Import_AccountEmployeeMapping] ([AccountName], [EmployeeID], [PlatformID]) VALUES (N" +
"'" + value1 + "', "
+ value2 + ", "
+ value3 + ")"
+ Environment.NewLine;
textToWrite.Append(message);
// Buffering
if (i % 1000 == 0)
{
Console.WriteLine(i + " " + DateTime.Now.ToLongTimeString());
File.AppendAllText(@"e:'temp'query2.sql", textToWrite.ToString());
textToWrite = new StringBuilder();
}
i++;
}
File.AppendAllText(@"e:'temp'query2.sql", textToWrite.ToString());
s.Stop();
TimeSpan ts = s.Elapsed;
Console.WriteLine("Timespan: {0}ms", ts.TotalMilliseconds);
Console.WriteLine("Total records: " + i);
Console.ReadLine();
}
编辑:StreamWriter解决方案(450毫秒)
static void Main(string[] args)
{
var s = new Stopwatch();
s.Start();
string[] lines = File.ReadAllLines(@"e:'temp'query.sql");
int i = 0;
using (StreamWriter writer = File.AppendText(@"e:'temp'query2.sql"))
{
foreach (var line in lines)
{
var bits = line.Split('''');
var value1 = bits[1];
var value2 = bits[3];
var value3 = bits[5];
writer.WriteLine("INSERT [PreStaging].[Import_AccountEmployeeMapping] ([AccountName], [EmployeeID], [PlatformID]) VALUES (N'{0}', {1}, {2})",
value1, value2, value3);
i++;
}
}
s.Stop();
TimeSpan ts = s.Elapsed;
Console.WriteLine("Timespan: {0}ms", ts.TotalMilliseconds);
Console.WriteLine("Total records: " + i);
Console.ReadLine();
}
正如其他人所指出的,使用StringBuilder
。因此,在您的情况下,声明:
StringBuilder textToWrite = new StringBuilder();
然后:
textToWrite.AppendLine(message);
if (i % 1000 == 0)
{
Console.WriteLine(i + " " + DateTime.Now.ToLongTimeString());
File.AppendAllText(@"e:'temp'query2.sql", textToWrite.ToString());
textToWrite = new StringBuilder();
}
尽管您可能最好完全放弃缓冲:
using (StreamWriter writer = File.AppendText(filename))
{
// initialization stuff here
foreach (var line in lines)
{
var bits = line.Split('''');
var value1 = bits[1];
var value2 = bits[3];
var value3 = bits[5];
var message = "INSERT [PreStaging].[Import_AccountEmployeeMapping] ([AccountName], [EmployeeID], [PlatformID]) VALUES (N" +
"'" + value1 + "', "
+ value2 + ", "
+ value3 + ")";
writer.WriteLine(message); // write the line
}
}
一个好的开始是使用.net中内置的StringBuilder类。这将避免大量的字符串分配和复制。
请参阅MSDN文档,了解其工作原理:http://msdn.microsoft.com/en-us/library/system.text.stringbuilder.aspx
另请参阅这篇Stackoverflow文章了解更多信息:连接字符串的最有效方法?
示例:
StringBuilder a = new StringBuilder();
a.Append("some text");
a.Append("more text");
string result = a.ToString();
哪个版本的sql server?做到这一点的最佳方法不是使用一个巨大的sql脚本,而是使用表值参数或使用sql服务器大容量复制支持。
最好的方法可能是同时打开两个文件,边读边写每一行,然后关闭文件。
然而,您最有可能遇到的最大问题是字符串串联。.NET中的字符串是不可变的,因此每次串联都会导致分配一个新的副本,这需要时间和内存(尽管GC最终会返回后者)。
如果将textToWrite
替换为StringBuilder
,并且最后只执行一个ToString()
,您将看到许多更好的性能。
或者,老实说,您可能会对整件事做一个regex替换,然后就可以完成了,尽管我认为您必须像已经做的那样,首先将整个文件读取到内存中。
MemoryMappedFiles是高效的,因此它们可能值得研究。
string[] lines = File.ReadAllLines(@"e:'temp'query.sql");
using (var mmf = MemoryMappedFile.CreateFromFile(@"e:'temp'query2.sql", FileMode.Create, "txt", new FileInfo(@"e:'temp'query.sql")Length))
{
StringBuilder sb = new StringBuilder();
using (MemoryMappedViewStream mmvs = mmf.CreateViewStream())
{
StreamWriter writer = new StreamWriter(mmvs);
for (int i = 0; i < lines.Length; i++)
{
var bits = lines[i].Split('''');
var value1 = bits[1];
var value2 = bits[3];
var value3 = bits[5];
sb.AppendFormat("INSERT [PreStaging].[Import_AccountEmployeeMapping]
([AccountName], [EmployeeID], [PlatformID])
VALUES (N'{0}', {1}, {2})", value1, value2, value3);
writer.WriteLine(message.ToString());
}
}
}
您可能会发现,首先构建整个文本,然后将整个内容写入MemoryMappedFiled的性能更好,因为对ToString
的调用更少。