正在合并c#中文本文件的行内容

本文关键字:文件 文本 合并 中文 | 更新日期: 2023-09-27 18:20:06

我有两个巨大的文本文件,其格式如下。

文件1:

ID1,20
ID2,20
ID3,30

文件2:

ID3,75
ID1,84
ID2,70

这两个文件都包含超过200000行。我需要读取这两个文件,并以这种格式创建第三个文件:

文件3:

ID1,20,84
ID2,20,70
ID3,30,75

ID可以是用户输入的任何字符串。第三个文件应该通过将文件1行中的ID与文件2行中的ID相匹配来创建。我已经写了一段代码,但生成文件3需要很多时间。手头的任务涉及并行化,所以我希望代码能尽可能多地为我节省时间。请提出一种更快、更有效的方法来处理这个问题。

(这是我使用的代码)

public void positionCure(string afile,string bfile,string dfile)
    {
        string alphaFilePath = afile;
        List<string> alphaFileContent = new List<string>();
        using (FileStream fs = new FileStream(alphaFilePath, FileMode.Open))
        using(StreamReader rdr = new StreamReader(fs))
        {
            while(!rdr.EndOfStream)
            {
                alphaFileContent.Add(rdr.ReadLine());
            }
        }
        string betaFilePath = bfile;
        StringBuilder sb = new StringBuilder();

        using (FileStream fs = new FileStream(betaFilePath, FileMode.Open))
        using (StreamReader rdr = new StreamReader(fs))
        {
            while(! rdr.EndOfStream)
            {
                string[] betaFileLine = rdr.ReadLine().Split(Convert.ToChar(","));
                foreach (string alphaline in alphaFileContent)
                {
                        string[] alphaFileLine = alphaline.Split(Convert.ToChar(","));
                        if (alphaFileLine[0].Equals(betaFileLine[0].ToString()))
                        {
                            sb.AppendLine(String.Format("{0}, {1}, {2}", betaFileLine[0], betaFileLine[1], alphaline.Substring(alphaline.IndexOf(Convert.ToChar(","))+1)));
                        }
                }
            }

           }
        using (FileStream fs = new FileStream(dfile, FileMode.Create))
        using (StreamWriter writer = new StreamWriter(fs))
        {
            writer.Write(sb.ToString());
        }

        }
    }

正在合并c#中文本文件的行内容

我会做一些类似的事情:

string[] files = new string[] { @"c:'temp'file1.txt", @"c:'temp'file2.txt" };
var hash = new Dictionary<string, Dictionary<string, bool>>();
foreach (var file in files)
{
    string[] fileContents = File.ReadAllLines(file);
    foreach (string line in fileContents)
    {
        string[] a = line.Split(',');
        if (!hash.Keys.Contains(a[0]))
            hash[a[0]] = new Dictionary<string, bool>();
        hash[a[0]][a[1]] = true;
    }
}
foreach (var key in hash.Keys)
    Console.WriteLine(key + "," + string.Join(",", hash[key].Keys.ToArray()));

我建议使用Dictionary:

var combined = new Dictionary<string, string>();
// loop through each of the rows in the first file, and the second file, 
while (going through each of the files)
{    
    string id;
    string number;
    //and after splitting the line into the two variables:
    if (combined.ContainsKey(id))
    {
        combined[id] = combined[id] + "," + number; // or do something else, this is just an example
        // changing it from a Dictionary<string, string> to a Dictionary<string, List<string>> might be more performant, especially if you have a bunch of files you want to do this do, but it also might not be necessary.
    }
    else
    {
        combined[id] = number;
    }
}
// you can then run through the file and output it.
foreach (var pair in combined)
{
    file.Write(pair.Key);
    file.Write(",");
    file.Writeline(pair.Value);
}

这里有几个不错的解决方案。这是另一个简单的例子:

将内容放入字典:

private Dictionary<string, string> LoadFile(string path)
        {
            string line;
            Dictionary<string, string> vals = new Dictionary<string, string>();
            using (StreamReader file = new StreamReader(path))
            {
                while ((line = file.ReadLine()) != null)
                {
                    string[] parts = line.Split(',');
                    vals.Add(parts[0], parts[1]);
                }
            }
            return vals;
        }

然后在您的程序中,加载每个文件并合并

Dictionary<string, string> fileAValues = LoadFile(@"C:'Temp'FileA.txt");
Dictionary<string, string> fileBValues = LoadFile(@"C:'Temp'FileB.txt");
            using (StreamWriter sr = new StreamWriter(@"C:'Temp'FileC.txt"))
            {
                foreach (string key in fileAValues.Keys)
                {
                    if (fileBValues.ContainsKey(key))
                    {
                        string combined = key + "," + 
                          String.Join(",", fileAValues[key].ToString(),
                        fileBValues[key].ToString());  
                        sr.WriteLine(combined);
                    }
                }
            }

如果有人对VB.NET版本感兴趣(我对C#总是太慢了),那么为了完整性。我也在使用字典的方法。

Dim dic1 As New Dictionary(Of String, List(Of String))
Dim file1 = System.IO.File.ReadAllLines("C:'Temp'File1.txt")
For Each l In file1
    Dim cols = l.Split(","c)
    If cols.Any Then
        Dim key = cols(0)
        If Not dic1.ContainsKey(key) Then
            Dim values = (From col In cols Skip (1)).ToList
            dic1.Add(key, values)
        End If
    End If
Next
Dim file2 = System.IO.File.ReadAllLines("C:'Temp'File2.txt")
For Each l In file2
    Dim cols = l.Split(","c)
    If cols.Any Then
        Dim key = cols(0)
        If dic1.ContainsKey(key) Then
            ' append '
            Dim values = (From col In cols Skip (1)).ToList
            dic1(key).AddRange(values)
        Else
            Dim values = (From col In cols Skip (1)).ToList
            dic1.Add(key, values)
        End If
    End If
Next
Using writer = New System.IO.StreamWriter("C:'Temp'File3.txt")
    For Each entry In dic1
        writer.WriteLine(String.Format("{0},{1}", entry.Key, String.Join(",", entry.Value)))
    Next
End Using

通过将其构造为LINQ查询,您可以利用AsParallel方法在多个线程上执行它。考虑到你有这么多数据,这将大大提高你的算法的性能。

首先,我们需要变得更有条理。我们可以对您使用的价值进行建模:

public class InputLine
{
    public string Id { get; set; }
    public string Value { get; set; }
}
public class OutputLine
{
    public string Id { get; set; }
    public string Value1 { get; set; }
    public string Value2 { get; set; }
}

我们还可以定义这些价值观的生产者和消费者:

public class InputFile
{
    private readonly string _path;
    public InputFile(string path)
    {
        _path = path;
    }
    public IEnumerable<InputLine> GetLines()
    {
        return
            from line in File.ReadAllLines(_path)
            let parts = line.Split(',')
            select new InputLine { Id = parts[0], Value = parts[1] };
    }
}
public class OutputFile
{
    private readonly string _path;
    public OutputFile(string path)
    {
        _path = path;
    }
    public void WriteLines(IEnumerable<OutputLine> lines)
    {
        File.WriteAllLines(_path, lines.Select(line => String.Join(",", line.Id, line.Value1, line.Value2)));
    }
}

现在,我们有了编写一个将所有查询联系在一起的查询的要素。此查询有两个关键方面:

  1. 使用.AsParallel()扩展方法并行执行
  2. 使用join运算符将两个输入文件之间的键关联起来

我们只需要两个输入文件和输出文件:

public void WriteResults(InputFile file1, InputFile file2, OutputFile resultFile)
{
    var resultLines =
        from file1Line in file1.GetLines().AsParallel()
        join file2Line in file2.GetLines() on file1Line.Id equals file2Line.Id
        select new OutputLine
        {
            Id = file1Line.Id,
            Value1 = file1Line.Value,
            Value2 = file2Line.Value
        };
    resultFile.WriteLines(resultLines);
}

join操作符在后台使用与字典类似的方法,并且您还可以从在多个线程上进行关联中获益。