将2个CSV文件与C#进行比较，得到差异

本文关键字：比较 CSV 2个文件 | 更新日期: 2023-09-27 18:22:07

我一天要读几次CSV文件。它大约是300MB，每次我都要通读它，与数据库中的现有数据进行比较，添加新数据，隐藏旧数据，更新现有数据。还有一堆数据没有得到处理。

我可以访问所有文件，包括旧文件和新文件，我想将新文件与上一个文件进行比较，然后更新文件中更改的内容。我不知道该做什么，我正在用C#做我所有的工作。可能最有问题的一件事是，前一个字段中的一行可能在第二个提要中的另一个位置，即使它根本没有更新。如果可能的话，我也想避免那个问题。

任何想法都会有所帮助。

使用现有的CSV解析器之一
将每一行解析为映射的类对象
覆盖对象的Equals和GetHashCode
在内存中保留一个List<T>或HashSet<T>，在第一步初始化它们时不包含任何内容
从CSV文件中读取每一行时，请检查内存集合（List，HashSet）中是否存在
如果内存中的集合中不存在该对象，请将其添加到集合中并插入数据库
如果对象存在于内存中的集合中，则忽略它（检查它将基于Equals和GetHashCode实现，然后它将像if(inMemoryCollection.Contains(currentRowObject))一样简单

我想你有一个windows服务定期从文件位置读取CSV文件。每次读取新的CSV文件时，都可以重复上述过程。这样，您就可以在内存中维护以前插入的对象的集合，并忽略它们，而不管它们在CSV文件中的位置如何。

如果您有为数据定义的主键，则可以使用Dictionary<T,T>，其中key可以是唯一字段。这将帮助您获得更多的性能进行比较，并且您可以忽略Equals和GetHashCode的实现。

作为这个过程的备份，DB写入例程/存储过程的定义方式应该是，如果记录已经存在于表中，它将首先检查，在这种情况下，更新表，否则插入新记录。这将是UPSERT。

请记住，如果您最终维护了内存中的集合，那么请定期清除它，否则可能会出现内存不足的异常。

只是好奇，为什么要将旧文件与新文件进行比较？SQL server中的旧文件中的数据不是已经存在了吗？（当你说数据库时，你指的是SQL服务器，对吧？我假设SQL服务器是因为你使用C#.net）

我的方法很简单：

将新的CSV文件加载到暂存表中

使用存储的进程插入、更新和设置非活动文件

public static void ProcessCSV(FileInfo file)
{
    foreach (string line in ReturnLines(file))
    {
        //break the lines up and parse the values into parameters
        using (SqlConnection conn = new SqlConnection(connectionString))
        using (SqlCommand command = conn.CreateCommand())
        {
            command.CommandType = CommandType.StoredProcedure;
            command.CommandText = "[dbo].sp_InsertToStaging";
            //some value from the string Line, you need to parse this from the string
            command.Parameters.Add("@id", SqlDbType.BigInt).Value = line["id"];
            command.Parameters.Add("@SomethingElse", SqlDbType.VarChar).Value = line["something_else"];
            //execute
            if (conn.State != ConnectionState.Open)
                conn.Open();
            try
            {
                command.ExecuteNonQuery();
            }
            catch (SqlException exc)
            {
                //throw or do something
            }
        }
    }
}
public static IEnumerable<string> ReturnLines(FileInfo file)
{
    using (FileStream stream = File.Open(file.FullName, FileMode.Open, FileAccess.Read, FileShare.Read))
    using (StreamReader reader = new StreamReader(stream))
    {
        string line;
        while ((line = reader.ReadLine()) != null)
        {
            yield return line;
        }
    }
}

现在，您可以编写存储的proc来插入、更新和设置基于Id的非活动字段。如果Field_x（main_table）！=特定Id的字段_x（staging_table），依此类推。

以下是如何检测主表和暂存表之间的更改和更新。

/* SECTION: SET INACTIVE */
UPDATE main_table
SET IsActiveTag = 0
WHERE unique_identifier IN
    (
        SELECT a.unique_identifier
        FROM main_table AS a INNER JOIN staging_table AS b
        --inner join because you only want existing records
        ON a.unique_identifier = b.unique_identifier
        --detect any updates
        WHERE a.field1 <> b.field2
            OR a.field2 <> b.field2
            OR a.field3 <> b.field3
            --etc
    )
/* SECTION: INSERT UPDATED AND NEW */
INSERT INTO main_table
SELECT *
FROM staging_table AS b
LEFT JOIN
    (SELECT *
    FROM main_table
    --only get active records
    WHERE IsActiveTag = 1) AS a
ON b.unique_identifier = a.unique_identifier
--select only records available in staging table
WHERE a.unique_identifier IS NULL

csv文件有多大？？如果它很小，请尝试以下

string [] File1Lines = File.ReadAllLines(pathOfFileA);
      string [] File2Lines = File.ReadAllLines(pathOfFileB);
      List<string> NewLines = new List<string>();
      for (int lineNum = 0; lineNo < File1Lines.Length; lineNo++)
      {
        if(!String.IsNullOrEmpty(File1Lines[lineNum]) 
 String.IsNullOrEmpty(File2Lines[lineNo]))
        {
          if(String.Compare(File1Lines[lineNo], File2Lines[lineNo]) != 0)
            NewLines.Add(File2Lines[lineNo]) ;
        }
        else if (!String.IsNullOrEmpty(File1Lines[lineNo]))
        {
        }
        else
        {
          NewLines.Add(File2Lines[lineNo]);
        }
      }
      if (NewLines.Count > 0)
      {
        File.WriteAllLines(newfilepath, NewLines);
      }