使用Filehelpers将大型CSV文件导入SQL2014表

本文关键字:导入 SQL2014 文件 CSV Filehelpers 大型 使用 | 更新日期: 2023-09-27 18:03:20

CSV文件以逗号分隔,包含内嵌分隔符和引号。有些字段有开头和结尾引号,有些没有。

第一个记录被完美地处理了,但第二个记录就不行了。正如您所看到的,该字段似乎有一个前置引号,但实际上它是嵌入的。字段5没有尾引号。导入的结果在字段5和6中放置空白,并将字段5的数据(粗体)压入字段7,这在稍后的过程中会导致最大字段长度冲突。

在Filehelpers中是否有一个属性设置,我可以用它来处理包含下面粗体字母字段的记录,以便记录正确导入每个字段?CSV文件是从外部源接收的,因此我无法控制提要。

AT2M-2471-3,,"1178",AccuTemp,48",实心砧板(必须与AT2A-2630-3或AT2A-2630-22一起订购),,ea,"10.00","0.00000","207.00","93.41","0.00","0.00","0.00","0.00","0.00","0.00","0.00","0.00","0.00","0.00","0.00","ATCUT","1",每个,"砧板,设备安装",AccuTemp,","假",,85,","0","baab3369-bcad-453e-9867-921e4af1203c",",AccuTemp,,","e0fb1dbf -c00d-dd11-a23a-00304834a8c9","bcd6e7a0-be0d-dd11-a23a-00304834a8c9","bcd6e7a0-be0d-dd11-a23a-00304834a8c9"

AT2M - 2877 - 1,"1178",AccuTemp, "U"通道连接两个29"深度扒炉,, ea、"4.00"、"0.00000"、"104.00"、"46.93"、"0.00"、"0.00"、"0.00"、"0.00",AT2M, ", " 1 ",,, AccuTemp,"、"False",85年,","0","f7d56cb1-b2ab-40c7-b7e5-55ee1b4d1023 AccuTemp, ", ", " e3fb1dfb - c00d dd11 a23a - 00304834 a8c9"、"bcd6e7a0 - be0d dd11 a23a - 00304834 a8c9"

下面是SQL表结构,没有索引:

    CREATE TABLE [dbo].[rawdata](
        [Model Number] [varchar](50) NULL,
        [User Stock Model Number] [varchar](50) NULL,
        [Vendor Number] [varchar](50) NULL,
        [Vendor Name] [varchar](50) NULL,
        [Specification] [varchar](max) NULL,
        [Vendor Pack] [varchar](50) NULL,
        [Selling Unit] [varchar](50) NULL,
        [Weight] [varchar](50) NULL,
        [Cube] [varchar](50) NULL,
        [List Price] [varchar](50) NULL,
        [Net Price] [varchar](50) NULL,
        [Height] [varchar](50) NULL,
        [Width] [varchar](50) NULL,
        [Depth] [varchar](50) NULL,
        [Deal Net] [varchar](50) NULL,
        [Picture Name] [varchar](150) NULL,
        [Blank Column] [varchar](50) NULL,
        [Vendor to Stock] [varchar](50) NULL,
        [Priced By] [varchar](50) NULL,
        [Category] [varchar](75) NULL,
        [Vendor Nickname] [varchar](50) NULL,
        [User Vendor Name] [varchar](50) NULL,
        [Configurable?] [varchar](50) NULL,
        [Category Values] [varchar](max) NULL,
        [Freight Class] [varchar](50) NULL,
        [Vendor FOB] [varchar](50) NULL,
        [Ship from Zip] [varchar](50) NULL,
        [Model Apply] [varchar](50) NULL,
        [Picture Link] [varchar](50) NULL,
        [Category Code] [varchar](50) NULL,
        [Vendor Short Name] [varchar](50) NULL,
        [Cutsheet Name] [varchar](150) NULL,
        [Cutsheet Link] [varchar](50) NULL,
        [Product ID] [varchar](50) NULL,
        [Vendor ID] [varchar](50) NULL
    ) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]

我用下面的属性为表创建了一个类:[DelimitedRecord (", "))[IgnoreFirst (1)]

class rawdata
{
    [FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
    public string Model_Number;
    [FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
    public string User_Stock_Model_Number;
    [FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
    public string Vendor_Number;
    [FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
    public string Vendor_Name;
    [FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
    public string Specification;
    [FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
    public string Vendor_Pack;
    [FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
    public string Selling_Unit;
    [FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
    public string Weight;
    [FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
    public string Cube;
    [FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
    public string List_Price;
    [FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
    public string Net_Price;
    [FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
    public string Height;
    [FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
    public string Width;
    [FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
    public string Depth;
    [FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
    public string Deal_Net;
    [FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
    public string Picture_Name;
    [FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
    public string Blank_Column;
    [FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
    public string Vendor_to_Stock;
    [FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
    public string Priced_By;
    [FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
    public string Category;
    [FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
    public string Vendor_Nickname;
    [FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
    public string User_Vendor_Name;
    [FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
    public string Configurable;
    [FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
    public string Category_Values;
    [FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
    public string Freight_Class;
    [FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
    public string Vendor_FOB;
    [FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
    public string Ship_from_Zip;
    [FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
    public string Model_Apply;
    [FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
    public string Picture_Link;
    [FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
    public string Category_Code;
    [FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
    public string Vendor_Short_Name;
    [FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
    public string Cutsheet_Name;
    [FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
    public string Cutsheet_Link;
    [FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
    public string Product_ID;
    [FieldQuoted('"', QuoteMode.OptionalForBoth)] // Optional quoted when read or write
    public string Vendor_ID;  

}

下面是c#代码:
    using System;
    using System.Collections.Generic;
    using System.Linq;
    using System.Text;
    using System.Text.RegularExpressions;
    using System.Threading.Tasks;
    using System.Data;
    using System.Data.SqlClient;
    using System.Windows.Forms;
    using FileHelpers;
    namespace XYZ
    {
        class Class1
        {
            static void Main(string[] args)
            {
                SqlConnection conn1 = new SqlConnection();
                DataTable temp_rawdata_table = new DataTable();
                conn1.ConnectionString = "Data Source=ABC;Initial Catalog=XYZ;Integrated Security=True";
                System.Diagnostics.Stopwatch elapsed = new System.Diagnostics.Stopwatch();
                elapsed.Start(); Int64 rows = 0;
                // ================ Begin BulkCopy ========================
                using (SqlBulkCopy bulkcopy = new SqlBulkCopy(conn1.ConnectionString,
                    System.Data.SqlClient.SqlBulkCopyOptions.TableLock)
                    {
                        DestinationTableName = "rawdata",
                        BulkCopyTimeout = 0,
                        BatchSize = 100000
                    })
                {
                    temp_rawdata_table = new XYZDataSet.rawdataDataTable();
                    // using the ASYNC engine allows for processing record by record
                    FileHelperAsyncEngine engine = new FileHelperAsyncEngine(typeof(rawdata));
                    engine.BeginReadFile("C:''rawdata.csv");
                    int batchsize = 0;
                    Console.WriteLine("Copying data to table.");
                    // The Async engines are IEnumerable
                    foreach (rawdata aqtext in engine)
                    {
                        //create a new update row for aq360productsraw table
                        DataRow rawdata_update_row = temp_rawdata_table.NewRow();
                        rawdata_update_row["Model Number"] = aqtext.Model_Number.Trim();
                        rawdata_update_row["User Stock Model Number"] = aqtext.User_Stock_Model_Number.Trim();
                        rawdata_update_row["Vendor Number"] = aqtext.Vendor_Number.Trim();
                        rawdata_update_row["Vendor Name"] = aqtext.Vendor_Name.Trim();
                        rawdata_update_row["Specification"] = aqtext.Specification.Trim();
                        rawdata_update_row["Vendor Pack"] = aqtext.Vendor_Pack.Trim();
                        rawdata_update_row["Selling Unit"] = aqtext.Selling_Unit.Trim();
                        rawdata_update_row["Weight"] = aqtext.Weight.Trim();
                        rawdata_update_row["Cube"] = aqtext.Cube.Trim();
                        rawdata_update_row["List Price"] = aqtext.List_Price.Trim();
                        rawdata_update_row["Net Price"] = aqtext.Net_Price.Trim();
                        rawdata_update_row["Height"] = aqtext.Height.Trim();
                        rawdata_update_row["Width"] = aqtext.Width.Trim();
                        rawdata_update_row["Depth"] = aqtext.Depth.Trim();
                        rawdata_update_row["Deal Net"] = aqtext.Deal_Net.Trim();
                        rawdata_update_row["Picture Name"] = aqtext.Picture_Name.Trim();
                        rawdata_update_row["Blank Column"] = aqtext.Blank_Column.Trim();
                        rawdata_update_row["Vendor to Stock"] = aqtext.Vendor_to_Stock.Trim();
                        rawdata_update_row["Priced By"] = aqtext.Priced_By.Trim();
                        rawdata_update_row["Category"] = aqtext.Category.Trim();
                        rawdata_update_row["Vendor Nickname"] = aqtext.Vendor_Nickname.Trim();
                        rawdata_update_row["User Vendor Name"] = aqtext.User_Vendor_Name.Trim();
                        rawdata_update_row["Configurable?"] = aqtext.Configurable.Trim();
                        rawdata_update_row["Category Values"] = aqtext.Category_Values.Trim();
                        rawdata_update_row["Freight Class"] = aqtext.Freight_Class.Trim();
                        rawdata_update_row["Vendor FOB"] = aqtext.Vendor_FOB.Trim();
                        rawdata_update_row["Ship from Zip"] = aqtext.Ship_from_Zip.Trim();
                        rawdata_update_row["Model Apply"] = aqtext.Model_Apply.Trim();
                        rawdata_update_row["Picture Link"] = aqtext.Picture_Link.Trim();
                        rawdata_update_row["Category Code"] = aqtext.Category_Code.Trim();
                        rawdata_update_row["Vendor Short Name"] = aqtext.Vendor_Short_Name.Trim();
                        rawdata_update_row["Cutsheet Name"] = aqtext.Cutsheet_Name.Trim();
                        rawdata_update_row["Cutsheet Link"] = aqtext.Cutsheet_Link.Trim();
                        rawdata_update_row["Product ID"] = aqtext.Product_ID.Trim();
                        rawdata_update_row["Vendor ID"] = aqtext.Vendor_ID.Trim();

                        temp_rawdata_table.Rows.Add(rawdata_update_row);
                        batchsize += 1;
                        if (batchsize == 100000)
                        {
                            bulkcopy.WriteToServer(temp_rawdata_table);
                            temp_rawdata_table.Rows.Clear();
                            batchsize = 0;
                            Console.WriteLine("Flushing 100,000 rows");
                        }
                        rows += 1;
                        Console.WriteLine(rows.ToString() + "    " + aqtext.Model_Number.Trim() + Environment.NewLine);
                    }

                    bulkcopy.WriteToServer(temp_rawdata_table);
                    temp_rawdata_table.Rows.Clear();
                    engine.Close();
                }
                elapsed.Stop();
                Console.WriteLine((rows + " records imported in " +  elapsed.Elapsed.TotalSeconds + " seconds."));
            }
        }
    }

使用Filehelpers将大型CSV文件导入SQL2014表

这个问题,正如@MarcosMeli也提到的,是这是一个无效的CSV文件。而且不仅仅是这一个领域。即使是你认为行得通的那排也不是真的行得通。无论谁创建了这个CSV文件,似乎都在哪些字段应该是文本限定的方面做了反向操作。"引用"),哪些不需要它。它们有文本限定的数字字段和非文本限定的文本字段。

第1行工作的原因是文本限定查看字段的第一个和最后一个字符。在第1行中,转义引号(即双引号)不是第一个字符,因此我怀疑它被导入为重复的双引号。然而,在第2行中,该字段的开头文本被引用,因此第一个字符是引号,然后通过重复双引号进行转义。它做得非常草率,即使现在让FileHelpers使用它,也不能给它继续正常工作的信心,特别是如果非文本限定的文本字段中曾经有逗号。在这种情况下,它会再次引起意想不到的变化。我知道您说CSV文件来自外部源,您无法控制它,但您确实需要尝试修复它,因为它是完全错误的。这是一个错误,在任何系统产生它,它需要修复。

目前您可以将所有文本字段设置为非文本限定的。但是,您可能需要添加一个步骤,将所有双引号替换为单个双引号。


除了数据格式问题,没有从FileHelpers拿走任何东西,因为它看起来像一个有趣和有用的库,我要说的是,你不需要 FileHelpers来读取文本文件,逐行(最小的内存占用)并将其批处理到SQL Server中。事实上,你可以做所有这些加上:
  • 跳过使用单独的staging表(即[rawdata])的步骤,而是直接将行发送到同步存储过程
  • 在应用层进行基本的数据类型验证,并发送强类型的数据行(而不是在所有VARCHAR/NVARCHAR字段中传递)。

如何?通过使用表值参数,通过使用IEnumerable<SqlDataRecord>方法传递它(而不是 DataTable方法)。我在这里的几个答案中详细介绍了该技术:

  • 如何在最短的时间内插入1000万条记录?
  • 通过Dictionary<字符串,int>到存储过程T-SQL
  • 如何将一个有1亿个数据的大表拆分为多个表?

问题是CSV无效,只有当字段被引用时才能转义引号

的值:

,""U"" channel for connecting two 29"" A Depth griddles,
要正确解析

,必须是

,"""U"" channel for connecting two 29"" A Depth griddles",

如何从规格字段中删除fieldquotes ?

  public string Specification;