CSV解析器通过OLEDB解析双引号

本文关键字:OLEDB CSV | 更新日期: 2023-09-27 18:11:53

如何使用OLEDB来解析和导入CSV文件,每个单元格都用双引号括起来,因为有些行中包含逗号?我无法更改格式,因为它来自供应商。

我正在尝试以下操作,但由于IO错误而失败:

public DataTable ConvertToDataTable(string fileToImport, string fileDestination)
{
    string fullImportPath = fileDestination + @"'" + fileToImport;
    OleDbDataAdapter dAdapter = null;
    DataTable dTable = null;
    try
    {
        if (!File.Exists(fullImportPath))
            return null;
        string full = Path.GetFullPath(fullImportPath);
        string file = Path.GetFileName(full);
        string dir = Path.GetDirectoryName(full);

        //create the "database" connection string
        string connString = "Provider=Microsoft.Jet.OLEDB.4.0;"
          + "Data Source='"" + dir + "'''";"
          + "Extended Properties='"text;HDR=No;FMT=Delimited'"";
        //create the database query
        string query = "SELECT * FROM " + file;
        //create a DataTable to hold the query results
        dTable = new DataTable();
        //create an OleDbDataAdapter to execute the query
        dAdapter = new OleDbDataAdapter(query, connString);

        //fill the DataTable
        dAdapter.Fill(dTable);
    }
    catch (Exception ex)
    {
        throw new Exception(CLASS_NAME + ".ConvertToDataTable: Caught Exception: " + ex);
    }
    finally
    {
        if (dAdapter != null)
            dAdapter.Dispose();
    }
    return dTable;
}

当我使用正常的CSV时,它可以正常工作。我需要改变connString中的一些东西吗??

CSV解析器通过OLEDB解析双引号

使用专用的CSV解析器

外面有很多。一个流行的是FileHelpers,尽管有一个隐藏在Microsoft.VisualBasic.FileIO命名空间- TextFieldParser

看一下FileHelpers

您可以使用此代码:MS office required

  private void ConvertCSVtoExcel(string filePath = @"E:'nucc_taxonomy_140.csv", string tableName = "TempTaxonomyCodes")
    {
        string tempPath = System.IO.Path.GetDirectoryName(filePath);
        string strConn = @"Driver={Microsoft Text Driver (*.txt; *.csv)};Dbq=" + tempPath + @"';Extensions=asc,csv,tab,txt";
        OdbcConnection conn = new OdbcConnection(strConn);
        OdbcDataAdapter da = new OdbcDataAdapter("Select * from " + System.IO.Path.GetFileName(filePath), conn);
        DataTable dt = new DataTable();
        da.Fill(dt);
        using (SqlBulkCopy bulkCopy = new SqlBulkCopy(ConfigurationSettings.AppSettings["dbConnectionString"]))
        {
            bulkCopy.DestinationTableName = tableName;
            bulkCopy.BatchSize = 50;
            bulkCopy.WriteToServer(dt);
        }
    }

在处理CSV文件时需要考虑很多问题。无论如何从文件中提取它们,您都应该知道如何处理解析。有一些类可以让你在一定程度上做到这一点,但大多数都不能处理Excel中嵌入逗号、引号和换行符的细微差别。然而,如果你只是想解析一个像CSV一样的txt文件,那么加载Excel或MS类似乎会有很多开销。

您可以考虑的一件事是在您自己的Regex中进行解析,这也将使您的代码更加独立于平台,以防您在某些时候需要将其移植到另一个服务器或应用程序。使用regex的好处是几乎所有语言都可以访问它。也就是说,有一些很好的正则表达式模式可以处理CSV难题。下面是我的尝试,它涵盖了嵌入的逗号、引号和换行符。Regex代码/模式及解释:

http://www.kimgentes.com/worshiptech-web-tools-page/2008/10/14/regex-pattern-for-parsing-csv-files-with-embedded-commas-dou.html

试试我的答案中的代码:

在c#中读取CSV文件

 private static void Mubashir_CSVParser(string s)
        {
            // extract the fields
            Regex RegexCSVParser = new Regex(",(?=(?:[^'"]*'"[^'"]*'")*(?![^'"]*'"))");
            String[] Fields = RegexCSVParser.Split(s);
            // clean up the fields (remove " and leading spaces)
            for (int i = 0; i < Fields.Length; i++)
            {
                Fields[i] = Fields[i].TrimStart(' ', '"');
                Fields[i] = Fields[i].TrimEnd('"');// this line remove the quotes
                //Fields[i] = Fields[i].Trim();
            }
        }

只是以防有人有类似的问题,我想张贴我使用的代码。我最终使用Textparser来获取文件并解析列,但我使用递归来完成其余的工作和子字符串。

 /// <summary>
        /// Parses each string passed as a "row".
        /// This routine accounts for both double quotes
        /// as well as commas currently, but can be added to
        /// </summary>
        /// <param name="row"> string or row to be parsed</param>
        /// <returns></returns>
        private List<String> ParseRowToList(String row)
        {
            List<String> returnValue = new List<String>();
            if (row[0] == ''"')
            {// Quoted String
                if (row.IndexOf("'",") > -1)
                {// There are more columns
                    returnValue = ParseRowToList(row.Substring(row.IndexOf("'",") + 2));
                    returnValue.Insert(0, row.Substring(1, row.IndexOf("'",") - 1));
                }
                else
                {// This is the last column
                    returnValue.Add(row.Substring(1, row.Length - 2));
                }
            }
            else
            {// Unquoted String
                if (row.IndexOf(",") > -1)
                {// There are more columns
                    returnValue = ParseRowToList(row.Substring(row.IndexOf(",") + 1));
                    returnValue.Insert(0, row.Substring(0, row.IndexOf(",")));
                }
                else
                {// This is the last column
                    returnValue.Add(row.Substring(0, row.Length));
                }
            }
            return returnValue;
        }
那么Textparser的代码是:
 // string pathFile = @"C:'TestFTP'TestCatalog.txt";
            string pathFile = @"C:'TestFTP'SomeFile.csv";
            List<String> stringList = new List<String>();
            TextFieldParser fieldParser = null;
            DataTable dtable = new DataTable();
            /* Set up TextFieldParser
                *  use the correct delimiter provided
                *  and path */
            fieldParser = new TextFieldParser(pathFile);
            /* Set that there are quotes in the file for fields and or column names */
            fieldParser.HasFieldsEnclosedInQuotes = true;
            /* delimiter by default to be used first */
            fieldParser.SetDelimiters(new string[] { "," });
            // Build Full table to be imported
            dtable = BuildDataTable(fieldParser, dtable);

这是我在一个项目中使用的,解析单行数据。

    private string[] csvParser(string csv, char separator = ',')
    {
        List <string> parsed = new List<string>();
        string[] temp = csv.Split(separator);
        int counter = 0;
        string data = string.Empty;
        while (counter < temp.Length)
        {
            data = temp[counter].Trim();
            if (data.Trim().StartsWith("'""))
            {
                bool isLast = false;
                while (!isLast && counter < temp.Length)
                {
                    data += separator.ToString() + temp[counter + 1];
                    counter++;
                    isLast = (temp[counter].Trim().EndsWith("'""));
                }
            }
            parsed.Add(data);
            counter++;
        }
        return parsed.ToArray();
    }
http://zamirsblog.blogspot.com/2013/09/c-csv-parser-csvparser.html