CSV解析器通过OLEDB解析双引号
本文关键字:OLEDB CSV | 更新日期: 2023-09-27 18:11:53
如何使用OLEDB来解析和导入CSV文件,每个单元格都用双引号括起来,因为有些行中包含逗号?我无法更改格式,因为它来自供应商。
我正在尝试以下操作,但由于IO错误而失败:
public DataTable ConvertToDataTable(string fileToImport, string fileDestination)
{
string fullImportPath = fileDestination + @"'" + fileToImport;
OleDbDataAdapter dAdapter = null;
DataTable dTable = null;
try
{
if (!File.Exists(fullImportPath))
return null;
string full = Path.GetFullPath(fullImportPath);
string file = Path.GetFileName(full);
string dir = Path.GetDirectoryName(full);
//create the "database" connection string
string connString = "Provider=Microsoft.Jet.OLEDB.4.0;"
+ "Data Source='"" + dir + "'''";"
+ "Extended Properties='"text;HDR=No;FMT=Delimited'"";
//create the database query
string query = "SELECT * FROM " + file;
//create a DataTable to hold the query results
dTable = new DataTable();
//create an OleDbDataAdapter to execute the query
dAdapter = new OleDbDataAdapter(query, connString);
//fill the DataTable
dAdapter.Fill(dTable);
}
catch (Exception ex)
{
throw new Exception(CLASS_NAME + ".ConvertToDataTable: Caught Exception: " + ex);
}
finally
{
if (dAdapter != null)
dAdapter.Dispose();
}
return dTable;
}
当我使用正常的CSV时,它可以正常工作。我需要改变connString中的一些东西吗??
使用专用的CSV解析器
外面有很多。一个流行的是FileHelpers,尽管有一个隐藏在Microsoft.VisualBasic.FileIO
命名空间- TextFieldParser
。
看一下FileHelpers
您可以使用此代码:MS office required
private void ConvertCSVtoExcel(string filePath = @"E:'nucc_taxonomy_140.csv", string tableName = "TempTaxonomyCodes")
{
string tempPath = System.IO.Path.GetDirectoryName(filePath);
string strConn = @"Driver={Microsoft Text Driver (*.txt; *.csv)};Dbq=" + tempPath + @"';Extensions=asc,csv,tab,txt";
OdbcConnection conn = new OdbcConnection(strConn);
OdbcDataAdapter da = new OdbcDataAdapter("Select * from " + System.IO.Path.GetFileName(filePath), conn);
DataTable dt = new DataTable();
da.Fill(dt);
using (SqlBulkCopy bulkCopy = new SqlBulkCopy(ConfigurationSettings.AppSettings["dbConnectionString"]))
{
bulkCopy.DestinationTableName = tableName;
bulkCopy.BatchSize = 50;
bulkCopy.WriteToServer(dt);
}
}
在处理CSV文件时需要考虑很多问题。无论如何从文件中提取它们,您都应该知道如何处理解析。有一些类可以让你在一定程度上做到这一点,但大多数都不能处理Excel中嵌入逗号、引号和换行符的细微差别。然而,如果你只是想解析一个像CSV一样的txt文件,那么加载Excel或MS类似乎会有很多开销。
您可以考虑的一件事是在您自己的Regex中进行解析,这也将使您的代码更加独立于平台,以防您在某些时候需要将其移植到另一个服务器或应用程序。使用regex的好处是几乎所有语言都可以访问它。也就是说,有一些很好的正则表达式模式可以处理CSV难题。下面是我的尝试,它涵盖了嵌入的逗号、引号和换行符。Regex代码/模式及解释:
http://www.kimgentes.com/worshiptech-web-tools-page/2008/10/14/regex-pattern-for-parsing-csv-files-with-embedded-commas-dou.html试试我的答案中的代码:
在c#中读取CSV文件
private static void Mubashir_CSVParser(string s)
{
// extract the fields
Regex RegexCSVParser = new Regex(",(?=(?:[^'"]*'"[^'"]*'")*(?![^'"]*'"))");
String[] Fields = RegexCSVParser.Split(s);
// clean up the fields (remove " and leading spaces)
for (int i = 0; i < Fields.Length; i++)
{
Fields[i] = Fields[i].TrimStart(' ', '"');
Fields[i] = Fields[i].TrimEnd('"');// this line remove the quotes
//Fields[i] = Fields[i].Trim();
}
}
只是以防有人有类似的问题,我想张贴我使用的代码。我最终使用Textparser来获取文件并解析列,但我使用递归来完成其余的工作和子字符串。
/// <summary>
/// Parses each string passed as a "row".
/// This routine accounts for both double quotes
/// as well as commas currently, but can be added to
/// </summary>
/// <param name="row"> string or row to be parsed</param>
/// <returns></returns>
private List<String> ParseRowToList(String row)
{
List<String> returnValue = new List<String>();
if (row[0] == ''"')
{// Quoted String
if (row.IndexOf("'",") > -1)
{// There are more columns
returnValue = ParseRowToList(row.Substring(row.IndexOf("'",") + 2));
returnValue.Insert(0, row.Substring(1, row.IndexOf("'",") - 1));
}
else
{// This is the last column
returnValue.Add(row.Substring(1, row.Length - 2));
}
}
else
{// Unquoted String
if (row.IndexOf(",") > -1)
{// There are more columns
returnValue = ParseRowToList(row.Substring(row.IndexOf(",") + 1));
returnValue.Insert(0, row.Substring(0, row.IndexOf(",")));
}
else
{// This is the last column
returnValue.Add(row.Substring(0, row.Length));
}
}
return returnValue;
}
那么Textparser的代码是:
// string pathFile = @"C:'TestFTP'TestCatalog.txt";
string pathFile = @"C:'TestFTP'SomeFile.csv";
List<String> stringList = new List<String>();
TextFieldParser fieldParser = null;
DataTable dtable = new DataTable();
/* Set up TextFieldParser
* use the correct delimiter provided
* and path */
fieldParser = new TextFieldParser(pathFile);
/* Set that there are quotes in the file for fields and or column names */
fieldParser.HasFieldsEnclosedInQuotes = true;
/* delimiter by default to be used first */
fieldParser.SetDelimiters(new string[] { "," });
// Build Full table to be imported
dtable = BuildDataTable(fieldParser, dtable);
这是我在一个项目中使用的,解析单行数据。
private string[] csvParser(string csv, char separator = ',')
{
List <string> parsed = new List<string>();
string[] temp = csv.Split(separator);
int counter = 0;
string data = string.Empty;
while (counter < temp.Length)
{
data = temp[counter].Trim();
if (data.Trim().StartsWith("'""))
{
bool isLast = false;
while (!isLast && counter < temp.Length)
{
data += separator.ToString() + temp[counter + 1];
counter++;
isLast = (temp[counter].Trim().EndsWith("'""));
}
}
parsed.Add(data);
counter++;
}
return parsed.ToArray();
}
http://zamirsblog.blogspot.com/2013/09/c-csv-parser-csvparser.html