处理字段不一致的文本文件
本文关键字:文本 文件 不一致 字段 处理 | 更新日期: 2023-09-27 18:13:39
供应商提供了一个带分隔符的文本文件,但该文件可以并且很可能为每个客户定制。因此,如果规范提供100个字段,我可能只接收到10个字段。
我关心的是每个循环的开销。总的来说,我使用了一个while循环和两个for循环,只是在标题中,至少会有同样多的细节。
我的回答如下:
using (StreamReader sr = new StreamReader(flName))
{
//Process first line to get field names
flHeader = sr.ReadLine().Split(charDelimiters);
//Check first field to determine header or detail file
if (flHeader[0].ToUpper() == "ORDERID")
{
header = true;
} else if (flHeader[0].ToUpper() == "ORDERITEMID"){
detail = true;
}
}
//Use TextFieldParser to read and parse files
using (TextFieldParser parser = new TextFieldParser(flName))
{
parser.TextFieldType = FieldType.Delimited;
parser.SetDelimiters(delimiters);
while (!parser.EndOfData)
{
string[] fields = parser.ReadFields();
//Send read line to header or detail processor
if (header == true)
{
if (flHeader[0] != fields[0])
{
ProcessHeader(fields);
}
}
if (detail == true)
{
if (flHeader[0] != fields[0])
{
ProcessDetail(fields);
}
}
}
//头处理器代码段
//Declare header class
Data.BLL.OrderExportHeader_BLL OrderHeaderBLL = new Data.BLL.OrderExportHeader_BLL();
foreach (string field in fields)
{
int fldCnt = fields.Count();
//Loop through each field then use the switch to determine which field is to be filled in
for (int flds = 0; flds < fldCnt; flds++ )
{
string strField = field.Trim();
switch (flHeader[flds].ToUpper())
{
case "ORDERID":
OrderHeaderBLL.OrderID = strField;
break;
}
}
}
//头文件OrderID ManufacturerID CustomerID SalesRepID PONumber OrderDate CustomerName CustomerNumber RepNumber Discount Terms ShipVia Notes ShipToCompanyName ShipToContactName ShipToContactPhone ShipToFax ShipToContactEmail ShipToAddress1 ShipToAddress2 ShipToCity ShipToState ShipToZip ShipToCountry ShipDate BillingAddress1 BillingAddress2 BillingCity BillingState BillingZip BillingCountry FreightTerm PriceLevel OrderType OrderStatus IsPlaced ContactName ContactPhone ContactEmail ContactFax Exported ExportDate Source ContainerName ContainerCubes Origin MarketName FOB SubTotal OrderTotal TaxRate TaxTotal ShippingTotal IsDeleted IsContainer OrderGUID CancelDate DoNotShipBefore WrittenByName WrittenForName WrittenForRepNumber CatalogCode CatalogName ShipToCode
491975 18 0 2621 1234 7/17/2014 RepZio 2499174 0 Test 561-351-7416 max@repzio.com 465 Ocean Ridge Way Juno Beach FL 33408 7/18/2014 465 Ocean Ridge Way Juno Beach FL 33408 USA 0 ShopZio True Max Fraser 561-351-7416 max@repzio.com False ShopZio 0.00 ShopZio 1500.0000 1500.0000 0.000 0.0000 0.0000 False False 63960a7b-86b7-47a2-ad11-9763a6b52fd0 7/31/2014 7/18/2014
你的样本数据是关键,你的样本目前是模糊的,但我认为它符合下面的描述。
从100个字段中选出10个字段。
在解析每行时,您只需要将其分成10个字段。看起来您是由空格分隔的,但您有一个问题,即字段可能包含嵌入的空白。也许您的数据实际上是制表符分隔的,在这种情况下,您可以。
为简单起见,我将假设您的100个字段名称为'fld0', 'fld1',…,"fld99"
现在,假设接收到的文件包含这个头文件fld10, fld50, fld0, fld20, fld80, fld70, fld0, fld90, fld50, fld60
和一行看起来像
的数据Alpha Bravo Charlie Delta Echo Foxtrot高尔夫酒店印度朱丽叶
。
分裂[0]="阿尔法",分裂[1]="万岁",等等。
您解析报头并发现100个字段的主列表中的索引是10,50,0等。所以你用这些索引值建立一个lookupFld数组,比如lookupFld[0] = 10, lookupFld[1] = 50,等等
现在,当您处理每行时,将其分成10个字段,并且您可以立即索引查找主字段列表中正确对应的字段。
MasterList[0] = "fld0", MasterList[1] = "fld1",…, MasterList[99] = "fld99"
for (ii=0; ii<lookupFld.count; ++ii)
{
// MasterField[lookupFld[ii]] is represented by with split[ii]
// when ii = 0
// lookupFld[0] is 10
// so MasterField[10] /* fld10 */ is represented by split[0] /* alpha */
}