处理字段不一致的文本文件

本文关键字:文本 文件 不一致 字段 处理 | 更新日期: 2023-09-27 18:13:39

供应商提供了一个带分隔符的文本文件,但该文件可以并且很可能为每个客户定制。因此,如果规范提供100个字段,我可能只接收到10个字段。

我关心的是每个循环的开销。总的来说,我使用了一个while循环和两个for循环,只是在标题中,至少会有同样多的细节。

我的回答如下:

        using (StreamReader sr = new StreamReader(flName))
        {
            //Process first line to get field names
            flHeader = sr.ReadLine().Split(charDelimiters);
            //Check first field to determine header or detail file
            if (flHeader[0].ToUpper() == "ORDERID")
            {
                header = true;
            } else if (flHeader[0].ToUpper() == "ORDERITEMID"){
                detail = true;
            }
        }
        //Use TextFieldParser to read and parse files
        using (TextFieldParser parser = new TextFieldParser(flName))
        {
            parser.TextFieldType = FieldType.Delimited;
            parser.SetDelimiters(delimiters);
            while (!parser.EndOfData)
            {
                string[] fields = parser.ReadFields();
                //Send read line to header or detail processor
                if (header == true)
                {
                    if (flHeader[0] != fields[0])
                    {
                        ProcessHeader(fields);
                    }
                }
                if (detail == true)
                {
                    if (flHeader[0] != fields[0])
                    {
                        ProcessDetail(fields);
                    }
                }
            }

//头处理器代码段

        //Declare header class
        Data.BLL.OrderExportHeader_BLL OrderHeaderBLL = new Data.BLL.OrderExportHeader_BLL();
        foreach (string field in fields)
        {
            int fldCnt = fields.Count();
            //Loop through each field then use the switch to determine which field is to be filled in
            for (int flds = 0; flds < fldCnt; flds++ )
            {
                string strField = field.Trim();
                switch (flHeader[flds].ToUpper())
                {
                    case "ORDERID":
                        OrderHeaderBLL.OrderID = strField;
                        break;
                 }
             }
          }

//头文件
OrderID ManufacturerID  CustomerID  SalesRepID  PONumber    OrderDate   CustomerName    CustomerNumber  RepNumber   Discount    Terms   ShipVia Notes   ShipToCompanyName   ShipToContactName   ShipToContactPhone  ShipToFax   ShipToContactEmail  ShipToAddress1  ShipToAddress2  ShipToCity  ShipToState ShipToZip   ShipToCountry   ShipDate    BillingAddress1 BillingAddress2 BillingCity BillingState    BillingZip  BillingCountry  FreightTerm PriceLevel  OrderType   OrderStatus IsPlaced    ContactName ContactPhone    ContactEmail    ContactFax  Exported    ExportDate  Source  ContainerName   ContainerCubes  Origin  MarketName  FOB SubTotal    OrderTotal  TaxRate TaxTotal    ShippingTotal   IsDeleted   IsContainer OrderGUID   CancelDate  DoNotShipBefore WrittenByName   WrittenForName  WrittenForRepNumber CatalogCode CatalogName ShipToCode
491975  18  0   2621    1234    7/17/2014   RepZio  2499174     0           Test            561-351-7416        max@repzio.com  465 Ocean Ridge Way     Juno Beach  FL  33408       7/18/2014   465 Ocean Ridge Way     Juno Beach  FL  33408   USA     0       ShopZio True    Max Fraser  561-351-7416    max@repzio.com      False       ShopZio     0.00        ShopZio     1500.0000   1500.0000   0.000   0.0000  0.0000  False   False   63960a7b-86b7-47a2-ad11-9763a6b52fd0    7/31/2014   7/18/2014                       

处理字段不一致的文本文件

你的样本数据是关键,你的样本目前是模糊的,但我认为它符合下面的描述。

从100个字段中选出10个字段。

在解析每行时,您只需要将其分成10个字段。看起来您是由空格分隔的,但您有一个问题,即字段可能包含嵌入的空白。也许您的数据实际上是制表符分隔的,在这种情况下,您可以。

为简单起见,我将假设您的100个字段名称为'fld0', 'fld1',…,"fld99"

现在,假设接收到的文件包含这个头文件

fld10, fld50, fld0, fld20, fld80, fld70, fld0, fld90, fld50, fld60

和一行看起来像

的数据

Alpha Bravo Charlie Delta Echo Foxtrot高尔夫酒店印度朱丽叶

分裂[0]="阿尔法",分裂[1]="万岁",等等。

您解析报头并发现100个字段的主列表中的索引是10,50,0等。所以你用这些索引值建立一个lookupFld数组,比如lookupFld[0] = 10, lookupFld[1] = 50,等等

现在,当您处理每行时,将其分成10个字段,并且您可以立即索引查找主字段列表中正确对应的字段。

MasterList[0] = "fld0", MasterList[1] = "fld1",…, MasterList[99] = "fld99"

for (ii=0; ii<lookupFld.count; ++ii)
{
    // MasterField[lookupFld[ii]] is represented by with split[ii]
    // when ii = 0
    // lookupFld[0] is 10 
    // so MasterField[10] /* fld10 */ is represented by split[0] /* alpha */
}