使用FileHelpers.动态的,读取一个固定宽度的文件并上传到SQL

本文关键字:文件 固定宽度 SQL 一个 动态 FileHelpers 读取 使用 | 更新日期: 2023-09-27 18:03:02

好吧,我会尽我所能解释清楚的。我编写了一个应用程序,该应用程序使用SQL表来定义固定宽度的数据源结构(例如,标头、起始索引、字段长度等)。当我的应用程序运行时,它查询这个表并创建一个DataTable对象(称为finalDT),其中dataccolumn对象持有ColumnName = header。然后向这个表追加一组存在于我们使用的每个数据源中的DataColumn对象(我倾向于称之为派生列)。我还创建了一个主键字段,它是一个自动递增的整数。最初,我滚动了自己的解决方案来读取固定宽度的文件,但我试图将其转换为使用FileHelper。主要是,我希望合并它,这样我就可以访问FileHelper可以解析的其他文件类型(CSV, Excel等)。

现在,我的问题。使用FileHelper。动态的,我能够使用以下方法创建FileHelperEngine对象:
private static FileHelperEngine GetFixedWidthFileClass(bool ignore)
{
    singletonArguments sArgs = singletonArguments.sArgs;
    singletonSQL sSQL = singletonSQL.sSQL;
    List<string> remove = new List<string>();
    FixedLengthClassBuilder flcb = new FixedLengthClassBuilder(sSQL.FixedDataDefinition.DataTableName);
    flcb.IgnoreFirstLines = 1;
    flcb.IgnoreLastLines = 1;
    flcb.IgnoreEmptyLines = true;
    foreach (var dcs in sSQL.FixedDataDefinition.Columns)
    {
        flcb.AddField(dcs.header, Convert.ToInt32(dcs.length), "String");
        if (ignore && dcs.ignore)
        {
            flcb.LastField.FieldValueDiscarded = true; //If we want to ignore a column, this is how to do it.  Would like to incorporate this.
            flcb.LastField.Visibility = NetVisibility.Protected;
        }
        else
        {
            flcb.LastField.TrimMode = TrimMode.Both;
            flcb.LastField.FieldNullValue = string.Empty;
        }
    }
    return new FileHelperEngine(flcb.CreateRecordClass());
}

sSQL.FixedDataDefinition.Columns是我如何存储固定宽度数据源文件的字段定义。然后我生成一个DataTable:

DataTable dt = engine.ReadFileAsDT(file);

其中file是固定宽度文件的完整路径,engine是我保留上面所示的GetFixedWidthFileClass()方法结果的地方。现在我有一个没有主键和派生列的DataTable。此外,dt中的所有字段都标记为ReadOnly = true。这就是事情变得一团糟的地方。

我需要将dt填充到finalDT中,并且dt没有任何主键信息。如果可以发生,那么我可以使用finalDT将我的数据上传到我的SQL表中。如果这不能发生,那么我需要一种方法finalDT没有主键,但仍然上传到我的SQL表。SqlBulkCopy会允许吗?还有别的办法吗?

在这一点上,我愿意从头开始,只要我可以使用FileHelper解析固定宽度的文件和结果存储到我的SQL表,我只是没有看到路径在那里。

使用FileHelpers.动态的,读取一个固定宽度的文件并上传到SQL

我明白了。它并不漂亮,但它是这样工作的。基本上,我如何设置我的代码在我原来的帖子仍然适用,因为我在GetFixedWidthFileClass()方法没有改变任何东西。然后,我不得不添加两个方法来正确设置finalDT:

/// <summary>
///     For a given a datasource file, add all rows to the DataSet and collect Hexdump data 
/// </summary>
/// <param name="ds">
///     The <see cref="System.Data.DataSet" /> to add to 
/// </param>
/// <param name="file">
///     The datasource file to process 
/// </param>
internal static void GenerateDatasource(ref DataSet ds, ref FileHelperEngine engine, DataSourceColumnSpecs mktgidSpecs, string file)
{
    // Some singleton class instances to hold program data I will need.
    singletonSQL sSQL = singletonSQL.sSQL;
    singletonArguments sArgs = singletonArguments.sArgs;
    try
    {
        // Load a DataTable with contents of datasource file.
        DataTable dt = engine.ReadFileAsDT(file);
        // Clean up the DataTable by removing columns that should be ignored.
        DataTableCleanUp(ref dt, ref engine);
        // ReadFileAsDT() makes all of the columns ReadOnly. Fix that.
        foreach (DataColumn column in dt.Columns)
            column.ReadOnly = false;
        // Okay, now get a Primary Key and add in the derived columns.
        GenerateDatasourceSchema(ref dt);
        // Parse all of the rows and columns to do data clean up and assign some custom
        // values. Add custom values for jobID and serial columns to each row in the DataTable.
        for (int row = 0; row < dt.Rows.Count; row++)
        {
            string version = string.Empty; // The file version
            bool found = false; // Used to get out of foreach loops once the required condition is found.
            // Iterate all configured jobs and add the jobID and serial number to each row
            // based upon match.
            foreach (JobSetupDetails job in sSQL.VznJobDescriptions.JobDetails)
            {
                // Version must match id in order to update the row. Break out once we find
                // the match to save time.
                version = dt.Rows[row][dt.Columns[mktgidSpecs.header]].ToString().Trim().Split(new char[] { '_' })[0];
                foreach (string id in job.ids)
                {
                    if (version.Equals(id))
                    {
                        dt.Rows[row][dt.Columns["jobid"]] = job.jobID;
                        lock (locklist)
                            dt.Rows[row][dt.Columns["serial"]] = job.serial++;
                        found = true;
                        break;
                    }
                }
                if (found)
                    break;
            }
            // Parse all columns to do data clean up.
            for (int column = 0; column < dt.Columns.Count; column++)
            {
                // This tab character keeps showing up in the data. It should not be there,
                // but customer won't fix it, so we have to.
                if (dt.Rows[row][column].GetType() == typeof(string))
                    dt.Rows[row][column] = dt.Rows[row][column].ToString().Replace(''t', ' ');
            }
        }
        dt.AcceptChanges();
        // DataTable is cleaned up and modified. Time to push it into the DataSet.
        lock (locklist)
        {
            // If dt is writing back to the DataSet for the first time, Rows.Count will be
            // zero. Since the DataTable in the DataSet does not have the table schema and
            // since dt.Copy() is not an option (ds is referenced, so Copy() won't work), Use
            // Merge() and use the option MissingSchemaAction.Add to create the schema.
            if (ds.Tables[sSQL.FixedDataDefinition.DataTableName].Rows.Count == 0)
                ds.Tables[sSQL.FixedDataDefinition.DataTableName].Merge(dt, false, MissingSchemaAction.Add);
            else
            {
                // If this is not the first write to the DataSet, remove the PrimaryKey
                // column to avoid duplicate key values. Use ImportRow() rather then .Merge()
                // since, for whatever reason, Merge() is overwriting ds each time it is
                // called and ImportRow() is actually appending the row. Ugly, but can't
                // figure out another way to make this work.
                dt.PrimaryKey = null;
                dt.Columns.Remove(dt.Columns[0]);
                foreach (DataRow dr in dt.Rows)
                    ds.Tables[sSQL.FixedDataDefinition.DataTableName].ImportRow(dr);
            }
            // Accept all the changes made to the DataSet.
            ds.Tables[sSQL.FixedDataDefinition.DataTableName].AcceptChanges();
        }
        // Clean up memory.
        dt.Clear();
        // Log my progress.
        log.GenerateLog("0038", log.Info
                        , engine.TotalRecords.ToString() + " DataRows successfully added for file:'r'n't"
                        + file + "'r'nto DataTable "
                        + sSQL.FixedDataDefinition.DataTableName);
    }
    catch (Exception e)
    {
        // Something bad happened here.
        log.GenerateLog("0038", log.Error, "Failed to add DataRows to DataTable "
                        + sSQL.FixedDataDefinition.DataTableName
                        + " for file'r'n't"
                        + file, e);
    }
    finally
    {
        // Successful or not, get rid of the datasource file to prevent other issues.
        File.Delete(file);
    }
}

这个方法:

/// <summary>
///     Deletes columns that are not needed from a given <see cref="System.Data.DataTable" /> reference.
/// </summary>
/// <param name="dt">
///     The <see cref="System.Data.DataTable" /> to delete columns from. 
/// </param>
/// <param name="engine">
///     The <see cref="FileHelperEngine" /> object containing data field usability information. 
/// </param>
private static void DataTableCleanUp(ref DataTable dt, ref FileHelperEngine engine)
{
    // Tracks DataColumns I need to remove from my temp DataTable, dt.
    List<DataColumn> removeColumns = new List<DataColumn>();
    // If a field is Discarded, then the data was not imported because we don't need this
    // column. In that case, mark the column for deletion by adding it to removeColumns.
    for (int i = 0; i < engine.Options.Fields.Count; i++)
        if (engine.Options.Fields[i].Discarded)
            removeColumns.Add(dt.Columns[i]);
    // Reverse the List so changes to dt don't generate schema errors.
    removeColumns.Reverse();
    // Do the deletion.
    foreach (DataColumn column in removeColumns)
        dt.Columns.Remove(column);
    // Clean up memory.
    removeColumns.Clear();
}

基本上,由于ds (finalDT所在的数据集)在GenerateDatasource方法中被引用,我不能使用dt.Copy()将数据推入其中。我必须使用Merge()来做到这一点。然后,在我想使用Merge()的地方,我不得不使用foreach循环和ImportRow(),因为Merge()覆盖了finalDT

我必须修复的其他问题是:

  1. 当我使用ImportRow()时,我还需要从dt中删除PrimaryKey,否则我会得到关于重复密钥的错误。
  2. FileHelperEngineFileHelpers.Dynamic.FixedLengthClassBuilder有跳过我想忽略的列的问题。它要么根本不承认它们,从而杀死我的列偏移量,随后,在数据源文件中读取数据的准确性(使用FieldHidden选项),要么读取它们并以任何方式创建列,但不加载数据(使用FieldValueDiscardedVisibility.Private.Protected选项)。这对我来说意味着我必须在调用engine.ReadFileAsDT(file)之后迭代dt,并删除标记为Discarded的列。
  3. 由于FileHelper对我的PrimaryKey列或在处理过程中添加到所有数据源的其他派生列一无所知,因此我必须将dt传递给方法(GenerateDatasourceSchema())以对其进行排序。该方法基本上只是添加这些列,并确保PrimaryKey是第一列。

其余的代码是修复我需要做的列和行。在某些情况下,我为每行设置一列的值,在其他情况下,我正在清理原始数据中的错误(因为它来自我的客户)。

这不是很好,我希望以后能找到更好的方法。如果有人知道我是怎么做到的,我很乐意听听。