使用SAX方法从行中获取所有单元格

本文关键字:获取 单元格 SAX 方法 使用 | 更新日期: 2023-09-27 18:11:01

我刚刚使用Open XML SDK的DOM方法读取了一个大的xlsx文件。它工作得很好;然而,这样做需要很长时间。所以我想用SAX方法做同样的事情。然而,我在这个问题上没有任何进展。我在DOM方法中所做的是,对于工作簿中的每个工作表,我获得工作表的名称。然后假设第一行包含所有的列名。接下来,我创建一个动态类,其中包含第一行中列出的所有属性。在那之后,我读剩下的行。对于每一行,我使用动态创建的自定义类创建一个新对象。然后遍历行中的每个单元格,用得到的值填充对象。

下面是我用来使用DOM方法完成刚才描述的任务的代码。

public static List<Object> ConvertExcelArchiveToListObjects(string filePath)
    {
        ... 
using (SpreadsheetDocument spreadsheetDocument = SpreadsheetDocument.Open(filePath, false))
        {
            WorkbookPart wbPart = spreadsheetDocument.WorkbookPart;
            Sheets theSheets = wbPart.Workbook.Sheets;
            SharedStringTablePart sstPart = spreadsheetDocument.WorkbookPart.GetPartsOfType<SharedStringTablePart>().FirstOrDefault();
            ...
            var sheets = wbPart.Workbook.Sheets.Cast<Sheet>().ToList();
            foreach (WorksheetPart worksheetpart in wbPart.WorksheetParts)
            {
                Worksheet worksheet = worksheetpart.Worksheet;
                string partRelationshipId = wbPart.GetIdOfPart(worksheetpart);
                var correspondingSheet = sheets.FirstOrDefault(
                    s => s.Id.HasValue && s.Id.Value == partRelationshipId);
                Debug.Assert(correspondingSheet != null);
                // Grab the sheet name
                string sheetName = correspondingSheet.GetAttribute("name", "").Value;
                ...
                dynamic expandoObjectClass = new ExpandoObject();
                List<Object> listObjectsCustomClasses  = new List<Object>();
                foreach (var dataRow in rowContent)
                {
                    Type generatedType = typeBuilder.CreateType();
                    object generatedObject = Activator.CreateInstance(generatedType);
                    PropertyInfo[] properties = generatedType.GetProperties();
                    int propertiesCounter = 0;
                    // Loop over the values that we will assign to the properties
                    var rowCells = dataRow.Descendants<Cell>();
                    var value = string.Empty;
                    foreach (var rowCell in rowCells)
                    {
                        if (rowCell.DataType != null
                            && rowCell.DataType.HasValue
                            && rowCell.DataType == CellValues.SharedString
                            && int.Parse(rowCell.CellValue.InnerText) < ssTable.ChildElements.Count)
                        {
                            value = ssTable.ChildElements[int.Parse(rowCell.CellValue.InnerText)].InnerText ?? string.Empty;
                        }
                        else
                        {
                            if (rowCell.CellValue != null && rowCell.CellValue.InnerText != null)
                            {
                                value = rowCell.CellValue.InnerText;
                            }
                            else
                            {
                                value = string.Empty;
                            }
                        }
                        properties[propertiesCounter].SetValue(generatedObject, value, null);
                        propertiesCounter++;
                    }
                    listObjectsCustomClasses.Add(generatedObject);
                }
                listObjects.Add(listObjectsCustomClasses);
            }
        }
        DateTime end = DateTime.UtcNow;
        Console.WriteLine("Measured time: " + (end - begin).TotalMinutes + " minutes.");
        return listObjects;
    }

但是,每当我读取大的xlsx文件(大小大于30 MB)时,上述方法都要花费大量时间来执行。我已经编写了这段代码,至少可以获得行,而不必深入挖掘每行中的单元格。

public static List<Object> ConvertExcelArchiveToListObjectsSAXApproach(string filePath)
    {
        DateTime begin = DateTime.UtcNow;
        List<Object> listObjects = new List<Object>();
        using (SpreadsheetDocument spreadsheetDocument = SpreadsheetDocument.Open(filePath, false))
        {
            WorkbookPart wbPart = spreadsheetDocument.WorkbookPart;
            Sheets theSheets = wbPart.Workbook.Sheets;
            SharedStringTablePart sstPart = spreadsheetDocument.WorkbookPart.GetPartsOfType<SharedStringTablePart>().FirstOrDefault();
            SharedStringTable ssTable = null;
            if (sstPart != null)
                ssTable = sstPart.SharedStringTable;
            // Get the CellFormats for cells without defined data types
            WorkbookStylesPart workbookStylesPart = spreadsheetDocument.WorkbookPart.GetPartsOfType<WorkbookStylesPart>().First();
            CellFormats cellFormats = (CellFormats)workbookStylesPart.Stylesheet.CellFormats;
            var sheets = wbPart.Workbook.Sheets.Cast<Sheet>().ToList();
            foreach (WorksheetPart worksheetpart in wbPart.WorksheetParts)
            {
                //Worksheet worksheet = worksheetpart.Worksheet;
                OpenXmlPartReader reader = new OpenXmlPartReader(worksheetpart);
                bool firstRow = false;
                while (reader.Read())
                {
                    if (reader.ElementType == typeof(Row))
                    {
                       ...
                    }
                    if (reader.ElementType != typeof(Worksheet)) // Dont' want to skip the contents of the worksheet
                        reader.Skip(); // Skip contents of any node before finding the first row.
                }    DateTime end = DateTime.UtcNow;
        Console.WriteLine("Measured time: " + (end - begin).TotalMinutes + " minutes.");
        return listObjects;
    }
但是,我在 中设置的断点
if (reader.ElementType == typeof(Row))
                    {
                       ...
                    } 

甚至没有被击中。你知道我错过了什么吗?谢谢你!

使用SAX方法从行中获取所有单元格

您看过线程中使用OpenXmlReader的代码吗?代码正在执行您想要执行的操作