使用SAX方法从行中获取所有单元格
本文关键字:获取 单元格 SAX 方法 使用 | 更新日期: 2023-09-27 18:11:01
我刚刚使用Open XML SDK的DOM方法读取了一个大的xlsx文件。它工作得很好;然而,这样做需要很长时间。所以我想用SAX方法做同样的事情。然而,我在这个问题上没有任何进展。我在DOM方法中所做的是,对于工作簿中的每个工作表,我获得工作表的名称。然后假设第一行包含所有的列名。接下来,我创建一个动态类,其中包含第一行中列出的所有属性。在那之后,我读剩下的行。对于每一行,我使用动态创建的自定义类创建一个新对象。然后遍历行中的每个单元格,用得到的值填充对象。
下面是我用来使用DOM方法完成刚才描述的任务的代码。
public static List<Object> ConvertExcelArchiveToListObjects(string filePath)
{
...
using (SpreadsheetDocument spreadsheetDocument = SpreadsheetDocument.Open(filePath, false))
{
WorkbookPart wbPart = spreadsheetDocument.WorkbookPart;
Sheets theSheets = wbPart.Workbook.Sheets;
SharedStringTablePart sstPart = spreadsheetDocument.WorkbookPart.GetPartsOfType<SharedStringTablePart>().FirstOrDefault();
...
var sheets = wbPart.Workbook.Sheets.Cast<Sheet>().ToList();
foreach (WorksheetPart worksheetpart in wbPart.WorksheetParts)
{
Worksheet worksheet = worksheetpart.Worksheet;
string partRelationshipId = wbPart.GetIdOfPart(worksheetpart);
var correspondingSheet = sheets.FirstOrDefault(
s => s.Id.HasValue && s.Id.Value == partRelationshipId);
Debug.Assert(correspondingSheet != null);
// Grab the sheet name
string sheetName = correspondingSheet.GetAttribute("name", "").Value;
...
dynamic expandoObjectClass = new ExpandoObject();
List<Object> listObjectsCustomClasses = new List<Object>();
foreach (var dataRow in rowContent)
{
Type generatedType = typeBuilder.CreateType();
object generatedObject = Activator.CreateInstance(generatedType);
PropertyInfo[] properties = generatedType.GetProperties();
int propertiesCounter = 0;
// Loop over the values that we will assign to the properties
var rowCells = dataRow.Descendants<Cell>();
var value = string.Empty;
foreach (var rowCell in rowCells)
{
if (rowCell.DataType != null
&& rowCell.DataType.HasValue
&& rowCell.DataType == CellValues.SharedString
&& int.Parse(rowCell.CellValue.InnerText) < ssTable.ChildElements.Count)
{
value = ssTable.ChildElements[int.Parse(rowCell.CellValue.InnerText)].InnerText ?? string.Empty;
}
else
{
if (rowCell.CellValue != null && rowCell.CellValue.InnerText != null)
{
value = rowCell.CellValue.InnerText;
}
else
{
value = string.Empty;
}
}
properties[propertiesCounter].SetValue(generatedObject, value, null);
propertiesCounter++;
}
listObjectsCustomClasses.Add(generatedObject);
}
listObjects.Add(listObjectsCustomClasses);
}
}
DateTime end = DateTime.UtcNow;
Console.WriteLine("Measured time: " + (end - begin).TotalMinutes + " minutes.");
return listObjects;
}
但是,每当我读取大的xlsx文件(大小大于30 MB)时,上述方法都要花费大量时间来执行。我已经编写了这段代码,至少可以获得行,而不必深入挖掘每行中的单元格。
public static List<Object> ConvertExcelArchiveToListObjectsSAXApproach(string filePath)
{
DateTime begin = DateTime.UtcNow;
List<Object> listObjects = new List<Object>();
using (SpreadsheetDocument spreadsheetDocument = SpreadsheetDocument.Open(filePath, false))
{
WorkbookPart wbPart = spreadsheetDocument.WorkbookPart;
Sheets theSheets = wbPart.Workbook.Sheets;
SharedStringTablePart sstPart = spreadsheetDocument.WorkbookPart.GetPartsOfType<SharedStringTablePart>().FirstOrDefault();
SharedStringTable ssTable = null;
if (sstPart != null)
ssTable = sstPart.SharedStringTable;
// Get the CellFormats for cells without defined data types
WorkbookStylesPart workbookStylesPart = spreadsheetDocument.WorkbookPart.GetPartsOfType<WorkbookStylesPart>().First();
CellFormats cellFormats = (CellFormats)workbookStylesPart.Stylesheet.CellFormats;
var sheets = wbPart.Workbook.Sheets.Cast<Sheet>().ToList();
foreach (WorksheetPart worksheetpart in wbPart.WorksheetParts)
{
//Worksheet worksheet = worksheetpart.Worksheet;
OpenXmlPartReader reader = new OpenXmlPartReader(worksheetpart);
bool firstRow = false;
while (reader.Read())
{
if (reader.ElementType == typeof(Row))
{
...
}
if (reader.ElementType != typeof(Worksheet)) // Dont' want to skip the contents of the worksheet
reader.Skip(); // Skip contents of any node before finding the first row.
} DateTime end = DateTime.UtcNow;
Console.WriteLine("Measured time: " + (end - begin).TotalMinutes + " minutes.");
return listObjects;
}
但是,我在
中设置的断点if (reader.ElementType == typeof(Row))
{
...
}
甚至没有被击中。你知道我错过了什么吗?谢谢你!
您看过线程中使用OpenXmlReader的代码吗?代码正在执行您想要执行的操作