iTextSharp HTMLWorker.ParseToList() throws NullReferenceExce

本文关键字:throws NullReferenceExce HTMLWorker ParseToList iTextSharp | 更新日期: 2023-09-27 18:18:18

我使用iTextSharp v.4来合并一大堆html文件。它工作得很好,直到我需要升级到iTextSharp的v.5。

问题来了,当我传递一个流阅读器(读取html文件的内容)到HTMLWorker对象的ParseToList方法。它抛出一个空引用异常。在调试它时,我可以访问streamReader,并可以确认读取文件的正确内容。

代码如下:

List<IElement> objects;
try
{
    objects = HTMLWorker.ParseToList(new StringReader(htmlString), null);
}
catch (Exception e)
{
    htmlString = "<html><head></head><body><br/><br/><h2 style='color:#FF0000'>ERROR READING FILE!</h2><h3>File Excluded From Stitched Document!</h3><br/><br/><p>There was an error while trying to read the following file:</p><p><span style='color:#FF0000'>" + fileName + "</span></p></body></html>";
    objects = HTMLWorker.ParseToList(new StringReader(htmlString), null);
}

在catch块中,您将看到我随后使用几乎相同的代码向pdf添加文本,以说明存在问题。这段代码运行良好。这当然让我认为问题在于原始html字符串的内容,所以这里是字符串的内容,因为它是立即传递到解析器之前:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
    <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
    <meta http-equiv="Pragma" content="no-cache" />
    <meta http-equiv="cache-control" content="no-cache" />
</head>
<body style="font-family: Arial, Helvetica, sans-serif; font-size: 1em; margin: 0;
    padding: 0;">
    <div style="font-size: 1em; line-height: 1.25em; width: 190mm;">
        <h1 style="font-size: 1.5em; font-weight: bold; margin: 0 0 1.5em 0; text-align: center;">
            Advice Item 1</h1>
        <table border="0" style="width: 190mm; border-collapse: collapse; margin: 0 0 1.5em 0;
            width: 100%;">
            <tbody>
                <tr>
                    <td style="width: 35mm; height: 1px; line-height: 1px; font-size: 1px;">
                        &nbsp;
                    </td>
                    <td>
                    </td>
                    <td style="width: 30mm; height: 1px; line-height: 1px; font-size: 1px;">
                        &nbsp;
                    </td>
                    <td>
                    </td>
                </tr>
                <tr>
                    <td colspan="4" style="font-weight: bold;">
                        <span id="litPatchedToCC" style="text-align: right; font-weight: bold;"></span>
                    </td>
                </tr>
                <tr>
                    <th scope="row" style="text-align: right; font-weight: normal; padding: 2px 5px;">
                        By:
                    </th>
                    <td style="font-weight: bold; padding: 2px 5px;">
                        ABC
                    </td>
                    <th scope="row" style="text-align: right; font-weight: normal; padding: 2px 5px;">
                        From:
                    </th>
                    <td style="font-weight: bold; padding: 2px 5px;">
                        CC
                    </td>
                </tr>
                <tr>
                    <th scope="row" style="text-align: right; font-weight: normal; padding: 2px 5px;">
                        Date:
                    </th>
                    <td style="font-weight: bold; padding: 2px 5px;">
                        29/03/2011 13:35
                    </td>
                    <th scope="row" style="text-align: right; font-weight: normal; padding: 2px 5px;">
                        To:
                    </th>
                    <td style="font-weight: bold; padding: 2px 5px;">
                        Member Practice
                    </td>
                </tr>
                <tr>
                    <th scope="row" style="text-align: right; font-weight: normal; padding: 2px 5px;">
                        Folder:
                    </th>
                    <td style="font-weight: bold; padding: 2px 5px;">
                        A15-123456
                    </td>
                    <th scope="row" style="text-align: right; font-weight: normal; padding: 2px 5px;">
                        Individual:
                    </th>
                    <td style="font-weight: bold; padding: 2px 5px;">
                        Miss A B Test
                    </td>
                </tr>
                <tr>
                    <td colspan="2">
                        <hr width="100%" />
                    </td>
                    <th scope="row" style="text-align: right; font-weight: normal; padding: 2px 5px;">
                        Of:
                    </th>
                    <td style="font-weight: bold; padding: 2px 5px;">
                        Lorem &amp; Ipsum
                    </td>
                </tr>
                <tr>
                    <th scope="row" style="text-align: right; font-weight: normal; padding: 2px 5px;">
                        Species:
                    </th>
                    <td style="font-weight: bold; padding: 2px 5px;">
                        Bovine
                    </td>
                    <th scope="row" style="text-align: right; font-weight: normal; padding: 2px 5px;">
                        Position:
                    </th>
                    <td style="font-weight: bold; padding: 2px 5px;">
                        Member
                    </td>
                </tr>
                <tr>
                    <th scope="row" style="text-align: right; font-weight: normal; padding: 2px 5px;">
                        Item Type:
                    </th>
                    <td style="font-weight: bold; padding: 2px 5px;">
                    </td>
                    <th scope="row" style="text-align: right; font-weight: normal; padding: 2px 5px;">
                        Tel:
                    </th>
                    <td style="font-weight: bold; padding: 2px 5px;">
                        0123 01234
                    </td>
                </tr>
                <tr>
                    <th scope="row" style="text-align: right; font-weight: normal; padding: 2px 5px;">
                    </th>
                    <td style="font-weight: bold; padding: 2px 5px;">
                    </td>
                    <th scope="row" style="text-align: right; font-weight: normal; padding: 2px 5px;">
                        Other Nos:
                    </th>
                    <td style="font-weight: bold; padding: 2px 5px;">
                    </td>
                </tr>
                <tr>
                    <th scope="row" style="text-align: right; font-weight: normal; padding: 2px 5px;">
                        Reason For Call:
                    </th>
                    <td colspan="3" style="font-weight: bold; padding: 2px 5px;">
                        Some Reason
                    </td>
                </tr>
                <tr>
                    <th scope="row" style="text-align: right; font-weight: normal; padding: 2px 5px;">
                        Subject:
                    </th>
                    <td colspan="3" style="font-weight: bold; padding: 2px 5px;">
                        Some problem.
                    </td>
                </tr>
                <tr>
                    <th scope="row" style="text-align: right; font-weight: normal; padding: 2px 5px;">
                    </th>
                    <td>
                    </td>
                    <th scope="row" colspan="2" style="text-align: right; font-weight: normal; padding: 2px 5px;">
                    </th>
                    <td colspan="2">
                    </td>
                </tr>
                <tr>
                    <td style="font-size: 1.5em; font-weight: bold; text-align: center;" colspan="4">
                        Internal
                    </td>
                </tr>
                <tr>
                    <td colspan="4" style="text-align: center; padding: 2px 5px;">
                        <hr width="100%" />
                    </td>
                </tr>
            </tbody>
        </table>
        <div style="padding: 2px 5px;">
            <p>
                Here we start the discussion.</p>
            <br />
            <p>
                Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.</p>
            <br />
            <p>
                Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.</p>
        </div>
    </div>
</body>
</html>

谢谢你的帮助。hofnarwillie

iTextSharp HTMLWorker.ParseToList() throws NullReferenceExce

看起来HTMLWorker被两个<hr width="100%" />呛住了。因为你说要升级到V5。在XX版本中,开始使用XMLWorker开始解析HTML可能也不错——开发团队正在推荐这样做。(最新的HTMLWorker源代码甚至有一个小参考指出这一点)

用扩展的HTML进行测试,它可以工作,并且实现起来还不错:)

using (Document document = new Document()) {
  PdfWriter writer = PdfWriter.GetInstance(document, Response.OutputStream);
  document.Open();
  try {
    StringReader sr = new StringReader(htmlString);
    XMLWorkerHelper.GetInstance().ParseXHtml(
      writer, document, sr
    );          
  }
  catch (Exception e) {
    throw;
  }
}

在web环境中测试,因此将Response.OutputStream替换为您选择的Stream