htmlagility包无法获取本地托管的页面

本文关键字:包无法 获取 htmlagility | 更新日期: 2023-09-27 18:28:26

我尝试运行以下代码:

 public void Init(Url rootUrl)
        {
             var web = new HtmlWeb();
            this.doc = web.Load(rootUrl.Value);
        }

具有以下参数:

{<System.Security.Policy.Url version="1">
<Url>http://localhost:85/HCM/HCM.html</Url>
</System.Security.Policy.Url>
}

并获得以下异常:Object reference not set to an instance of an object.

堆栈跟踪:

   at HtmlAgilityPack.HtmlDocument.ReadDocumentEncoding(HtmlNode node) in C:'Source'htmlagilitypack'Trunk'HtmlAgilityPack'HtmlDocument.cs:line 1916
   at HtmlAgilityPack.HtmlDocument.PushNodeEnd(Int32 index, Boolean close) in C:'Source'htmlagilitypack'Trunk'HtmlAgilityPack'HtmlDocument.cs:line 1805
   at HtmlAgilityPack.HtmlDocument.Parse() in C:'Source'htmlagilitypack'Trunk'HtmlAgilityPack'HtmlDocument.cs:line 1492
   at HtmlAgilityPack.HtmlDocument.Load(TextReader reader) in C:'Source'htmlagilitypack'Trunk'HtmlAgilityPack'HtmlDocument.cs:line 769
   at HtmlAgilityPack.HtmlDocument.Load(Stream stream, Boolean detectEncodingFromByteOrderMarks) in C:'Source'htmlagilitypack'Trunk'HtmlAgilityPack'HtmlDocument.cs:line 597
   at HtmlAgilityPack.HtmlWeb.Get(Uri uri, String method, String path, HtmlDocument doc, IWebProxy proxy, ICredentials creds) in C:'Source'htmlagilitypack'Trunk'HtmlAgilityPack'HtmlWeb.cs:line 1515
   at HtmlAgilityPack.HtmlWeb.LoadUrl(Uri uri, String method, WebProxy proxy, NetworkCredential creds) in C:'Source'htmlagilitypack'Trunk'HtmlAgilityPack'HtmlWeb.cs:line 1563
   at HtmlAgilityPack.HtmlWeb.Load(String url, String method) in C:'Source'htmlagilitypack'Trunk'HtmlAgilityPack'HtmlWeb.cs:line 1152
   at HtmlAgilityPack.HtmlWeb.Load(String url) in C:'Source'htmlagilitypack'Trunk'HtmlAgilityPack'HtmlWeb.cs:line 1107
   at Conduit.CPServices.Logic.HtmlContentMonitor.HtmlAgilityPackHtmlProvider.Init(Url rootUrl) in D:'Conduit'RnD'Server'Services'CP'CPServices'Logic'HtmlContentMonitor'Conduit.CPServices.Logic.HtmlContentMonitor'HtmlAgilityPackHtmlProvider.cs:line 22
   at Conduit.CPServices.Logic.HtmlContentMonitor.HtmlContentManager.FetchRootAndExternlContentAsByteArray(Url rootUrl) in D:'Conduit'RnD'Server'Services'CP'CPServices'Logic'HtmlContentMonitor'Conduit.CPServices.Logic.HtmlContentMonitor'HtmlContentManager.cs:line 112

htmlagility包无法获取本地托管的页面

这是HtmlAgilityPack中的一个错误,例如,如果通过<META>标记设置的文档编码无效(例如<META http-equiv="Content-Type" content="text/html; charset=8859-9">),则可能会被捕获。正如Simon Mourier所说,这是1.4.0.0中引入的一个错误。

看看类似错误的答案:HAL内部试图通过使用类似Encoding.GetEncoding("8859-9")的东西来为这个字符串获得合适的编码,这会引发一个错误。

为了避免这种情况,手动设置编码,例如:

web.Load(rootUrl.Value, Encoding.GetEncoding("iso-8859-9"));

这很可能是HtmlAgilityPack中的一个错误,很可能是由于文档中包含的HTML。

你能发布HtmlAgilityPack正在解析的HTML吗?