Html Node没有按照正确的顺序显示
本文关键字:顺序 显示 Node Html | 更新日期: 2023-09-27 18:16:40
我一直在玩HTMLAgility,但无济于事,HTML的结构没有正确显示。
这是我试图阅读的HTML(简化)
<table>...</table>
你可以看到它缺少<html><head></head><body></body></html>
这是我到目前为止的代码:
HtmlDocument html = new HtmlDocument();
html.LoadHtml(HttpUtility.HtmlDecode(str_html));
//check if <html> exists. If not create <html>
var htmlNode = html.DocumentNode.SelectSingleNode("//html");
if (htmlNode == null)
{
htmlNode = html.CreateElement("html");
var htmlCollection = html.DocumentNode.ChildNodes;
htmlNode.AppendChildren(htmlCollection);
html.DocumentNode.RemoveAllChildren();
html.DocumentNode.PrependChild(htmlNode);
}
//check if <head> exists, if not create <head>
HtmlNode head = html.DocumentNode.SelectSingleNode("//head");
HtmlNode cssLink = html.DocumentNode.SelectSingleNode("//link[contains(@href, '/assets/global/css/reset.css')]");
if (head != null)
{
//if <link> does not exist, create <link> to reset.css
if (cssLink == null)
{
cssLink = html.CreateElement("link");
cssLink.SetAttributeValue("rel", "stylesheet");
cssLink.SetAttributeValue("href", Url.Content("/assets/global/css/reset.css"));
head.AppendChild(cssLink);
}
}
else
{
//
var htmlNode2 = html.DocumentNode.SelectSingleNode("//html");
head = html.CreateElement("head");
var htmlCollection = html.DocumentNode.ChildNodes;
html.DocumentNode.InnerHtml(head);
if (cssLink == null)
{
cssLink = html.CreateElement("link");
cssLink.SetAttributeValue("rel", "stylesheet");
cssLink.SetAttributeValue("href", Url.Content("/assets/global/css/reset.css"));
head.AppendChild(cssLink);
}
}
//check if <body> exists, if yes, add style='margin:0; padding:0'
HtmlNode htmlBody = html.DocumentNode.SelectSingleNode("//body");
if (htmlBody != null)
htmlBody.SetAttributeValue("style", "margin: 0; padding: 0;");
//remove <script> and <iframe> references
html.DocumentNode.Descendants()
.Where(n => n.Name == "script" || n.Name == "iframe")
.ToList()
.ForEach(n => n.Remove());
str_html = html.DocumentNode.OuterHtml;
输出如下:
<head><link rel="stylesheet" href="/assets/global/css/reset.css"></head><html><table>...</table</html>
为什么HEAD显示在<html>
前面?我也试过。appendchild。但是它产生了以下内容:
<html><table>asome stuff </table></html><head></html><link rel="stylesheet" href="/assets/global/css/reset.css">
我需要代码显示如下:
<html><head>some stuff</head><body></body></html>
任何帮助都是感激的。
谢谢。
您可以尝试将<head>
作为<html>
的子元素,例如(为了清晰起见,删除了不相关的代码):
var str_html = "<table>...</table>";
.....
if (head != null)
{
.....
}
else
{
head = html.CreateElement("head");
var htmlCollection = html.DocumentNode.ChildNodes;
htmlNode.PrependChild(head); //I only added this line to your existing code
if (cssLink == null)
{
cssLink = html.CreateElement("link");
cssLink.SetAttributeValue("rel", "stylesheet");
cssLink.SetAttributeValue("href", Url.Content("/assets/global/css/reset.css"));
head.AppendChild(cssLink);
}
}
输出顺序正确:
<html><head><link rel="stylesheet" href="/assets/global/css/reset.css"></head><table>...</table></html>