Html Node没有按照正确的顺序显示

本文关键字:顺序 显示 Node Html | 更新日期: 2023-09-27 18:16:40

我一直在玩HTMLAgility,但无济于事,HTML的结构没有正确显示。

这是我试图阅读的HTML(简化)

<table>...</table>

你可以看到它缺少<html><head></head><body></body></html>

这是我到目前为止的代码:

HtmlDocument html = new HtmlDocument();
html.LoadHtml(HttpUtility.HtmlDecode(str_html));
//check if <html> exists. If not create <html>
var htmlNode = html.DocumentNode.SelectSingleNode("//html");
if (htmlNode == null)
{
    htmlNode = html.CreateElement("html");
    var htmlCollection = html.DocumentNode.ChildNodes;
    htmlNode.AppendChildren(htmlCollection);
    html.DocumentNode.RemoveAllChildren();
    html.DocumentNode.PrependChild(htmlNode);
}
//check if <head> exists, if not create <head>
HtmlNode head = html.DocumentNode.SelectSingleNode("//head");
HtmlNode cssLink = html.DocumentNode.SelectSingleNode("//link[contains(@href, '/assets/global/css/reset.css')]");
if (head != null)
{
    //if <link> does not exist, create <link> to reset.css
    if (cssLink == null)
    {
        cssLink = html.CreateElement("link");
        cssLink.SetAttributeValue("rel", "stylesheet");
        cssLink.SetAttributeValue("href", Url.Content("/assets/global/css/reset.css"));
        head.AppendChild(cssLink);
    }
}
else
{
    //
    var htmlNode2 = html.DocumentNode.SelectSingleNode("//html");
    head = html.CreateElement("head");
    var htmlCollection = html.DocumentNode.ChildNodes;
    html.DocumentNode.InnerHtml(head);
    if (cssLink == null)
    {
        cssLink = html.CreateElement("link");
        cssLink.SetAttributeValue("rel", "stylesheet");
        cssLink.SetAttributeValue("href", Url.Content("/assets/global/css/reset.css"));
        head.AppendChild(cssLink);
    }
}
//check if <body> exists, if yes, add style='margin:0; padding:0'
HtmlNode htmlBody = html.DocumentNode.SelectSingleNode("//body");
if (htmlBody != null)
    htmlBody.SetAttributeValue("style", "margin: 0; padding: 0;");
//remove <script> and <iframe> references
html.DocumentNode.Descendants()
                .Where(n => n.Name == "script" || n.Name == "iframe")
                .ToList()
                .ForEach(n => n.Remove());
str_html = html.DocumentNode.OuterHtml;

输出如下:

<head><link rel="stylesheet" href="/assets/global/css/reset.css"></head><html><table>...</table</html>

为什么HEAD显示在<html>前面?我也试过。appendchild。但是它产生了以下内容:

<html><table>asome stuff </table></html><head></html><link rel="stylesheet" href="/assets/global/css/reset.css">

我需要代码显示如下:

<html><head>some stuff</head><body></body></html>

任何帮助都是感激的。

谢谢。

Html Node没有按照正确的顺序显示

您可以尝试将<head>作为<html>的子元素,例如(为了清晰起见,删除了不相关的代码):

var str_html = "<table>...</table>";
.....
if (head != null)
{
    .....
}
else
{
    head = html.CreateElement("head");
    var htmlCollection = html.DocumentNode.ChildNodes;
    htmlNode.PrependChild(head); //I only added this line to your existing code
    if (cssLink == null)
    {
        cssLink = html.CreateElement("link");
        cssLink.SetAttributeValue("rel", "stylesheet");
        cssLink.SetAttributeValue("href", Url.Content("/assets/global/css/reset.css"));
        head.AppendChild(cssLink);
    }
}

输出顺序正确:

<html><head><link rel="stylesheet" href="/assets/global/css/reset.css"></head><table>...</table></html>