htmllagilitypack解析属性

本文关键字:属性 htmllagilitypack | 更新日期: 2023-09-27 18:02:22

我正在尝试解析HTML,我不知道如何使用条件(例如类名必须是X)。我知道有很多关于敏捷包的主题,但我找不到任何有用的。

<div class="main-class">
<a href="LINK">
<img src="IMAGELINK" alt="SOMETEXT" class="image-class">
</a>
</div>
<p> bla bla </p>
<div class="main-class">
<a href="LINK">
<img src="IMAGELINK" alt="SOMETEXT" class="image-class">
</a>
</div>
<div class="main-class">
<a href="LINK">
<img src="IMAGELINK" alt="SOMETEXT" class="image-class">
</a>
<p> asd sadh awww </p>
</div>

我想得到href, src和alt为每个div类名称"main-class",这是我的代码,但它只打印"p",因为这是我唯一知道如何做的事情。

      HtmlDocument doc = new HtmlDocument();
        doc.LoadHtml(dataString);
         foreach (HtmlNode nodeItem in doc.DocumentNode.Descendants("p").ToArray())
          {
              Debug.WriteLine(nodeItem.InnerText);
          }

我正在工作的WP应用程序,其中"SelectNodes"是不支持

htmllagilitypack解析属性

使用传统的非xpath方式。

注意:检查省略的可为空的值

string dataString = "<div class='"main-class'"><a href='"LINK'"><img src='"IMAGELINK'" alt='"SOMETEXT'" class='"image-class'"></a></div><p> bla bla </p><div class='"main-class'"><a href='"LINK'"><img src='"IMAGELINK'" alt='"SOMETEXT'" class='"image-class'"></a></div><div class='"main-class'"><a href='"LINK'"><img src='"IMAGELINK'" alt='"SOMETEXT'" class='"image-class'"></a><p> asd sadh awww </p></div>";
var doc = new HtmlDocument();
doc.LoadHtml(dataString);
var elements = doc.DocumentNode.Descendants("div").Where(o => o.GetAttributeValue("class", "") == "main-class");
foreach (var nodeItem in elements)
{
    var aTag = nodeItem.Descendants("a").First();
    var aTagHrefValue = aTag.Attributes["href"];
    var imgTag = nodeItem.Descendants("img").First();
    var imgTagSrcValue = imgTag.Attributes["src"];
    var imgTagAltValue = imgTag.Attributes["alt"];
    Console.WriteLine("a href value: {0}", aTagHrefValue.Value);
    Console.WriteLine("img src value: {0}", imgTagSrcValue.Value);
    Console.WriteLine("img alt value: {0}", imgTagAltValue.Value);
    Console.WriteLine();
}

@Orel Eraki -谢谢。我自己3分钟前做过,但我将使用你的解决方案,因为它只有一个foreach循环。总之,这是我的解决方案

     foreach (HtmlNode nodeItem in doc.DocumentNode.Descendants("div").Where(p => p.GetAttributeValue("class", "def").Equals("main-class")))
         {
             foreach (HtmlNode nodeAItem in nodeItem.Descendants("a"))
             {
                Debug.WriteLine(nodeAItem.GetAttributeValue("href", "def"));
                foreach (HtmlNode nodeIMAGEitem in nodeAItem.Descendants("img"))
                 {
                     Debug.WriteLine(nodeIMAGEitem.GetAttributeValue("src", "def"));
                     Debug.WriteLine(nodeIMAGEitem.GetAttributeValue("alt", "def"));
                 }                    
             }
          }

您可以使用LINQ

var attrs = doc.DocumentNode
               .Descendants("div")
               .Where(d => d.Attributes != null &&
                           d.Attributes.Contains("class") &&
                           d.Attributes["class"].Value.Contains("main-class"))
               .Select(d => new
               {
                   anchor = d.SelectSingleNode("a"),
                   img = d.SelectSingleNode("a") != null 
                                                 ? d.SelectSingleNode("a").SelectSingleNode("img") 
                                                 : null 
               })
               .Select(d => new
               {
                   href = d.anchor != null 
                                   ? d.anchor.GetAttributeValue("href", string.Empty) 
                                   : string.Empty,
                   imgsrc = d.img != null 
                                  ? d.img.GetAttributeValue("src", string.Empty) 
                                  : string.Empty,
                   imgalt = d.img != null 
                                  ? d.img.GetAttributeValue("alt", string.Empty) 
                                  : string.Empty
               })
               .ToList();