HtmlAgilityPack issue with HTML

本文关键字:HTML with issue HtmlAgilityPack | 更新日期: 2023-09-27 18:20:38

我想获得标题、图片src和其他详细信息,但有问题

<div class="thumb-container">
    <a class="featured" title="Spectacularly " href="http://www.site.com"></a>          
    <div rel="0" id="property_image_1181140" class="thumb">
        <a title="*Want this title*" href="*http://www.wanttogetthislink.com*">
           <img style="width: 190px; height: 127px; left: -11px; top: 0px;" alt="Spectacularly upgraded 5 bed Family Villa For Sale" src="http://c1369013.r13.cf3.rackcdn.com/1181140-1-mini.jpg">
        </a>
    </div>
<div class="description-listing">
   <div class="heading">
      <div class="type">
         <label>*5,900* sq.ft.,</label>
         <span>*Villa*</span>
         <p class="bedroom"><em>*5*</em></p>
         <p class="bathroom"><em>*6*</em></p>
      </div>
      <p class="amount">
         <label>AED</label>
         <strong>*5,120,000*</strong>
      </p>
   </div>

这是我的代码

 var allCarResults = rootNode.SelectNodes("//div[normalize-space(@class)='general-listing']");
 foreach (var carResult in allCarResults)
 {
     var dataNode = carResult.SelectSingleNode(".//div[@class='thumb']");
     var carNameNode = dataNode.SelectSingleNode(".//a");
 }

在这里,我想在**中获取所有内容

我不知道该怎么做。。

HtmlAgilityPack issue with HTML

原理基本相同,您需要为每个项目编写一个XPath,并从一个公共锚点中选择它:

HtmlNode thumbContainer = doc.DocumentNode.SelectSingleNode("//div[@class='thumb-container']");
HtmlNode link = thumbContainer.SelectSingleNode("./div[@class='thumb']/a");
string linkTitle = link.Attributes["title"].Value;
string linkHref = link.Attributes["href"].Value;
HtmlNode label = thumbContainer.SelectSingleNode("./div[@class='description-listing']/div[@class='heading']/div[@class='type']/label");
string labelText = label.InnerText;
// ... Similar for other items

或者,您可以遍历每个HtmlNode及其子项,然后针对每个项将其与您要查找的项列表进行匹配。