我可以使用 HtmlAgilityPack 在某个标签上拆分 HTML 文档吗?

本文关键字:拆分 HTML 文档 标签 可以使 HtmlAgilityPack 我可以 | 更新日期: 2023-09-27 17:56:15

例如,我有一堆<tr>标签想要收集。我需要将这些标签中的每一个拆分为单独的元素,以便于我的解析。

这可能吗?

标记的示例:

<tr class="first-in-year">
  <td class="year">2011</td>
  <td class="img"><a href="/battlefield-3/61-27006/"><img src=
  "http://media.giantbomb.com/uploads/6/63038/1700748-bf3_thumb.jpg" alt=""></a></td>
  <td class="title">
    <a href="/battlefield-3/61-27006/">Battlefield 3</a>
    <p class="deck">Battlefield 3 is DICE's next installment in the franchise and
    will be on PC, PS3 and Xbox 360. The game will feature jets, prone, a
    single-player and co-op campaign, and 64-player multiplayer (on PC). It's due out
    in Fall of 2011.</p>
  </td>
  <td class="date">Expected: Q4 2011</td>
  <td><a href="/pc/60-94/" class="PC">PC</a>, <a href="/xbox-360/60-20/" class=
  "X360">X360</a>, <a href="/playstation-3/60-35/" class="PS3">PS3</a></td>
</tr>
<tr>
  <td class="year"></td>
  <td class="img"><a href="/forza-motorsport-4/61-33400/"><img src=
  "http://media.giantbomb.com/uploads/0/1992/1654849-forza4_thumb.jpg" alt=
  ""></a></td>
  <td class="title">
    <a href="/forza-motorsport-4/61-33400/">Forza Motorsport 4</a>
    <p class="deck">The next installment of Turn 10's racing franchise slated for
    release in Fall 2011. It is set to feature 16 player online races, dynamic race
    conditions, cars from over 80 manufacturers, and compatibility with Kinect, both
    on and off the racetrack.</p>
  </td>
  <td class="date">Expected: Oct 2011</td>
  <td><a href="/xbox-360/60-20/" class="X360">X360</a></td>
</tr>
<tr>
  <td class="year"></td>
  <td class="img"><a href="/max-payne-3/61-23398/"><img src=
  "http://media.giantbomb.com/uploads/0/1400/938434-custom_1237811317319_mp3_poster_thumb.jpg"
  alt=""></a></td>
  <td class="title">
    <a href="/max-payne-3/61-23398/">Max Payne 3</a>
    <p class="deck">The long awaited third instalment in Remedy's beloved series, in
    which an aging Max Payne faces one final chance to redeem himself.</p>
  </td>
  <td class="date">Expected: 2011</td>
  <td><a href="/pc/60-94/" class="PC">PC</a>, <a href="/playstation-3/60-35/" class=
  "PS3">PS3</a>, <a href="/xbox-360/60-20/" class="X360">X360</a></td>
</tr>

因此,对于这个例子,我将有三个元素。:)

我可以使用 HtmlAgilityPack 在某个标签上拆分 HTML 文档吗?

如果这是你的意思,你不能在标签上将其拆分为多个 HTML 文档。您可以选择单个 TD 元素并单独解析它们。

XPath 选择器//td将选择可以传递到分析方法中的所有元素。

HtmlAgilityPack.HtmlDocument doc = LoadHtmlHowever();
doc.DocumentNode.SelectNodes("//td");