XPath - 选择两个节点之间的第一组同级

本文关键字：一组节点选择 XPath 两个之间 | 更新日期: 2023-09-27 18:35:45

在使用XPath查询C#中的一些HTML文件时遇到了一个小问题。

好的，首先这是一个示例 HTML：

<table id="theTable">
    <tbody>
        <tr class="theClass">A</tr>
        <tr class="theClass">B</tr>
        <tr>1</tr>
        <tr>2</tr>
        <tr>3</tr>
        <tr>4</tr>
        <tr>5</tr>
        <tr class="theClass">C</tr>
        <tr class="theClass">D</tr>
        <tr>6</tr>
        <tr>7</tr>
        <tr>8</tr>
        <tr>9</tr>
        <tr>10</tr>
        <tr>11</tr>
        <tr>12</tr>
        <tr>13</tr>
        <tr>14</tr>
        <tr>15</tr>
        <tr class="theClass">E</tr>
        <tr class="theClass">F</tr>
        <tr>16</tr>
        <tr>17</tr>
        <tr>18</tr>
        <tr>19</tr>
        <tr>20</tr>
        <tr>21</tr>
        <tr>22</tr>
    </tbody>
</table>

现在，我试图做的是只获取B和C节点（1，2，3，4，5）之间的那些元素。

这是我到目前为止尝试的：

using System;
using System.Xml.XPath;
namespace Test
{
    class Test
    {
        static void Main(string[] args)
        {
            XPathDocument doc = new XPathDocument("Test.xml");
            XPathNavigator nav = doc.CreateNavigator();
            Console.WriteLine(nav.Select("//table[@id='theTable']/tbody/tr[preceding-sibling::tr[@class='theClass'] and following-sibling::tr[@class='theClass']]").Count);
            Console.WriteLine(nav.Select("//table[@id='theTable']/tbody/tr[preceding-sibling::tr[@class='theClass'][2] and following-sibling::tr[@class='theClass'][4]]").Count);
            Console.ReadKey(true);
        }
    }
}

此代码在上述 HTML 上运行，输出 19 和 5。因此，只有第二个 XPath 表达式有效，但这仅仅是因为它搜索具有两个元素的元素，其中 class=theClass 在它们之前，在它们之后有 4

。

我的问题现在开始。我想编写一个表达式，该表达式将仅返回 <td class="theClass"></td> 标记之后的第一组元素，无论后面有多少组。

如果我在这个 HTML 上运行我的代码

<table id="theTable">
    <tbody>
        <tr class="theClass">A</tr>
        <tr class="theClass">B</tr>
        <tr>1</tr>
        <tr>2</tr>
        <tr>3</tr>
        <tr>4</tr>
        <tr>5</tr>
        <tr>6</tr>
    </tbody>
</table>

它将输出 0 和 0。

所以这不好。

有人有什么想法吗？

谢谢！

XPath - 选择两个节点之间的第一组同级

现在，我试图做的是只获取那些在B节点和C节点之间

使用此单个 XPath 表达式：

   /*/*/tr[.='B']
           /following-sibling::*
             [count(.|/*/*/tr[. ='C']/preceding-sibling::*)
             =
              count(/*/*/tr[. ='C']/preceding-sibling::*)
             ]

下面是基于 XSLT 的验证：

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:template match="/">
  <xsl:copy-of select=
  "/*/*/tr[.='B']
           /following-sibling::*
             [count(.|/*/*/tr[. ='C']/preceding-sibling::*)
             =
              count(/*/*/tr[. ='C']/preceding-sibling::*)
             ]
  "/>
 </xsl:template>
</xsl:stylesheet>

将此转换应用于第一个提供的 XML 文档时：

<table id="theTable">
    <tbody>
        <tr class="theClass">A</tr>
        <tr class="theClass">B</tr>
        <tr>1</tr>
        <tr>2</tr>
        <tr>3</tr>
        <tr>4</tr>
        <tr>5</tr>
        <tr class="theClass">C</tr>
        <tr class="theClass">D</tr>
        <tr>6</tr>
        <tr>7</tr>
        <tr>8</tr>
        <tr>9</tr>
        <tr>10</tr>
        <tr>11</tr>
        <tr>12</tr>
        <tr>13</tr>
        <tr>14</tr>
        <tr>15</tr>
        <tr class="theClass">E</tr>
        <tr class="theClass">F</tr>
        <tr>16</tr>
        <tr>17</tr>
        <tr>18</tr>
        <tr>19</tr>
        <tr>20</tr>
        <tr>21</tr>
        <tr>22</tr>
    </tbody>
</table>

计算 XPath 表达式并将所选节点复制到输出：

<tr>1</tr>
<tr>2</tr>
<tr>3</tr>
<tr>4</tr>
<tr>5</tr>

解释：

在这里，我们简单地使用Kayessian公式进行节点集交集：

$ns1[count(.|$ns2) = count($ns2)]

其中我们将$ns1替换为：

 /*/*/tr[.='B']
               /following-sibling::*

我们将$ns2替换为：

/*/*/tr[. ='C']/preceding-sibling::*

第二个问题：

我的问题现在开始。我想写一个表达式，它将仅返回 <td class="theClass"></td> 标记之后的第一组元素，无论还有多少组跟随它。

同样，存在选择这些元素的单个 XPath 表达式：

   /*/*/tr[@class='theClass'
         and
           following-sibling::*[1][self::tr[not(@*)] ]
           ][1]
             /following-sibling::tr
               [not(@*)
              and
                count(preceding-sibling::tr
                       [@class='theClass'
                      and
                        following-sibling::*[1][self::tr[not(@*)] ]
                       ]
                     )
                = 1
               ]

解释：

这将选择第一个*/*/tr元素的所有后续同级tr元素（满足许多条件），该元素的class属性具有字符串值"theClass"，并且其第一个后续元素同级元素是没有属性的tr。

这些选定的tr元素还满足两个条件：1）它们没有任何属性;2）它们只有一个前面的同级tr元素，其class属性具有字符串值"theClass"。

下面是基于 XSLT 的验证：

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:template match="/">
  <xsl:copy-of select=
  "/*/*/tr[@class='theClass'
         and
           following-sibling::*[1][self::tr[not(@*)] ]
           ][1]
             /following-sibling::tr
               [not(@*)
              and
                count(preceding-sibling::tr
                       [@class='theClass'
                      and
                        following-sibling::*[1][self::tr[not(@*)] ]
                       ]
                     )
                = 1
               ]
  "/>
 </xsl:template>
</xsl:stylesheet>

应用于第二个提供的 XML 文档时：

<table id="theTable">
    <tbody>
        <tr class="theClass">A</tr>
        <tr class="theClass">B</tr>
        <tr>1</tr>
        <tr>2</tr>
        <tr>3</tr>
        <tr>4</tr>
        <tr>5</tr>
        <tr>6</tr>
    </tbody>
</table>

再次输出所需和正确选择的元素：

<tr>1</tr>
<tr>2</tr>
<tr>3</tr>
<tr>4</tr>
<tr>5</tr>
<tr>6</tr>

如果您不必使用 XPath，则某些 LINQ 可能更容易正确使用，并且更具可读性。

在您的情况下，类似于以下伪代码的 Skip 和 TakeWhile 的组合可以工作：

nav.Select("//table[@id='theTable']/tbody/tr") // whatever to get list of all TR
   .Skip("theClass is B") // some condition to skip up to first node
   .TakeWhile("theClass is C"); // some condition to take upto second node.