C# 错误无法识别的分组构造

本文关键字:识别 错误 | 更新日期: 2023-09-27 18:36:46

需要帮助。为什么我得到一个 ArgumentException 是 Unhandle。错误显示Unrecognized grouping construct 。我的模式错了吗?

   WebClient client = new WebClient();
            string contents = client.DownloadString("http://site.com");
                string pattern =@"<td>'s*(?<no>'d+)'.'s*</td>'s*<td>'s*
                        <a class=""LN"" href=""[^""]*+"" 
                        onclick=""[^""]*+"">'s*+<b>(?<name>[^<]*+)
                        </b>'s*+</a>.*'s*</td>'s*+ 
                        <td align=""center"">[^<]*+</td>
                        's*+<td>'s*+(?<locations>(?:<a href=""[^""]*+"">[^<]*+</a><br />'s*+)++)</td>";
            foreach (Match match in Regex.Matches(contents, pattern, RegexOptions.IgnoreCase))
            {
                string no = match.Groups["no"].Value;
                string name = match.Groups["name"].Value;
                string locations = match.Groups["locations"].Value;
                Console.WriteLine(no+" "+name+" "+locations);
            }

C# 错误无法识别的分组构造

C#/.NET 中没有 ?P<name> 这样的东西。等效语法只是?<name> .

P命名的组语法来自PCRE/Python(Perl允许它作为扩展)。

您还需要删除所有嵌套量词(即将*+更改为*,将++更改为+)。如果你想得到完全相同的行为,你可以X*+切换到 (?>X*) ,同样地使用 ++ .

这是您的正则表达式,经过修改。我也试图评论它,但我不能保证我这样做没有破坏它。

new Regex(
@"<td>                   # a td element
    's*(?<no>'d+)'.'s*   # containing a number captured as 'no'
  </td>'s*
  <td>'s*                # followed by another td, containing
                         # an <a href=... onclick=...> exactly
      <a class=""LN"" href=""(?>[^""]*)"" onclick=""(?>[^""]*)""> 
         (?>'s*)                   # which contains
         <b>(?<name>(?>[^<]*))</b> # some text in bold captured as 'name'
         (?>'s*)
      </a>
      .*                 # and anywhere later in the document
      's*
  </td>                  # the end of a td, followed by whitespace
  (?>'s*)   
  <td align=""center"">  # after a <td align=center> containing no other elements
    (?>[^<]*)
  </td>
  (?>'s*)
  <td>                   # lastly 
    (?>'s*)
    (?<locations>        # a series of <a href=...>...</a><br/>
        (?>(?:           # captured as 'locations'
            <a href=""(?>[^""]*)"">(?>[^<]*)</a>
            <br />
            (?>'s*)
            )
        +))              # (containing at least one of these)
  </td>", RegexOptions.IgnorePatternWhitespace|RegexOptions.IgnoreCase)

但是你真的应该使用HTML敏捷包之类的东西。