C# 错误无法识别的分组构造
本文关键字:识别 错误 | 更新日期: 2023-09-27 18:36:46
需要帮助。为什么我得到一个 ArgumentException 是 Unhandle。错误显示Unrecognized grouping construct
。我的模式错了吗?
WebClient client = new WebClient();
string contents = client.DownloadString("http://site.com");
string pattern =@"<td>'s*(?<no>'d+)'.'s*</td>'s*<td>'s*
<a class=""LN"" href=""[^""]*+""
onclick=""[^""]*+"">'s*+<b>(?<name>[^<]*+)
</b>'s*+</a>.*'s*</td>'s*+
<td align=""center"">[^<]*+</td>
's*+<td>'s*+(?<locations>(?:<a href=""[^""]*+"">[^<]*+</a><br />'s*+)++)</td>";
foreach (Match match in Regex.Matches(contents, pattern, RegexOptions.IgnoreCase))
{
string no = match.Groups["no"].Value;
string name = match.Groups["name"].Value;
string locations = match.Groups["locations"].Value;
Console.WriteLine(no+" "+name+" "+locations);
}
C#/.NET 中没有 ?P<name>
这样的东西。等效语法只是?<name>
.
P
命名的组语法来自PCRE/Python(Perl允许它作为扩展)。
您还需要删除所有嵌套量词(即将*+
更改为*
,将++
更改为+
)。如果你想得到完全相同的行为,你可以X*+
切换到 (?>X*)
,同样地使用 ++
.
这是您的正则表达式,经过修改。我也试图评论它,但我不能保证我这样做没有破坏它。
new Regex(
@"<td> # a td element
's*(?<no>'d+)'.'s* # containing a number captured as 'no'
</td>'s*
<td>'s* # followed by another td, containing
# an <a href=... onclick=...> exactly
<a class=""LN"" href=""(?>[^""]*)"" onclick=""(?>[^""]*)"">
(?>'s*) # which contains
<b>(?<name>(?>[^<]*))</b> # some text in bold captured as 'name'
(?>'s*)
</a>
.* # and anywhere later in the document
's*
</td> # the end of a td, followed by whitespace
(?>'s*)
<td align=""center""> # after a <td align=center> containing no other elements
(?>[^<]*)
</td>
(?>'s*)
<td> # lastly
(?>'s*)
(?<locations> # a series of <a href=...>...</a><br/>
(?>(?: # captured as 'locations'
<a href=""(?>[^""]*)"">(?>[^<]*)</a>
<br />
(?>'s*)
)
+)) # (containing at least one of these)
</td>", RegexOptions.IgnorePatternWhitespace|RegexOptions.IgnoreCase)
但是你真的应该使用HTML敏捷包之类的东西。