通过正则表达式将url的部分提取到命名组中

本文关键字：提取正则表达式 url | 更新日期: 2023-09-27 18:15:53

我试图获得部分url与命名组与Regex的。net

示例如下:

/find/products/
/find/products/test/
/find/products/test/with/
/find/products/test/with/lids/
/find/products/test/page/3/
/find/products/test/with/lids/page/3/

正则表达式的结果应该是

Query: Test
Subset: Lids
Page: 3

或null取决于url，我想要命名组，以便我可以在以后动态提取它。

我的尝试是

^/find/products/(?<Query>'w*)?
(?<SubsQuery>/with/(?<Subset>'w*)?/)?
(?<PageQuery>/page/(?<Page>'d)?/)?
$

从示例

/find/products/ (matches)
/find/products/test/ (doesnt)
/find/products/test/with/ (doesnt)
/find/products/test/with/lids/ (matches)
/find/products/test/page/3/  (matches)
/find/products/test/with/lids/page/3/ (doesnt)

这意味着我错过了一些可选的东西?:()，但我似乎看不到在哪里，认为我一天有太多的正则表达式:)

如果有人能帮助我，我将不胜感激。

通过正则表达式将url的部分提取到命名组中

在这里试试

Match result = Regex.Match(str, @"^/find/products/(?<Query>'w*)?/?
    (?<SubsQuery>with/(?<Subset>'w*))?/?
    (?<PageQuery>page/(?<Page>'d)?/)?
    $",
    RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace);

问题是，你错过了e.g.中的最后一个斜杠。"/find/products/test/"，因为这是从下一个(不可用的)组中覆盖的。

你的问题是你有太多的斜杠(/)在你的正则表达式。也就是说，你在一个部分的结尾有一个，然后在下一个部分的开始。最简单的解决方法是在每个部分的末尾加上斜杠:

^/find/products/(?<Query>'w*/)?
(?<SubsQuery>with/(?<Subset>'w*/)?)?
(?<PageQuery>page/(?<Page>'d/)?)?
$

当然，这会将斜杠放入命名组中。为了删除它们，你需要更多的组:

^/find/products/((?<Query>'w*)/)?
(?<SubsQuery>with/((?<Subset>'w*)/)?)?
(?<PageQuery>page/((?<Page>'d)/)?)?
$