Regex,C#中存在大于符号的问题

本文关键字:符号 问题 大于 存在 Regex | 更新日期: 2023-09-27 18:00:34

我正在尝试从html中提取一个特定的标签(我从这个网站上读到,你不应该尝试用正则表达式解析html,但我只需要特定的标签,这遵循一个非常特定的顺序)

这是一个正则表达式(在Expresso中测试),可以完美地工作

(?<ExternalSource2>'<eds2['s.]+url'='"?(?<Url>['w'./:'?=&'+%'d_-]+)'"?['s.]*'>(?<Text>['s.]*['w's'd]*)'</eds2'>)

当试图在C#中使用这个代码时,问题来了

Regex re = new Regex(@"(?<ExternalSource2>'<eds2['s.]+url'='""?(?<Url>['w'./:'?=&'+%'d_-]+)'""?['s.]*'>(?<Text>['s.]*['w's'd]*)'</eds2'>)");
        string Input = @"width: 662px; height: 60px; vertical-align: middle""><eds2 url=""http://www.someurl.co.uk/_modules/system/Newsletter.aspx?Username=TBO&Password=N5TBO2&TagID=PlaceLogo&TownID=147"">PlaceLogo</eds2></td></tr></tbody></table><table style=""width: 662px; border-collapse: collapse""><tbod";
        foreach (Match m in re.Matches(Input)) {
            HttpContext.Current.Response.Write(string.Format("Match : {0}<br/>", m));
            short i = 0;
            foreach (Group g in m.Groups) {
                HttpContext.Current.Response.Write(string.Format("Group {0} = {1}<br/>", i++, g.Value));
            }
            HttpContext.Current.Response.Write("<br/><br/>");
        }

产生以下结果:

Match : PlaceLogo
Group 0 = PlaceLogo
Group 1 = PlaceLogo
Group 2 = http://www.someurl.co.uk/_modules/system/Newsletter.aspx?Username=TBO&Password=N5TBO2&TagID=PlaceLogo&TownID=147
Group 3 = PlaceLogo

这根本不是我所期望的。

不过,当你使用下面的代码时,结果更符合我的预期(但仍然不完全):

    Regex re = new Regex(@"eds2['s.]+url'='""?(?<Url>['w'./:'?=&'+%'d_-]+)'""?['s.]*'>(?<Text>['s.]*['w's'd]*)'</eds2'>");

结果:

Match : eds2 url="http://www.someurl.co.uk/_modules/system/Newsletter.aspx?Username=TBO&Password=N5TBO2&TagID=PlaceLogo&TownID=147">PlaceLogo
Group 0 = eds2 url="http://www.someurl.co.uk/_modules/system/Newsletter.aspx?Username=TBO&Password=N5TBO2&TagID=PlaceLogo&TownID=147">PlaceLogo
Group 1 = http://www.someurl.co.uk/_modules/system/Newsletter.aspx?Username=TBO&Password=N5TBO2&TagID=PlaceLogo&TownID=147
Group 2 = PlaceLogo

预期输出为:

Match : <eds2 url="http://www.someurl.co.uk/_modules/system/Newsletter.aspx?Username=TBO&Password=N5TBO2&TagID=PlaceLogo&TownID=147">PlaceLogo</eds2>
Group 0 = <eds2 url="http://www.someurl.co.uk/_modules/system/Newsletter.aspx?Username=TBO&Password=N5TBO2&TagID=PlaceLogo&TownID=147">PlaceLogo</eds2>
Group 1 = <eds2 url="http://www.someurl.co.uk/_modules/system/Newsletter.aspx?Username=TBO&Password=N5TBO2&TagID=PlaceLogo&TownID=147">PlaceLogo</eds2>
Group 2 = http://www.someurl.co.uk/_modules/system/Newsletter.aspx?Username=TBO&Password=N5TBO2&TagID=PlaceLogo&TownID=147
Group 3 = PlaceLogo

感谢您的帮助。

Regex,C#中存在大于符号的问题

我无法用您的示例代码重现您的问题。它创建以下输出:

Match : <eds2 url="http://www.someurl.co.uk/_modules/system/Newsletter.aspx?Username=TBO&Password=N5TBO2&TagID=PlaceLogo&TownID=147">PlaceLogo</eds2>
Group 0 = <eds2 url="http://www.someurl.co.uk/_modules/system/Newsletter.aspx?Username=TBO&Password=N5TBO2&TagID=PlaceLogo&TownID=147">PlaceLogo</eds2>
Group 1 = <eds2 url="http://www.someurl.co.uk/_modules/system/Newsletter.aspx?Username=TBO&Password=N5TBO2&TagID=PlaceLogo&TownID=147">PlaceLogo</eds2>
Group 2 = http://www.someurl.co.uk/_modules/system/Newsletter.aspx?Username=TBO&Password=N5TBO2&TagID=PlaceLogo&TownID=147
Group 3 = PlaceLogo

请澄清你的问题。

更新:
我想,你的问题如下:您可以将匹配结果直接写入响应流,而无需对其进行转义。这意味着,它将被解释为HTML,而不是您想要的文本
您应该将代码更改为:

Regex re = new Regex(@"(?<ExternalSource2>'<eds2['s.]+url'='""?(?<Url>['w'./:'?=&'+%'d_-]+)'""?['s.]*'>(?<Text>['s.]*['w's'd]*)'</eds2'>)");
string Input = @"width: 662px; height: 60px; vertical-align: middle""><eds2 url=""http://www.someurl.co.uk/_modules/system/Newsletter.aspx?Username=TBO&Password=N5TBO2&TagID=PlaceLogo&TownID=147"">PlaceLogo</eds2></td></tr></tbody></table><table style=""width: 662px; border-collapse: collapse""><tbod";
foreach (Match m in re.Matches(Input))
{
    HttpContext.Current.Response.Write(string.Format("Match : {0}<br/>",
                                                     Server.HtmlEncode(m)));
    short i = 0;
    foreach (Group g in m.Groups)
    {
        HttpContext.Current.Response
                           .Write(string.Format("Group {0} = {1}<br/>", i++, 
                                                Server.HtmlEncode(g.Value)));
    }
    HttpContext.Current.Response.Write("<br/><br/>");
}