多行正则表达式确实选择了太多或太少的东西

本文关键字:太多 正则表达式 选择 | 更新日期: 2023-09-27 18:33:40

我正在尝试编写一个正则表达式来"解析"某种日志文件。结构如下所示:

2013-09-05 00:01:14.5726 WEB Info [n/a: UPN Claim] New instance of service created.
2013-09-05 00:01:14.6038 WEB Info [n/a: UPN Claim] 
---------------- [ Ping received ] -------------
CurrentPrincipel has Claims:
Claim Type: http://schemas.xmlsoap.org/ws/2005/05/identity/claims/name with Value: GROUP'User
Claim Type: http://schemas.microsoft.com/ws/2008/06/identity/claims/primarysid with Value: S-1-5-21-36134387-561137642-176895030-23737
Claim Type: http://schemas.microsoft.com/ws/2008/06/identity/claims/primarygroupsid with Value: S-1-5-21-36134387-561137642-174895030-513
Claim Type: http://schemas.microsoft.com/ws/2008/06/identity/claims/groupsid with Value: S-1-5-21-36134387-561137642-176892330-513
Claim Type: http://schemas.microsoft.com/ws/2008/06/identity/claims/groupsid with Value: S-1-1-0
Claim Type: http://schemas.microsoft.com/ws/2008/06/identity/claims/groupsid with Value: S-1-5-32-545
Claim Type: http://schemas.microsoft.com/ws/2008/06/identity/claims/groupsid with Value: S-1-5-2
Claim Type: http://schemas.microsoft.com/ws/2008/06/identity/claims/groupsid with Value: S-1-5-11
Claim Type: http://schemas.microsoft.com/ws/2008/06/identity/claims/groupsid with Value: S-1-5-15
Claim Type: http://schemas.microsoft.com/ws/2008/06/identity/claims/groupsid with Value: S-1-5-21-36134387-561137642-326895030-16415
Claim Type: http://schemas.microsoft.com/ws/2008/06/identity/claims/groupsid with Value: S-1-5-21-36134387-561137642-1732895030-31127
Claim Type: http://schemas.microsoft.com/ws/2008/06/identity/claims/groupsid with Value: S-1-5-21-36134387-561137642-176235030-12815
Claim Type: http://schemas.microsoft.com/ws/2008/06/identity/claims/groupsid with Value: S-1-5-21-36134387-561137642-176892330-12145
Claim Type: http://schemas.microsoft.com/ws/2008/06/identity/claims/groupsid with Value: S-1-5-21-36134387-561137642-176895430-31228
Claim Type: http://schemas.microsoft.com/ws/2008/06/identity/claims/groupsid with Value: S-1-5-21-36134387-561137642-176892330-16100
Claim Type: http://schemas.microsoft.com/ws/2008/06/identity/claims/groupsid with Value: S-1-5-64-10
2013-09-05 00:01:15.0406 WEB Info [n/a: UPN Claim] New instance of service created.
2013-09-05 00:01:15.0718 WEB Info [n/a: UPN Claim] GetLangugesChanges invoked for rowVersion 1 and 8 clientChanges.

我喜欢用正则表达式实现的是获得 4 个日志条目匹配项(每个条目 = 1 个匹配项)。我尝试了以下正则表达式,但没有前导时间戳就无法获取行:

(^'d{4}-'d{2}-'d{2}(.|^|$)*(?=>^'d{4})*)

正则表达式的调用方式如下:

string input = File.ReadAllText(@"log.txt");
MatchCollection matches = Regex.Matches(input, @"(^'d{4}-'d{2}-'d{2}(.|^|$)*(?=>^'d{4})*)", RegexOptions.Multiline);
foreach (Match match in matches)
{
    Console.WriteLine("{0}", match.Value);
    Console.WriteLine("------");
}

给出的输出是:

2013-09-05 00:01:14.5726 WEB Info [n/a: UPN Claim] New instance of service created. ------
2013-09-05 00:01:14.6038 WEB Info [n/a: UPN Claim]  ------
2013-09-05 00:01:15.0406 WEB Info [n/a: UPN Claim] New instance of service created. ------
2013-09-05 00:01:15.0718 WEB Info [n/a: UPN Claim] GetLangugesChanges invoked for rowVersion 1 and 8 clientChanges.
------

预期的输出将是:

2013-09-05 00:01:14.5726 WEB Info [n/a: UPN Claim] New instance of service created.
------
2013-09-05 00:01:14.6038 WEB Info [n/a: UPN Claim] 
---------------- [ Ping received ] -------------
CurrentPrincipel has Claims:
Claim Type: http://schemas.xmlsoap.org/ws/2005/05/identity/claims/name with Value: GROUP'User
Claim Type: http://schemas.microsoft.com/ws/2008/06/identity/claims/primarysid with Value: S-1-5-21-36134387-561137642-176895030-23737
Claim Type: http://schemas.microsoft.com/ws/2008/06/identity/claims/primarygroupsid with Value: S-1-5-21-36134387-561137642-174895030-513
Claim Type: http://schemas.microsoft.com/ws/2008/06/identity/claims/groupsid with Value: S-1-5-21-36134387-561137642-176892330-513
Claim Type: http://schemas.microsoft.com/ws/2008/06/identity/claims/groupsid with Value: S-1-1-0
Claim Type: http://schemas.microsoft.com/ws/2008/06/identity/claims/groupsid with Value: S-1-5-32-545
Claim Type: http://schemas.microsoft.com/ws/2008/06/identity/claims/groupsid with Value: S-1-5-2
Claim Type: http://schemas.microsoft.com/ws/2008/06/identity/claims/groupsid with Value: S-1-5-11
Claim Type: http://schemas.microsoft.com/ws/2008/06/identity/claims/groupsid with Value: S-1-5-15
Claim Type: http://schemas.microsoft.com/ws/2008/06/identity/claims/groupsid with Value: S-1-5-21-36134387-561137642-326895030-16415
Claim Type: http://schemas.microsoft.com/ws/2008/06/identity/claims/groupsid with Value: S-1-5-21-36134387-561137642-1732895030-31127
Claim Type: http://schemas.microsoft.com/ws/2008/06/identity/claims/groupsid with Value: S-1-5-21-36134387-561137642-176235030-12815
Claim Type: http://schemas.microsoft.com/ws/2008/06/identity/claims/groupsid with Value: S-1-5-21-36134387-561137642-176892330-12145
Claim Type: http://schemas.microsoft.com/ws/2008/06/identity/claims/groupsid with Value: S-1-5-21-36134387-561137642-176895430-31228
Claim Type: http://schemas.microsoft.com/ws/2008/06/identity/claims/groupsid with Value: S-1-5-21-36134387-561137642-176892330-16100
Claim Type: http://schemas.microsoft.com/ws/2008/06/identity/claims/groupsid with Value: S-1-5-64-10
------
2013-09-05 00:01:15.0406 WEB Info [n/a: UPN Claim] New instance of service created.
------
2013-09-05 00:01:15.0718 WEB Info [n/a: UPN Claim] GetLangugesChanges invoked for rowVersion 1 and 8 clientChanges.
------

我做错了什么?我感谢任何帮助!

多行正则表达式确实选择了太多或太少的东西

您可以尝试匹配两个日期(或文件末尾)之间的所有内容,如下所示:

^'d{4}-'d{2}-'d{2}.*?(?=^'d{4}-'d{2}-'d{2}|(?!.))

将其与RegexOptions.Multiline | RegexOptions.Singleline一起使用。

另一种选择是使用正则表达式拆分字符串:

var regex = new Regex(@"^'d{4}-'d{2}-'d{2}", RegexOptions.Multiline);
string[] result = regex.Split(input);

如果我正确理解了您的问题,这应该会有所帮助。

(?<='d{4}-'d{2}-'d{2} 'd{2}:'d{2}:'d{2}.'d{4} )(.*)

例:

 2013-09-05 00:01:14.6038 VSWEB04 Info [n/a: UPN Claim] 

匹配:

VSWEB04 Info [n/a: UPN Claim]