使用c#和正则表达式分析日志文件

本文关键字:日志 文件 正则表达式 使用 | 更新日期: 2023-09-27 18:22:14

我有一个很大的日志文件,看起来像下面的3行示例。

'LogFiles'W3SVC1'u_ex12.log:32:2015-01-04 07:11:22 &actor=%7B%22name%22%3A%5B%22Smith%2C%20Steve%22%5D%2C%22mbox%22%3A%5B%22mailto%3ASmith.Steve%40xyz.com%22%5D%7D&
'LogFiles'W3SVC1'u_ex12.log:32:2015-06-08 02:04:13 &actor=%7B%22name%22%3A%5B%22Brown%2C%20Bob%22%5D%2C%22mbox%22%3A%5B%22mailto%3ABrown.Bob%40xyz.com%22%5D%7D&
'LogFiles'W3SVC1'u_ex12.log:32:2014-08-02 05:50:37 &actor=%7B%22name%22%3A%5B%22Franklin%2C%20Francis%22%5D%2C%22mbox%22%3A%5B%22mailto%3AFranklin.Francis%40xyz.com%22%5D%7D&

我需要提取隐藏在日志文件中的日期、名称和mailto字段。

我试着使用在线正则表达式生成器,但在它变得笨拙之前,我只做了这么远。

using System;
using System.Text.RegularExpressions;
namespace ConsoleApplication1
{
  class Program
  {
    static void Main(string[] args)
    {
      //test string
      string txt="'LogFiles'W3SVC1'u_ex12.log:32:2014-08-02 05:50:37 &actor=%7B%22name%22%3A%5B%22Franklin%2C%20Francis%22%5D%2C%22mbox%22%3A%5B%22mailto%3AFranklin.Francis%40xyz.com%22%5D%7D&";
  string re1=".*?"; // Non-greedy match on filler
  string re2="((?:(?:[1]{1}''d{1}''d{1}''d{1})|(?:[2]{1}''d{3}))[-:''/.](?:[0]?[1-9]|[1][012])[-:''/.](?:(?:[0-2]?''d{1})|(?:[3][01]{1})))(?![''d])";   // YYYYMMDD 1
  Regex r = new Regex(re1+re2,RegexOptions.IgnoreCase|RegexOptions.Singleline);
  Match m = r.Match(txt);
  if (m.Success)
  {
        String yyyymmdd1=m.Groups[1].ToString();
        Console.Write("("+yyyymmdd1.ToString()+")"+"'n");
  }
  Console.ReadLine();
}
  }
 }

有没有一种方法可以在c中使用或不使用regex来实现这一点?

谢谢!

使用c#和正则表达式分析日志文件

假设您使用正则表达式,并且它是这种广义的行形式,那么这样的东西应该可以工作-

(?m)^'S+:(?<Date>'d+-'d+-'d+)'s(?:(?!&actor=).)+&actor=(?:%[0-9a-fA-F]{2})*name(?:%[0-9a-fA-F]{2})*(?<LastName>(?:(?!%[0-9a-fA-F]{2}|mbox).)+)(?:%[0-9a-fA-F]{2})+(?<FirstName>(?:(?!%[0-9a-fA-F]{2}|mbox).)*)(?:%[0-9a-fA-F]{2})*mbox(?:%[0-9a-fA-F]{2})+mailto(?:%[0-9a-fA-F]{2})+(?<MailUser>(?:(?!%[0-9a-fA-F]{2}).)+)(?:%[0-9a-fA-F]{2})+(?<MailDomain>(?:(?!%[0-9a-fA-F]{2}).)+)(?:%[0-9a-fA-F]{2})+&

它使用正则表达式中修饰符组中的多行修饰符(?m)

格式化:

 (?m)
 ^ 
 'S+ 
 :
 (?<Date>                            #_(1 start)         
      'd+ 
      -
      'd+ 
      -
      'd+ 
 )                                   #_(1 end)         
 's 
 (?:
      (?! &actor= )
      . 
 )+
 &actor=
 (?: % [0-9a-fA-F]{2} )*
 name
 (?: % [0-9a-fA-F]{2} )*
 (?<LastName>                        #_(2 start)         
      (?:
           (?! % [0-9a-fA-F]{2} | mbox )
           . 
      )+
 )                                   #_(2 end)         

 (?: % [0-9a-fA-F]{2} )+
 (?<FirstName>                       #_(3 start)         
      (?:
           (?! % [0-9a-fA-F]{2} | mbox )
           . 
      )*
 )                                   #_(3 end)         
 (?: % [0-9a-fA-F]{2} )*
 mbox
 (?: % [0-9a-fA-F]{2} )+
 mailto

 (?: % [0-9a-fA-F]{2} )+

 (?<MailUser>                        #_(4 start)         
      (?:
           (?! % [0-9a-fA-F]{2} )
           . 
      )+
 )                                   #_(4 end)         
 (?: % [0-9a-fA-F]{2} )+

 (?<MailDomain>                      #_(5 start)         
      (?:
           (?! % [0-9a-fA-F]{2} )
           . 
      )+
 )                                   #_(5 end)         
 (?: % [0-9a-fA-F]{2} )+
 &

输出:

 **  Grp 1 [Date]       -  ( pos 31 , len 10 ) 
2015-01-04  
 **  Grp 2 [LastName]   -  ( pos 80 , len 5 ) 
Smith  
 **  Grp 3 [FirstName]  -  ( pos 91 , len 5 ) 
Steve  
 **  Grp 4 [MailUser]   -  ( pos 133 , len 11 ) 
Smith.Steve  
 **  Grp 5 [MailDomain] -  ( pos 147 , len 7 ) 
xyz.com  
---------------------
 **  Grp 1 [Date]       -  ( pos 197 , len 10 ) 
2015-06-08  
 **  Grp 2 [LastName]   -  ( pos 246 , len 5 ) 
Brown  
 **  Grp 3 [FirstName]  -  ( pos 257 , len 3 ) 
Bob  
 **  Grp 4 [MailUser]   -  ( pos 297 , len 9 ) 
Brown.Bob  
 **  Grp 5 [MailDomain] -  ( pos 309 , len 7 ) 
xyz.com  
----------------------
 **  Grp 1 [Date]       -  ( pos 359 , len 10 ) 
2014-08-02  
 **  Grp 2 [LastName]   -  ( pos 408 , len 8 ) 
Franklin  
 **  Grp 3 [FirstName]  -  ( pos 422 , len 7 ) 
Francis  
 **  Grp 4 [MailUser]   -  ( pos 466 , len 16 ) 
Franklin.Francis  
 **  Grp 5 [MailDomain] -  ( pos 485 , len 7 ) 
xyz.com  

此外,只需稍作修改,您就可以将它们全部放入CaptureCollection列表
在一场比赛中。

C#

string log =
@"
'LogFiles'W3SVC1'u_ex12.log:32:2015-01-04 07:11:22 &actor=%7B%22name%22%3A%5B%22Smith%2C%20Steve%22%5D%2C%22mbox%22%3A%5B%22mailto%3ASmith.Steve%40xyz.com%22%5D%7D&
'LogFiles'W3SVC1'u_ex12.log:32:2015-06-08 02:04:13 &actor=%7B%22name%22%3A%5B%22Brown%2C%20Bob%22%5D%2C%22mbox%22%3A%5B%22mailto%3ABrown.Bob%40xyz.com%22%5D%7D&
'LogFiles'W3SVC1'u_ex12.log:32:2014-08-02 05:50:37 &actor=%7B%22name%22%3A%5B%22Franklin%2C%20Francis%22%5D%2C%22mbox%22%3A%5B%22mailto%3AFranklin.Francis%40xyz.com%22%5D%7D&
sfgbadfbdfbadfbdab
junk .........
'LogFiles'W3SVC1'u_ex12.log:32:2014-08-02 05:50:37 &actor=%7B%22name%22%3A%5B%22Smith%2C%20Joe%22%5D%2C%22mbox%22%3A%5B%22mailto%3ASmith.Joe%40xyz.com%22%5D%7D&
'LogFiles'W3SVC1'u_ex12.log:32:2014-08-02 05:50:37 &actor=%7B%22name%22%3A%5B%22Doe%2C%20Jane%22%5D%2C%22mbox%22%3A%5B%22mailto%3ADoe.Jane%40xyz.com%22%5D%7D&
";
Regex RxLog = new Regex(@"(?m)(?:^'S+:(?<Date>'d+-'d+-'d+)'s(?:(?!&actor=).)+&actor=(?:%[0-9a-fA-F]{2})*name(?:%[0-9a-fA-F]{2})*(?<LastName>(?:(?!%[0-9a-fA-F]{2}|mbox).)+)(?:%[0-9a-fA-F]{2})+(?<FirstName>(?:(?!%[0-9a-fA-F]{2}|mbox).)*)(?:%[0-9a-fA-F]{2})*mbox(?:%[0-9a-fA-F]{2})+mailto(?:%[0-9a-fA-F]{2})+(?<MailUser>(?:(?!%[0-9a-fA-F]{2}).)+)(?:%[0-9a-fA-F]{2})+(?<MailDomain>(?:(?!%[0-9a-fA-F]{2}).)+)(?:%[0-9a-fA-F]{2})+&'s*|(?:.*'s))+");
Match logMatch = RxLog.Match(log);
if (logMatch.Success)
{
    CaptureCollection ccDate = logMatch.Groups["Date"].Captures;
    CaptureCollection ccLname = logMatch.Groups["LastName"].Captures;
    CaptureCollection ccFname = logMatch.Groups["FirstName"].Captures;
    CaptureCollection ccUser = logMatch.Groups["MailUser"].Captures;
    CaptureCollection ccDomain = logMatch.Groups["MailDomain"].Captures;
    for (int i = 0; i < ccDate.Count; i++)
        Console.WriteLine("{0}  {1}, {2}    {3}@{4}", ccDate[i].Value, ccLname[i].Value, ccFname[i].Value, ccUser[i].Value, ccDomain[i].Value );
}

输出:

2015-01-04  Smith, Steve    Smith.Steve@xyz.com
2015-06-08  Brown, Bob    Brown.Bob@xyz.com
2014-08-02  Franklin, Francis    Franklin.Francis@xyz.com
2014-08-02  Smith, Joe    Smith.Joe@xyz.com
2014-08-02  Doe, Jane    Doe.Jane@xyz.com

您可以做的是将行拆分为几个部分,然后解码url部分,获取actor参数,将其反序列化为Actor并使用其属性。一个快速的例子是:

string txt = @"'LogFiles'W3SVC1'u_ex12.log:32:2014-08-02 05:50:37 &actor=%7B%22name%22%3A%5B%22Franklin%2C%20Francis%22%5D%2C%22mbox%22%3A%5B%22mailto%3AFranklin.Francis%40xyz.com%22%5D%7D&";
var parts = txt.Split(' ');
var urlParams = HttpUtility.UrlDecode(parts[2]);
string actorJson = HttpUtility.ParseQueryString(urlParams).Get("actor");
Actor actor = JsonConvert.DeserializeObject<Actor>(actorJson);
Console.WriteLine(actor.Name + " " + actor.EmailAddress);

您需要添加对System.WebJson.Net的引用才能使其工作,当然还需要为Actor类添加一个定义,如:

namespace MyNamespace
{
    public class Actor
    {
        public string[] name { get; set; }
        public string[] mbox { get; set; }
        public string Name { get { return name[0]; } }
        public string EmailAddress { get { return mbox[0].Replace("mailto:", ""); } }
    }
}

现在,您只需使用File类获取所有行,并循环遍历它们中的每一行,将所有未被激怒的参与者放入List或类似的列表中。