从给定条件的字符串数组中提取字符串,例如一个Class's Properties

本文关键字:字符串 一个 Class Properties 条件 数组 提取 | 更新日期: 2023-09-27 18:01:18

用例:我导入文本文件,需要读取由4行组成的内容。对于这4行,我解析出不是预定义的,而是动态的字符串。

这是我从@zx81收集的一个例子:

输入:

on Apr 28, 2014 at 22:00
an Employee John Doe accessed
server - TPCX123
AccessType2 was ReasonType1 - program: Px2x3x, start: No22, 0.0 sec

因此,给定上述4行,我正在考虑保留它们的回车(即4行)或使其全部为一个字符串(即只有一行),我将提取属性并通过Class的属性将它们放入内存,例如ReportDate, ReportTime, EmployeeName, ServerName, AccessType, ReasonType, ProgramId, Start, Length

期望输出:

ReportDate = Apr 28, 2014
ReportTime = 22:00
EmployeeName = John Doe
ServerName = TnCX123
AccessType = AccessType2
ReasonType = ReasonType1
ProgramId = Px2x3x
Start = No22
Length = 0.0 sec

这就是我想要的-在等号的RHS上找到的所有项目,即分配给内存中Object中发现的特定属性的某些字符串,最终响应数据库表的列。从上面的示例中,属性EmployeeName将始终位于相同的位置(在特定字符串之间),因此将解析出其值,例如:"John Doe"。当然,对于我导入的每个文件,这些值将是不同的,因此它的动态部分。

希望对大家有帮助,谢谢。

从给定条件的字符串数组中提取字符串,例如一个Class's Properties

给定您的数据,像这样的代码将输出您想要的内容:

输出:

ReportDate = Apr 28, 2014
ReportTime = 22:00
EmployeeName = John Doe
ServerName = TnCX123
AccessType = AccessType2
ReasonType = ReasonType1
ProgramId = Px2x3x
Start = No22
Length = 0.0 sec

代码:

using System;
using System.Text.RegularExpressions;
using System.Collections.Specialized;
class Program
{
    static void Main()
    {
    string s1 = @"on Apr 28, 2014 at 22:00
an Employee John Doe accessed
server - TPCX123
AccessType2 was ReasonType1 - program: Px2x3x, start: No22, 0.0 sec";
    try
    {
    var myRegex = new Regex(@"(?s)^on's+(['w, ]+?) at ('d{2}:'d{2}).*?Employee (['w ]+) accessed.*?server - ('w+).*?('w+) was ('w+) - program: ('w+), start: ('w+), ('d+'.'d+ 'w+)");
    string date = myRegex.Match(s1).Groups[1].Value;
    string time = myRegex.Match(s1).Groups[2].Value;
    string name = myRegex.Match(s1).Groups[3].Value;
    string server = myRegex.Match(s1).Groups[4].Value;
    string access = myRegex.Match(s1).Groups[5].Value;
    string reason = myRegex.Match(s1).Groups[6].Value;
    string prog = myRegex.Match(s1).Groups[7].Value;
    string start = myRegex.Match(s1).Groups[8].Value;
    string length = myRegex.Match(s1).Groups[9].Value;
    Console.WriteLine("ReportDate = " + date);
    Console.WriteLine("ReportTime = " + time);
    Console.WriteLine("EmployeeName = " + name);
    Console.WriteLine("ServerName = " + server);
    Console.WriteLine("AccessType = " + access);
    Console.WriteLine("ReasonType = " + reason);
    Console.WriteLine("ProgramId = " + prog);
    Console.WriteLine("Start = " + start);
    Console.WriteLine("Length = " + length);
    }
    catch (ArgumentException ex)
    {
    // We have a syntax error
    }
    Console.WriteLine("'nPress Any Key to Exit.");
    Console.ReadKey();
    } // END Main
} // END Program

调整它

然而,要调整它,你将不得不刷新你的正则表达式。

首先,下面是对代码中正则表达式的逐个标记解释。那么我建议你访问FAQ, RexEgg和FAQ中提到的其他网站。

@"
(?                 # Use these options for the whole regular expression
   s                  # Dot matches line breaks
)
^                  # Assert position at the beginning of the string
on                 # Match the character string “on” literally (case sensitive)
's                 # Match a single character that is a “whitespace character” (any Unicode separator, tab, line feed, carriage return, vertical tab, form feed, next line)
   +                  # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
(                  # Match the regex below and capture its match into backreference number 1
   ['w,' ]            # Match a single character present in the list below
                         # A “word character” (Unicode; any letter or ideograph, digit, connector punctuation)
                         # A single character from the list “, ”
      +?                 # Between one and unlimited times, as few times as possible, expanding as needed (lazy)
)
' at'              # Match the character string “ at ” literally (case sensitive)
(                  # Match the regex below and capture its match into backreference number 2
   'd                 # Match a single character that is a “digit” (0–9 in any Unicode script)
      {2}                # Exactly 2 times
   :                  # Match the character “:” literally
   'd                 # Match a single character that is a “digit” (0–9 in any Unicode script)
      {2}                # Exactly 2 times
)
.                  # Match any single character
   *?                 # Between zero and unlimited times, as few times as possible, expanding as needed (lazy)
Employee'          # Match the character string “Employee ” literally (case sensitive)
(                  # Match the regex below and capture its match into backreference number 3
   ['w' ]             # Match a single character present in the list below
                         # A “word character” (Unicode; any letter or ideograph, digit, connector punctuation)
                         # The literal character “ ”
      +                  # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
' accessed         # Match the character string “ accessed” literally (case sensitive)
.                  # Match any single character
   *?                 # Between zero and unlimited times, as few times as possible, expanding as needed (lazy)
server' -'         # Match the character string “server - ” literally (case sensitive)
(                  # Match the regex below and capture its match into backreference number 4
   'w                 # Match a single character that is a “word character” (Unicode; any letter or ideograph, digit, connector punctuation)
      +                  # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
.                  # Match any single character
   *?                 # Between zero and unlimited times, as few times as possible, expanding as needed (lazy)
(                  # Match the regex below and capture its match into backreference number 5
   'w                 # Match a single character that is a “word character” (Unicode; any letter or ideograph, digit, connector punctuation)
      +                  # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
' was'             # Match the character string “ was ” literally (case sensitive)
(                  # Match the regex below and capture its match into backreference number 6
   'w                 # Match a single character that is a “word character” (Unicode; any letter or ideograph, digit, connector punctuation)
      +                  # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
' -' program:'     # Match the character string “ - program: ” literally (case sensitive)
(                  # Match the regex below and capture its match into backreference number 7
   'w                 # Match a single character that is a “word character” (Unicode; any letter or ideograph, digit, connector punctuation)
      +                  # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
,' start:'         # Match the character string “, start: ” literally (case sensitive)
(                  # Match the regex below and capture its match into backreference number 8
   'w                 # Match a single character that is a “word character” (Unicode; any letter or ideograph, digit, connector punctuation)
      +                  # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
,'                 # Match the character string “, ” literally
(                  # Match the regex below and capture its match into backreference number 9
   'd                 # Match a single character that is a “digit” (0–9 in any Unicode script)
      +                  # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
   '.                 # Match the character “.” literally
   'd                 # Match a single character that is a “digit” (0–9 in any Unicode script)
      +                  # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
   '                  # Match the character “ ” literally
   'w                 # Match a single character that is a “word character” (Unicode; any letter or ideograph, digit, connector punctuation)
      +                  # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
"