在空行之前匹配行

本文关键字: | 更新日期: 2023-09-27 18:13:47

输入如下:

000:00:000000-->00:00:000000你好,世界!1.00:00:000000-->00:00:000000你好,世界!这是我的新世界。2.00:00:000000-->00:00:000000大家好!

使用一个清晰快速的正则表达式,我想将其拆分为:

匹配1:`0`匹配2:`00:00:000000-->00:00:000000`第三场比赛:世界你好`匹配1:`1`匹配2:`00:00:000000-->00:00:000000`第三场比赛:世界你好!这是我的新世界`匹配1:`2`匹配2:`00:00:000000-->00:00:000000`第三场比赛:"大家好`

我使用('d+)['n'r](['d:,]+'s-->'s['d:,]+)['n'r].+进行匹配,但问题是它与两行或两行以上的文本不匹配(上例第2组中的匹配3(。

注意:如果您知道一种不使用Regex就能获得良好可读性和更好性能的方法,请随时为我提供。

谢谢,
Alireza

在空行之前匹配行

好吧,这里有一种非正则表达式方法:

public IEnumerable<List<string>> ReadSeparatedLines(string file)
{
    List<string> lines = new List<string>();
    foreach (var line in File.ReadLines(file))
    {
        if (line == "")
        {
            // Only take action if we've actually got something to return. This
            // handles files starting with blank lines, and also files with
            // multiple consecutive blank lines.
            if (lines.Count > 0)
            {
                yield return lines;
                lines = new List<string>();
            }
        }
        else
        {
            lines.Add(line);
        }
    }
    // Check whether we had any trailing lines to return
    if (lines.Count > 0)
    {
        yield return lines;
    }
}

我个人认为这比正则表达式更容易理解,但当然你可能有不同的品味。

您可以使用以下regex,

/('d+)['n'r](['d:,]+'s-->'s['d:,]+)(.*?)(?='n'n|$)/sg

演示

这里是

('d+)['n'r](['d:,]+'s-->'s['d:,]+)['n](.+(?:['n]*[^'d|^'n]+)*)

结果

匹配1

  1. [0-1]0

  2. [2-31]00:00:00,000 --> 00:00:00,000

  3. [32-44]Hello world!

匹配2

  1. [46-47]1

  2. [48-77]00:00:00,000 --> 00:00:00,000

  3. [78-112]Hello world! This is my new world.

匹配3

  1. [114-115]2

  2. [116-145]00:00:00,000 --> 00:00:00,000

  3. [146-157]Hello guys!

尝试regex101.com

编辑

我也尝试过更新正则表达式中的数字,所以现在它可以根据需要匹配多行数字。现在它看起来有点短

('d+)['n](.*?)'n((?s).*?)(?='n'n'd|'Z)

该正则表达式与以下匹配

0
00:00:00,000 --> 00:00:00,000
Hello world!
1
00:00:00,000 --> 00:00:00,000
Hello world!
This is my new world.
2
00:00:00,000 --> 00:00:00,000
Hello guys!
This line contains 123457!
This is third line!
And more lines!

作为

匹配1

  1. [0-1]0

  2. [2-31]00:00:00,000 --> 00:00:00,000

  3. [32-44]Hello world!

匹配2

  1. [46-47]1

  2. [48-77]00:00:00,000 --> 00:00:00,000

  3. [78-112]Hello world! This is my new world.

匹配3

  1. [114-115]2

  2. [116-145]00:00:00,000 --> 00:00:00,000

  3. [146-220]Hello guys! This line contains 123457! This is third line! And more lines!

尝试regex101.com

相关文章:
  • 没有找到相关文章