在空行之前匹配行
本文关键字: | 更新日期: 2023-09-27 18:13:47
输入如下:
000:00:000000-->00:00:000000你好,世界!1.00:00:000000-->00:00:000000你好,世界!这是我的新世界。2.00:00:000000-->00:00:000000大家好!
使用一个清晰快速的正则表达式,我想将其拆分为:
匹配1:`0`匹配2:`00:00:000000-->00:00:000000`第三场比赛:世界你好`匹配1:`1`匹配2:`00:00:000000-->00:00:000000`第三场比赛:世界你好!这是我的新世界`匹配1:`2`匹配2:`00:00:000000-->00:00:000000`第三场比赛:"大家好`
我使用('d+)['n'r](['d:,]+'s-->'s['d:,]+)['n'r].+
进行匹配,但问题是它与两行或两行以上的文本不匹配(上例第2组中的匹配3(。
注意:如果您知道一种不使用Regex就能获得良好可读性和更好性能的方法,请随时为我提供。
谢谢,
Alireza
好吧,这里有一种非正则表达式方法:
public IEnumerable<List<string>> ReadSeparatedLines(string file)
{
List<string> lines = new List<string>();
foreach (var line in File.ReadLines(file))
{
if (line == "")
{
// Only take action if we've actually got something to return. This
// handles files starting with blank lines, and also files with
// multiple consecutive blank lines.
if (lines.Count > 0)
{
yield return lines;
lines = new List<string>();
}
}
else
{
lines.Add(line);
}
}
// Check whether we had any trailing lines to return
if (lines.Count > 0)
{
yield return lines;
}
}
我个人认为这比正则表达式更容易理解,但当然你可能有不同的品味。
您可以使用以下regex,
/('d+)['n'r](['d:,]+'s-->'s['d:,]+)(.*?)(?='n'n|$)/sg
演示
这里是
('d+)['n'r](['d:,]+'s-->'s['d:,]+)['n](.+(?:['n]*[^'d|^'n]+)*)
结果
匹配1
[0-1]
0
[2-31]
00:00:00,000 --> 00:00:00,000
[32-44]
Hello world!
匹配2
[46-47]
1
[48-77]
00:00:00,000 --> 00:00:00,000
[78-112]
Hello world! This is my new world.
匹配3
[114-115]
2
[116-145]
00:00:00,000 --> 00:00:00,000
[146-157]
Hello guys!
尝试regex101.com
编辑
我也尝试过更新正则表达式中的数字,所以现在它可以根据需要匹配多行数字。现在它看起来有点短
('d+)['n](.*?)'n((?s).*?)(?='n'n'd|'Z)
该正则表达式与以下匹配
0
00:00:00,000 --> 00:00:00,000
Hello world!
1
00:00:00,000 --> 00:00:00,000
Hello world!
This is my new world.
2
00:00:00,000 --> 00:00:00,000
Hello guys!
This line contains 123457!
This is third line!
And more lines!
作为
匹配1
[0-1]
0
[2-31]
00:00:00,000 --> 00:00:00,000
[32-44]
Hello world!
匹配2
[46-47]
1
[48-77]
00:00:00,000 --> 00:00:00,000
[78-112]
Hello world! This is my new world.
匹配3
[114-115]
2
[116-145]
00:00:00,000 --> 00:00:00,000
[146-220]
Hello guys! This line contains 123457! This is third line! And more lines!
尝试regex101.com