函数Regex递归节-从这个被屠杀的字符串中重建特定的字符串
本文关键字:字符串 屠杀 重建 递归 Regex 函数 | 更新日期: 2023-09-27 18:19:56
In:
53_2_b
50
48_1_b_i
50A_3_b
48_1_b_iv
输出:
53(2)(b)
50
48(1)(b)(i)
50A(3)(b)
48(1)(b)(iv)
(它们是已转换为NCNames的立法中的章节参考。我想取消它们的转换。)
有没有一些令人尴尬的少量代码可以做到这一点,这将教会我很多?
这就是我目前拥有的:
readonly Func<char, bool> _isNotUnderscore = c => c != '_';
string ConvertFragmentToSecRef(string frag)
{
var p0 = new Regex(@"^[0-9]+[A-Z]*");
var p1 = new Regex(@"[0-9]+");
var p2 = new Regex(@"['w]+");
var p3 = new Regex(@"(i|v|x)+");
var regexes = new[] {p0, p1, p2, p3};
var sb = new StringBuilder();
Recurse(frag,0,ref regexes,ref sb);
return sb.ToString();
}
void Recurse(string left,int level, ref Regex[] regexes,ref StringBuilder sb)
{
if (level < 4)
{
var head = String.Concat(left.TakeWhile(_isNotUnderscore));
var tail = String.Concat(left.Skip(head.Count())).TrimStart('_');
if (regexes[level].IsMatch(head))
{
sb.Append(level == 0 ? head : "(" + head + ")");
Recurse(tail, level + 1, ref regexes, ref sb);
}
}
}
您不需要递归,只需要前瞻断言:
resultString = Regex.Replace(subjectString,
@"_ # match _
([^_'r'n]*) # match whatever follows except _ or newlines
(?=[_'r]|$) # assert that a _ or end-of-line follows",
"($1)", RegexOptions.Multiline | RegexOptions.IgnorePatternWhitespace);
这适用于多行输入字符串。当然,如果你把每一行都放在一个单独的字符串中,那就更容易了:
resultString = Regex.Replace(subjectString,
@"_ # match _
([^_]*) # match whatever follows except _
(?=_|$) # assert that a _ or end-of-string follows",
"($1)", RegexOptions.IgnorePatternWhitespace);