函数Regex递归节-从这个被屠杀的字符串中重建特定的字符串

本文关键字:字符串 屠杀 重建 递归 Regex 函数 | 更新日期: 2023-09-27 18:19:56

In:

53_2_b
50
48_1_b_i
50A_3_b
48_1_b_iv

输出:

53(2)(b)
50
48(1)(b)(i)
50A(3)(b)
48(1)(b)(iv)

(它们是已转换为NCNames的立法中的章节参考。我想取消它们的转换。)

有没有一些令人尴尬的少量代码可以做到这一点,这将教会我很多?

这就是我目前拥有的:

readonly Func<char, bool> _isNotUnderscore = c => c != '_';
string ConvertFragmentToSecRef(string frag)
{           
    var p0 = new Regex(@"^[0-9]+[A-Z]*");
    var p1 = new Regex(@"[0-9]+");
    var p2 = new Regex(@"['w]+");
    var p3 = new Regex(@"(i|v|x)+");
    var regexes = new[] {p0, p1, p2, p3};
    var sb = new StringBuilder();
    Recurse(frag,0,ref regexes,ref sb);
    return sb.ToString();
}
void Recurse(string left,int level, ref Regex[] regexes,ref StringBuilder sb)
{
    if (level < 4)
    {
        var head = String.Concat(left.TakeWhile(_isNotUnderscore));
        var tail = String.Concat(left.Skip(head.Count())).TrimStart('_');
        if (regexes[level].IsMatch(head))
        {
            sb.Append(level == 0 ? head : "(" + head + ")");
            Recurse(tail, level + 1, ref regexes, ref sb);
        }
    }
}

函数Regex递归节-从这个被屠杀的字符串中重建特定的字符串

您不需要递归,只需要前瞻断言:

resultString = Regex.Replace(subjectString, 
    @"_          # match _
    ([^_'r'n]*)  # match whatever follows except _ or newlines
    (?=[_'r]|$)  # assert that a _ or end-of-line follows", 
    "($1)", RegexOptions.Multiline | RegexOptions.IgnorePatternWhitespace);

这适用于多行输入字符串。当然,如果你把每一行都放在一个单独的字符串中,那就更容易了:

resultString = Regex.Replace(subjectString, 
    @"_      # match _
    ([^_]*)  # match whatever follows except _
    (?=_|$)  # assert that a _ or end-of-string follows", 
    "($1)", RegexOptions.IgnorePatternWhitespace);