字符解析器和字符转义

本文关键字:字符 转义 | 更新日期: 2023-09-27 18:14:51

我还没有找到一个例子-如何处理字符转义。我发现了一个代码示例:

static void Main(string[] args)
{
    string text = "'test '''' text'";
    var result = Grammar.QuotedText.End().Parse(text);
}
public static class Grammar
{
    private static readonly Parser<char> QuoteEscape = Parse.Char('''');
    private static Parser<T> Escaped<T>(Parser<T> following)
    {
        return from escape in QuoteEscape
               from f in following
               select f;
    }
    private static readonly Parser<char> QuotedTextDelimiter = Parse.Char('''');
      private static readonly Parser<char> QuotedContent =
          Parse.AnyChar.Except(QuotedTextDelimiter).Or(Escaped(QuotedTextDelimiter));
    public static Parser<string> QuotedText = (
        from lquot in QuotedTextDelimiter
        from content in QuotedContent.Many().Text()
        from rquot in QuotedTextDelimiter
        select content
        ).Token();
}

如果文本不包含转义,则成功解析文本,但不解析带有转义字符的文本。

字符解析器和字符转义

我有一个类似的问题,在解析字符串时使用"作为分隔符,'作为转义字符。我为此编写了一个简单的解析器(可能不是最优雅的解决方案),它似乎工作得很好。

您应该能够适应它,因为唯一的区别似乎是分隔符。

var escapedDelimiter = Parse.String("'''"").Text().Named("Escaped delimiter");
var singleEscape = Parse.String("''").Text().Named("Single escape character");
var doubleEscape = Parse.String("''''").Text().Named("Escaped escape character");
var delimiter = Parse.Char('"').Named("Delimiter");
var simpleLiteral = Parse.AnyChar.Except(singleEscape).Except(delimiter).Many().Text().Named("Literal without escape/delimiter character");
var stringLiteral = (from start in delimiter
            from v in escapedDelimiter.Or(doubleEscape).Or(singleEscape).Or(simpleLiteral).Many()
            from end in delimiter
            select string.Concat(start) + string.Concat(v) + string.Concat(end));

关键部分是from v in ...。它首先搜索转义分隔符,然后搜索双转义字符,然后搜索单转义字符,然后尝试将其解析为没有任何转义字符或分隔符的"simpleLiteral"。更改这里的顺序将导致解析错误(例如,如果您试图在转义分隔符之前解析单个转义,则永远找不到后者,对于双转义和单转义也是如此)。这个步骤重复很多次,直到出现一个未转义的分隔符(from v in ...不处理未转义的分隔符,但from end in delimiter当然可以)。

我需要解析可以用单引号或双引号表示的字符串字面值,而且还支持这些字面值的转义。

生成字符串文本解析器的方法:

private readonly StringBuilder _reusableStringBuilder = new StringBuilder();
private Parser<string> BuildStringLiteralParser(char delimiterChar)
{
    var escapeChar = '''';
    var delimiter = Sprache.Parse.Char(delimiterChar);
    var escape = Sprache.Parse.Char(escapeChar);
    var escapedDelimiter = Sprache.Parse.String($"{escapeChar}{delimiterChar}");
    var splitByEscape = Sprache.Parse.AnyChar
        .Except(delimiter.Or(escape))
        .Many()
        .Text()
        .DelimitedBy(escapedDelimiter);
    string BuildStr(IEnumerable<IEnumerable<string>> splittedByEscape)
    {
        _reusableStringBuilder.Clear();
        var i = 0;
        foreach (var splittedByEscapedDelimiter in splittedByEscape)
        {
            if (i > 0)
            {
                _reusableStringBuilder.Append(escapeChar);
            }
            var j = 0;
            foreach (var str in splittedByEscapedDelimiter)
            {
                if (j > 0)
                {
                    _reusableStringBuilder.Append(delimiterChar);
                }
                _reusableStringBuilder.Append(str);
                j++;
            }
            i++;
        }
        return _reusableStringBuilder.ToString();
    }
    return (from ln in delimiter
            from splittedByEscape in splitByEscape.DelimitedBy(escape)
            from rn in delimiter
            select BuildStr(splittedByEscape)).Named("string");
}

用法:

var stringParser = BuildStringLiteralParser(''"').Or(BuildStringLiteralParser(''''));
var str1 = stringParser.Parse("'"'Hello' '''"John'''"'"");
Console.WriteLine(str1);
var str2 = stringParser.Parse("'''''Hello''' '"John'"''");
Console.WriteLine(str2);
输出:

'Hello' "John"
'Hello' "John"

检查工作演示:https://dotnetfiddle.net/8wFNbj