解析器生成器:如何同时使用 GPLEX 和 GPPG

本文关键字：GPLEX GPPG 何同时 | 更新日期: 2023-09-27 18:35:45

在浏览了好的C#解析器生成器的帖子后，我偶然发现了GPLEX和GPPG。我想使用 GPLEX 为 GPPG 生成令牌以解析和创建树（类似于 lex/yacc 关系）。但是，我似乎找不到这两者如何相互作用的例子。使用 lex/yacc，lex 返回由 yacc 定义的标记，并且可以在 yylval 中存储值。这是如何在GPLEX/GPPG中完成的（他们的文档中缺少）？

附上我想转换为 GPLEX 的 lex 代码：

%{
 #include <stdio.h>
 #include "y.tab.h"
%}
%%
[Oo][Rr]                return OR;
[Aa][Nn][Dd]            return AND;
[Nn][Oo][Tt]            return NOT;
[A-Za-z][A-Za-z0-9_]*   yylval=yytext; return ID;
%%

谢谢！安德鲁

解析器生成器:如何同时使用 GPLEX 和 GPPG

首先：包括引用"QUT.ShiftReduceParser.dll"在你的项目中。它在 GPLEX 的下载包中提供。

主程序的示例代码：

using System;
using ....;
using QUT.Gppg;
using Scanner;
using Parser;
namespace NCParser
{
class Program
{
    static void Main(string[] args)
    {
        string pathTXT = @"C:'temp'testFile.txt";
        FileStream file = new FileStream(pathTXT, FileMode.Open);
        Scanner scanner = new Scanner();
        scanner.SetSource(file, 0);
        Parser parser = new Parser(scanner);            
    }
}
}

GPLEX 的示例代码：

%using Parser;           //include the namespace of the generated Parser-class
%Namespace Scanner       //names the Namespace of the generated Scanner-class
%visibility public       //visibility of the types "Tokens","ScanBase","Scanner"
%scannertype Scanner     //names the Scannerclass to "Scanner"
%scanbasetype ScanBase   //names the Scanbaseclass to "ScanBase"
%tokentype Tokens        //names the Tokenenumeration to "Tokens"
%option codePage:65001 out:Scanner.cs /*see the documentation of GPLEX for further Options you can use */
%{ //user-specified code will be copied in the Output-file
%}
OR [Oo][Rr]
AND [Aa][Nn][Dd]
Identifier [A-Za-z][A-Za-z0-9_]*
%% //Rules Section
%{ //user-code that will be executed before getting the next token
%}
{OR}           {return (int)Tokens.kwAND;}
{AND}          {return (int)Tokens.kwAND;}
{Identifier}   {yylval = yytext; return (int)Tokens.ID;}
%% //User-code Section

GPPG 输入文件的示例代码：

%using Scanner      //include the Namespace of the scanner-class
%output=Parser.cs   //names the output-file
%namespace Parser  //names the namespace of the Parser-class
%parsertype Parser      //names the Parserclass to "Parser"
%scanbasetype ScanBase  //names the ScanBaseclass to "ScanBase"
%tokentype Tokens       //names the Tokensenumeration to "Tokens"
%token kwAND "AND", kwOR "OR" //the received Tokens from GPLEX
%token ID
%% //Grammar Rules Section
program  : /* nothing */
         | Statements
         ;
Statements : EXPR "AND" EXPR
           | EXPR "OR" EXPR
           ;
EXPR : ID
     ;
%% User-code Section
// Don't forget to declare the Parser-Constructor
public Parser(Scanner scnr) : base(scnr) { }

C#parsegppggplex

我遇到了类似的问题 - 由于明显缺乏文档，不知道如何将 GPLEX 的输出与 GPPG 一起使用。我认为问题源于这样一个事实，即GPLEX发行版包括gppg.exe以及gplex.exe，但只是GPLEX的文档。

如果您访问 GPPG 主页并下载该发行版，您将获得 GPPG 的文档，其中描述了输入文件的要求、如何构建语法等。哦，你还会再次得到两个二进制文件 - gppg.exe 和 gplex.exe。

将所有

内容包含在一个包中似乎更简单。它绝对可以消除一些困惑，特别是对于那些可能不熟悉词汇分析（标记化）和解析的人（并且可能还不是 100% 熟悉两者之间的差异）。

所以无论如何，对于那些可能第一次这样做的人：

GPLEX http://gplex.codeplex.com - 用于标记化/扫描/词法分析（相同）

GPPG http://gppg.codeplex.com/- 将分词器的输出作为要解析的输入。例如，解析器使用语法，可以执行简单分词器无法执行的操作，例如检测括号集是否匹配。

前段时间，我有同样的需求，同时使用GPLEX和GPPG，为了使工作更加容易，我创建了一个nuget包，用于在Visual Studio中一起使用GPPG和GPLEX。
此包可以安装在基于 .Net Framework 的 C# 项目中，并将一些命令添加到 Visual Studio 中的包管理器控制台。此命令可帮助您配置 C# 项目，以便在生成过程中集成 GPPG 和 GPLEX。本质上，在您的项目中，您将编辑 YACC 和 LEX 文件作为源代码，在项目构建期间，将生成解析器和扫描程序。此外，命令let将自定义解析器和扫描程序所需的文件添加到项目中。

你可以在这里找到它：https://www.nuget.org/packages/YaccLexTools/

这是解释如何使用它的博客文章的链接：http://ecianciotta-en.abriom.com/2013/08/yacclex-tools-v02.html

您是否考虑过使用 Roslyn？（这不是一个正确的答案，但我没有足够的声誉来将其作为评论发布）

具有讽刺意味的是，因为当我跳入 C# 中的解析器时，我正是从这两个工具开始的（大约一年前）。然后词法分析器有小错误（易于修复）：

http://gplex.codeplex.com/workitem/11308

但解析器更严重：

http://gppg.codeplex.com/workitem/11344

词法分析器应该是固定的（发布日期是 2013 年 6 月），但解析器可能仍然存在此错误（2012 年 5 月）。

所以我写了自己的套件：-）从那时起 https://sourceforge.net/projects/naivelangtools/、使用和发展它。

您的示例（在 NLT 中）转换为：

/[Oo][Rr]/                -> OR;
/[Aa][Nn][Dd]/            -> AND;
/[Nn][Oo][Tt]/            -> NOT;
// by default text is returned as value
/[A-Za-z][A-Za-z0-9_]*/   -> ID;

整个套件类似于 lex/yacc，如果可能的话，它不依赖于副作用（因此您返回适当的值）。