C#中的antlr4解析器似乎是正确的,但不起作用

本文关键字:不起作用 似乎是 antlr4 中的 | 更新日期: 2023-09-27 17:58:41

grammar CSVParser;
//
table   :   line+;
line    :   NAME
',' PEAK ',' 
STARTYEAR ',' 
ENDYEAR ',' LENGTH NEWLINE
;
NEWLINE :   ''r'? ''n'
        ;   
NAME    :   ('"'(~'"')*'"') 
;
PEAK    :   ([0-9]+);
STARTYEAR   :   ([0-9]+);
ENDYEAR :   ([0-9]+);
LENGTH  :   [0-9]+;

正如你所看到的,我想解析一个CSV表,如下所示:

"ANNUAL REVIEW OF IMMUNOLOGY, VOL 31",0,0,1,1
"",0,0,1,1
"CA-A CANCER JOURNAL FOR CLINICIANS",1,1,2,1
"NATURE CHEMISTRY",1,1,3,2
"NATURE PHOTONICS",1,1,3,2
"ANNUAL REVIEW OF IMMUNOLOGY, VOL 30",1,1,2,1
"PHYSICS TODAY",2,1,3,2
"NATURE BIOTECHNOLOGY",2,2,4,2
"CHEMICAL SOCIETY REVIEWS",2,1,3,2
"NATURE REVIEWS GENETICS",2,2,3,1

但也有例外:

line 1:40 mismatched input '0' expecting STARTYEAR
line 2:5 mismatched input '0' expecting STARTYEAR
line 3:39 mismatched input '1' expecting STARTYEAR
line 4:21 mismatched input '1' expecting STARTYEAR
line 5:21 mismatched input '1' expecting STARTYEAR
line 6:40 mismatched input '1' expecting STARTYEAR
line 7:18 mismatched input '1' expecting STARTYEAR
line 8:25 mismatched input '2' expecting STARTYEAR
line 9:29 mismatched input '1' expecting STARTYEAR
line 10:28 mismatched input '2' expecting STARTYEAR
line 11:31 mismatched input '2' expecting STARTYEAR
line 12:42 mismatched input '2' expecting STARTYEAR
line 13:19 mismatched input '1' expecting STARTYEAR
line 14:40 mismatched input '2' expecting STARTYEAR
line 15:34 mismatched input '2' expecting STARTYEAR
line 16:29 mismatched input '2' expecting STARTYEAR
line 17:40 mismatched input '2' expecting STARTYEAR
line 18:40 mismatched input '2' expecting STARTYEAR
line 19:43 mismatched input '2' expecting STARTYEAR
line 20:40 mismatched input '3' expecting STARTYEAR

怎么了?

哦,Stackoverflow告诉我添加更多细节。但我认为这里的代码已经足够了,因为这里显示的是csv文件。

C#中的antlr4解析器似乎是正确的,但不起作用

您的许多lexer规则都匹配相同的规则。当[0-9]+匹配时,将只创建PEAK,而从不创建STARTYEARENDYEARLENGTH。lexer不会根据解析器的"需求"创建令牌。

改为这样做:

grammar CSVParser;
table     :   line+ EOF;
line      :   NAME ',' peak ',' startyear ',' endyear ',' length NEWLINE;
peak      :   NUMBER;    
startyear :   NUMBER;
endyear   :   NUMBER;    
length    :   NUMBER;
NEWLINE   :   ''r'? ''n';   
NAME      :   '"' (~'"')* '"';
NUMBER    :   [0-9]+;