[antlr-interest] Semantic Predicates in a Lexer
Jim Idle
jimi at temporal-wave.com
Fri Mar 20 09:15:48 PDT 2009
Paul Bouché (NSN) wrote:
> Hi,
>
>
>
Firstly, do not forget that you cannot set such a flag from the parser
as the lexer runs first and creates all the tokens.
> Here is a lexer excerpt:
> NUMBER : DIGIT_+;
> SIMPLENAME: {noColonInNames}?=> LETTER_+;
> COLON: {noColonInNames}?=> COLON_;
> NAME: {!noColonInNames}?=> (LETTER_ | COLON_)+;
> fragment DIGIT_: '0'..'9';
> fragment LETTER_: 'a'..'z' | 'A'..'Z';
>
Assuming that you can configure these flags in lexer context are not
expecting them to be respected by the lexer if the parser sets them,
then you should be able to do this:
grammar ttt;
@lexer::members
{
boolean noColonInNames = false;
}
test
: (SIMPLENAME | COLON | NAME)* EOF ;
fragment LETTER_
: ('a'..'z'|'A'..'Z') ('a'..'z'|'A'..'Z')*
;
fragment COLON
: ':'
;
fragment SIMPLENAME
:
;
NAME
: {!noColonInNames}?=> (LETTER_ | COLON)+ { noNamesInColons = true; }
| LETTER_+ { $type = SIMPLENAME; }
| COLON { $type = COLON; }
;
However, I suspect that you will find it much easier to use predicates
in the parser, even if it is only the first one you come across that
should be NAME COLON NAME:
grammar ttt;
@lexer::members
{
boolean noColonInNames = false;
}
test
: names* EOF ;
names
: {!noColonInNames}?=> name { System.out.println("Var is '" +
$name.text + "'"); }
| {noColonInNames}?=> NAME (COLON NAME)*
;
name
: ((NAME | COLON)=>(NAME | COLON))+
;
fragment LETTER_
: ('a'..'z'|'A'..'Z') ('a'..'z'|'A'..'Z')*
;
fragment COLON
: ':'
;
NAME
: LETTER_+
;
In general, try to solve as little as possible in the grammar (just get
it to return a consistent token stream), then solve as little as
possible in the parser, then solve everything else in the tree parser.
This will maximize the chances of producing the most relevant error
messages for your users. Of course for "as little as possible" you
should infer the suffix "... but no less than that" ;-)
Jim
More information about the antlr-interest
mailing list