[antlr-interest] Semantic Predicates in a Lexer

Fri Mar 20 09:15:48 PDT 2009

Paul Bouché (NSN) wrote:
> Hi,
>
>
>   
Firstly, do not forget that you cannot set such a flag from the parser 
as the lexer runs first and creates all the tokens.
> Here is a lexer excerpt:
> NUMBER : DIGIT_+;
> SIMPLENAME: {noColonInNames}?=> LETTER_+;
> COLON: {noColonInNames}?=> COLON_;
> NAME: {!noColonInNames}?=> (LETTER_ | COLON_)+;
> fragment DIGIT_: '0'..'9';
> fragment LETTER_: 'a'..'z' | 'A'..'Z';
>   
Assuming that you can configure these flags in lexer context are not 
expecting them to be respected by the lexer if the parser sets them, 
then you should be able to do this:

grammar ttt;

@lexer::members
{
    boolean noColonInNames = false;
}

test
    : (SIMPLENAME | COLON | NAME)* EOF ;

fragment LETTER_
    :    ('a'..'z'|'A'..'Z') ('a'..'z'|'A'..'Z')*
    ;

fragment COLON
    :     ':'
    ;

fragment SIMPLENAME
    :   
    ;

NAME
    : {!noColonInNames}?=> (LETTER_ | COLON)+ { noNamesInColons = true; }
    |  LETTER_+ { $type = SIMPLENAME; }
    | COLON { $type = COLON; }
    ;

However, I suspect that you will find it much easier to use predicates 
in the parser, even if it is only the first one you come across that 
should be NAME COLON NAME:

grammar ttt;

@lexer::members
{
    boolean noColonInNames = false;
}

test
    : names* EOF ;

names
    : {!noColonInNames}?=> name { System.out.println("Var is '" + 
$name.text + "'"); }
    | {noColonInNames}?=> NAME (COLON NAME)*
    ;

name
    : ((NAME | COLON)=>(NAME | COLON))+
    ;

fragment LETTER_
    :    ('a'..'z'|'A'..'Z') ('a'..'z'|'A'..'Z')*
    ;

fragment COLON
    :     ':'
    ;

NAME
    : LETTER_+
    ;

In general, try to solve as little as possible in the grammar (just get 
it to return a consistent token stream), then solve as little as 
possible in the parser, then solve everything else in the tree parser. 
This will maximize the chances of producing the most relevant error 
messages for your users. Of course for "as little as possible" you 
should infer the  suffix "... but no less than that" ;-)

Jim