gnupic: Re: special numbers

Previous by date:	10 Sep 2004 19:33:23 +0100 Re: special numbers, Scott Dattalo
Next by date:	10 Sep 2004 19:33:23 +0100 Re: special numbers, Craig Franklin
Previous in thread:	10 Sep 2004 19:33:23 +0100 Re: special numbers, Scott Dattalo
Next in thread:	10 Sep 2004 19:33:23 +0100 Re: special numbers, Craig Franklin

Subject: Re: special numbers
From: pico ####@####.####
Date: 10 Sep 2004 19:33:23 +0100
Message-Id: <4141F374.9070908@sourceforge.net>

> 
> In general, you want to perform lexical analysis in the lexer and parsing 
> in the parser. Anytime you lex in the parser or parse in the lexer then 
> this means either the grammer is ambiguous in some way (and consequently 
> difficult for an LALR parser) or the application programmer is having 
> trouble disambiguating the grammer. I'm by no means an expert in bison and 
> flex, but I've struggled with it enough to know that when I try to parse 
> in the lexer that it's a good hint that I'm probably doing something 
> wrong! (Lexing in the parser is a little more forgiving...)
I know, the 16cr84 syntax is only for compatibility reasons.
The correct way is to specify p16c84 or pic16c84. But because the 
previous ugly parsing in the lexer without context have caused some 
sideffects parsing numbers as identifiers in certain situation, i simply 
have tried to correct it. I don't introduce a new parsing.

> 
> The suggestion I made to Craig is that the lexer only needs to convert the 
> input character stream into a token stream. Depending on the token stream, 
> the parser may decide to tell the lexer to switch the way tokens are 
> converted. To the case in point, if the parser knows that a processor 
> identifer is to be expected next, it will then instruct the lexer to look 
> for a processor identifier. 
The ambiguos thing is, that the actual gpasm architecture don't utilize 
the two-tiered lex/yacc but a tree-tiered lex/yacc/lookup-table(c) parsing.
It's possible to decide, that some command are parsed at stage 2 (yacc).
My feeling is, that this can bring more sideffects. Specially if a more
cleanup of code is made.
As example the orginal MPASM uses the cpp as first pass.
This ensures, that you can use #defines for changing include files.
The current implementation don't differ from #include, #define
and include / set. If we mix these stages, the possibility is high, that 
we can have some trouble later, if something must be parsed in the 
correct/different way.

In your particular example, the 'set'
> directive expansion will be ignored while the lexer is in the 'processor 
> identifier' mode. Now, there are only two instances where processor 
> identifiers are expected. One is with the PROCESSOR directive and the 
> other is with the list directive. Each of these should have a uniquely 
> distinguishable parser rule that recognizes when the situation is about to 
> happen. When the rule triggers, the parser can tell the lexer to switch 
> states. When the processor identifier has been fully identified then 
> either the lexer or parser can switch back to the previous state.
You suggest, that either analyse the identifier and make a tiny C state 
machine on top of the bison parser, parse the directives as yacc grammar
and patch it later to be identifiers, or parse it directly in yacc.
It seems to me, that all this tree possibilitys are more cribbled code 
as those proposed. The actual code work only, if you use 0x100000 or 
D'100000', or 1000*100, but not 1000000. The radix base don't matter.

It's possible that i'm wrong from the internal parsing in gpasm.
I had some parsing errors resulting in the wrong radix or that
generate a error like this: illegal character 'c' in numerc value.
I have added somethings like that in order to resolve the error.

[0-9][a-z0-9]+   { char *p; yylval=strtol(yytext,&p,radix); 
yyless(yytext-p); }

> 
> Is that explaination a little more clear?
> 
I understand full you'r point of view.
I don't know, how i can make it modifying the yacc parser without 
introducing more problems.
As example, on bison this example
   processor ; missing identifier
can result to a wrong parsing because the newling information is missing 
in bison. It's not a problem in this case, but a non programmer user can
try to fix the problem one or more lines below the real typing error.
C programmer are familiar with this type of problem , not asm programmer.
  Annother thing is, that bision caches one yylex, if a rule like
    | IDENTIFIER { /* the next yylex is always issued and cached before
                      executing this code */ }
is used.
On a intermediary rule :
    | PROCESSOR { *lexstate=2; } processor_option
the yylex is not cached.

In my eyes, fixing this bug or more esplitly speaking this compatibility
parsing in bison requires a considerably rework of the parsing 
algorithm. It's possible, that Craig has this type of solution.

The patch now works, it was my initial error to don't consider the list 
p=xxx .
As note, the lexer makes a special case, if it's in a list command.
This is to ignore the current lexer radix.
Cleaning up all this means a non significant rework of the lexer/parser.


> Scott
>

Previous by date:	10 Sep 2004 19:33:23 +0100 Re: special numbers, Scott Dattalo
Next by date:	10 Sep 2004 19:33:23 +0100 Re: special numbers, Craig Franklin
Previous in thread:	10 Sep 2004 19:33:23 +0100 Re: special numbers, Scott Dattalo
Next in thread:	10 Sep 2004 19:33:23 +0100 Re: special numbers, Craig Franklin