The dispatch using $this->{'reduceRule' . $rule}() is very expensive
because it involves
* One allocation when converting $rule to a string
* Another allocation when concatenating the two strings
* Lowercasing during the method call
* Various uncached method checks, e.g. visibility.
Using an array of closures for semantic action dispatch is
significantly more efficient.
We're accessing $this->stackPos a lot, and property accesses are
much more expensive than local variable acesses. Its cheaper to
use a local var and pass it as an argument to semantic actions.
The parameter case is a bit weird, because the subnode is called
"name" here, rather than "var". Nothing we can do about that in
this version though.
The two parser options might be merged. I've kept it separate,
because I think this variable representation should become the
default (or even only representation) in the future, while I'm
less sure about the Identifier thing.
In this mode non-namespaced names that are currently represented
using strings will be represented using Identifier nodes instead.
Identifier nodes have a string $name subnode and coerce to string.
This allows preserving attributes and in particular location
information on identifiers.
Add ErrorHandler interface, as well as ErrorHandler\Throwing
and ErrorHandler\Collecting. The error handler is passed to
Parser::parse(). This supersedes the throwOnError option.
NameResolver now accepts an ErrorHandler in the ctor.
Nearly all special errors are now handled gracefully, i.e. the
parser will be able to continue after encountering them. In some
cases the associated error range has been improved using the new
end attribute stack.
To achieve this the error handling code has been moved out of the
node constructors and into special methods in the parser.
Lexer::startLexing() no longer throws, instead errors can be fetched
using Lexer::getErrors().
Lexer errors now also contain full line and position information.
It's likely that an error after -> will trigger another one due to
missing semicolon without shifting a single token. We prevent an
immediate failure in this case by manually setting errorState to 2,
which will suppress the duplicate error message, but allow error
recovery.
Adding only a single recovery rule for now.
The API is now:
* throwOnError parser option must be disabled.
* List of Errors is available through $parser->getErrors(). This
method is available either way.
* If no recovery is possible $parser->parse() will return null.
(Obviously only if throwOnError is disabled).
* Don't assign to attribute stack on reduce - why was that there
in the first place?
* Assign attributes to the position in the stack where the first
token of the production is, instead of one position earlier.
* Add a comment to clarify why we also assign attributes on read,
instead of just on shift.
Minor performance improvement for parsing, also allows to access
attributes with higher granulity in the parser, though this is not
currently done.
* #n can now be used to access the stack position of a token. $n
is the same as $this->semStack[#n]. (Post-translate $n will
actually be the stack position.)
* $attributeStack is now $this->startAttributeStack and
$endAttributes is now $this->endAttributes.
* Attributes for a node are now computed inside the individual
reduction methods, instead of being passed as a parameter.
Accessible through the attributes() macro.
The lexer can now optionally add startFilePos and endFilePos
attributes, which are offsets in to the lexed code string.
The end offset currently points one past the last character of
the token - this is pending further discussion.
The attributes are not added by default and have to be enabled
using the new 'usedAttributes' lexer option:
$lexer = new Lexer([
'usedAttributes' => [
'comments', 'startLine', 'endLine',
'startFilePos', 'endFilePos'
]
]);
And improve the code a tad bit in general.
I left YY2TBLSTATES and YYNLSTATES around, because I don't fully
understand their role in the action double indexing.