Empty productions are supposed to be assigned the start attributes
of the lookahead token. Currently, this happens by assigning above
the current stack position when the token it read.
This fails in a situation where we first reduce an empty production
higher up in the stack, and then again reduce an empty production
lower in the stack, without consuming the lookahead token in the
meantime.
Fix this by moving the assignment into the reduction phase. We
also need to do this for error productions, which are effectively
empty.
Previously zero-length nop nodes used the lookahead start attributes
and current end attributes. This choice ends up being somewhat weird,
because the end attributes will be the at the last non-whitespace,
non-comment token, which might be quite far back. More problematically,
we may not have encountered any non-discarded token if we're at the
start of the file, in which case we will have no end attributes to
assign.
Change things to use a canonical "zero-length" node representation,
where the end position (token & file) will be exactly one before the
start position.
Fixes#589.
If indentation is invalid, we strip on a best-effort basis.
The error position information is not great, but I don't want to
introduce sub-token error positioning at this point in time.
Move doc string parsing logic from rebuildParsers.php and
String_::parseDocString() into ParserAbstract. This stuff is
going to get complicated now.
For now only implement the validation of the indentation on the
end label.
The parser will now always generate Identifier nodes (for
non-namespaced identifiers). This obsoletes the useIdentifierNodes
parser option.
Node constructors still accepts strings and will implicitly create
an Identifier wrapper. Identifier implement __toString(), so that
outside of strict-mode many things continue to work without changes.
The dispatch using $this->{'reduceRule' . $rule}() is very expensive
because it involves
* One allocation when converting $rule to a string
* Another allocation when concatenating the two strings
* Lowercasing during the method call
* Various uncached method checks, e.g. visibility.
Using an array of closures for semantic action dispatch is
significantly more efficient.
We're accessing $this->stackPos a lot, and property accesses are
much more expensive than local variable acesses. Its cheaper to
use a local var and pass it as an argument to semantic actions.
The parameter case is a bit weird, because the subnode is called
"name" here, rather than "var". Nothing we can do about that in
this version though.
The two parser options might be merged. I've kept it separate,
because I think this variable representation should become the
default (or even only representation) in the future, while I'm
less sure about the Identifier thing.
In this mode non-namespaced names that are currently represented
using strings will be represented using Identifier nodes instead.
Identifier nodes have a string $name subnode and coerce to string.
This allows preserving attributes and in particular location
information on identifiers.