With the introduction of intersection types, PHP now lexes the
token '&' either as T_AMPERSAND_(NOT_)FOLLOWED_BY_VAR_OR_VARARG.
This completely breaks parsing of any code containing '&'.
Fix this by canonicalizing to the new token format (unconditionally,
independent of emulation) and adjusting the parser to use the two
new tokens.
This doesn't add actual support for intersection types yet.
Empty productions are supposed to be assigned the start attributes
of the lookahead token. Currently, this happens by assigning above
the current stack position when the token it read.
This fails in a situation where we first reduce an empty production
higher up in the stack, and then again reduce an empty production
lower in the stack, without consuming the lookahead token in the
meantime.
Fix this by moving the assignment into the reduction phase. We
also need to do this for error productions, which are effectively
empty.
Adds support for PHP 8 attributes, represented using `AttrGroup` nodes
containing `Attribute` nodes. The `attrGroup` subnode is added to all
nodes that can have attributes.
This is still missing FPPP support.
Co-authored-by: Nikita Popov <nikita.ppv@gmail.com>
The token stream should cover all characters in the original code,
insert a dummy token for missing illegal characters. We should
really be doing this in token_get_all() as well.
The UseUse::$alias node can now be null if an alias is not
explicitly given. As such "use Foo\Bar" and "use Foo\Bar as Bar"
are now represented differently.
The UseUse->getAlias() method replicates the previous semantics,
by returning "Bar" in both cases.
The parser will now always generate Identifier nodes (for
non-namespaced identifiers). This obsoletes the useIdentifierNodes
parser option.
Node constructors still accepts strings and will implicitly create
an Identifier wrapper. Identifier implement __toString(), so that
outside of strict-mode many things continue to work without changes.
Lexer::startLexing() no longer throws, instead errors can be fetched
using Lexer::getErrors().
Lexer errors now also contain full line and position information.
It's likely that an error after -> will trigger another one due to
missing semicolon without shifting a single token. We prevent an
immediate failure in this case by manually setting errorState to 2,
which will suppress the duplicate error message, but allow error
recovery.
A Nop statement will be inserted into statement lists if there are
any trailing comments in the list (which would otherwise not be
associated with any node).
The pretty printer output currently still contains a superfluous
newline.