Nearly all special errors are now handled gracefully, i.e. the
parser will be able to continue after encountering them. In some
cases the associated error range has been improved using the new
end attribute stack.
To achieve this the error handling code has been moved out of the
node constructors and into special methods in the parser.
It's likely that an error after -> will trigger another one due to
missing semicolon without shifting a single token. We prevent an
immediate failure in this case by manually setting errorState to 2,
which will suppress the duplicate error message, but allow error
recovery.
Expr\List will now contain ArrayItems instead of plain variables.
I'm reusing ArrayItem, because code handling list() must also handle
arrays, and this allows both to go through the same code path.
This also renames Expr\List->vars to ->items.
TODO: Should Expr\List be dropped in favor of Expr\Array with an
extra flag?
Scalar\String_ and Scalar\Encapsed now have an additional "kind"
attribute, which may be one of:
* String_::KIND_SINGLE_QUOTED
* String_::KIND_DOUBLE_QUOTED
* String_::KIND_NOWDOC
* String_::KIND_HEREDOC
Additionally, if the string kind is one of the latter two, an
attribute "docLabel" is provided, which contains the doc string
label (STR in <<<STR) that was originally used.
The pretty printer will try to take the original kind of the string,
as well as the used doc string label into account.
To distinguish array() and [] syntax. The pretty printer respects
this attribute. The shortArraySyntax pretty printer option acts as
a default in case the attribute is not specified.
A Nop statement will be inserted into statement lists if there are
any trailing comments in the list (which would otherwise not be
associated with any node).
The pretty printer output currently still contains a superfluous
newline.
Magic constant names have been added after the PHP 7 release.
We do not support and likely will not support __halt_compiler here
due to lexer limitations.
As these are shared between Php5 and Php7 parsers they should be
in some common place, otherwise we'd have to always reference either
one or the other.
Adding only a single recovery rule for now.
The API is now:
* throwOnError parser option must be disabled.
* List of Errors is available through $parser->getErrors(). This
method is available either way.
* If no recovery is possible $parser->parse() will return null.
(Obviously only if throwOnError is disabled).
* Don't assign to attribute stack on reduce - why was that there
in the first place?
* Assign attributes to the position in the stack where the first
token of the production is, instead of one position earlier.
* Add a comment to clarify why we also assign attributes on read,
instead of just on shift.
Minor performance improvement for parsing, also allows to access
attributes with higher granulity in the parser, though this is not
currently done.
* #n can now be used to access the stack position of a token. $n
is the same as $this->semStack[#n]. (Post-translate $n will
actually be the stack position.)
* $attributeStack is now $this->startAttributeStack and
$endAttributes is now $this->endAttributes.
* Attributes for a node are now computed inside the individual
reduction methods, instead of being passed as a parameter.
Accessible through the attributes() macro.
This adds an additional "returnType" subnode to Stmt\Function_,
Stmt\ClassMethod and Expr\Closure, as well as the corresponding
support in the name resolver and pretty printer.
And improve the code a tad bit in general.
I left YY2TBLSTATES and YYNLSTATES around, because I don't fully
understand their role in the action double indexing.
The end attributes previously were always assigned from the last read token,
which does not necessarily correspond to the last token in the reduced rule.
In particular this occurs if the parser read a new token and based on that
lookahead decided to reduce a rule. The behavior was only correct if the
newly read token was first shifted and then the rule was reduced.
This is fixed by buffering the endAttributes of the new token in a temporary
variable and only assigning them once the token is shifted.
Directly creating the node isn't necessary anymore, the token only needs
to be parsed. This makes it consistent with the other scalar parsing
methods and removes the need to pass $arguments around.
* nested list()s will now create nested List nodes (instead of just
nested arrays)
* yield $k => $v was parsed with key and value swapped. This is now fixed
* the pretty printer now works with the newly added language constructs
Example: foreach ($coords as list($x, $y)) { ... }
This change slightly breaks backwards compatability, as it changes the
node structure for the previously existing `list(...) = $foo` assignments.
Those no longer have a dedicated `AssignList` node; instead they are
parsed as a normal `Assign` node with a `List` as `var`. Similarly the
use in `foreach` will generate a `List` for `valueVar`.
The new dereferencing syntaxes (new Foo)->bar and (new Foo)['bar'] were
causing a shift/reduce conflict with the '(' expr ')' rule. When
(new Foo) was encountered (without dereference operators following) the
parser thus threw a parse error.
The fix simply adds a special '(' new_expr ')' rule to expr. This does not
remove the shift/reduce conflict itself, but makes it irrelevant.
This fixes issue #20.
Now two arrays are fetched from the lexer: $startAttributes and
$endAttributes. When constructing the attributes for a node, the
$startAttributes from the first token of the node and the $endAttributes
of the last token of the node are merged.
Now the end line is saved in the endLine attribute.
The yacc parser skeleton with all those odd $yy short names is quite
non-obvious. This commits starts to refactor it a bit, to use more
obvious names and logic.
Now the lexer is injected only once when creating the parser. Instead of
$parser = new PHPParser_Parser;
$parser->parse(new PHPParser_Lexer($code));
$parser->parse(new PHPParser_Lexer($code2));
you write:
$parser = new PHPParser_Parser(new PHPParser_Lexer);
$parser->parse($code);
$parser->parse($code2);
The parser didn't account for the additional newline after the content of doc strings, which is left there by the tokenizer for some reason. Additoinally esacape sequences were parsed in nowdoc strings.
Additionally this contains some minor changes to the grammar: Some _list nonterminals were refactored to have the possible single elements in a reparate rule and only assemble those single elements. (This reduces duplication and gives better assignment of line number context.)
(new A)->b(), (new A)->b, (new A)[0]. The feature is not implemented fully compliant (implemented as a `variable`, not `expr_without_variable`: Awaiting input on that on internals@.
Node_Const is shared between Node_Stmt_ClassConst and Node_Stmt_Const. Maybe one could generalize it to a Node_NameToValue to share it with Node_Stmt_Declare too.