Update docs to mention emulative lexer

This commit is contained in:
nikic 2012-02-21 19:02:04 +01:00
parent 10ba9f8dda
commit b0883f2bb8
2 changed files with 15 additions and 25 deletions

View File

@ -26,31 +26,13 @@ programmatic PHP code analysis are incidentially PHP developers, not C developer
What can it parse?
------------------
The parser uses a PHP 5.4 compliant grammar, but lexing is done using the `token_get_all` tokenization
facility provided by PHP itself. This means that you will be able to parse pretty much any PHP code you
want, but there are some limitations to keep in mind:
The parser uses a PHP 5.4 compliant grammar, which is backwards compatible with at least PHP 5.3 and PHP
5.2 (and maybe older).
* The PHP 5.4 grammar is implemented in such a way that it is backwards compatible. So parsing PHP 5.3
and PHP 5.2 is also possible (and maybe older versions). On the other hand this means that the parser
will let some code through, which would be invalid in the newest version (for example call time pass
by reference will *not* throw an error even though PHP 5.4 doesn't allow it anymore). This shouldn't
normally be a problem and if it is strictly required it can be easily implemented in a NodeVisitor.
* Even though the parser supports PHP 5.4 it depends on the internal tokenizer, which only supports
the PHP version it runs on. So you will be able parse PHP 5.4 if you are running PHP 5.4. But you
wouldn't be able to parse PHP 5.4 code (which uses one of the new features) on PHP 5.3. The support
matrix looks roughly like this:
| parsing PHP 5.4 | parsing PHP 5.3 | parsing PHP 5.2
---------------------------------------------------------------------
running PHP 5.4 | yes | yes | yes
running PHP 5.3 | no | yes | yes
running PHP 5.2 | no | no | yes
* The parser inherits all bugs of the `token_get_all` function. There are only two which I
currently know of, namely lexing of `b"$var"` literals and nested HEREDOC strings. The former
bug is circumvented by the `PHPParser_Lexer` wrapper which the parser uses, but the latter remains
(though I seriously doublt it will ever occur in practical use.)
As the parser is based on the tokens returned by `token_get_all` (which is only able to lex the PHP
version it runs on), additionally a wrapper for emulating new tokens from 5.3 and 5.4 is provided. This
allows to parse PHP 5.4 source code running on PHP 5.2, for example. This emulation is very hacky and not
yet perfect, but it should work well on any sane code.
What output does it produce?
----------------------------

View File

@ -45,6 +45,12 @@ try {
The `parse` method will return an array of statement nodes (`$stmts`).
### Emulative lexer
Instead of `PHPParser_Lexer` one can also use `PHPParser_Lexer_Emulative`. This class will emulate tokens
of newer PHP versions and as such allow parsing PHP 5.4 on PHP 5.2, for example. So if you want to parse
PHP code of newer versions than the one you are running, you should use the emulative lexer.
Node tree
---------
@ -288,7 +294,9 @@ foreach (new RecursiveIteratorIterator(
$code = file_get_contents($file);
// parse
$stmts = $parser->parse(new PHPParser_Lexer($code));
// use the emulative lexer here, as we are running PHP 5.2 but want to
// parse PHP 5.3
$stmts = $parser->parse(new PHPParser_Lexer_Emulative($code));
// traverse
$stmts = $traverser->traverse($stmts);