mirror of
https://github.com/danog/PHP-Parser.git
synced 2024-11-26 20:04:48 +01:00
Small docs touchups and typo fixes
This commit is contained in:
parent
7a3789f1a9
commit
e65fd664d1
@ -1,16 +1,16 @@
|
||||
Introduction
|
||||
============
|
||||
|
||||
This project is a PHP 5.5 (and older) parser **written in PHP itself**.
|
||||
This project is a PHP 5.2 to PHP 5.6 parser **written in PHP itself**.
|
||||
|
||||
What is this for?
|
||||
-----------------
|
||||
|
||||
A parser is useful for [static analysis][0] and manipulation of code and basically any other
|
||||
A parser is useful for [static analysis][0], manipulation of code and basically any other
|
||||
application dealing with code programmatically. A parser constructs an [Abstract Syntax Tree][1]
|
||||
(AST) of the code and thus allows dealing with it in an abstract and robust way.
|
||||
|
||||
There are other ways of dealing with source code. One that PHP supports natively is using the
|
||||
There are other ways of processing source code. One that PHP supports natively is using the
|
||||
token stream generated by [`token_get_all`][2]. The token stream is much more low level than
|
||||
the AST and thus has different applications: It allows to also analyze the exact formatting of
|
||||
a file. On the other hand the token stream is much harder to deal with for more complex analysis.
|
||||
@ -26,13 +26,13 @@ programmatic PHP code analysis are incidentally PHP developers, not C developers
|
||||
What can it parse?
|
||||
------------------
|
||||
|
||||
The parser uses a PHP 5.5 compliant grammar, which is backwards compatible with at least PHP 5.4, PHP 5.3
|
||||
and PHP 5.2 (and maybe older).
|
||||
The parser uses a PHP 5.6 compliant grammar, which is backwards compatible with all PHP version from PHP 5.2
|
||||
upwards (and maybe older).
|
||||
|
||||
As the parser is based on the tokens returned by `token_get_all` (which is only able to lex the PHP
|
||||
version it runs on), additionally a wrapper for emulating new tokens from 5.3, 5.4 and 5.5 is provided. This
|
||||
allows to parse PHP 5.5 source code running on PHP 5.2, for example. This emulation is very hacky and not
|
||||
yet perfect, but it should work well on any sane code.
|
||||
version it runs on), additionally a wrapper for emulating new tokens from 5.3, 5.4, 5.5 and 5.6 is provided.
|
||||
his allows to parse PHP 5.6 source code running on PHP 5.3, for example. This emulation is very hacky and not
|
||||
perfect, but it should work well on any sane code.
|
||||
|
||||
What output does it produce?
|
||||
----------------------------
|
||||
@ -56,7 +56,7 @@ array(
|
||||
)
|
||||
```
|
||||
|
||||
This matches the semantics the program had: An echo statement, which takes two strings as expressions,
|
||||
This matches the structure of the code: An echo statement, which takes two strings as expressions,
|
||||
with the values `Hi` and `World!`.
|
||||
|
||||
You can also see that the AST does not contain any whitespace information (but most comments are saved).
|
||||
|
@ -3,11 +3,6 @@ Installation
|
||||
|
||||
There are multiple ways to include the PHP parser into your project:
|
||||
|
||||
Installing from the Zip- or Tarball
|
||||
-----------------------------------
|
||||
|
||||
Download the latest version from [the download page][2], unpack it and move the files somewhere into your project.
|
||||
|
||||
Installing via Composer
|
||||
-----------------------
|
||||
|
||||
@ -34,6 +29,10 @@ Run the following command to install the parser into the `vendor/PHP-Parser` fol
|
||||
|
||||
git submodule add git://github.com/nikic/PHP-Parser.git vendor/PHP-Parser
|
||||
|
||||
Installing from the Zip- or Tarball
|
||||
-----------------------------------
|
||||
|
||||
Download the latest version from [the download page][2], unpack it and move the files somewhere into your project.
|
||||
|
||||
|
||||
[1]: http://getcomposer.org/composer.phar
|
||||
|
@ -26,31 +26,38 @@ This ensures that there will be no errors when traversing highly nested node tre
|
||||
Parsing
|
||||
-------
|
||||
|
||||
In order to parse some source code you first have to create a `PhpParser\Parser` object (which
|
||||
needs to be passed a `PhpParser\Lexer` instance) and then pass the code (including `<?php` opening
|
||||
tags) to the `parse` method. If a syntax error is encountered `PhpParser\Error` is thrown, so this
|
||||
exception should be `catch`ed.
|
||||
In order to parse some source code you first have to create a `PhpParser\Parser` object, which
|
||||
needs to be passed a `PhpParser\Lexer` instance:
|
||||
|
||||
```php
|
||||
<?php
|
||||
|
||||
$parser = new PhpParser\Parser(new PhpParser\Lexer);
|
||||
// or
|
||||
$parser = new PhpParser\Parser(new PhpParser\Lexer\Emulative);
|
||||
```
|
||||
|
||||
Use of the emulative lexer is required if you want to parse PHP code from newer versions than the one
|
||||
you're running on. For example it will allow you to parse PHP 5.6 code while running on PHP 5.3.
|
||||
|
||||
Subsequently you can pass PHP code (including the opening `<?php` tag) to the `parse` method in order to
|
||||
create a syntax tree. If a syntax error is encountered, an `PhpParser\Error` exception will be thrown:
|
||||
|
||||
```php
|
||||
<?php
|
||||
$code = '<?php // some code';
|
||||
|
||||
$parser = new PhpParser\Parser(new PhpParser\Lexer);
|
||||
$parser = new PhpParser\Parser(new PhpParser\Lexer\Emulative);
|
||||
|
||||
try {
|
||||
$stmts = $parser->parse($code);
|
||||
// $stmts is an array of statement nodes
|
||||
} catch (PhpParser\Error $e) {
|
||||
echo 'Parse Error: ', $e->getMessage();
|
||||
}
|
||||
```
|
||||
|
||||
The `parse` method will return an array of statement nodes (`$stmts`).
|
||||
|
||||
### Emulative lexer
|
||||
|
||||
Instead of `PhpParser\Lexer` one can also use `PhpParser\Lexer\Emulative`. This class will emulate tokens
|
||||
of newer PHP versions and as such allow parsing PHP 5.5 on PHP 5.2, for example. So if you want to parse
|
||||
PHP code of newer versions than the one you are running, you should use the emulative lexer.
|
||||
A parser instance can be reused to parse multiple files.
|
||||
|
||||
Node tree
|
||||
---------
|
||||
@ -104,7 +111,7 @@ with a PHP keyword.
|
||||
|
||||
Every node has a (possibly zero) number of subnodes. You can access subnodes by writing
|
||||
`$node->subNodeName`. The `Stmt\Echo_` node has only one subnode `exprs`. So in order to access it
|
||||
in the above example you would write `$stmts[0]->exprs`. If you wanted to access name of the function
|
||||
in the above example you would write `$stmts[0]->exprs`. If you wanted to access the name of the function
|
||||
call, you would write `$stmts[0]->exprs[1]->name`.
|
||||
|
||||
All nodes also define a `getType()` method that returns the node type. The type is the class name
|
||||
@ -131,7 +138,7 @@ namely `PhpParser\PrettyPrinter\Standard`.
|
||||
<?php
|
||||
$code = "<?php echo 'Hi ', hi\\getTarget();";
|
||||
|
||||
$parser = new PhpParser\Parser(new PhpParser\Lexer);
|
||||
$parser = new PhpParser\Parser(new PhpParser\Lexer);
|
||||
$prettyPrinter = new PhpParser\PrettyPrinter\Standard;
|
||||
|
||||
try {
|
||||
@ -143,10 +150,10 @@ try {
|
||||
->exprs // sub expressions
|
||||
[0] // the first of them (the string node)
|
||||
->value // it's value, i.e. 'Hi '
|
||||
= 'Hallo '; // change to 'Hallo '
|
||||
= 'Hello '; // change to 'Hello '
|
||||
|
||||
// pretty print
|
||||
$code = '<?php ' . $prettyPrinter->prettyPrint($stmts);
|
||||
$code = $prettyPrinter->prettyPrint($stmts);
|
||||
|
||||
echo $code;
|
||||
} catch (PhpParser\Error $e) {
|
||||
@ -156,7 +163,7 @@ try {
|
||||
|
||||
The above code will output:
|
||||
|
||||
<?php echo 'Hallo ', hi\getTarget();
|
||||
<?php echo 'Hello ', hi\getTarget();
|
||||
|
||||
As you can see the source code was first parsed using `PhpParser\Parser->parse()`, then changed and then
|
||||
again converted to code using `PhpParser\PrettyPrinter\Standard->prettyPrint()`.
|
||||
@ -164,8 +171,8 @@ again converted to code using `PhpParser\PrettyPrinter\Standard->prettyPrint()`.
|
||||
The `prettyPrint()` method pretty prints a statements array. It is also possible to pretty print only a
|
||||
single expression using `prettyPrintExpr()`.
|
||||
|
||||
The `prettyPrintFile()` method can be used to print an entire file. This will include the opening `<?php` tag and handle
|
||||
inline HTML as the first/last sentence more gracefully.
|
||||
The `prettyPrintFile()` method can be used to print an entire file. This will include the opening `<?php` tag
|
||||
and handle inline HTML as the first/last statement more gracefully.
|
||||
|
||||
Node traversation
|
||||
-----------------
|
||||
@ -180,9 +187,8 @@ structure of a program using this `PhpParser\NodeTraverser` looks like this:
|
||||
|
||||
```php
|
||||
<?php
|
||||
$code = "<?php // some code";
|
||||
|
||||
$parser = new PhpParser\Parser(new PhpParser\Lexer);
|
||||
$parser = new PhpParser\Parser(new PhpParser\Lexer\Emulative);
|
||||
$traverser = new PhpParser\NodeTraverser;
|
||||
$prettyPrinter = new PhpParser\PrettyPrinter\Standard;
|
||||
|
||||
@ -190,6 +196,8 @@ $prettyPrinter = new PhpParser\PrettyPrinter\Standard;
|
||||
$traverser->addVisitor(new MyNodeVisitor);
|
||||
|
||||
try {
|
||||
$code = file_get_contents($fileName);
|
||||
|
||||
// parse
|
||||
$stmts = $parser->parse($code);
|
||||
|
||||
@ -197,7 +205,7 @@ try {
|
||||
$stmts = $traverser->traverse($stmts);
|
||||
|
||||
// pretty print
|
||||
$code = '<?php ' . $prettyPrinter->prettyPrint($stmts);
|
||||
$code = $prettyPrinter->prettyPrintFile($stmts);
|
||||
|
||||
echo $code;
|
||||
} catch (PhpParser\Error $e) {
|
||||
@ -205,14 +213,16 @@ try {
|
||||
}
|
||||
```
|
||||
|
||||
A same node visitor for this code might look like this:
|
||||
The corresponding node visitor might look like this:
|
||||
|
||||
```php
|
||||
<?php
|
||||
use PhpParser\Node;
|
||||
|
||||
class MyNodeVisitor extends PhpParser\NodeVisitorAbstract
|
||||
{
|
||||
public function leaveNode(PhpParser\Node $node) {
|
||||
if ($node instanceof PhpParser\Node\Scalar\String) {
|
||||
public function leaveNode(Node $node) {
|
||||
if ($node instanceof Node\Scalar\String) {
|
||||
$node->value = 'foo';
|
||||
}
|
||||
}
|
||||
@ -221,7 +231,7 @@ class MyNodeVisitor extends PhpParser\NodeVisitorAbstract
|
||||
|
||||
The above node visitor would change all string literals in the program to `'foo'`.
|
||||
|
||||
All visitors must implement the `PhpParser\NodeVisitor` interface, which defined the following four
|
||||
All visitors must implement the `PhpParser\NodeVisitor` interface, which defines the following four
|
||||
methods:
|
||||
|
||||
public function beforeTraverse(array $nodes);
|
||||
@ -240,11 +250,12 @@ The `enterNode` and `leaveNode` methods are called on every node, the former whe
|
||||
i.e. before its subnodes are traversed, the latter when it is left.
|
||||
|
||||
All four methods can either return the changed node or not return at all (i.e. `null`) in which
|
||||
case the current node is not changed. The `leaveNode` method can furthermore return two special
|
||||
values: If `false` is returned the current node will be removed from the parent array. If an `array`
|
||||
is returned the current node will be merged into the parent array at the offset of the current node.
|
||||
I.e. if in `array(A, B, C)` the node `B` should be replaced with `array(X, Y, Z)` the result will be
|
||||
`array(A, X, Y, Z, C)`.
|
||||
case the current node is not changed. The `leaveNode` method can additionally return two special
|
||||
values:
|
||||
|
||||
If `false` is returned the current node will be removed from the parent array. If an array is returned
|
||||
it will be merged into the parent array at the offset of the current node. I.e. if in `array(A, B, C)`
|
||||
the node `B` should be replaced with `array(X, Y, Z)` the result will be `array(A, X, Y, Z, C)`.
|
||||
|
||||
Instead of manually implementing the `NodeVisitor` interface you can also extend the `NodeVisitorAbstract`
|
||||
class, which will define empty default implementations for all the above methods.
|
||||
@ -283,10 +294,9 @@ We start off with the following base code:
|
||||
|
||||
```php
|
||||
<?php
|
||||
const IN_DIR = '/some/path';
|
||||
const OUT_DIR = '/some/other/path';
|
||||
$inDir = '/some/path';
|
||||
$outDir = '/some/other/path';
|
||||
|
||||
// use the emulative lexer here, as we are running PHP 5.2 but want to parse PHP 5.3
|
||||
$parser = new PhpParser\Parser(new PhpParser\Lexer\Emulative);
|
||||
$traverser = new PhpParser\NodeTraverser;
|
||||
$prettyPrinter = new PhpParser\PrettyPrinter\Standard;
|
||||
@ -295,7 +305,7 @@ $traverser->addVisitor(new PhpParser\NodeVisitor\NameResolver); // we will need
|
||||
$traverser->addVisitor(new NodeVisitor\NamespaceConverter); // our own node visitor
|
||||
|
||||
// iterate over all .php files in the directory
|
||||
$files = new RecursiveIteratorIterator(new RecursiveDirectoryIterator(IN_DIR));
|
||||
$files = new RecursiveIteratorIterator(new RecursiveDirectoryIterator($inDir));
|
||||
$files = new RegexIterator($files, '/\.php$/');
|
||||
|
||||
foreach ($files as $file) {
|
||||
@ -310,11 +320,11 @@ foreach ($files as $file) {
|
||||
$stmts = $traverser->traverse($stmts);
|
||||
|
||||
// pretty print
|
||||
$code = '<?php ' . $prettyPrinter->prettyPrint($stmts);
|
||||
$code = $prettyPrinter->prettyPrintFile($stmts);
|
||||
|
||||
// write the converted file to the target directory
|
||||
file_put_contents(
|
||||
substr_replace($file->getPathname(), OUT_DIR, 0, strlen(IN_DIR)),
|
||||
substr_replace($file->getPathname(), $outDir, 0, strlen($inDir)),
|
||||
$code
|
||||
);
|
||||
} catch (PhpParser\Error $e) {
|
||||
@ -323,7 +333,7 @@ foreach ($files as $file) {
|
||||
}
|
||||
```
|
||||
|
||||
Now lets start with the main code, the `NodeVisitor_NamespaceConverter`. One thing it needs to do
|
||||
Now lets start with the main code, the `NodeVisitor\NamespaceConverter`. One thing it needs to do
|
||||
is convert `A\\B` style names to `A_B` style ones.
|
||||
|
||||
```php
|
||||
@ -340,14 +350,14 @@ class NodeVisitor_NamespaceConverter extends PhpParser\NodeVisitorAbstract
|
||||
```
|
||||
|
||||
The above code profits from the fact that the `NameResolver` already resolved all names as far as
|
||||
possible, so we don't need to do that. All the need to create a string with the name parts separated
|
||||
possible, so we don't need to do that. We only need to create a string with the name parts separated
|
||||
by underscores instead of backslashes. This is what `$node->toString('_')` does. (If you want to
|
||||
create a name with backslashes either write `$node->toString()` or `(string) $node`.) Then we create
|
||||
a new name from the string and return it. Returning a new node replaces the old node.
|
||||
|
||||
Another thing we need to do is change the class/function/const declarations. Currently they contain
|
||||
only the shortname (i.e. the last part of the name), but they need to contain the complete class
|
||||
name:
|
||||
only the shortname (i.e. the last part of the name), but they need to contain the complete name inclduing
|
||||
the namespace prefix:
|
||||
|
||||
```php
|
||||
<?php
|
||||
|
@ -1,7 +1,7 @@
|
||||
Other node tree representations
|
||||
===============================
|
||||
|
||||
It is possible to convert the AST in several textual representations, which serve different uses.
|
||||
It is possible to convert the AST into several textual representations, which serve different uses.
|
||||
|
||||
Simple serialization
|
||||
--------------------
|
||||
@ -13,33 +13,34 @@ but PHP, but it is compact and generates fast. The main application thus is in c
|
||||
Human readable dumping
|
||||
----------------------
|
||||
|
||||
Furthermore it is possible to dump nodes into a human readable form using the `dump` method of
|
||||
Furthermore it is possible to dump nodes into a human readable format using the `dump` method of
|
||||
`PhpParser\NodeDumper`. This can be used for debugging.
|
||||
|
||||
```php
|
||||
<?php
|
||||
$code = <<<'CODE'
|
||||
<?php
|
||||
function printLine($msg) {
|
||||
echo $msg, "\n";
|
||||
}
|
||||
|
||||
printLine('Hallo World!!!');
|
||||
function printLine($msg) {
|
||||
echo $msg, "\n";
|
||||
}
|
||||
|
||||
printLine('Hello World!!!');
|
||||
CODE;
|
||||
|
||||
$parser = new PhpParser\Parser(new PhpParser\Lexer);
|
||||
$parser = new PhpParser\Parser(new PhpParser\Lexer);
|
||||
$nodeDumper = new PhpParser\NodeDumper;
|
||||
|
||||
try {
|
||||
$stmts = $parser->parse($code);
|
||||
|
||||
echo '<pre>' . htmlspecialchars($nodeDumper->dump($stmts)) . '</pre>';
|
||||
echo $nodeDumper->dump($stmts), "\n";
|
||||
} catch (PhpParser\Error $e) {
|
||||
echo 'Parse Error: ', $e->getMessage();
|
||||
}
|
||||
```
|
||||
|
||||
The above output will have an output looking roughly like this:
|
||||
The above script will have an output looking roughly like this:
|
||||
|
||||
```
|
||||
array(
|
||||
@ -77,7 +78,7 @@ array(
|
||||
args: array(
|
||||
0: Arg(
|
||||
value: Scalar_String(
|
||||
value: Hallo World!!!
|
||||
value: Hello World!!!
|
||||
)
|
||||
byRef: false
|
||||
)
|
||||
@ -97,20 +98,21 @@ interfacing with other languages and applications or for doing transformation us
|
||||
<?php
|
||||
$code = <<<'CODE'
|
||||
<?php
|
||||
function printLine($msg) {
|
||||
echo $msg, "\n";
|
||||
}
|
||||
|
||||
printLine('Hallo World!!!');
|
||||
function printLine($msg) {
|
||||
echo $msg, "\n";
|
||||
}
|
||||
|
||||
printLine('Hello World!!!');
|
||||
CODE;
|
||||
|
||||
$parser = new PhpParser\Parser(new PhpParser\Lexer);
|
||||
$parser = new PhpParser\Parser(new PhpParser\Lexer);
|
||||
$serializer = new PhpParser\Serializer\XML;
|
||||
|
||||
try {
|
||||
$stmts = $parser->parse($code);
|
||||
|
||||
echo '<pre>' . htmlspecialchars($serializer->serialize($stmts)) . '</pre>';
|
||||
echo $serializer->serialize($stmts);
|
||||
} catch (PhpParser\Error $e) {
|
||||
echo 'Parse Error: ', $e->getMessage();
|
||||
}
|
||||
@ -185,7 +187,7 @@ Produces:
|
||||
<subNode:value>
|
||||
<node:Scalar_String line="6">
|
||||
<subNode:value>
|
||||
<scalar:string>Hallo World!!!</scalar:string>
|
||||
<scalar:string>Hello World!!!</scalar:string>
|
||||
</subNode:value>
|
||||
</node:Scalar_String>
|
||||
</subNode:value>
|
||||
|
@ -42,7 +42,7 @@ getNextToken
|
||||
------------
|
||||
|
||||
`getNextToken` returns the ID of the next token and sets some additional information in the three variables which it
|
||||
accepts by-ref. If no more tokens are available it has to return `0`, which is the ID of the `EOF` token.
|
||||
accepts by-ref. If no more tokens are available it must return `0`, which is the ID of the `EOF` token.
|
||||
|
||||
The first by-ref variable `$value` should contain the textual content of the token. It is what will be available as `$1`
|
||||
etc in the parser.
|
||||
|
Loading…
Reference in New Issue
Block a user