More docs

This commit is contained in:
nikic 2011-11-12 19:28:53 +01:00
parent 3b02facf0c
commit 6289ccfa78
4 changed files with 390 additions and 348 deletions

201
README.md
View File

@ -1,206 +1,13 @@
PHP Parser
==========
This is a PHP parser written in PHP. It's purpose is to simplify static code analysis and
This is a PHP 5.4 (and older) parser written in PHP. It's purpose is to simplify static code analysis and
manipulation.
Documentation can be found in the [`doc/`][1] directory.
***Note: This project is experimental. There are no known bugs in the parser itself, but the API is
subject to change.***
Components
==========
This package currently bundles several components:
* The `Parser` itself
* A `NodeDumper` to dump the nodes to a human readable string representation
* A `NodeTraverser` to traverse and modify the node tree
* A `PrettyPrinter` to translate the node tree back to PHP
Autoloader
----------
In order to automatically include required files `PHPParser_Autoloader` can be used:
require_once 'path/to/PHP-Parser/lib/PHPParser/Autoloader.php';
PHPParser_Autoloader::register();
Parser and Parser_Debug
-----------------------
Parsing is performed using `PHPParser_Parser->parse()`. This method accepts a `PHPParser_Lexer`
as the only parameter and returns an array of statement nodes. If an error occurs it throws a
PHPParser_Error.
$code = '<?php // some code';
try {
$parser = new PHPParser_Parser;
$stmts = $parser->parse(new PHPParser_Lexer($code));
} catch (PHPParser_Error $e) {
echo 'Parse Error: ', $e->getMessage();
}
The `PHPParser_Parser_Debug` class also parses PHP code, but outputs a debug trace while doing so.
Node Tree
---------
The output of the parser is an array of statement nodes. All nodes implement the `PHPParser_Node`
interface (and extend `PHPParser_NodeAbstract`). Furthermore nodes are divided into three categories:
* `PHPParser_Node_Stmt`: A statement
* `PHPParser_Node_Expr`: An expression
* `PHPParser_Node_Scalar`: A scalar (which is a string, a number, aso.)
`PHPParser_Node_Scalar` inherits from `PHPParser_Node_Expr`.
Each node may have subnodes. For example `PHPParser_Node_Expr_Plus` has two subnodes, namely `left`
and `right`, which represent the left hand side and right hand side expressions of the plus operation.
Subnodes are accessed as normal properties:
$node->left
The subnodes which a certain node can have are documented as `@property` doccomments in the
respective files.
Additionally all nodes have two methods, `getLine()` and `getDocComment()`.
`getLine()` returns the line a node started in.
`getDocComment()` returns the doccomment before the node or `null` if there was none.
NodeDumper
----------
Nodes can be dumped into a string representation using the `PHPParser_NodeDumper->dump()` method:
$code = <<<'CODE'
<?php
function printLine($msg) {
echo $msg, "\n";
}
printLine('Hallo World!!!');
CODE;
try {
$parser = new PHPParser_Parser;
$stmts = $parser->parse(new PHPParser_Lexer($code));
$nodeDumper = new PHPParser_NodeDumper;
echo '<pre>' . htmlspecialchars($nodeDumper->dump($stmts)) . '</pre>';
} catch (PHPParser_Error $e) {
echo 'Parse Error: ', $e->getMessage();
}
This script will have an output similar to the following:
array(
0: Stmt_Func(
byRef: false
name: printLine
params: array(
0: Stmt_FuncParam(
type: null
name: msg
byRef: false
default: null
)
)
stmts: array(
0: Stmt_Echo(
exprs: array(
0: Variable(
name: msg
)
1: Scalar_String(
value:
)
)
)
)
)
1: Expr_FuncCall(
func: Name(
parts: array(
0: printLine
)
)
args: array(
0: Arg(
value: Scalar_String(
value: Hallo World!!!
)
byRef: false
)
)
)
)
NodeTraverser
-------------
The node traverser allows traversing the node tree using a visitor class. A visitor class must
implement the `NodeVisitor` interface, which defines the following four methods:
public function beforeTraverse(array $nodes);
public function enterNode(PHPParser_Node $node);
public function leaveNode(PHPParser_Node $node);
public function afterTraverse(array $nodes);
The `beforeTraverse` method is called once before the traversal begins and is passed the nodes the
traverser was called with. This method can be used for resetting values before traversation or
preparing the tree for traversal.
The `afterTraverse` method is similar to the `beforeTraverse` method, with the only difference that
it is called once after the traversal.
The `enterNode` and `leaveNode` methods are called on every node, the former when it is entered,
i.e. before its subnodes are traversed, the latter when it is left.
All four methods can either return the changed node or not return at all (or return `null`) in which
case the current node is not changed. The `leaveNode` method can furthermore return two special
values: If `false` is returned the current node will be removed from the parent array. If an `array`
is returned the current node will be merged into the parent array at the offset of the current node.
I.e. if in `array(A, B, C)` the node `B` should be replaced with `array(X, Y, Z)` the result will be
`array(A, X, Y, Z, C)`.
The above described visitors are registered in the `NodeTraverser` class:
$visitor = new MyVisitor;
$traverser = new PHPParser_NodeTraverser;
$traverser->addVisitor($visitor);
$stmts = $parser->parse($lexer);
$stmts = $traverser->traverse($stmts);
With `MyVisitor` being something like that:
class MyVisitor extends PHPParser_NodeVisitorAbstract
{
public function enterNode(PHPParser_Node $node) {
// ...
}
}
As you can see above you don't need to define all four methods if you extend
`PHPParser_NodeVisitorAbstract` instead of directly implementing the interface.
PrettyPrinter
-------------
The pretty printer compiles nodes back to PHP code. "Pretty printing" here is just the formal
name of the process and does not mean that the output is in any way pretty.
$prettyPrinter = new PHPParser_PrettyPrinter_Zend;
echo '<pre>' . htmlspecialchars($prettyPrinter->prettyPrint($stmts)) . '</pre>';
For the code mentioned in the above section this should create the output:
function printLine($msg)
{
echo $msg, "\n";
}
printLine('Hallo World!!!');
You can also pretty print only a single expression using the `prettyPrintExpr()` method.
[1]: https://github.com/nikic/PHP-Parser/tree/master/doc

View File

@ -59,18 +59,20 @@ The parser produces an [Abstract Syntax Tree][1] (AST) also known as a node tree
can best be seen in an example. The program `<?php echo 'Hi', 'World';` will give you a node tree
roughly looking like this:
array(
0: Stmt_Echo(
exprs: array(
0: Scalar_String(
value: Hi
)
1: Scalar_String(
value: World
)
```
array(
0: Stmt_Echo(
exprs: array(
0: Scalar_String(
value: Hi
)
1: Scalar_String(
value: World
)
)
)
)
```
This matches the semantics the program had: An echo statement, which takes two strings as expressions,
with the values `Hi` and `World!`.

View File

@ -9,12 +9,18 @@ Bootstrapping
The library needs to register a class autoloader; you can do this either by including the
`bootstrap.php` file:
require 'path/to/PHP-Parser/lib/bootstrap.php';
```php
<?php
require 'path/to/PHP-Parser/lib/bootstrap.php';
```
Or by manually registering the loader:
require 'path/to/PHP-Parser/lib/PHPParser/Autoloader.php';
PHPParser_Autoloader::register();
```php
<?php
require 'path/to/PHP-Parser/lib/PHPParser/Autoloader.php';
PHPParser_Autoloader::register();
```
Parsing
-------
@ -24,15 +30,18 @@ expects a `PHPParser_Lexer` instance which itself again expects a PHP source cod
`<?php` opening tags). If a syntax error is encountered `PHPParser_Error` is thrown, so
this exception should be `catch`ed.
$code = '<?php // some code';
```php
<?php
$code = '<?php // some code';
$parser = new PHPParser_Parser;
$parser = new PHPParser_Parser;
try {
$stmts = $parser->parse(new PHPParser_Lexer($code));
} catch (PHPParser_Error $e) {
echo 'Parse Error: ', $e->getMessage();
}
try {
$stmts = $parser->parse(new PHPParser_Lexer($code));
} catch (PHPParser_Error $e) {
echo 'Parse Error: ', $e->getMessage();
}
```
The `parse` method will return an array of statement nodes (`$stmts`).
@ -42,25 +51,27 @@ Node tree
If you use the above code with `$code = "<?php echo 'Hi ', hi\\getTarget();"` the parser will
generate a node tree looking like this:
array(
0: Stmt_Echo(
exprs: array(
0: Scalar_String(
value: Hi
```
array(
0: Stmt_Echo(
exprs: array(
0: Scalar_String(
value: Hi
)
1: Expr_FuncCall(
name: Name(
parts: array(
0: hi
1: getTarget
)
)
1: Expr_FuncCall(
name: Name(
parts: array(
0: hi
1: getTarget
)
)
args: array(
)
args: array(
)
)
)
)
)
```
Thus `$stmts` will contain an array with only one node, with this node being an instance of
`PHPParser_Node_Stmt_Echo`.
@ -99,29 +110,32 @@ information the formatting is done using a specified scheme. Currently there is
namely `PHPParser_PrettyPrinter_Zend` (the name "Zend" might be misleading. It does not strictly adhere
to the Zend Coding Standard.)
$code = "<?php echo 'Hi ', hi\\getTarget();";
```php
<?php
$code = "<?php echo 'Hi ', hi\\getTarget();";
$parser = new PHPParser_Parser;
$prettyPrinter = new PHPParser_PrettyPrinter_Zend;
$parser = new PHPParser_Parser;
$prettyPrinter = new PHPParser_PrettyPrinter_Zend;
try {
// parse
$stmts = $parser->parse(new PHPParser_Lexer($code));
try {
// parse
$stmts = $parser->parse(new PHPParser_Lexer($code));
// change
$stmts[0] // the echo statement
->exprs // sub expressions
[0] // the first of them (the string node)
->value // it's value, i.e. 'Hi '
= 'Hallo '; // change to 'Hallo '
// change
$stmts[0] // the echo statement
->exprs // sub expressions
[0] // the first of them (the string node)
->value // it's value, i.e. 'Hi '
= 'Hallo '; // change to 'Hallo '
// pretty print
$code = '<?php ' . $prettyPrinter->prettyPrint($stmts);
// pretty print
$code = '<?php ' . $prettyPrinter->prettyPrint($stmts);
echo $code;
} catch (PHPParser_Error $e) {
echo 'Parse Error: ', $e->getMessage();
}
echo $code;
} catch (PHPParser_Error $e) {
echo 'Parse Error: ', $e->getMessage();
}
```
The above code will output:
@ -144,40 +158,46 @@ going to look like.
For this purpose the parser provides a component for traversing and visiting the node tree. The basic
structure of a program using this `PHPParser_NodeTraverser` looks like this:
$code = "<?php // some code";
```php
<?php
$code = "<?php // some code";
$parser = new PHPParser_Parser;
$traverser = new PHPParser_NodeTraverser;
$prettyPrinter = new PHPParser_PrettyPrinter_Zend;
$parser = new PHPParser_Parser;
$traverser = new PHPParser_NodeTraverser;
$prettyPrinter = new PHPParser_PrettyPrinter_Zend;
// add your visitor
$traverser->addVisitor(new MyNodeVisitor);
// add your visitor
$traverser->addVisitor(new MyNodeVisitor);
try {
// parse
$stmts = $parser->parse(new PHPParser_Lexer($code));
try {
// parse
$stmts = $parser->parse(new PHPParser_Lexer($code));
// traverse
$stmts = $traverser->traverse($stmts);
// traverse
$stmts = $traverser->traverse($stmts);
// pretty print
$code = '<?php ' . $prettyPrinter->prettyPrint($stmts);
// pretty print
$code = '<?php ' . $prettyPrinter->prettyPrint($stmts);
echo $code;
} catch (PHPParser_Error $e) {
echo 'Parse Error: ', $e->getMessage();
}
echo $code;
} catch (PHPParser_Error $e) {
echo 'Parse Error: ', $e->getMessage();
}
```
A same node visitor for this code might look like this:
class MyNodeVisitor extends PHPParser_NodeVisitorAbstract
{
public function leaveNode(PHPParser_Node $node) {
if ($node instanceof PHPParser_Node_Scalar_String) {
$node->value = 'foo';
}
```php
<?php
class MyNodeVisitor extends PHPParser_NodeVisitorAbstract
{
public function leaveNode(PHPParser_Node $node) {
if ($node instanceof PHPParser_Node_Scalar_String) {
$node->value = 'foo';
}
}
}
```
The above node visitor would change all string literals in the program to `'foo'`.
@ -235,66 +255,72 @@ Example: Converting namespaced code to pseudo namespaces
--------------------------------------------------------
A small example to understand the concept: We want to convert namespaced code to pseudo namespaces
so it works on 5.2, i.e. name like `A\\B` should be converted to `A_B`. Note that such conversions
so it works on 5.2, i.e. names like `A\\B` should be converted to `A_B`. Note that such conversions
are fairly complicated if you take PHP's dynamic features into account, so our conversion will
assume that no dynamic features are used.
We start off with the following base code:
const IN_DIR = '/some/path';
const OUT_DIR = '/some/other/path';
```php
<?php
const IN_DIR = '/some/path';
const OUT_DIR = '/some/other/path';
$parser = new PHPParser_Parser;
$traverser = new PHPParser_NodeTraverser;
$prettyPrinter = new PHPParser_PrettyPrinter_Zend;
$parser = new PHPParser_Parser;
$traverser = new PHPParser_NodeTraverser;
$prettyPrinter = new PHPParser_PrettyPrinter_Zend;
$traverser->addVisitor(new PHPParser_NodeVisitor_NameResolver); // we will need resolved names
$traverser->addVisitor(new NodeVisitor_NamespaceConverter); // our own node visitor
$traverser->addVisitor(new PHPParser_NodeVisitor_NameResolver); // we will need resolved names
$traverser->addVisitor(new NodeVisitor_NamespaceConverter); // our own node visitor
// iterate over all files in the directory
foreach (new RecursiveIteratorIterator(
new RecursiveDirectoryIterator(IN_DIR),
RecursiveIteratorIterator::LEAVES_ONLY)
as $file) {
// only convert .php files
if (!preg_match('~\.php$~', $file)) {
continue;
}
try {
// read the file that should be converted
$code = file_get_contents($file);
// parse
$stmts = $parser->parse(new PHPParser_Lexer($code));
// traverse
$stmts = $traverser->traverse($stmts);
// pretty print
$code = '<?php ' . $prettyPrinter->prettyPrint($stmts);
// write the converted file to the target directory
file_put_contents(
substr_replace($file->getPathname(), OUT_DIR, 0, strlen(IN_DIR)),
$code
);
} catch (PHPParser_Error $e) {
echo 'Parse Error: ', $e->getMessage();
}
// iterate over all files in the directory
foreach (new RecursiveIteratorIterator(
new RecursiveDirectoryIterator(IN_DIR),
RecursiveIteratorIterator::LEAVES_ONLY)
as $file) {
// only convert .php files
if (!preg_match('~\.php$~', $file)) {
continue;
}
try {
// read the file that should be converted
$code = file_get_contents($file);
// parse
$stmts = $parser->parse(new PHPParser_Lexer($code));
// traverse
$stmts = $traverser->traverse($stmts);
// pretty print
$code = '<?php ' . $prettyPrinter->prettyPrint($stmts);
// write the converted file to the target directory
file_put_contents(
substr_replace($file->getPathname(), OUT_DIR, 0, strlen(IN_DIR)),
$code
);
} catch (PHPParser_Error $e) {
echo 'Parse Error: ', $e->getMessage();
}
}
```
Now lets start with the main code, the `NodeVisitor_NamespaceConverter`. One thing it needs to do
is convert `A\\B` style names to `A_B` style ones.
class NodeVisitor_NamespaceConverter extends PHPParser_NodeVisitorAbstract
{
public function leaveNode(PHPParser_Node $node) {
if ($node instanceof PHPParser_Node_Name) {
return new PHPParser_Node_Name($node->toString('_'));
}
```php
<?php
class NodeVisitor_NamespaceConverter extends PHPParser_NodeVisitorAbstract
{
public function leaveNode(PHPParser_Node $node) {
if ($node instanceof PHPParser_Node_Name) {
return new PHPParser_Node_Name($node->toString('_'));
}
}
}
```
The above code profits from the fact that the `NameResolver` already resolved all names as far as
possible, so we don't need to do that. All the need to create a string with the name parts separated
@ -306,48 +332,54 @@ Another thing we need to do is change the class/function/const declarations. Cur
only the shortname (i.e. the last part of the name), but they need to contain the complete class
name:
class NodeVisitor_NamespaceConverter extends PHPParser_NodeVisitorAbstract
{
public function leaveNode(PHPParser_Node $node) {
if ($node instanceof PHPParser_Node_Name) {
return new PHPParser_Node_Name($node->toString('_'));
} elseif ($node instanceof PHPParser_Node_Stmt_Class
|| $node instanceof PHPParser_Node_Stmt_Interface
|| $node instanceof PHPParser_Node_Stmt_Function) {
$node->name = $node->namespacedName->toString('_');
} elseif ($node instanceof PHPParser_Node_Stmt_Const) {
foreach ($node->consts as $const) {
$const->name = $const->namespacedName->toString('_');
}
```php
<?php
class NodeVisitor_NamespaceConverter extends PHPParser_NodeVisitorAbstract
{
public function leaveNode(PHPParser_Node $node) {
if ($node instanceof PHPParser_Node_Name) {
return new PHPParser_Node_Name($node->toString('_'));
} elseif ($node instanceof PHPParser_Node_Stmt_Class
|| $node instanceof PHPParser_Node_Stmt_Interface
|| $node instanceof PHPParser_Node_Stmt_Function) {
$node->name = $node->namespacedName->toString('_');
} elseif ($node instanceof PHPParser_Node_Stmt_Const) {
foreach ($node->consts as $const) {
$const->name = $const->namespacedName->toString('_');
}
}
}
}
```
There is not much more to it than converting the namespaced name to string with `_` as separator.
The last thing we need to do is remove the `namespace` and `use` statements:
class NodeVisitor_NamespaceConverter extends PHPParser_NodeVisitorAbstract
{
public function leaveNode(PHPParser_Node $node) {
if ($node instanceof PHPParser_Node_Name) {
return new PHPParser_Node_Name($node->toString('_'));
} elseif ($node instanceof PHPParser_Node_Stmt_Class
|| $node instanceof PHPParser_Node_Stmt_Interface
|| $node instanceof PHPParser_Node_Stmt_Function) {
$node->name = $node->namespacedName->toString('_');
} elseif ($node instanceof PHPParser_Node_Stmt_Const) {
foreach ($node->consts as $const) {
$const->name = $const->namespacedName->toString('_');
}
} elseif ($node instanceof PHPParser_Node_Stmt_Namespace) {
// returning an array merges is into the parent array
return $node->stmts;
} elseif ($node instanceof PHPParser_Node_Stmt_Use) {
// returning false removed the node altogether
return false;
```php
<?php
class NodeVisitor_NamespaceConverter extends PHPParser_NodeVisitorAbstract
{
public function leaveNode(PHPParser_Node $node) {
if ($node instanceof PHPParser_Node_Name) {
return new PHPParser_Node_Name($node->toString('_'));
} elseif ($node instanceof PHPParser_Node_Stmt_Class
|| $node instanceof PHPParser_Node_Stmt_Interface
|| $node instanceof PHPParser_Node_Stmt_Function) {
$node->name = $node->namespacedName->toString('_');
} elseif ($node instanceof PHPParser_Node_Stmt_Const) {
foreach ($node->consts as $const) {
$const->name = $const->namespacedName->toString('_');
}
} elseif ($node instanceof PHPParser_Node_Stmt_Namespace) {
// returning an array merges is into the parent array
return $node->stmts;
} elseif ($node instanceof PHPParser_Node_Stmt_Use) {
// returning false removed the node altogether
return false;
}
}
}
```
That's all.

View File

@ -0,0 +1,201 @@
Other node tree representations
===============================
It is possible to convert the AST in several textual representations, which serve different uses.
Simple serialization
--------------------
It is possible to serialize the node tree using `serialize()` and also unserialize it using
`unserialize()`. The output is not human readable and not easily processable from anything
but PHP, but it is compact and generates fast. The main application thus is in caching.
Human readable dumping
----------------------
Furthermore it is possible to dump nodes into a human readable form using the `dump` method of
`PHPParser_NodeDumper`. This can be used for debugging.
```php
<?php
$code = <<<'CODE'
<?php
function printLine($msg) {
echo $msg, "\n";
}
printLine('Hallo World!!!');
CODE;
$parser = new PHPParser_Parser;
$nodeDumper = new PHPParser_NodeDumper;
try {
$stmts = $parser->parse(new PHPParser_Lexer($code));
echo '<pre>' . htmlspecialchars($nodeDumper->dump($stmts)) . '</pre>';
} catch (PHPParser_Error $e) {
echo 'Parse Error: ', $e->getMessage();
}
```
The above output will have an output looking roughly like this:
```
array(
0: Stmt_Function(
byRef: false
params: array(
0: Param(
name: msg
default: null
type: null
byRef: false
)
)
stmts: array(
0: Stmt_Echo(
exprs: array(
0: Expr_Variable(
name: msg
)
1: Scalar_String(
value:
)
)
)
)
name: printLine
)
1: Expr_FuncCall(
name: Name(
parts: array(
0: printLine
)
)
args: array(
0: Arg(
value: Scalar_String(
value: Hallo World!!!
)
byRef: false
)
)
)
)
```
Serialization to XML
--------------------
It is also possible to serialize the node tree to XML using `PHPParser_Serializer_XML->serialize()`
and to unserialize it using `PHPParser_Unserializer_XML->unserialize()`. This is useful for
interfacing with other languages and applications or for doing transformation using XSLT.
```php
<?php
$code = <<<'CODE'
<?php
function printLine($msg) {
echo $msg, "\n";
}
printLine('Hallo World!!!');
CODE;
$parser = new PHPParser_Parser;
$serializer = new PHPParser_Serializer_XML;
try {
$stmts = $parser->parse(new PHPParser_Lexer($code));
echo '<pre>' . htmlspecialchars($serializer->serialize($stmts)) . '</pre>';
} catch (PHPParser_Error $e) {
echo 'Parse Error: ', $e->getMessage();
}
```
Produces:
```
<?xml version="1.0" encoding="UTF-8"?>
<AST xmlns:node="http://nikic.github.com/PHPParser/XML/node" xmlns:subNode="http://nikic.github.com/PHPParser/XML/subNode" xmlns:scalar="http://nikic.github.com/PHPParser/XML/scalar">
<scalar:array>
<node:Stmt_Function line="2">
<subNode:byRef>
<scalar:false/>
</subNode:byRef>
<subNode:params>
<scalar:array>
<node:Param line="2">
<subNode:name>
<scalar:string>msg</scalar:string>
</subNode:name>
<subNode:default>
<scalar:null/>
</subNode:default>
<subNode:type>
<scalar:null/>
</subNode:type>
<subNode:byRef>
<scalar:false/>
</subNode:byRef>
</node:Param>
</scalar:array>
</subNode:params>
<subNode:stmts>
<scalar:array>
<node:Stmt_Echo line="3">
<subNode:exprs>
<scalar:array>
<node:Expr_Variable line="3">
<subNode:name>
<scalar:string>msg</scalar:string>
</subNode:name>
</node:Expr_Variable>
<node:Scalar_String line="3">
<subNode:value>
<scalar:string>
</scalar:string>
</subNode:value>
</node:Scalar_String>
</scalar:array>
</subNode:exprs>
</node:Stmt_Echo>
</scalar:array>
</subNode:stmts>
<subNode:name>
<scalar:string>printLine</scalar:string>
</subNode:name>
</node:Stmt_Function>
<node:Expr_FuncCall line="6">
<subNode:name>
<node:Name line="6">
<subNode:parts>
<scalar:array>
<scalar:string>printLine</scalar:string>
</scalar:array>
</subNode:parts>
</node:Name>
</subNode:name>
<subNode:args>
<scalar:array>
<node:Arg line="6">
<subNode:value>
<node:Scalar_String line="6">
<subNode:value>
<scalar:string>Hallo World!!!</scalar:string>
</subNode:value>
</node:Scalar_String>
</subNode:value>
<subNode:byRef>
<scalar:false/>
</subNode:byRef>
</node:Arg>
</scalar:array>
</subNode:args>
</node:Expr_FuncCall>
</scalar:array>
</AST>
```