From e65fd664d133fd16fc7d8420b1d435ec7089dda1 Mon Sep 17 00:00:00 2001 From: nikic Date: Fri, 12 Sep 2014 00:20:22 +0200 Subject: [PATCH] Small docs touchups and typo fixes --- doc/0_Introduction.markdown | 18 ++-- doc/1_Installation.markdown | 9 +- doc/2_Usage_of_basic_components.markdown | 92 ++++++++++--------- ...3_Other_node_tree_representations.markdown | 36 ++++---- doc/component/Lexer.markdown | 2 +- 5 files changed, 84 insertions(+), 73 deletions(-) diff --git a/doc/0_Introduction.markdown b/doc/0_Introduction.markdown index d4b0b7b..325ca8e 100644 --- a/doc/0_Introduction.markdown +++ b/doc/0_Introduction.markdown @@ -1,16 +1,16 @@ Introduction ============ -This project is a PHP 5.5 (and older) parser **written in PHP itself**. +This project is a PHP 5.2 to PHP 5.6 parser **written in PHP itself**. What is this for? ----------------- -A parser is useful for [static analysis][0] and manipulation of code and basically any other +A parser is useful for [static analysis][0], manipulation of code and basically any other application dealing with code programmatically. A parser constructs an [Abstract Syntax Tree][1] (AST) of the code and thus allows dealing with it in an abstract and robust way. -There are other ways of dealing with source code. One that PHP supports natively is using the +There are other ways of processing source code. One that PHP supports natively is using the token stream generated by [`token_get_all`][2]. The token stream is much more low level than the AST and thus has different applications: It allows to also analyze the exact formatting of a file. On the other hand the token stream is much harder to deal with for more complex analysis. @@ -26,13 +26,13 @@ programmatic PHP code analysis are incidentally PHP developers, not C developers What can it parse? ------------------ -The parser uses a PHP 5.5 compliant grammar, which is backwards compatible with at least PHP 5.4, PHP 5.3 -and PHP 5.2 (and maybe older). +The parser uses a PHP 5.6 compliant grammar, which is backwards compatible with all PHP version from PHP 5.2 +upwards (and maybe older). As the parser is based on the tokens returned by `token_get_all` (which is only able to lex the PHP -version it runs on), additionally a wrapper for emulating new tokens from 5.3, 5.4 and 5.5 is provided. This -allows to parse PHP 5.5 source code running on PHP 5.2, for example. This emulation is very hacky and not -yet perfect, but it should work well on any sane code. +version it runs on), additionally a wrapper for emulating new tokens from 5.3, 5.4, 5.5 and 5.6 is provided. +his allows to parse PHP 5.6 source code running on PHP 5.3, for example. This emulation is very hacky and not +perfect, but it should work well on any sane code. What output does it produce? ---------------------------- @@ -56,7 +56,7 @@ array( ) ``` -This matches the semantics the program had: An echo statement, which takes two strings as expressions, +This matches the structure of the code: An echo statement, which takes two strings as expressions, with the values `Hi` and `World!`. You can also see that the AST does not contain any whitespace information (but most comments are saved). diff --git a/doc/1_Installation.markdown b/doc/1_Installation.markdown index f82159f..8ffdf1d 100644 --- a/doc/1_Installation.markdown +++ b/doc/1_Installation.markdown @@ -3,11 +3,6 @@ Installation There are multiple ways to include the PHP parser into your project: -Installing from the Zip- or Tarball ------------------------------------ - -Download the latest version from [the download page][2], unpack it and move the files somewhere into your project. - Installing via Composer ----------------------- @@ -34,6 +29,10 @@ Run the following command to install the parser into the `vendor/PHP-Parser` fol git submodule add git://github.com/nikic/PHP-Parser.git vendor/PHP-Parser +Installing from the Zip- or Tarball +----------------------------------- + +Download the latest version from [the download page][2], unpack it and move the files somewhere into your project. [1]: http://getcomposer.org/composer.phar diff --git a/doc/2_Usage_of_basic_components.markdown b/doc/2_Usage_of_basic_components.markdown index 67e12d8..b9ba3ed 100644 --- a/doc/2_Usage_of_basic_components.markdown +++ b/doc/2_Usage_of_basic_components.markdown @@ -26,31 +26,38 @@ This ensures that there will be no errors when traversing highly nested node tre Parsing ------- -In order to parse some source code you first have to create a `PhpParser\Parser` object (which -needs to be passed a `PhpParser\Lexer` instance) and then pass the code (including `parse($code); + // $stmts is an array of statement nodes } catch (PhpParser\Error $e) { echo 'Parse Error: ', $e->getMessage(); } ``` -The `parse` method will return an array of statement nodes (`$stmts`). - -### Emulative lexer - -Instead of `PhpParser\Lexer` one can also use `PhpParser\Lexer\Emulative`. This class will emulate tokens -of newer PHP versions and as such allow parsing PHP 5.5 on PHP 5.2, for example. So if you want to parse -PHP code of newer versions than the one you are running, you should use the emulative lexer. +A parser instance can be reused to parse multiple files. Node tree --------- @@ -104,7 +111,7 @@ with a PHP keyword. Every node has a (possibly zero) number of subnodes. You can access subnodes by writing `$node->subNodeName`. The `Stmt\Echo_` node has only one subnode `exprs`. So in order to access it -in the above example you would write `$stmts[0]->exprs`. If you wanted to access name of the function +in the above example you would write `$stmts[0]->exprs`. If you wanted to access the name of the function call, you would write `$stmts[0]->exprs[1]->name`. All nodes also define a `getType()` method that returns the node type. The type is the class name @@ -131,7 +138,7 @@ namely `PhpParser\PrettyPrinter\Standard`. exprs // sub expressions [0] // the first of them (the string node) ->value // it's value, i.e. 'Hi ' - = 'Hallo '; // change to 'Hallo ' + = 'Hello '; // change to 'Hello ' // pretty print - $code = 'prettyPrint($stmts); + $code = $prettyPrinter->prettyPrint($stmts); echo $code; } catch (PhpParser\Error $e) { @@ -156,7 +163,7 @@ try { The above code will output: - parse()`, then changed and then again converted to code using `PhpParser\PrettyPrinter\Standard->prettyPrint()`. @@ -164,8 +171,8 @@ again converted to code using `PhpParser\PrettyPrinter\Standard->prettyPrint()`. The `prettyPrint()` method pretty prints a statements array. It is also possible to pretty print only a single expression using `prettyPrintExpr()`. -The `prettyPrintFile()` method can be used to print an entire file. This will include the opening `addVisitor(new MyNodeVisitor); try { + $code = file_get_contents($fileName); + // parse $stmts = $parser->parse($code); @@ -197,7 +205,7 @@ try { $stmts = $traverser->traverse($stmts); // pretty print - $code = 'prettyPrint($stmts); + $code = $prettyPrinter->prettyPrintFile($stmts); echo $code; } catch (PhpParser\Error $e) { @@ -205,14 +213,16 @@ try { } ``` -A same node visitor for this code might look like this: +The corresponding node visitor might look like this: ```php value = 'foo'; } } @@ -221,7 +231,7 @@ class MyNodeVisitor extends PhpParser\NodeVisitorAbstract The above node visitor would change all string literals in the program to `'foo'`. -All visitors must implement the `PhpParser\NodeVisitor` interface, which defined the following four +All visitors must implement the `PhpParser\NodeVisitor` interface, which defines the following four methods: public function beforeTraverse(array $nodes); @@ -240,11 +250,12 @@ The `enterNode` and `leaveNode` methods are called on every node, the former whe i.e. before its subnodes are traversed, the latter when it is left. All four methods can either return the changed node or not return at all (i.e. `null`) in which -case the current node is not changed. The `leaveNode` method can furthermore return two special -values: If `false` is returned the current node will be removed from the parent array. If an `array` -is returned the current node will be merged into the parent array at the offset of the current node. -I.e. if in `array(A, B, C)` the node `B` should be replaced with `array(X, Y, Z)` the result will be -`array(A, X, Y, Z, C)`. +case the current node is not changed. The `leaveNode` method can additionally return two special +values: + +If `false` is returned the current node will be removed from the parent array. If an array is returned +it will be merged into the parent array at the offset of the current node. I.e. if in `array(A, B, C)` +the node `B` should be replaced with `array(X, Y, Z)` the result will be `array(A, X, Y, Z, C)`. Instead of manually implementing the `NodeVisitor` interface you can also extend the `NodeVisitorAbstract` class, which will define empty default implementations for all the above methods. @@ -283,10 +294,9 @@ We start off with the following base code: ```php addVisitor(new PhpParser\NodeVisitor\NameResolver); // we will need $traverser->addVisitor(new NodeVisitor\NamespaceConverter); // our own node visitor // iterate over all .php files in the directory -$files = new RecursiveIteratorIterator(new RecursiveDirectoryIterator(IN_DIR)); +$files = new RecursiveIteratorIterator(new RecursiveDirectoryIterator($inDir)); $files = new RegexIterator($files, '/\.php$/'); foreach ($files as $file) { @@ -310,11 +320,11 @@ foreach ($files as $file) { $stmts = $traverser->traverse($stmts); // pretty print - $code = 'prettyPrint($stmts); + $code = $prettyPrinter->prettyPrintFile($stmts); // write the converted file to the target directory file_put_contents( - substr_replace($file->getPathname(), OUT_DIR, 0, strlen(IN_DIR)), + substr_replace($file->getPathname(), $outDir, 0, strlen($inDir)), $code ); } catch (PhpParser\Error $e) { @@ -323,7 +333,7 @@ foreach ($files as $file) { } ``` -Now lets start with the main code, the `NodeVisitor_NamespaceConverter`. One thing it needs to do +Now lets start with the main code, the `NodeVisitor\NamespaceConverter`. One thing it needs to do is convert `A\\B` style names to `A_B` style ones. ```php @@ -340,14 +350,14 @@ class NodeVisitor_NamespaceConverter extends PhpParser\NodeVisitorAbstract ``` The above code profits from the fact that the `NameResolver` already resolved all names as far as -possible, so we don't need to do that. All the need to create a string with the name parts separated +possible, so we don't need to do that. We only need to create a string with the name parts separated by underscores instead of backslashes. This is what `$node->toString('_')` does. (If you want to create a name with backslashes either write `$node->toString()` or `(string) $node`.) Then we create a new name from the string and return it. Returning a new node replaces the old node. Another thing we need to do is change the class/function/const declarations. Currently they contain -only the shortname (i.e. the last part of the name), but they need to contain the complete class -name: +only the shortname (i.e. the last part of the name), but they need to contain the complete name inclduing +the namespace prefix: ```php parse($code); - echo '
' . htmlspecialchars($nodeDumper->dump($stmts)) . '
'; + echo $nodeDumper->dump($stmts), "\n"; } catch (PhpParser\Error $e) { echo 'Parse Error: ', $e->getMessage(); } ``` -The above output will have an output looking roughly like this: +The above script will have an output looking roughly like this: ``` array( @@ -77,7 +78,7 @@ array( args: array( 0: Arg( value: Scalar_String( - value: Hallo World!!! + value: Hello World!!! ) byRef: false ) @@ -97,20 +98,21 @@ interfacing with other languages and applications or for doing transformation us parse($code); - echo '
' . htmlspecialchars($serializer->serialize($stmts)) . '
'; + echo $serializer->serialize($stmts); } catch (PhpParser\Error $e) { echo 'Parse Error: ', $e->getMessage(); } @@ -185,7 +187,7 @@ Produces: - Hallo World!!! + Hello World!!! diff --git a/doc/component/Lexer.markdown b/doc/component/Lexer.markdown index 0f10afa..314cb41 100644 --- a/doc/component/Lexer.markdown +++ b/doc/component/Lexer.markdown @@ -42,7 +42,7 @@ getNextToken ------------ `getNextToken` returns the ID of the next token and sets some additional information in the three variables which it -accepts by-ref. If no more tokens are available it has to return `0`, which is the ID of the `EOF` token. +accepts by-ref. If no more tokens are available it must return `0`, which is the ID of the `EOF` token. The first by-ref variable `$value` should contain the textual content of the token. It is what will be available as `$1` etc in the parser.