php-parser/doc/0_Introduction.markdown

81 lines
3.4 KiB
Markdown
Raw Normal View History

Introduction
============
2018-01-14 16:54:55 +01:00
This project is a PHP 5.2 to PHP 7.2 parser **written in PHP itself**.
What is this for?
-----------------
2014-09-12 00:20:22 +02:00
A parser is useful for [static analysis][0], manipulation of code and basically any other
application dealing with code programmatically. A parser constructs an [Abstract Syntax Tree][1]
(AST) of the code and thus allows dealing with it in an abstract and robust way.
2014-09-12 00:20:22 +02:00
There are other ways of processing source code. One that PHP supports natively is using the
token stream generated by [`token_get_all`][2]. The token stream is much more low level than
2012-11-05 17:44:56 +01:00
the AST and thus has different applications: It allows to also analyze the exact formatting of
a file. On the other hand the token stream is much harder to deal with for more complex analysis.
For example, an AST abstracts away the fact that, in PHP, variables can be written as `$foo`, but also
as `$$bar`, `${'foobar'}` or even `${!${''}=barfoo()}`. You don't have to worry about recognizing
all the different syntaxes from a stream of tokens.
2016-01-28 15:01:28 +01:00
Another question is: Why would I want to have a PHP parser *written in PHP*? Well, PHP might not be
a language especially suited for fast parsing, but processing the AST is much easier in PHP than it
would be in other, faster languages like C. Furthermore the people most probably wanting to do
2012-11-05 17:44:56 +01:00
programmatic PHP code analysis are incidentally PHP developers, not C developers.
What can it parse?
------------------
2018-01-14 16:54:55 +01:00
The parser supports parsing PHP 5.2-7.2.
2012-02-21 19:02:04 +01:00
As the parser is based on the tokens returned by `token_get_all` (which is only able to lex the PHP
2016-07-22 17:07:56 +02:00
version it runs on), additionally a wrapper for emulating tokens from newer versions is provided.
2018-01-14 16:54:55 +01:00
This allows to parse PHP 7.2 source code running on PHP 5.5, for example. This emulation is somewhat
2016-07-22 17:07:56 +02:00
hacky and not perfect, but it should work well on any sane code.
What output does it produce?
----------------------------
The parser produces an [Abstract Syntax Tree][1] (AST) also known as a node tree. How this looks
can best be seen in an example. The program `<?php echo 'Hi', 'World';` will give you a node tree
roughly looking like this:
2011-11-12 19:28:53 +01:00
```
array(
0: Stmt_Echo(
exprs: array(
0: Scalar_String(
value: Hi
)
1: Scalar_String(
value: World
)
)
)
2011-11-12 19:28:53 +01:00
)
```
2014-09-12 00:20:22 +02:00
This matches the structure of the code: An echo statement, which takes two strings as expressions,
with the values `Hi` and `World!`.
2012-05-11 16:44:13 +02:00
You can also see that the AST does not contain any whitespace information (but most comments are saved).
So using it for formatting analysis is not possible.
What else can it do?
--------------------
Apart from the parser itself this package also bundles support for some other, related features:
* Support for pretty printing, which is the act of converting an AST into PHP code. Please note
that "pretty printing" does not imply that the output is especially pretty. It's just how it's
called ;)
2017-10-03 19:13:20 +02:00
* Support for serializing and unserializing the node tree to JSON
* Support for dumping the node tree in a human readable form (see the section above for an
example of how the output looks like)
* Infrastructure for traversing and changing the AST (node traverser and node visitors)
* A node visitor for resolving namespaced names
[0]: http://en.wikipedia.org/wiki/Static_program_analysis
[1]: http://en.wikipedia.org/wiki/Abstract_syntax_tree
2014-10-01 10:18:01 +02:00
[2]: http://php.net/token_get_all