php-parser/doc/component/Performance.markdown

65 lines
3.3 KiB
Markdown
Raw Normal View History

2017-11-10 22:44:06 +01:00
Performance
===========
Parsing is computationally expensive task, to which the PHP language is not very well suited.
Nonetheless, there are a few things you can do to improve the performance of this library, which are
described in the following.
Xdebug
------
Running PHP with XDebug adds a lot of overhead, especially for code that performs many method calls.
Just by loading XDebug (without enabling profiling or other more intrusive XDebug features), you
can expect that code using PHP-Parser will be approximately *five times slower*.
As such, you should make sure that XDebug is not loaded when using this library. Note that setting
the `xdebug.default_enable=0` ini option does *not* disable XDebug. The *only* way to disable
XDebug is to not load the extension in the first place.
If you are building a command-line utility for use by developers (who often have XDebug enabled),
you may want to consider automatically restarting PHP with XDebug unloaded. See the composer
[XdebugHandler](https://github.com/composer/composer/blob/master/src/Composer/XdebugHandler.php)
for an implementation of such functionality.
If you do run with XDebug, you may need to increase the `xdebug.max_nesting_level` option to a
higher level, such as 3000. While the parser itself is recursion free, most other code working on
the AST uses recursion and will generate an error if the value of this option is too low.
Assertions
----------
Assertions should be disabled in a production context by setting `zend.assertions=-1` (or
`zend.assertions=0` if set at runtime). The library currently doesn't make heavy use of assertions,
but they are used in an increasing number of places.
Object reuse
------------
Many objects in this project are designed for reuse. For example, one `Parser` object can be used to
parse multiple files.
When possible, objects should be reused rather than being newly instantiated for every use. Some
objects have expensive initialization procedures, which will be unnecessarily repeated if the object
is not reused. (Currently two objects with particularly expensive setup are lexers and pretty
printers, though the details might change between versions of this library.)
Garbage collection
------------------
A limitation in PHP's cyclic garbage collector may lead to major performance degradation when the
active working set exceeds 10000 objects (or arrays). Especially when parsing very large files this
limit is significantly exceeded and PHP will spend the majority of time performing unnecessary
garbage collection attempts.
Without GC, parsing time is roughly linear in the input size. With GC, this degenerates to quadratic
runtime for large files. While the specifics may differ, as a rough guideline you may expect a 2.5x
GC overhead for 500KB files and a 5x overhead for 1MB files.
Because this a limitation in PHP's implementation, there is no easy way to work around this. If
possible, you should avoid parsing very large files, as they will impact overall execution time
disproportionally (and are usually generated anyway).
Of course, you can also try to (temporarily) disable GC. By design the AST generated by PHP-Parser
is cycle-free, so the AST itself will never cause leaks with GC disabled. However, other code
(including for example the parser object itself) may hold cycles, so disabling of GC should be
approached with care.