Lexer component documentation ============================= The lexer is responsible for providing tokens to the parser. The project comes with two lexers: `PhpParser\Lexer` and `PhpParser\Lexer\Emulative`. The latter is an extension of the former, which adds the ability to emulate tokens of newer PHP versions and thus allows parsing of new code on older versions. A lexer has to define the following public interface: void startLexing(string $code); string handleHaltCompiler(); int getNextToken(string &$value = null, array &$startAttributes = null, array &$endAttributes = null); The `startLexing()` method is invoked with the source code that is to be lexed (including the opening tag) whenever the `parse()` method of the parser is called. It can be used to reset state or preprocess the source code or tokens. The `handleHaltCompiler()` method is called whenever a `T_HALT_COMPILER` token is encountered. It has to return the remaining string after the construct (not including `();`). The `getNextToken()` method returns the ID of the next token (as defined by the `Parser::T_*` constants). If no more tokens are available it must return `0`, which is the ID of the `EOF` token. Furthermore the string content of the token should be written into the by-reference `$value` parameter (which will then be available as `$n` in the parser). Attribute handling ------------------ The other two by-ref variables `$startAttributes` and `$endAttributes` define which attributes will eventually be assigned to the generated nodes: The parser will take the `$startAttributes` from the first token which is part of the node and the `$endAttributes` from the last token that is part of the node. E.g. if the tokens `T_FUNCTION T_STRING ... '{' ... '}'` constitute a node, then the `$startAttributes` from the `T_FUNCTION` token will be taken and the `$endAttributes` from the `'}'` token. By default the lexer creates the attributes `startLine`, `comments` (both part of `$startAttributes`) and `endLine` (part of `$endAttributes`). If you don't want all these attributes to be added (to reduce memory usage of the AST) you can simply remove them by overriding the method: ```php pos; return $tokenId; } public function getTokens() { return $this->tokens; } } ``` This information can now be used to examine the exact formatting used for a node. For example the AST does not distinguish whether a property was declared using `public` or using `var`, but you can retrieve this information based on the token offset: ```php function isDeclaredUsingVar(array $tokens, PhpParser\Node\Stmt\Property $prop) { $i = $prop->getAttribute('startOffset'); return $tokens[$i][0] === T_VAR; } ``` In order to make use of this function, you will have to provide the tokens from the lexer to your node visitor using code similar to the following: ```php class MyNodeVisitor extends PhpParser\NodeVisitorAbstract { private $tokens; public function setTokens(array $tokens) { $this->tokens = $tokens; } public function leaveNode(PhpParser\Node $node) { if ($node instanceof PhpParser\Node\Stmt\Property) { var_dump(isDeclaredUsingVar($this->tokens, $node)); } } } $lexer = new TokenOffsetLexer(); $parser = new PhpParser\Parser($lexer); $visitor = new MyNodeVisitor(); $traverser = new PhpParser\NodeTraverser(); $traverser->addVisitor($visitor); try { $stmts = $parser->parse($code); $visitor->setTokens($lexer->getTokens()); $stmts = $traverser->traverse($stmts); } catch (PhpParser\Error $e) { echo 'Parse Error: ', $e->getMessage(); } ``` The same approach can also be used to perform specific modifications in the code, without changing the formatting in other places (which is the case when using the pretty printer).