php-parser/lib/PhpParser/Node/Scalar/String_.php

<?php

namespace PhpParser\Node\Scalar;

use PhpParser\Error;
use PhpParser\Node\Scalar;

class String_ extends Scalar
{
    /* For use in "kind" attribute */
    const KIND_SINGLE_QUOTED = 1;
    const KIND_DOUBLE_QUOTED = 2;
    const KIND_HEREDOC = 3;
    const KIND_NOWDOC = 4;

    /** @var string String value */
    public $value;

    protected static $replacements = array(
        '\\' => '\\',
        '$'  =>  '$',
        'n'  => "\n",
        'r'  => "\r",
        't'  => "\t",
        'f'  => "\f",
        'v'  => "\v",
        'e'  => "\x1B",
    );

    /**
     * Constructs a string scalar node.
     *
     * @param string $value      Value of the string
     * @param array  $attributes Additional attributes
     */
    public function __construct($value, array $attributes = array()) {
        parent::__construct($attributes);
        $this->value = $value;
    }

    public function getSubNodeNames() {
        return array('value');
    }

    /**
     * @internal
     *
     * Parses a string token.
     *
     * @param string $str String token content
     * @param bool $parseUnicodeEscape Whether to parse PHP 7 \u escapes
     *
     * @return string The parsed string
     */
    public static function parse($str, $parseUnicodeEscape = true) {
        $bLength = 0;
        if ('b' === $str[0] || 'B' === $str[0]) {
            $bLength = 1;
        }

        if ('\'' === $str[$bLength]) {
            return str_replace(
                array('\\\\', '\\\''),
                array(  '\\',   '\''),
                substr($str, $bLength + 1, -1)
            );
        } else {
            return self::parseEscapeSequences(
                substr($str, $bLength + 1, -1), '"', $parseUnicodeEscape
            );
        }
    }

    /**
     * @internal
     *
     * Parses escape sequences in strings (all string types apart from single quoted).
     *
     * @param string      $str   String without quotes
     * @param null|string $quote Quote type
     * @param bool $parseUnicodeEscape Whether to parse PHP 7 \u escapes
     *
     * @return string String with escape sequences parsed
     */
    public static function parseEscapeSequences($str, $quote, $parseUnicodeEscape = true) {
        if (null !== $quote) {
            $str = str_replace('\\' . $quote, $quote, $str);
        }

        $extra = '';
        if ($parseUnicodeEscape) {
            $extra = '|u\{([0-9a-fA-F]+)\}';
        }

        return preg_replace_callback(
            '~\\\\([\\\\$nrtfve]|[xX][0-9a-fA-F]{1,2}|[0-7]{1,3}' . $extra . ')~',
            function($matches) {
                $str = $matches[1];

                if (isset(self::$replacements[$str])) {
                    return self::$replacements[$str];
                } elseif ('x' === $str[0] || 'X' === $str[0]) {
                    return chr(hexdec($str));
                } elseif ('u' === $str[0]) {
                    return self::codePointToUtf8(hexdec($matches[2]));
                } else {
                    return chr(octdec($str));
                }
            },
            $str
        );
    }

    private static function codePointToUtf8($num) {
        if ($num <= 0x7F) {
            return chr($num);
        }
        if ($num <= 0x7FF) {
            return chr(($num>>6) + 0xC0) . chr(($num&0x3F) + 0x80);
        }
        if ($num <= 0xFFFF) {
            return chr(($num>>12) + 0xE0) . chr((($num>>6)&0x3F) + 0x80) . chr(($num&0x3F) + 0x80);
        }
        if ($num <= 0x1FFFFF) {
            return chr(($num>>18) + 0xF0) . chr((($num>>12)&0x3F) + 0x80)
                 . chr((($num>>6)&0x3F) + 0x80) . chr(($num&0x3F) + 0x80);
        }
        throw new Error('Invalid UTF-8 codepoint escape sequence: Codepoint too large');
    }

    /**
     * @internal
     *
     * Parses a constant doc string.
     *
     * @param string $startToken Doc string start token content (<<<SMTHG)
     * @param string $str        String token content
     * @param bool $parseUnicodeEscape Whether to parse PHP 7 \u escapes
     *
     * @return string Parsed string
     */
    public static function parseDocString($startToken, $str, $parseUnicodeEscape = true) {
        // strip last newline (thanks tokenizer for sticking it into the string!)
        $str = preg_replace('~(\r\n|\n|\r)\z~', '', $str);

        // nowdoc string
        if (false !== strpos($startToken, '\'')) {
            return $str;
        }

        return self::parseEscapeSequences($str, null, $parseUnicodeEscape);
    }
}
a) changes node structure (Stmt_, Expr_, ...) b) fixes parsing of x::$y[z] Sorry for that one large commit. Won't happen again. 2011-05-27 18:20:44 +02:00			`<?php`

Port library to use namespaces, with BC for old names 2014-02-06 14:44:16 +01:00			`namespace PhpParser\Node\Scalar;`

Add support for unicode escape sequences Only parsed if the PHP 7 parser is used. 2015-06-13 20:51:02 +02:00			`use PhpParser\Error;`
Port library to use namespaces, with BC for old names 2014-02-06 14:44:16 +01:00			`use PhpParser\Node\Scalar;`

Rename nodes for compat with PHP 7 The old names will still be available on PHP 5.x. 2015-03-20 21:47:20 +01:00			`class String_ extends Scalar`
a) changes node structure (Stmt_, Expr_, ...) b) fixes parsing of x::$y[z] Sorry for that one large commit. Won't happen again. 2011-05-27 18:20:44 +02:00			`{`
Add string kinds and doc string labels Scalar\String_ and Scalar\Encapsed now have an additional "kind" attribute, which may be one of: * String_::KIND_SINGLE_QUOTED * String_::KIND_DOUBLE_QUOTED * String_::KIND_NOWDOC * String_::KIND_HEREDOC Additionally, if the string kind is one of the latter two, an attribute "docLabel" is provided, which contains the doc string label (STR in <<<STR) that was originally used. The pretty printer will try to take the original kind of the string, as well as the used doc string label into account. 2016-04-02 15:22:24 +02:00			`/* For use in "kind" attribute */`
			`const KIND_SINGLE_QUOTED = 1;`
			`const KIND_DOUBLE_QUOTED = 2;`
			`const KIND_HEREDOC = 3;`
			`const KIND_NOWDOC = 4;`

Use real properties for storing subnodes Instead of storing subnodes in a subNodes dictionary, they are now stored as simple properties. This requires declarating the properties, assigning them in the constructor, overriding the getSubNodeNames() method and passing NULL to the first argument of the NodeAbstract constructor. [Deprecated: It's still possible to use the old mode of operation for custom nodes by passing an array of subnodes to the constructor.] The only behavior difference this should cause is that getSubNodeNames() will always return the original subnode names and skip any additional properties that were dynamically added. E.g. this means that the "namespacedName" node added by the NameResolver visitor is not treated as a subnode, but as a dynamic property instead. This change improves performance and memory usage. 2015-02-28 18:44:28 +01:00			`/** @var string String value */`
			`public $value;`

More test coverage and doc string parsing fixes The parser didn't account for the additional newline after the content of doc strings, which is left there by the tokenizer for some reason. Additoinally esacape sequences were parsed in nowdoc strings. Additionally this contains some minor changes to the grammar: Some _list nonterminals were refactored to have the possible single elements in a reparate rule and only assemble those single elements. (This reduces duplication and gives better assignment of line number context.) 2011-12-04 16:52:43 +01:00			`protected static $replacements = array(`
			`'\\' => '\\',`
			`'$' => '$',`
			`'n' => "\n",`
			`'r' => "\r",`
			`'t' => "\t",`
			`'f' => "\f",`
			`'v' => "\v",`
[5.4] Add new \e escape sequence (0x1B/27) 2011-12-04 17:35:30 +01:00			`'e' => "\x1B",`
More test coverage and doc string parsing fixes The parser didn't account for the additional newline after the content of doc strings, which is left there by the tokenizer for some reason. Additoinally esacape sequences were parsed in nowdoc strings. Additionally this contains some minor changes to the grammar: Some _list nonterminals were refactored to have the possible single elements in a reparate rule and only assemble those single elements. (This reduces duplication and gives better assignment of line number context.) 2011-12-04 16:52:43 +01:00			`);`

Give all Scalar nodes and the special nodes Name and Variable specialized constructors for easier use 2011-08-09 14:55:45 +02:00			`/**`
			`* Constructs a string scalar node.`
			`*`
Store line and doc comment as attributes 2012-04-29 23:32:09 +02:00			`* @param string $value Value of the string`
			`* @param array $attributes Additional attributes`
Give all Scalar nodes and the special nodes Name and Variable specialized constructors for easier use 2011-08-09 14:55:45 +02:00			`*/`
Drop default values from Scalar ctor params 2015-07-12 22:02:18 +02:00			`public function __construct($value, array $attributes = array()) {`
Drop support for old Node format 2015-05-02 22:17:34 +02:00			`parent::__construct($attributes);`
Use real properties for storing subnodes Instead of storing subnodes in a subNodes dictionary, they are now stored as simple properties. This requires declarating the properties, assigning them in the constructor, overriding the getSubNodeNames() method and passing NULL to the first argument of the NodeAbstract constructor. [Deprecated: It's still possible to use the old mode of operation for custom nodes by passing an array of subnodes to the constructor.] The only behavior difference this should cause is that getSubNodeNames() will always return the original subnode names and skip any additional properties that were dynamically added. E.g. this means that the "namespacedName" node added by the NameResolver visitor is not treated as a subnode, but as a dynamic property instead. This change improves performance and memory usage. 2015-02-28 18:44:28 +01:00			`$this->value = $value;`
			`}`

			`public function getSubNodeNames() {`
			`return array('value');`
Give all Scalar nodes and the special nodes Name and Variable specialized constructors for easier use 2011-08-09 14:55:45 +02:00			`}`

Parse strings more correctly, keep information on whether it was a single or double quoted string 2011-05-28 00:21:12 +02:00			`/**`
Annotate some APIs as @internal 2014-09-30 20:23:25 +02:00			`* @internal`
			`*`
Scalar_String::create() -> Scalar_String::parse() Directly creating the node isn't necessary anymore, the token only needs to be parsed. This makes it consistent with the other scalar parsing methods and removes the need to pass $arguments around. 2012-10-19 15:17:08 +02:00			`* Parses a string token.`
Parse strings more correctly, keep information on whether it was a single or double quoted string 2011-05-28 00:21:12 +02:00			`*`
Scalar_String::create() -> Scalar_String::parse() Directly creating the node isn't necessary anymore, the token only needs to be parsed. This makes it consistent with the other scalar parsing methods and removes the need to pass $arguments around. 2012-10-19 15:17:08 +02:00			`* @param string $str String token content`
Add support for unicode escape sequences Only parsed if the PHP 7 parser is used. 2015-06-13 20:51:02 +02:00			`* @param bool $parseUnicodeEscape Whether to parse PHP 7 \u escapes`
fix doccomment 2011-06-01 22:37:10 +02:00			`*`
Scalar_String::create() -> Scalar_String::parse() Directly creating the node isn't necessary anymore, the token only needs to be parsed. This makes it consistent with the other scalar parsing methods and removes the need to pass $arguments around. 2012-10-19 15:17:08 +02:00			`* @return string The parsed string`
Parse strings more correctly, keep information on whether it was a single or double quoted string 2011-05-28 00:21:12 +02:00			`*/`
Add support for unicode escape sequences Only parsed if the PHP 7 parser is used. 2015-06-13 20:51:02 +02:00			`public static function parse($str, $parseUnicodeEscape = true) {`
Don't save whether a string is binary anymore. The binary flag isn't going to be used in the next couple of years, so it doesn't make sense to unnecessarily complicate things. 2011-08-09 14:19:44 +02:00			`$bLength = 0;`
Handle uppercase B"" prefix 2016-04-02 14:15:49 +02:00			`if ('b' === $str[0] \|\| 'B' === $str[0]) {`
Don't save whether a string is binary anymore. The binary flag isn't going to be used in the next couple of years, so it doesn't make sense to unnecessarily complicate things. 2011-08-09 14:19:44 +02:00			`$bLength = 1;`
Parse strings more correctly, keep information on whether it was a single or double quoted string 2011-05-28 00:21:12 +02:00			`}`

Properly parse escape sequences: * Add support for oct and hex escape sequences * Take used quote type into account when parsing encapsed strings 2011-08-20 10:40:27 +02:00			`if ('\'' === $str[$bLength]) {`
Scalar_String::create() -> Scalar_String::parse() Directly creating the node isn't necessary anymore, the token only needs to be parsed. This makes it consistent with the other scalar parsing methods and removes the need to pass $arguments around. 2012-10-19 15:17:08 +02:00			`return str_replace(`
Parse strings more correctly, keep information on whether it was a single or double quoted string 2011-05-28 00:21:12 +02:00			`array('\\\\', '\\\''),`
			`array( '\\', '\''),`
Properly parse escape sequences: * Add support for oct and hex escape sequences * Take used quote type into account when parsing encapsed strings 2011-08-20 10:40:27 +02:00			`substr($str, $bLength + 1, -1)`
Parse strings more correctly, keep information on whether it was a single or double quoted string 2011-05-28 00:21:12 +02:00			`);`
			`} else {`
Add support for unicode escape sequences Only parsed if the PHP 7 parser is used. 2015-06-13 20:51:02 +02:00			`return self::parseEscapeSequences(`
			`substr($str, $bLength + 1, -1), '"', $parseUnicodeEscape`
			`);`
Parse strings more correctly, keep information on whether it was a single or double quoted string 2011-05-28 00:21:12 +02:00			`}`
Parse escape sequences in encapsed strings too 2011-05-29 19:38:04 +02:00			`}`

			`/**`
Annotate some APIs as @internal 2014-09-30 20:23:25 +02:00			`* @internal`
			`*`
Properly parse escape sequences: * Add support for oct and hex escape sequences * Take used quote type into account when parsing encapsed strings 2011-08-20 10:40:27 +02:00			`* Parses escape sequences in strings (all string types apart from single quoted).`
Parse escape sequences in encapsed strings too 2011-05-29 19:38:04 +02:00			`*`
Properly parse escape sequences: * Add support for oct and hex escape sequences * Take used quote type into account when parsing encapsed strings 2011-08-20 10:40:27 +02:00			`* @param string $str String without quotes`
			`* @param null\|string $quote Quote type`
Add support for unicode escape sequences Only parsed if the PHP 7 parser is used. 2015-06-13 20:51:02 +02:00			`* @param bool $parseUnicodeEscape Whether to parse PHP 7 \u escapes`
fix doccomment 2011-06-01 22:37:10 +02:00			`*`
Parse escape sequences in encapsed strings too 2011-05-29 19:38:04 +02:00			`* @return string String with escape sequences parsed`
			`*/`
Add support for unicode escape sequences Only parsed if the PHP 7 parser is used. 2015-06-13 20:51:02 +02:00			`public static function parseEscapeSequences($str, $quote, $parseUnicodeEscape = true) {`
Properly parse escape sequences: * Add support for oct and hex escape sequences * Take used quote type into account when parsing encapsed strings 2011-08-20 10:40:27 +02:00			`if (null !== $quote) {`
			`$str = str_replace('\\' . $quote, $quote, $str);`
			`}`
Parse escape sequences in encapsed strings too 2011-05-29 19:38:04 +02:00
Add support for unicode escape sequences Only parsed if the PHP 7 parser is used. 2015-06-13 20:51:02 +02:00			`$extra = '';`
			`if ($parseUnicodeEscape) {`
			`$extra = '\|u\{([0-9a-fA-F]+)\}';`
			`}`

Properly parse escape sequences: * Add support for oct and hex escape sequences * Take used quote type into account when parsing encapsed strings 2011-08-20 10:40:27 +02:00			`return preg_replace_callback(`
Add support for unicode escape sequences Only parsed if the PHP 7 parser is used. 2015-06-13 20:51:02 +02:00			`'~\\\\([\\\\$nrtfve]\|[xX][0-9a-fA-F]{1,2}\|[0-7]{1,3}' . $extra . ')~',`
Anonymize some callbacks 2015-05-02 22:35:15 +02:00			`function($matches) {`
			`$str = $matches[1];`

			`if (isset(self::$replacements[$str])) {`
			`return self::$replacements[$str];`
			`} elseif ('x' === $str[0] \|\| 'X' === $str[0]) {`
			`return chr(hexdec($str));`
Add support for unicode escape sequences Only parsed if the PHP 7 parser is used. 2015-06-13 20:51:02 +02:00			`} elseif ('u' === $str[0]) {`
			`return self::codePointToUtf8(hexdec($matches[2]));`
Anonymize some callbacks 2015-05-02 22:35:15 +02:00			`} else {`
			`return chr(octdec($str));`
			`}`
			`},`
Properly parse escape sequences: * Add support for oct and hex escape sequences * Take used quote type into account when parsing encapsed strings 2011-08-20 10:40:27 +02:00			`$str`
Parse escape sequences in encapsed strings too 2011-05-29 19:38:04 +02:00			`);`
Parse strings more correctly, keep information on whether it was a single or double quoted string 2011-05-28 00:21:12 +02:00			`}`
Properly parse escape sequences: * Add support for oct and hex escape sequences * Take used quote type into account when parsing encapsed strings 2011-08-20 10:40:27 +02:00
Add support for unicode escape sequences Only parsed if the PHP 7 parser is used. 2015-06-13 20:51:02 +02:00			`private static function codePointToUtf8($num) {`
			`if ($num <= 0x7F) {`
			`return chr($num);`
			`}`
			`if ($num <= 0x7FF) {`
			`return chr(($num>>6) + 0xC0) . chr(($num&0x3F) + 0x80);`
			`}`
			`if ($num <= 0xFFFF) {`
			`return chr(($num>>12) + 0xE0) . chr((($num>>6)&0x3F) + 0x80) . chr(($num&0x3F) + 0x80);`
			`}`
			`if ($num <= 0x1FFFFF) {`
			`return chr(($num>>18) + 0xF0) . chr((($num>>12)&0x3F) + 0x80)`
			`. chr((($num>>6)&0x3F) + 0x80) . chr(($num&0x3F) + 0x80);`
			`}`
			`throw new Error('Invalid UTF-8 codepoint escape sequence: Codepoint too large');`
			`}`

More test coverage and doc string parsing fixes The parser didn't account for the additional newline after the content of doc strings, which is left there by the tokenizer for some reason. Additoinally esacape sequences were parsed in nowdoc strings. Additionally this contains some minor changes to the grammar: Some _list nonterminals were refactored to have the possible single elements in a reparate rule and only assemble those single elements. (This reduces duplication and gives better assignment of line number context.) 2011-12-04 16:52:43 +01:00			`/**`
Annotate some APIs as @internal 2014-09-30 20:23:25 +02:00			`* @internal`
			`*`
More test coverage and doc string parsing fixes The parser didn't account for the additional newline after the content of doc strings, which is left there by the tokenizer for some reason. Additoinally esacape sequences were parsed in nowdoc strings. Additionally this contains some minor changes to the grammar: Some _list nonterminals were refactored to have the possible single elements in a reparate rule and only assemble those single elements. (This reduces duplication and gives better assignment of line number context.) 2011-12-04 16:52:43 +01:00			`* Parses a constant doc string.`
			`*`
			`* @param string $startToken Doc string start token content (<<<SMTHG)`
			`* @param string $str String token content`
Add support for unicode escape sequences Only parsed if the PHP 7 parser is used. 2015-06-13 20:51:02 +02:00			`* @param bool $parseUnicodeEscape Whether to parse PHP 7 \u escapes`
More test coverage and doc string parsing fixes The parser didn't account for the additional newline after the content of doc strings, which is left there by the tokenizer for some reason. Additoinally esacape sequences were parsed in nowdoc strings. Additionally this contains some minor changes to the grammar: Some _list nonterminals were refactored to have the possible single elements in a reparate rule and only assemble those single elements. (This reduces duplication and gives better assignment of line number context.) 2011-12-04 16:52:43 +01:00			`*`
			`* @return string Parsed string`
			`*/`
Add support for unicode escape sequences Only parsed if the PHP 7 parser is used. 2015-06-13 20:51:02 +02:00			`public static function parseDocString($startToken, $str, $parseUnicodeEscape = true) {`
More test coverage and doc string parsing fixes The parser didn't account for the additional newline after the content of doc strings, which is left there by the tokenizer for some reason. Additoinally esacape sequences were parsed in nowdoc strings. Additionally this contains some minor changes to the grammar: Some _list nonterminals were refactored to have the possible single elements in a reparate rule and only assemble those single elements. (This reduces duplication and gives better assignment of line number context.) 2011-12-04 16:52:43 +01:00			`// strip last newline (thanks tokenizer for sticking it into the string!)`
Fix issue #227 Use \z instead of $. 2015-09-19 16:05:23 +02:00			`$str = preg_replace('~(\r\n\|\n\|\r)\z~', '', $str);`
More test coverage and doc string parsing fixes The parser didn't account for the additional newline after the content of doc strings, which is left there by the tokenizer for some reason. Additoinally esacape sequences were parsed in nowdoc strings. Additionally this contains some minor changes to the grammar: Some _list nonterminals were refactored to have the possible single elements in a reparate rule and only assemble those single elements. (This reduces duplication and gives better assignment of line number context.) 2011-12-04 16:52:43 +01:00
			`// nowdoc string`
			`if (false !== strpos($startToken, '\'')) {`
			`return $str;`
			`}`

Add support for unicode escape sequences Only parsed if the PHP 7 parser is used. 2015-06-13 20:51:02 +02:00			`return self::parseEscapeSequences($str, null, $parseUnicodeEscape);`
More test coverage and doc string parsing fixes The parser didn't account for the additional newline after the content of doc strings, which is left there by the tokenizer for some reason. Additoinally esacape sequences were parsed in nowdoc strings. Additionally this contains some minor changes to the grammar: Some _list nonterminals were refactored to have the possible single elements in a reparate rule and only assemble those single elements. (This reduces duplication and gives better assignment of line number context.) 2011-12-04 16:52:43 +01:00			`}`
Use real properties for storing subnodes Instead of storing subnodes in a subNodes dictionary, they are now stored as simple properties. This requires declarating the properties, assigning them in the constructor, overriding the getSubNodeNames() method and passing NULL to the first argument of the NodeAbstract constructor. [Deprecated: It's still possible to use the old mode of operation for custom nodes by passing an array of subnodes to the constructor.] The only behavior difference this should cause is that getSubNodeNames() will always return the original subnode names and skip any additional properties that were dynamically added. E.g. this means that the "namespacedName" node added by the NameResolver visitor is not treated as a subnode, but as a dynamic property instead. This change improves performance and memory usage. 2015-02-28 18:44:28 +01:00			`}`