The old regex missed a lot of HTML entities, like long references
(from 6-character entites like ≈ to the somewhat rarer
∳) as well as numeric references
(decimal e.g. Ӓ or hex e.g. 𓫶). This fixes that.
* Move handler call inside the inner loop's 'if handler != nil' clause
* Move appender of possible tail bytes outside of loop
* Get rid of outer loop
* Rename i -> beg
Again, this does not seem to gain much performance, but makes the code
significantly more readable.
Rearrange inline parser a little bit to check less conditionals for
every byte.
* Add early check for len(data) == 0
* Move 'for i < len(data)' check inside the (rarer) positive clause of
trigger result handling
* A check for newline turned out to be redundant
* Look up p.inlineCallback only once
All that does not gain much performance in itself, but doesn't hurt and
makes the code structure simpler, which will hopefully allow further
streamlining.
Some renderers might not care to have an explicit list of footnotes at
the end of the document, instead they're interested in the content of
the footnote at the location of a referer. Make their lives easier by
providing such a link
* Only split when inline callbacks consume some bytes
The former hacks around maybeLineBreak and Smartypants are no longer
needed.
The algorithm has been streamlined: shorter, simpler, faster.
The 'currBlock' field of the parser is gone.
* Remove spurious logs
* Unpublish and rename LinkType constants
The constants are only used in the parsing phase, they are not recorded
in the AST directly, so make them private. Improve their names along the
way. Fix tagLength to return two values instead of taking an output
parameter.
* autoLinkType -> autolinkType
And remove unnecessary comment.
Change the way maybeLineBreak gets called to avoid breaking up stretches
of unprocessed characters that smartypants expects.
This inline processing is getting a bit out of hand, something needs to
be done about it.
Autolink detection used to be triggered by a colon and preceding
protocol name used to be rewound. Now instead of doing that, trigger
autolink processing on [hmfHMF] and see if it looks like a link.
Replace output truncation with appropriate inline callbacks. lineBreak()
is now only responsible for handling HardLineBreak. BackslashLineBreak
is handled in escape() and trailing whitespace is considered in
maybeLineBreak().
Link parser used to truncate in two cases: when parsing image links and
inline footnotes. In order to avoid this truncation, introduce a
separate callback for each of these cases and avoid writing extra
characters instead of truncating them after the fact.
Link parser interpreted the sequence "![^foo]" as an image, but if
footnote extension is enabled, it's quite clear that it should be
interpreted as a footnote following something with an exclamation point
at the end.
Closes#194.
When parsing a deferred footnote, we already know it's a footnote from
the '[^' part, so we can use that to hit a proper switch branch
(default) a bit later on.
Closes#164.
Start searching for emphasis character at 0th index instead of 1st.
Fixes a corner case with doubly emphasised code span followed by
another code span on the same line.
Changes interpretation of improperly nested emphasis, hence the change
in TestEmphasisMix().
Closes#156.
The second footnote was treated as if the pair of them were a reference
style link, without checking if the second bit is another footnote.
Fixes issue 158.