This changes HTML renderer not to always add a newline character after
<img> tags. This is desirable because <img> tags can be inlined, and
sometimes you want to avoid whitespace on left and right sides. Previous
behavior of always adding a newline would unavoidably create whitespace
after <img> tag.
Update all tests to match new behavior. There are few changes, and
they're completely isolated to inline image tests.
Fixes#169.
The second footnote was treated as if the pair of them were a reference
style link, without checking if the second bit is another footnote.
Fixes issue 158.
If a user provides a ReferenceOverride function, then reference ids
will be passed to the given ReferenceOverride function first, before
consulting the generated reference table.
The goal here is to enable programmable support for
"WikiWords"-style identifiers or other application-specific
user-generated keywords.
Example, writing documentation:
The [Frobnosticator][] is a very important class in our codebase.
While it is used to frobnosticate widgets in general, it can also
be passed to the [WeeDoodler][] to interesting effect.
This might be solveable with the HTML Renderer relative prefix, but
I didn't see a good way of making a short link to 'Frobnosticator'
relatively without having to write it twice. Maybe
'<Frobnosticator>' should work? Should Autolinks work for relative
links?
In addition, I wanted a little more richness. I plan to support
Godoc links by prefixing references with a '!', like so:
Check out the [Frobnosticator][] helper function
[!util.Frobnosticate()][]
The first link links to the Frobnosticator architectural overview
documentation, whereas the second links to Godoc.
Better advice on how to implement this sort of think with
Blackfriday is highly desired.
The flag `HTML_SMARTYPANTS_ANGLED_QUOTES` combined with `HTML_USE_SMARTYPANTS` configures rendering of double quotes as angled left and right quotes (« »).
The SmartyPants documentation mentions a special syntax for these, `<<>>`, a syntax neither pretty nor user friendly.
Typical use cases would be either or, or combined, but never in the same document. As an example would be a person from Norway; he has a blog in both English and Norwegian (his native tounge); he would then configure Blackfriday to use angled quotes for the Norwegian section, but keep them as reqular double quotes for the English.
If the flag `HTML_SMARTYPANTS_ANGLED_QUOTES` is not provided, everything works as before this commit.
Add tests to make sure we don't break relative URLs again.
Extracted common html flags and common extensions for easy access from
tests.
Closes issue #104, which was fixed as a side effect of cf6bfc9.
Certain tags like <script> but also <title> and others switch an HTML5 parser
into raw mode, which causes the rest of the HTML string to be always parsed as
text, including any elements or entities that we do want to support (e.g. <p>).
As we're going to escape any of the raw text elements anyway (it's e.g. script,
style, title, xmp, noframes, and a couple of others) we can just switch of raw
text parsing by disabling it after each starting tag.
The sanitization code does not retain any particular escaped entities - it
parses the HTML and thus loses the information on what entities were in the
original. The result is correct UTF-8 HTML though.
Use an HTML5 compliant parser that interprets HTML as a browser would to parse
the Markdown result and then sanitize based on the result.
Escape unrecognized and disallowed HTML in the result.
Currently works with a hard coded whitelist of safe HTML tags and attributes.
If autolink encounters a link which already has an escaped html entity,
it would escape the ampersand again, producing things like these:
& --> &amp;
" --> &quot;
This commit solves that by first looking for all entity-looking things
in the link and copying those ranges verbatim, only considering the rest
of the string for escaping.
Doesn't seem to have considerable performance impact.
The mailto: links are processed the old way.
When the source Markdown contains an anchor tag with URL as link text
(i.e. <a href=...>http://foo.bar</a>), autolink converts that link text
into another anchor tag, which is nonsense. Detect this situation with
regexp and early exit autolink processing.
This drops the naive approach at <script> tag stripping and resorts to
full sanitization of html. The general idea (and the regexps) is grabbed
from Stack Exchange's PageDown JavaScript Markdown processor[1]. Like in
PageDown, it's implemented as a separate pass over resulting html.
Includes a metric ton (but not all) of test cases from here[2]. Several
are commented out since they don't pass yet.
Stronger (but still incomplete) fix for #11.
[1] http://code.google.com/p/pagedown/wiki/PageDown
[2] https://www.owasp.org/index.php/XSS_Filter_Evasion_Cheat_Sheet