Move these two flags from HTML renderer's flags to extensions. Implement
both since they were not yet implemented in the AST rewrite. Add tests.
Note: the expected test strings differ very slightly from v1. The HTML
produced by v2 has a few extra newlines compared to the old one, but
it's now uniform with other sections of the generated document. If the
newline placement gets cleaned up in the future, this will get fixed
automatically, since the renderer is agnostic about the TOC list.
Separate Smartypants somewhat from the HTML renderer. Move its flags
from HtmlFlags to Extensions (probably should be moved to its own set of
flags, but not now). With that done, do a separate walk of the tree and
either run Smartypants processor if it's enabled, or simply escape text
nodes.
A default HTML renderer for a single node is now easily accessible.
Makes it easy to fall back to the default behavior when writing custom
HTML renderers.
This is the new renderer that walks AST and renders everything to a
buffer. Completely covers all the functionality of the previous renderer
and will likely replace it.
Remove the 'out' parameter. Also, instead of returning and passing the
position of TOC, use CopyWrites to capture contents of the header and
pass that captured buffer instead.
Add a structure to collect output in a buffer (replaces what used to be
the 'out' parameter all over the place).
Notable things about this struct are the captureBuff and copyBuff
buffers. They're intended to redirect all the output (captureBuff) or
make a copy of all the output (copyBuff) while they're set to non-nil.
Here's an example of their intended use:
// what used to be a temp buffer as an 'out' parameter
// var cellWork bytes.Buffer
// p.inline(&cellWork, data[cellStart:cellEnd])
// can now be captured like this:
cellWork := p.r.CaptureWrites(func() {
p.inline(data[cellStart:cellEnd])
})
The callbacks used to return bools, but none of the actual
implementations return false, always true. So in order to make further
refactorings simpler, make the interface reflect the inner workings: no
more return values, no more conditionals.
This changes HTML renderer not to always add a newline character after
<img> tags. This is desirable because <img> tags can be inlined, and
sometimes you want to avoid whitespace on left and right sides. Previous
behavior of always adding a newline would unavoidably create whitespace
after <img> tag.
Update all tests to match new behavior. There are few changes, and
they're completely isolated to inline image tests.
Fixes#169.
Apply gofmt on html.go.
Apply goimports-compatible formatting on block.go (space between standard and third party imports).
Move Travis build status image in a more pleasing, common location.
Remove "Markdown pretty-printer output engine" from TODO steps; this is already done in markdownfmt.
Remove unneeded trailing whitespace in README.
This is specifically driven by the Hugo usecase where multiple documents
are often rendered into the same ultimate HTML page.
When a header ID is written to the output HTML format (either through
`HTML_TOC`, `EXTENSION_HEADER_IDS`, or `EXTENSION_AUTO_HEADER_IDS`), it
is possible that multiple documents will hvae identical header IDs. To
permit validation to pass, it is useful to have a per-document prefix or
suffix (in our case, an MD5 of the content filename, and we will be
using it as a suffix).
That is, two documents (`A` and `B`) that have the same header ID (`#
Reason {#reason}`), will end up having an actual header ID of the form
`#reason-DOCID` (e.g., `#reason-A`, `#reason-B`) with these HTML
parameters.
This is built on top of #126 (more intelligent collision detection for
`EXTENSION_AUTO_HEADER_IDS`).
> This is a rework of an earlier version of this code.
The automatic header ID generation code submitted in #125 has a subtle
bug where it will use the same ID for multiple headers with identical
text. In the case below, all the headers are rendered a `<h1
id="header">Header</h1>`.
```markdown
# Header
# Header
# Header
# Header
```
This change is a simple but robust approach that uses an incrementing
counter and pre-checking to prevent header collision. (The above would
be rendered as `header`, `header-1`, `header-2`, and `header-3`.) In
more complex cases, it will append a new counter suffix (`-1`), like so:
```markdown
# Header
# Header 1
# Header
# Header
```
This will generate `header`, `header-1`, `header-1-1`, and `header-1-2`.
This code has two additional changes over the prior version:
1. Rather than reimplementing @shurcooL’s anchor sanitization code, I
have imported it as from
`github.com/shurcooL/go/github_flavored_markdown/sanitized_anchor_name`.
2. The markdown block parser is now only interested in *generating* a
sanitized anchor name, not with ensuring its uniqueness. That code
has been moved to the HTML renderer. This means that if the HTML
renderer is modified to identify all unique headers prior to
rendering, the hackish nature of the collision detection can be
eliminated.
The flag `HTML_SMARTYPANTS_ANGLED_QUOTES` combined with `HTML_USE_SMARTYPANTS` configures rendering of double quotes as angled left and right quotes (« »).
The SmartyPants documentation mentions a special syntax for these, `<<>>`, a syntax neither pretty nor user friendly.
Typical use cases would be either or, or combined, but never in the same document. As an example would be a person from Norway; he has a blog in both English and Norwegian (his native tounge); he would then configure Blackfriday to use angled quotes for the Norwegian section, but keep them as reqular double quotes for the English.
If the flag `HTML_SMARTYPANTS_ANGLED_QUOTES` is not provided, everything works as before this commit.
For code blocks that contain a certain language of code, the recommended
attribute structure is <pre><code class="language-foo">. This also
corresponds to the behavior expected by various JS syntax highlighters.
The GitHub code block implementation was obsolete, and identical to the
normal implementation except for its attribute structure, so it was
removed.
Closes#108.
Use an HTML5 compliant parser that interprets HTML as a browser would to parse
the Markdown result and then sanitize based on the result.
Escape unrecognized and disallowed HTML in the result.
Currently works with a hard coded whitelist of safe HTML tags and attributes.
Fixing:
55cd82008e
This commit introduced a html tag whitelist which does not include any table tags (<td>,<tr>,<thead>...). Therefore even tables the markdown parser itself generated will be removed.
If autolink encounters a link which already has an escaped html entity,
it would escape the ampersand again, producing things like these:
& --> &amp;
" --> &quot;
This commit solves that by first looking for all entity-looking things
in the link and copying those ranges verbatim, only considering the rest
of the string for escaping.
Doesn't seem to have considerable performance impact.
The mailto: links are processed the old way.
This gives a ~10% slowdown of a full test run, which is tolerable.
Switch statement is still slightly slower (~5%). Using map turned out to
be unacceptably slow (~3x slowdown).
This drops the naive approach at <script> tag stripping and resorts to
full sanitization of html. The general idea (and the regexps) is grabbed
from Stack Exchange's PageDown JavaScript Markdown processor[1]. Like in
PageDown, it's implemented as a separate pass over resulting html.
Includes a metric ton (but not all) of test cases from here[2]. Several
are commented out since they don't pass yet.
Stronger (but still incomplete) fix for #11.
[1] http://code.google.com/p/pagedown/wiki/PageDown
[2] https://www.owasp.org/index.php/XSS_Filter_Evasion_Cheat_Sheet