Move these two flags from HTML renderer's flags to extensions. Implement
both since they were not yet implemented in the AST rewrite. Add tests.
Note: the expected test strings differ very slightly from v1. The HTML
produced by v2 has a few extra newlines compared to the old one, but
it's now uniform with other sections of the generated document. If the
newline placement gets cleaned up in the future, this will get fixed
automatically, since the renderer is agnostic about the TOC list.
Separate Smartypants somewhat from the HTML renderer. Move its flags
from HtmlFlags to Extensions (probably should be moved to its own set of
flags, but not now). With that done, do a separate walk of the tree and
either run Smartypants processor if it's enabled, or simply escape text
nodes.
Build a partial tree by adding block nodes. The block nodes will then be
traversed and inline markdown parsed inside each of them. Tests are
broken at this point until the full tree is constructed.
Remove the 'out' parameter. Also, instead of returning and passing the
position of TOC, use CopyWrites to capture contents of the header and
pass that captured buffer instead.
Add a structure to collect output in a buffer (replaces what used to be
the 'out' parameter all over the place).
Notable things about this struct are the captureBuff and copyBuff
buffers. They're intended to redirect all the output (captureBuff) or
make a copy of all the output (copyBuff) while they're set to non-nil.
Here's an example of their intended use:
// what used to be a temp buffer as an 'out' parameter
// var cellWork bytes.Buffer
// p.inline(&cellWork, data[cellStart:cellEnd])
// can now be captured like this:
cellWork := p.r.CaptureWrites(func() {
p.inline(data[cellStart:cellEnd])
})
Change the way maybeLineBreak gets called to avoid breaking up stretches
of unprocessed characters that smartypants expects.
This inline processing is getting a bit out of hand, something needs to
be done about it.
Autolink detection used to be triggered by a colon and preceding
protocol name used to be rewound. Now instead of doing that, trigger
autolink processing on [hmfHMF] and see if it looks like a link.
Replace output truncation with appropriate inline callbacks. lineBreak()
is now only responsible for handling HardLineBreak. BackslashLineBreak
is handled in escape() and trailing whitespace is considered in
maybeLineBreak().
Link parser used to truncate in two cases: when parsing image links and
inline footnotes. In order to avoid this truncation, introduce a
separate callback for each of these cases and avoid writing extra
characters instead of truncating them after the fact.
The callbacks used to return bools, but none of the actual
implementations return false, always true. So in order to make further
refactorings simpler, make the interface reflect the inner workings: no
more return values, no more conditionals.
This is a better style for a set, since each value can only be present
or absent.
With bool as value type, each value may be absent, or true or false. It
also uses slightly more memory.
This is both nasty and neat at the same time. All the code could handle
nested footnotes just fine, the only place that was not working was the
final loop that printed the list. The loop was in a range form, which
couldn't account for another footnote being inserted while processing
existing ones. Changing the loop to the iterative form solves that.
Closes#193.
Change approach at fixing #45: don't patch input markdown at preprocess
pass, instead improve special case detection when parsing paragraphs.
Leave the fenced code block detection in the preprocess pass though,
it's been put to another use since then, to suppress tab expansion
inside code blocks.
If a user provides a ReferenceOverride function, then reference ids
will be passed to the given ReferenceOverride function first, before
consulting the generated reference table.
The goal here is to enable programmable support for
"WikiWords"-style identifiers or other application-specific
user-generated keywords.
Example, writing documentation:
The [Frobnosticator][] is a very important class in our codebase.
While it is used to frobnosticate widgets in general, it can also
be passed to the [WeeDoodler][] to interesting effect.
This might be solveable with the HTML Renderer relative prefix, but
I didn't see a good way of making a short link to 'Frobnosticator'
relatively without having to write it twice. Maybe
'<Frobnosticator>' should work? Should Autolinks work for relative
links?
In addition, I wanted a little more richness. I plan to support
Godoc links by prefixing references with a '!', like so:
Check out the [Frobnosticator][] helper function
[!util.Frobnosticate()][]
The first link links to the Frobnosticator architectural overview
documentation, whereas the second links to Godoc.
Better advice on how to implement this sort of think with
Blackfriday is highly desired.
- Fixes#51, #101, and #102.
- Uses the [code][gfm] mentioned by @shurcooL from his Github
Flavored Markdown parser extension in a [comment on #102][comment].
Since this was mentioned, I assumed that @shurcooL would be OK with
this being included under the licence provided by blackfriday (there
is no licence comment on his code).
- I’ve added it behind another flag, EXTENSION_AUTO_HEADER_IDS, that
would need to be turned on for it to work. It works with both prefix
and underline headers.
[gfm]: 3bec0366a8/github_flavored_markdown/main.go (L90-L102)
[comment]: https://github.com/russross/blackfriday/issues/102#issuecomment-51272260
Add tests to make sure we don't break relative URLs again.
Extracted common html flags and common extensions for easy access from
tests.
Closes issue #104, which was fixed as a side effect of cf6bfc9.
Use an HTML5 compliant parser that interprets HTML as a browser would to parse
the Markdown result and then sanitize based on the result.
Escape unrecognized and disallowed HTML in the result.
Currently works with a hard coded whitelist of safe HTML tags and attributes.
Change firstPass() code that checks for fenced code blocks to check all
of them and properly keep track of lastFencedCodeBlockEnd.
This way, it won't misinterpret the end of a fenced code block as a
beginning of a new one.
This drops the naive approach at <script> tag stripping and resorts to
full sanitization of html. The general idea (and the regexps) is grabbed
from Stack Exchange's PageDown JavaScript Markdown processor[1]. Like in
PageDown, it's implemented as a separate pass over resulting html.
Includes a metric ton (but not all) of test cases from here[2]. Several
are commented out since they don't pass yet.
Stronger (but still incomplete) fix for #11.
[1] http://code.google.com/p/pagedown/wiki/PageDown
[2] https://www.owasp.org/index.php/XSS_Filter_Evasion_Cheat_Sheet