PHP and Javascript implementations of a simple markdown parser

Custom Syntax Modules

Syntax falls into two categories: block syntax and inline syntax. Block syntax operates on one or more whole lines of content to form structures like paragraphs, lists, tables, etc. Inline syntax operates on stretches of text within a block, such as marking a word as bold or inserting a link.

Block Reader

Block readers extend from MDBlockReader.

They have a priority value that determines when the reader is tried relative to other readers. Built-in readers range from 0 to 100 but any priority value is valid. Lower values are evaluated before larger values. Generally readers for more complex syntaxes should precede simpler ones. E.g. Headers have a lower priority value because they are easy to match by their leading # characters. Paragraphs have a priority of 100 because they are the fallback when no other format matches.

Block readers have a readBlock method that must be overriden. The method is passed an MDState object repeatedly, allowing the reader to check if a matching block is found at the current line pointer. An array of lines of markdown text are in MDState.lines, and the current line pointer is in MDState.p. If the reader does not detect a matching block starting at that line, the method should return null (preferably as soon as possible for good performance). If the line pointer does point to the beginning of a matching block, the p pointer should be positioned to the next line following the end of that block and an MDBlock subclass instance should be returned.

A reader can also perform post-processing after the document has been parsed. This can be done by overriding the postProcess method. It will be passed an array of top-level document blocks. This array can be modified with .splice operations to add, remove, or replace top-level blocks. Blocks can also be recursed into by calling MDBlock.visitChildren with a function. The function will be called with every MDBlock or MDSpan by recursing into all child nodes. This gives the ability to manipulate specific nodes easily without walking the node tree manually.

Inline Reader

Inline readers work differently. They extend from MDInlineReader. Inline parsing is done non-linearly by progressively swapping out elements of a MDToken array with MDSpan elements as patterns of tokens are found that match inline syntax structures.

Inline parsing is done in two phases: tokenization and substitution. These two phases have separate priorities specified by each reader. For the substitution phase, readers can even specify more than one priority with an array so that it can be called multiple times during that phase. As with block readers, lower priority values will be evaluated before higher values. This should be used to match more complex tokens and more complex token patterns before simpler patterns that might lead to ambiguities with syntaxes that use the same characters.

In the tokenize phase, each inline reader is given a chance to look for a relevant token at the beginning of a line. This is done by the readFirstToken method. It is passed a substring of a line of markdown, and the method should see if the string begins with a recognized token. Only tokens at the beginning of the string should be looked for. If no token is found the method should return null. If a token is found an MDToken should be returned.

After a block of markdown content has been tokenized, the substitution phase begins. This phase repeatedly processes an array of MDTokens, allowing readers to swap out one or more tokens with MDSpan subclasses in place. E.g. an emphasis reader will look for an asterisk token, some number of intermediary tokens, and another closing asterisk token. It will swap all of those out with an MDEmphasisSpan. Once the substitution phase is complete, any remaining MDTokens in the array get converted to MDTextSpan.

For inline readers with inner content (such as the previous emphasis example), the inner tokens can be converted to MDSpans by calling state.tokensToSpans(). Alternately, if an inner markdown string needs to be converted to MDSpans, call state.inlineMarkdownToSpan or .inlineMarkdownToSpans (plural).

custom.md 4.1KB Історія Неформатований

Custom Syntax Modules

Block Reader

Inline Reader

custom.md 4.1KB

Історія Неформатований