Created by Miguel Ángel Fernández Gutiérrez (https://mianfg.me/)
Markdown is one of the most popular and useful markup languages in computer science, as it allows to write stylized notes fastly and comfortably.
On the other hand, LaTeX is another text composition system, way more complex and slow to use. Inserting things as fundamental as bold, italics or lists can be tedious in some occasions.
md2tex
is a solution when we want to write stylized text in LaTeX fastly, as it converts Markdown text to LaTeX.
This program only requires a C compiler. In order to automatically construct all the necessary dependencies, you can use the make
tool
Use the provided makefile
, by doing:
$> make
You will get the compiled file md2tex
, that you can execute on the test Markdown file example.md
:
$> ./md2tex example.md
Check and compare the files example.md
and the recently generated example_md2tex.tex
.
You can check out the translation phases by doing:
$> ./md2tex example.md --preserve
md2tex
offers the following functionalities:
- Translation. It is able to translate text in Markdown according to the GSM specification (see GitHub-flavored Markdown down below) to text in LaTeX. It supports the following Markdown categories (and an extra one 😉):
- HTML (
HTML
). The translator will leave HTML text as-is. - Headers (h1-h6) (
HEADER
). Including setext headers. - Horizontal line (
LINE
). - Images (
IMAGE
). Inserts images and image footers. - Tables (
TABLE
). Inserts tables, with the following features:- Alignment. Inserting
:
to the left or right of the table separators the alignment of each column can be specified (as stated in GFM). - Dynamic adjusting. In case the number of columns of each row is different,
md2tex
will fill the row with empty columns, until it reaches the maximum number of columns of the table, in order to avoid rendering errors in LaTeX.
- Alignment. Inserting
- Itemize lists (
LIST
). - Enumerate lists (
LIST
). Allowing the following additional features:- Smart start. In case the enumerated lists starts in 1, no additional LaTeX directives will be included. If it starts with a number greater than 1, an additional LaTeX directive will be added indicating that the list will start with that number.
- Automatic disarrangement fixing. The translator is able to understand the list ordering, including extra directives in
\item
only for the elements that do not follow the list order.
- Blockquotes (
BLOCKQUOTE
). - Indentation (
INDENT
). - Code blocks (
CODEBLOCK
). Supporting syntax highlighting (the language specified for the Markdown block will be inserted upon translation). - Styling (
STYLE
) with bold, italics, both, strikethrough and inline code. - Links (
LINK
). Saving text and URL in translation. - Escaping. Allows the insertion of an inverted bar
\
before characters that can lead to unwanted behaviour. - Verbatim (
VERBATIM
). The code inserted between$$
will be translated as-is, deleting the surrounding$$
. This is useful, for example, if we want to introduce direct LaTeX commands in Markdown.
- HTML (
- LaTeX commands customization. As most users have a personalized set of commands used for the elements specified above,
md2tex
uses a C header file, calledlatexcommands.h
, that is modifiable by the user. The file is exhaustively docummented to ease the substitution of the default commands by the ones the user wants.
Because of dependencies between different Markdown commands, and the nesting possibility, the translation is executed using four phases:
- Escaping I. Substitutes the escaped characters to safe strings that cannot be interpreted as commands.
- Blocks. Substitutes the main blocks.
- Styling. Substitutes the styling commands.
- Escaping II. Substitutes the escaped strings with the original strings.
Because of the way that lex
parses characters, some commands use the parse with ECHO
strategy (in other words, read the sequence of characters and do nothing). As lex
executes the rule upon the longest string, other commands inside the parsed string will be ignored. We can use this as an advantage, but it also imposes some limitations.
Following, a graph that explains which commands are parsed in each translation phase. Consider the tangs mentioned above, and the following strategies:
- ignore: do not parse.
- ECHO: parse and
ECHO
. - action: the text is modified in some way.
Each arrow symbolizes next phase, starting with the first phase.
As there is no Markdown standard, this translator is based upon the GitHub-flavored Markdown spec. It consists on more than 600 rules and examples.
This translator is able to translate most of these rules, except some of them for two reasons:
- They are "unuseful" rules (most parsers do not implement them).
- They are rules that cannot be interpreted by regular expressions (and thus by regular languages).
Because of some limitations of lex
(and because the implementation is very difficult), the following features do not work completely on this translator:
- Styles nesting: if we want to place italics inside a portion of a bold text, for example. It can be avoided concatenating the available styles.
- Multiple list levels: the translator is only able to interpret lists in one level of depth.
- Styles in image and link texts: can be implemented with one extra translation phase. I deemed it unnecesary.
These features can be implemented in future versions of md2tex
.
Filename | Description |
---|---|
defines.h |
Necessary #define for the lexer, they make the code more legible |
example.md |
Example that uses almost all Markdown tags |
example_md2tex.tex |
Output of executing md2tex to the previous example |
latexcommands.h |
Allows customization of LaTeX commands (se Translator specs) |
lex.yy.c |
It is the generated file by doing lex md2tex.l , I insert it here in case the lex command does not work properly |
makefile |
Eases compilation |
md2tex_main.c |
Contains the main function, calls m2tex.l (translated to C) and generates files for each phase of translation |
md2tex.l |
lex file, it contains the code concerning each phase of translation |