Playing with Pandoc

Markup! As you might know, I write for a living. Much of my income comes from technical writing. So, I have varying degrees of knowledge of documentation tools and techniques. But unlike many technical writers I know, I’m also interested in markup languages (and not just the ones I need for the job). Especially lightweight markup languages and their associated tools.

I do a lot of my writing using Markdown. One limitation of Markdown is that the stock tools can only convert it to HTML. HTML is fine, but sometimes I need to take what I’ve written and formatted using Markdown and convert it to other formats like LaTeX or PDF.

When I need flexibility when converting from Markdown to other formats, I turn to Pandoc.

Pandoc?

Yes, Pandoc. It’s a set of libraries and a command line tool to convert between various markup languages. Among other languages, Pandoc can deal with:

  • Textile
  • reStructuredText
  • HTML
  • LaTeX

And, of course, Markdown.

Pandoc is very powerful and flexible. It’s also a command line tool, but don’t let that scare you.

While Pandoc is not standard kit with most Linux distributions, it is available in the software repositories for many major distros. Check your package manager. You can also download it.

Using Pandoc

I usually use Pandoc with files formatted with Markdown. So, let’s use that as our starting point.

Crack open a terminal window and navigate to the directory that contains the file(s) that you want to convert.

At it’s most basic, Pandoc’s syntax is fairly simple. You run the command pandoc followed by -o fileName followed by the file that you want to convert. For example:

pandoc -s -o myFile.tex myFile.md

That converts a Markdown file to LaTeX. Notice the -s option? on the command line. When converting to another markup language, that option surrounds your text with the proper header and footer. You add that when you’re creating documents that will exist on their own. If you plan to combine a bunch of, say, LaTeX or HTML documents you can leave the -s option out.

Let’s look at a real example. A while back, I wrote a post about my experiences with a Chromebook. I want to convert that post into three formats:

  • LaTeX
  • PDF
  • ODT

Let’s take quick look at how. First up, LaTeX. I run the following command:

pandoc -s -o chromebook.tex chromebook.md

The resulting file looks like this in a text editor:

Pandoc converted to LaTeX

I can modify the header to use whatever LaTeX class (which determines the look and feel of the output) that I prefer.

When I want to go PDF, I need to have a functioning TeX system installed on my computer. That’s standard kit for most Linux distros so you shouldn’t have a problem. If I’m running version 1.9 of Pandoc or newer, I use this command:

pandoc chromebook.md -o chromebook.pdf

If I’m using an older version of Pandoc, I need to run the script markdown2pdf (which is installed with Pandoc):

markdown2pdf chromebook.md

The results look like this:

Pandoc converted to PDF

It’s not the prettiest document, but it works. I use this conversion mainly for archiving.

Finally, I want to convert the file to ODT (the format for such applications as LibreOffice Writer and AbiWord). To do that, I run the following command:

pandoc -o chromebook.odt chromebook.md

Here’s what the results look like in LibreOffice Writer:

Pandoc converted to ODT

Final thoughts

Obviously, Pandoc is for much more than converting files formatted with Markdown to other formats. It really is a swiss-army knife. But, for now, it’s most useful to me as a Markdown coverter. I might move on to doing other things with Pandoc. It would be a shame to waste the tool’s power and flexibility.

This work, unless otherwise expressly stated, is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

flattr this!

  • http://twitter.com/airportyh airportyh

    Thanks! I had some trouble figuring out how to pandoc on my own, and this was really helpful.