A look at pdftk

I don’t know how many ways you can create PDF files in Linux. Most applications let you save documents directly to PDF, and you can convert files to PDF quite easily. But manipulating those PDFs is a bit trickier. various applications let you to fiddle with PDFs in one or two ways. But if you’re a command line junkie, an app called pdftk (PDF Toolkit) is practically an all-in-one solution. It’s the closest thing to Adobe Acrobat that I’ve found for Linux.

pdftk’s developer describes it as the PDF equivalent of an “electronic staple remover, hole punch, binder, secret decoder ring, and X-ray glasses.” That’s pertty close to the truth. Pdftk can:

  • Join and split PDFs
  • Pull single pages from a file
  • Encrypt and decrypt PDF files
  • Add, update, and export a PDF’s metadata
  • Export bookmarks to a text file
  • Add or remove attachments
  • Fix certain damaged PDF
  • Fill out PDF forms

You can download pdftk either as source code, or in packages for various flavours of Linux — for example, Debian, RPM-based distributions, FreeBSD, or Gentoo. If you’re going to compile pdftk, read this to learn out about the program’s dependencies.

As I mentioned earlier, pdftk is a command-line tool. Its options can be complicated, especially for complex operations. You’ll be doing quite a bit of typing, but that shouldn’t put you off using pdftk. When I started working with pdftk, I found myself using only a few of its functions: joining and splitting PDF files, adding metadata, and password protecting the file

Combining PDF files

pdftk’s can combing two or more PDF files, similar to joinPDF (which I discussed here). To do that, open a terminal window and change to the directory containing the PDF files that you want to combine. Then, type the following command:

pdftk file1.pdf file2.pdf cat output newFile.pdf

cat is short for concatenate — join together, for those of us plain plain — and output tells pdftk to write the combined PDFs to a new file; in this case, newFile.pdf.

Pdftk doesn’t retain the bookmarks that might have been in one or all of the files you’re combining, but it does keep hyperlinks to both destinations within the PDF and to external files or Web sites. Where some other applications point to the wrong destinations for hyperlinks, the links in PDFs combined using pdftk managed to hit each link target.

Splitting files

Splitting PDF files with pdftk can be … interesting. The burst option breaks a PDF into multiple files. How many? How about one file for each page. To use it, type:

pdftk style_guide.pdf burst

With larger documents you wind up with a lot of files with names corresponding to their page numbers, like pg0001 and pg0013. It’s not very intuitive or useful, especially if you want only a few pages.

Of course, pdftk remove specific pages from a PDF file. For example, to remove pages 10 to 25 from a PDF file, type the following command:

pdftk myDocument.pdf cat 1-9 26-end output removedPages.pdf

The options 1-9 and 26-end tell pdftk to ignore pages 1 through 9 and page 26 to the last page, and copy the pages between those ranges to the file removedPages.pdf.

I’ve used this feature quite a bit — mainly to trim pages from work samples that I have posted on my company’s Web site, and to extract articles from back issues of a magazine to which I contribute. The resulting files are small, and the PDFs are clear and easy to read.

Adding attachments to a PDF

To be honest, I miss Adobe Acrobat’s ability to attach files to a PDF. When working with PDFs on Windows, I regularly used this feature to include addenda, surveys, or additional information with a published PDF. Until I found pdftk, I was forced to move my PDF documents to a computer running Windows whenever I needed to attach a file.

Why attach a file to a PDF instead of sending an archive? Mainly convenience. If you move a PDF from one computer to another, and don’t move the archive along with it, you won’t have access to the attachments. And instead of pulling a file from an archive to view it, you just double-click on the attachment’s icon to open the file from your PDF viewer.

Using pdftk, you can easily attach binary and text files to a PDF. You can even specify what page of the PDF you want the attachment to appear on. Just type the following command:

pdftk htmltidy.pdf attachfiles commandref.html topage 24 output htmltidybook.pdf

Obviously, attachfiles is the option to attach files. topage 24 tells pdftk to attach the file command_ref.html to page 24 of the resulting PDF.

I’ve attached OpenOffice.org Writer documents, tar.gz and zip archives, and text and HTML files to various PDF documents. Apart from a noticeable increase in the size of the PDF file, there were no nasty side effects.

How do you know a PDF contains an attachment? Look for the thumbtack icon in the PDF. This only works in Adobe’s Acrobat Reader, though. Attachments don’t appear in applications like Xpdf, Evince, KPDF, or gv.

Adding metadata and passwords to a PDF

Pdftk has a number of options that you might use infrequently, but that are very useful when you need them. Two of them are updateinfo and userpw.

When you create a PDF, it might contain no or incomplete metadata, which is information that describes the PDF. Metadata can come in handy when you or your users need to organize or index a set of PDF files. Using pdftk and a text file, you can change or add metadata to the PDF by typing the following command:

pdftk DocBookOverview.pdf updateinfo data.txt output DocBookOverview.pdf

In this case, the file data.txt contains an InfoKey and InfoValue pair, like this:

InfoKey: Keywords
InfoValue: DocBook,writing,documentation,background

You can change only the following metadata items with pdftk: title, author, subject, producer, and keywords.

If you’re working with PDFs that contain sensitive information, you may want to make sure that only certain people can view a PDF by apply a password to it with the user_pw option:

pdftk salesreport.pdf output SalesReport.pdf userpw PROMPT

You will be prompted for a password of up to 32 characters. When someone tries to open the PDF, they will be asked to enter the password.

Conclusion

pdftk is one of the most useful tools for manipulating PDF file that I’ve found for Linux. It’s not the easiest software to work with, but you’ll get the hang of it after a bit of practice.

This work, unless otherwise expressly stated, is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

flattr this!

  • http://kyle.skrinakcreative.com Kyle

    I wish pdf editing tools on linux weren’t so anemic. Thanks for compiling this list — a very helpful quick reference.

  • http://www.tonyredhead.com Tony Redhead

    Hi,

    Just to let you know i found out the command line. I was leaving “output” off. So the final is pdftk combined.pdf cat 1 3-end 2 output final.pdf.

    thanks,

    Tony

  • Pingback: (Re)inserting metadata into a PDF file | Ubuntu Musings

  • Craig

    Thankyou, this article really helped me get started.

  • Johnson Kirsten

    This article is very helpful.

    I just want to share a PDF filling out tool I’ve discovered, just in case you need it.

    PDFfiller.com allowed me to allowed me to upload word and powerpoint to be converted to PDF. Let’s me fill out the form neatly and after either save, print, fax or SendtoSign the forms.r a way on how to electronically fill out W2 form, let me share my experience with the site I’ve come across with that might help everyone.

    I was able to get also the form i need through http://goo.gl/31FFdJ.

    Such a great experience!

  • Scott Nesbitt

    Thanks for the comment. I have used PDFFiller.com. Good service, though it’s not open source. Which is a big criteria for mention on this blog :-)