Convert PDF file to text with pdftotext


In many cases it can be helpful to access text from within a PDF file but accomplishing this can be next to impossible. Luckily in Linux there is a command line program called pdftotext which is included with the xpdf package.

This first step is making sure that the xpdf package is installed. In Ubuntu you can use the following command.
$ sudo apt-get install xpdf

Now you can convert a PDF to text with pdftotext. This code will output a file named &#60filename&#62.txt
$ pdftotext <filename>.pdf

You can also attempt to preserve some of the formatting within the PDF such as columns and spacing by using the “-layout” option.
$ apdftotext -layout <filename>.pdf