How to convert Microsoft Word .doc files to PDF from command line

January 14th, 2009 § 2 comments

I know lot of people need it, Google is full of requests by hundred, maybe thousands of users asking for a doc2pdf converter or this kind of thing. I need it too. It is useful to have all files in pdf format (and maybe all merged in one file only) and if you have a lot of files to convert by hand, believe me, you’re not going to have a nice day.

The easy way

It is pretty easy:

$ abiword --to=pdf filename.doc

I don’t think there is so much to explain here. It converts filename.doc to filename.pdf and saves it in the current directory. It was too easy. Why should you need an hard way? I don’t know, I’m sure I need one. Unfortunately abiword’s Microsoft doc file support is not so good, in fact it lacks of the math and image/clipart features. I’m not sure if this affects all versions of abiword but it is sure for the one that comes with ubuntu (actually it doesn’t come with it, you’ve to apt-get install it).

Anyway I really need to see plots and formulas. What you said? OpenOffice supports them. Check it out. Yes I know that, OpenOffice can read almost always plots and images in doc files. Bad luck seems to be here again, OpenOffice lacks of the same command line interface abiword has, so the only way is to open doc files one by one and click on the Export as PDF button. It is very frustrating. So, here is the hard way.

The hard way

Short version (for whom doesn’t like read me be but want to read so much): check the Python-UNO site.

Long version. You need to know what Python-UNO is

The Python-UNO bridge allows to

  • use the standard OpenOffice.org API from the well known python scripting language.
  • to develop UNO components in python, thus python UNO components may be run within the OpenOffice.org process and can be called from Java, C++ or the built in StarBasic scripting language.
  • create and invoke scripts with the office scripting framework (OOo 2.0 and later).

You can find the most current version of this document from http://udk.openoffice.org/python/python-bridge.html

Oh no! I’ll have to download this Python-UNO, read manuals to learn how to use those API and who knows if it’ll work…… No. Just don’t panic. I’m going to tell you something that will make this a not-so-hard way. The first thing is that if you have installed OpenOffice you’re at 50% of the work, in fact Pyhton-UNO comes with OpenOffice since version 1.1.

  • Pyhton-UNO comes with OpenOffice since version 1.1. You don’t have to download and install anything
  • Pyhton-UNO’s guys are so cool that in their code examples there is all of what we need.

From the examples page you can download the ooextract.py script. It has a very simple usage, we need to use it in this way:

$ openoffice -invisible "-accept=socket,host=localhost,port=2002;urp;"
$ python ooextract.py --pdf filename.doc

The result is almost the same of the one of the easy way but this will use OpenOffice for the conversion, so it will do it better. You also may like to write a little shell script to automate the conversion of a bunch of files, so there it is a very simple version:

#!/bin/bash

openoffice -invisible "-accept=socket,host=localhost,port=2002;urp;"
for i in *.doc; do
	python ooextract.py --pdf $i
done

Remember to kill OpenOffice when it ends :o) OpenOffice has now batteries included.

Tagged , , , , , ,

§ 2 Responses to How to convert Microsoft Word .doc files to PDF from command line"

  • Josir Gomes says:

    Thanks for the post Andrea. Some considerations:

    – the abiword generates a formatted PDF as close as the DOC formatting. Of course, if you don’t have the same fonts on abiword machine, output will never be the same.

    – on the other hand, ooextract.py just print the bare bone text inside the doc.

    Since the post is very old, could you share with us if you have found a better way to convert DOC to PDF on Linux using Open/LibreOffice ?

  • Ian says:

    How would I do this in windows?

Leave a Reply to Ian Cancel reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>