How to convert Microsoft Word .doc files to PDF from command line

January 14th, 2009 § 2 comments § permalink

I know lot of people need it, Google is full of requests by hundred, maybe thousands of users asking for a doc2pdf converter or this kind of thing. I need it too. It is useful to have all files in pdf format (and maybe all merged in one file only) and if you have a lot of files to convert by hand, believe me, you’re not going to have a nice day.

The easy way

It is pretty easy:

$ abiword --to=pdf filename.doc

I don’t think there is so much to explain here. It converts filename.doc to filename.pdf and saves it in the current directory. It was too easy. Why should you need an hard way? I don’t know, I’m sure I need one. Unfortunately abiword’s Microsoft doc file support is not so good, in fact it lacks of the math and image/clipart features. I’m not sure if this affects all versions of abiword but it is sure for the one that comes with ubuntu (actually it doesn’t come with it, you’ve to apt-get install it).

Anyway I really need to see plots and formulas. What you said? OpenOffice supports them. Check it out. Yes I know that, OpenOffice can read almost always plots and images in doc files. Bad luck seems to be here again, OpenOffice lacks of the same command line interface abiword has, so the only way is to open doc files one by one and click on the Export as PDF button. It is very frustrating. So, here is the hard way.

The hard way

Short version (for whom doesn’t like read me be but want to read so much): check the Python-UNO site.

Long version. You need to know what Python-UNO is

The Python-UNO bridge allows to

  • use the standard OpenOffice.org API from the well known python scripting language.
  • to develop UNO components in python, thus python UNO components may be run within the OpenOffice.org process and can be called from Java, C++ or the built in StarBasic scripting language.
  • create and invoke scripts with the office scripting framework (OOo 2.0 and later).

You can find the most current version of this document from http://udk.openoffice.org/python/python-bridge.html

Oh no! I’ll have to download this Python-UNO, read manuals to learn how to use those API and who knows if it’ll work…… No. Just don’t panic. I’m going to tell you something that will make this a not-so-hard way. The first thing is that if you have installed OpenOffice you’re at 50% of the work, in fact Pyhton-UNO comes with OpenOffice since version 1.1.

  • Pyhton-UNO comes with OpenOffice since version 1.1. You don’t have to download and install anything
  • Pyhton-UNO’s guys are so cool that in their code examples there is all of what we need.

From the examples page you can download the ooextract.py script. It has a very simple usage, we need to use it in this way:

$ openoffice -invisible "-accept=socket,host=localhost,port=2002;urp;"
$ python ooextract.py --pdf filename.doc

The result is almost the same of the one of the easy way but this will use OpenOffice for the conversion, so it will do it better. You also may like to write a little shell script to automate the conversion of a bunch of files, so there it is a very simple version:

#!/bin/bash

openoffice -invisible "-accept=socket,host=localhost,port=2002;urp;"
for i in *.doc; do
	python ooextract.py --pdf $i
done

Remember to kill OpenOffice when it ends :o) OpenOffice has now batteries included.

GCJ: Always Turn Left

July 16th, 2008 § 1 comment § permalink

Hi there :^)

Let’s continue with the “Practice Problems” of GCJ, the next is a problem about perfect mazes.

This time the source code will be quite long, sorry :^D. I’m currently having some problems with wordpress and utf8 (if you know how to solve quickly please leave a comment), so the complete source code is available here.

We have to map the maze, square by square we have to know where you can move starting from that square. We use a code to do this.

Idea #1
Look at the map of “cases”. Choose one of them, any of them. To be short consider N as North, S as South and so on… Now consider (in example) E = 1 if you can move to East, else 0, and do it with the others too, then put the result in this order EWSN. It’s binary, yes. Got it? Try to convert it to hex and compare the result with the problem case-codes ;^)

I’ll import the alien-numbers to do the conversion (I know that’s simple and there are other ways, but you will see what I used to debug :^D)

I’m going to comment my implementation part by part. Let’s start.

Importing

1
2
3
from __future__ import with_statement
from copy import deepcopy
import aliensys

with statement with input and output files (I did know show that on the previous post)

deepcopy is very useful. Remember that in python names are just a reference. I had serious problem debugging my code due to this!

aliensys… well you can now imagine why I’m using this… but that’s not the only reason :^D

class Mode

5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
class mode(object):
    def code(self):
        HEX = '0123456789abcdef'
        BIN = '01'
        number = ''
        for i in (self.e,self.w,self.s,self.n):
            number += str(1*i)
        return solve(number + ' ' + BIN + ' ' + HEX)
 
    def __init__(self, e = False, w = False, s = False, n = False):
        self.e = e
        self.w = w
        self.s = s
        self.n = n
    def change(self, k, val):
        if k == 'e':
            self.e = val
        elif k == 'w':
            self.w = val
        elif k == 'n':
            self.n = val
        elif k == 's':
            self.s = val
 
    def draw(self):
        MAZE = u'.???????????????'
        HEX =   '0123456789abcdef'
        return solve(self.code()+' '+HEX+' '+MAZE)
    def __str__(self):
        try:
            return self.code()
        except AttributeError:
            return 'tmpNone'
    def __repr__(self):
        try:
            return str(self.code())
        except AttributeError:
            return 'tmpNone'
    def draw(self):
        MAZE = u'.???????????????'
        HEX =   '0123456789abcdef'
        return solve(self.code()+' '+HEX+' '+MAZE)

__init__ I just get EWSN and save them.
code It returns the case-code using EWSN

Now come one of the interesting parts :^)
Note that I pasted it as is, sorry for the question marks, it’s a wordpress problem with utf8, anyway it should be clear in my perfect maze solution file. Check it.

Debugging this was not simple. I used the __str__ and __repr__ because the IDLE debugger uses them, but it was not enough. I looked for utf8 symbols that do the trick, I can’t show those in this post (damned wordpress) but some aliens could use them as numeral system… and I’ve a converter to read that :^D

I can draw a “mode” of a square of the maze…

class Square

44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
class sqr(object):
    def __init__(self, x,y,m):
        self.x = x
        self.y = y
        self.m = m
 
    def move(self, d, update=True):
        if d == 'e':
            self.x+=1
            if update: self.m.w=True
        elif d == 'w':
            self.x-=1
            if update: self.m.e=True
        elif d == 'n':
            self.y-=1
            if update: self.m.s=True
        elif d == 's':
            self.y+=1
            if update: self.m.n=True
 
    def __eq__(self, x):
        return ((self.x == x.x) and (self.y == x.y))
    def __cmp__(self, x):
        if ((self.x == x.x) and (self.y == x.y)):
            return 0
        elif self.y==x.y:
            return cmp(self.x,x.x)
        else:
            return cmp(self.y,x.y)

We have to know the position of a square (x and y) and its mode.
We can move a square (to E,W,N or S), and from that position it could (if want) come back to the old position (the update part)

__cmp__ and __eq__ are used in the Maze class because I need to sort squares. Given a list of squares comes first the square that has the minor y and minor x, then comes the one that has the same y but greater x else a greater y and the minor x… and so on… I think it will be clearer in the next class…

ls2str

82
83
84
85
86
def ls2str(ls):
    r = ''
    for i in ls:
        r+=str(i)
    return r

That’s a simple function that convert a list to a string… No. str(ls) is just like print ls and that’s not what we will need ;^)

class Maze
Now the big part :^D I’ll slice it in more parts because is very long (120 lines) and not in the same order of the real file. Anyway I’m trying to save line numbers so that you can check ;^)

__init__

193
194
195
196
197
198
    def __init__(self, i2o, o2i):
        self.i2o = list(i2o) # in 2 out
        self.o2i = list(o2i) # out 2 in
        self.head = 's'
        self.sqrs = []
        self.init=False

The problem gives us two path, as string, the first is from the entrance to the exit (i2o) and the second is from the exit to the entrance (o2i). Our head is poting to S, we know no squares now, because we haven’t started yet ;^)

have_sqr

89
90
91
92
93
94
95
    def have_sqr(self, s):
        check = False
        for i in self.sqrs:
            if s == i:
                check = True
                break
        return check

It’s simple, it there is already a square with the same x and y returns True, else False… it will be useful :^D

turn

96
97
98
99
100
101
102
103
104
105
106
107
108
    def turn(self, direction):
        if direction == 'R':
            for i in zip(list('ewsn'), list('snwe')):
                if i[0] == self.head:
                    self.head = i[1]
                    return True
        elif direction == 'L':
            for i in zip(list('ewsn'), list('nsew')):
                if i[0] == self.head:
                    self.head = i[1]
                    return True
        else:
            return False

When you are walking trough the maze you don’t change square if you turn. Returns True if you turned (and change the direction, read with attention those lines), else returns False

walk
That’s one of the longest parts. We want to walk trough the maze using a path….

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
    def walk(self, path, reverse=False):
        pre = False
        for i in path:
            if pre:
                pass
            elif reverse:
                pre = deepcopy(self.sqrs[-1])
            else:
                pre = sqr(0,-1,mode())
 
            if self.turn(i) == False:
                if self.have_sqr(pre) == False:
                    self.sqrs.append(pre)
                index = self.sqrs.index(pre)
                self.sqrs[index].m.change(self.head, True)
                new = deepcopy(pre)
                new.m = mode() # resetting modes
                new.move(self.head)
                pre = deepcopy(new) # remember to update pre
        if self.have_sqr(new) == False: # the last ;)
            self.sqrs.append(deepcopy(new))
            if reverse == False: # Changing direction...
                self.last = self.sqrs[-1]
                for dirs in zip(list('ewns'), list('wesn')):
                    if self.head == dirs[0]:
                        self.head = dirs[1]
                        break

It’s simple. Don’t panic :^)
We start from a square, the path tells what to do. While it says to turn, we turn, if it says to go we move to a new square… are you sure is it new? Check it, if not add it, else update the modes.

If B is below A and you can go from A to B you can also go from B to A, so A has S = 1 and B has N = 1 ;^)

walkall

138
139
140
141
142
143
144
    def walkall(self):
        if self.init:
            pass
        else:
            self.walk(self.i2o)
            self.walk(self.o2i, True)
            self.init = True

To have a clear and correct map we need to walk using i2o and o2i too, this means to walk trough all paths…

146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
    def getlist(self):
        self.walkall()
        m = deepcopy(self.sqrs)
        for i in m:
            if i == self.last:
                index = m.index(i)
                del m[index]
        xs = [i.x for i in m]
        xmin = min(xs)
        xm = max(xs)
        ys = [i.y for i in m]
        ym = max(ys)
 
        m.sort()
        r = []
        a = []
 
        for y in xrange(0,ym+1):
            a = []
            for x in xrange(xmin,xm+1):
                c = sqr(x,y,mode())
                if 1 == 0:
                    pass
                else:
                    if c in m:
                        i = m.index(c)
                        a.append(m[i].m)
                        del m[i]
                    elif xmin<x<xm:
                        a.append('0')
            r.append(ls2str(a))
 
        return r

The requested output is a list of line, each one contains the modes of the squares of that maze’s associated line. We need to order self.sqrs that’s why redefined the __cmp__ method for a square.

We’ve to delete the starting and the ending point from the map too, this method is too long for me, I think there is a better way to do this… but was the first way that turned in my mind…

draw

180
181
182
183
184
185
186
187
188
189
190
191
    def draw(self):
        MAZE = u'.???????????????'
        HEX =   '0123456789abcdef'
        maxs = []
        p=[]
        for i in self.getlist():
            l = solve(i+' '+HEX+' '+MAZE)
            maxs.append(len(l))
            p.append(l)
        m = max(maxs)
        for i in p:
            print i.center(m)

My draw idea was cool enough that I forgot I’ve already implemented in the mode class… i did it again! o_O Don’t ask me why, maybe I was drunk when I wrote that :^O

Anyway… we can draw a maze now :^) really :^D

File eater
Final part, we have an input file, and we’ve to produce an output one…
eatFile will do the trick

209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
def eatFile(path_i, path_o, show=True, draw=False):
    with file(path_i, 'r') as f_in:
        lines = f_in.readlines()
        n = int(lines[0].replace('\n', ''))
        del lines[0]
        with file(path_o, 'w') as f_out:
            for i in xrange(0,n):
                s = lines[i].replace('\n', '').split(' ')
                m = maze(s[0],s[1])
                if show:
                    print 'Case #%d:' % (i+1)
                if draw:
                    m.draw()
                    print '\n\n'
 
                if draw:
                    f_out.write('Case #%d:\n' % (i+1))
                for j in m.getlist():
                    f_out.write(j+'\n')
                del m

Working with text files is painless with python. Using the with statement I don’t have to open and close files, it’s cool :^D

The mechanism is simple:

  • Get the number of lines
  • For each line of the input file:
    1. Create a maze
    2. Print and draw if you have to
    3. write to the file output
  • Finish :^D

and now the operative part….

237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
# Solving the question... (with draws :D)
import time
start = time.time()
before = time.time()
path1 = 'B2-small.in'
path2 = 'B2-small.out.txt'
eatFile(path1,path2,True,True)
print 'Small in',time.time() - before,'secs'
 
 
before = time.time()
path1 = 'B2-large.in'
path2 = 'B2-large.out.txt'
eatFile(path1,path2,True,True)
print 'Large in',time.time() - before,'secs'
print 'Both in',time.time() - start,'secs'

lines 243 and 250 are enough but I wanted to measure the time it tooks to solve the problem :^D
On my pc the final lines were:
Small in 9.17199993134 secs
Large in 247.765999794 secs
Both in 256.983999968 secs

But I had too many thing opened while testing it that’s not a valid test, anyway try it on your computer and tell me your time ;^)

Conclusions
There few notes I’d like to leave.
I spent time, like 2 hours, to solve this. And it is not optimized (I wrote a change method for mode but I don’t use it in sqr when I should, but there are lots of these little things). I was coding thinking of the future but in the future i forgot of the past and… it’s, maybe, TOO object oriented… I’m too slow for this competition I guess… I hope someone will find this useful, my code need to be shorter!

Thanks for reading

return ‘Bye’

GCJ: Alien Numbers

July 10th, 2008 § 3 comments § permalink

In these days I started to study python. That’s a very cool language and I have lots of things to learn, I need exercise. What is better than google code jam to practise?

I will share with you my solution for the first of the “Practice Problems”, it’s about aliens’ numeral system :^) and I’ve, obviously, written my solution in python ;^)

I’m posting it to discuss them with you, maybe I’ll do the same when I’ll have the time to do the others. It doesn’t want to be a spoil so if you want to solve this alone just go and code, when you’ve finished you can come back and comment my code and tell me if you coded a better one :^)

Well, the text of the problem is:

Problem

The decimal numeral system is composed of ten digits, which we represent as “0123456789” (the digits in a system are written from lowest to highest). Imagine you have discovered an alien numeral system composed of some number of digits, which may or may not be the same as those used in decimal. For example, if the alien numeral system were represented as “oF8″, then the numbers one through ten would be (F, 8, Fo, FF, F8, 8o, 8F, 88, Foo, FoF). We would like to be able to work with numbers in arbitrary alien systems. More generally, we want to be able to convert an arbitrary number that’s written in one alien system into a second alien system.

Input

The first line of input gives the number of cases, N. N test cases follow. Each case is a line formatted as

alien_number source_language target_language

There are some other things, if you want you can read the whole text here.

Ok? That’s should be not so difficult. I’m human, coder and engineering student, I hear aliens speaking every day, it’s ok. The problem is to speak with them :^)

We’re going to:

  • Get the number we want convert from system1 to system2
  • Convert to decimal that number from system1, we’re human and used to see decimal number :^)
  • We know what number it is, we can now convert the decimal number to system2

Well try to think on how we convert from binary to decimal and viceversa, or from hex to decimal and viceversa…

Here is my implementation:

class aliensys(object):
    def __init__(self, stringa):
        self.symbols = list(stringa)
        self.N = len(self.symbols)
    def a2d(self, n):
	"Convert n from alien to decimal"
        s = 0
        n = list(n)
        n.reverse()
        for i in xrange(0, len(n)):
            s += self.symbols.index(n[i])*self.N**i
        return s
    def d2a(self, n):
	"Convert n from decimal to alien"
        s = ''
        while n>=1:
            r = n%self.N
            n/=self.N
            s = self.symbols[r]+s
        return s
    def convert(self, n, target): # target must be an aliensys
        to = self.a2d(n)
        return target.d2a(to)
 
 
def solve(string):
    s = string.replace('\n', '')
    s = s.replace('\r', '')
    n, src, tgt = s.split(' ') # number, source, target
    src = aliensys(src)
    tgt = aliensys(tgt)
    return src.convert(n,tgt)

And it’s so simple to use:

>>> solve('CODE O!CDE? A?JM!.')
'JAM!'

ps: did you see that? how WP-syntax is highlighting “string” and “self” on my script? They’re not keyword in pyhton, str is, it shouldn’t do that… I’ve to fix this :\

yield ‘Bye’