Re: Is there a utility to open a folder of many pdfs and determine if each one will open? (eom)

From: Posthumus, Etienne <E.Posthumus_at_nyob>
Date: Thu, 29 Jan 2009 12:32:07 +0100
To: CODE4LIB_at_LISTSERV.ND.EDU
Your question reminded me that I wanted to do something similar for a
pile of PDFs from our institutional repository. So I made a small script
in Python that can do this, using the Python module from:
http://pybrary.net/pyPdf/

You can then put the following in a script: (mind the indentation)
---8<------
import os, sys
import pyPdf

if len(sys.argv) > 1:
  PATH = sys.argv[1]
else:
  PATH = '.'

for dirpath, dirnames, filenames in os.walk(PATH):
  for filename in filenames:
    try:
      filename_path = os.path.join(dirpath, filename)
      checked_file = pyPdf.PdfFileReader(file(filename_path, "rb"))
    except Exception, e:
      sys.stderr.write('%s :: %s\n' % (filename_path, e))
---8<---------

If you run it without arguments, it checks the current directory, if you
specify a path it will walk down the tree and check each file found.
If a file is not recognised as a PDF it barfs on stderr.

Have fun.

Etienne Posthumus
TU Delft Library   -  Digital Product Development
t: +31 (0) 15 27 81 949
m: e.posthumus_at_tudelft.nl
skype:  eposthumus
http://www.library.tudelft.nl/
Prometheusplein 1, 2628 ZC, Delft, Netherlands
Received on Thu Jan 29 2009 - 06:33:48 EST