Tuesday, April 19, 2011

Merging Multiple PDF Files

If you have taken one of the RTEMS Classes from me, you will remember that the material for the Open Class comprises over 1000 PowerPoint slides. [1]  These slides are broken down into sections and within each section, there is a unit of 20-100 slides.  Each unit is an individual file.  Getting from 50+ PowerPoint files to printed material is a tedious and error prone process by hand.  The class and this process have evolved over the past ten years.  In this post, I will provide some insight into how this is done.

The first piece of magic is an MS-Office macro written by someone here are OAR.  It reads in a list of files from a text file.  The files are in the order they are to be printed.  This macro automates either generating PDFs or directly printing the files in the various handout formats (1 per page, 3 per page, 6 per page, etc.).  The PDFs are generated using PDFCreator which makes it possible to specify a unique file name for each PDF file.  The PDF files are prepended with a number so they sort and print in the correct order when wild-carded.  This produces files like this:

001-OpenClass.pdf
002-IntroToRTEMS.pdf
003-ProfilesAndRTEMS.pdf

...
Once the PDF files are generated, they can be printed easily.  However, I sometimes teach the class in Munich and have to send the PDFs to the nice folks embedded brains GmbH  to print there.  For the first few classes, there I sent them a large number of PDFs.  When someone dropped the master copy, we learned it didn't have page numbers.  This taught us to add page numbers. :-D

But this still leaves us with a large number of PDFs.  The solution to this was a custom  shell script that merges them into proper double-sided "units".  Each unit is then a single PDF file which goes between divider tabs in a binder.  Now there are seven PDF files for the Open Class and each page is numbered.  Much safer and easier.

The script to merge the PDF files was developed and executes on GNU/Linux (no surprise, right?).  The key to this program is this shell function:

merge_them()
{
  outf=$1
  shift
  inf=$*
  gs -dNOPAUSE -sDEVICE=pdfwrite -sOUTPUTFILE=${outf} -dBATCH ${inf}
}
This function takes the name of output file as the first argument and the set of PDF files to merge as the rest of the arguments.  When invoked, the command looks something like this in my shell script:

merge_them ${mergedir}/01-Intro.pdf 00[1-5]*.pdf
That takes the first five "section" PDF files and merges them to produce the PDF file named 01-Intro.pdf for the Introduction to RTEMS "unit".  This file  is  placed in the output directory ${mergedir}.  This is repeated for each of the units in the class.

But remember -- I want to produce a double-sided master copy.  Sometimes, the merged PDF files for a unit will have an odd number of pages.  The script has another section to detect merged PDFs with odd number of pages and add a page the says "Intentionally Blank"  [2]  The following fragment of the shell script determines how many pages are in the PDF file. If the number of pages is odd, it them adds the Intentionally Blank PDF file.

pages=`pdfinfo $1 | grep Pages | cut -d':' -f2`
remainder=`expr ${pages} % 2`

if [ ${remainder} = 1 ] ; then
   mv $1 XXX.pdf
   merge_them $1 XXX.pdf ${BLANKPDF}
   rm -f XXX.pdf
fi
And that's it.  It only takes about a minute to run and produces double-sided files that are very easy to send to a printer.  We have a nice duplex printer and by using paper that is already 3-hole punched, constructing the material for the RTEMS classes is much simpler than it was 10 years ago.

--joel

[1] OpenOffice did not exist when the slides were created.  I have tried to use OpenOffice with them, but it butchers the slides and destroys. them.  If this is ever resolved, I will happily use OpenOffice for the class.

[2] The "Intentionally Blank" page was generated in OpenOffice. :D

No comments:

Post a Comment