Procedural City Generation.

Extract diagrams from pdf - pdf2svg

PDFs are the primary format for research papers and are used for many other publications as well. It is possible to get diagrams in a friendly usable format with the aid of a few tools

Why bother? Its easy to get a screenshot of a pdf diagram but there are problems.

  1. Resampling bitmaps in diagrams is very lossy and file size bloats considerably when over sampling.
  2. Images do not scale gracefully or provide the same detail and clarity present in the vector diagram of pdfs.
  3. Using images in place of vector diagrams bloats file size.

To extract the full details of diagrams in pdf files it is important to convert them into an equivalent format that can maintain the vector and bitmap information. An obvious choice for this is the open SVG (Simple Vector Graphics) format that also provides a wide range of tools and is designed to work on the web.

Initially a fine service was available here that converted entire pdfs on demand, unfortunetly at time of writing this service is not available anymore.

The tools required to extract pdf diagrams to SVG on your machine are:

  1. pstoedit
  2. ghostscript - gs
  3. Inkscape to view and edit *.svg files.

The commands I use are:

pstoedit -f svg input_file.pdf output_file.svg
pstoedit -f svg -page 1 input_file.pdf output_file.svg
pstoedit -f plot-svg -page 1 input_file.pdf output_file.svg

Note: the -f switch specifies the output format, 'svg' extracts all bitmaps but has some limitations - mainly messing the colours, 'plot-svg' works well but doesn't get all the images.

With diagrams in lovely SVG format you can then import them into your word processor for small files and high detail. For openoffice I use this SVG import plugin.