February 14, 2025 | General, PDF, PDF optimization

Optimization of Existing PDF files: Methods


Illustration for the introduction to the PDF Optimization In-depth Series

This paper was originally a presentation in French delivered at the PDF Day – France by Loïc Carrère, CEO of ORPALIS, in April 2019.
Organized by the PDF Association, the PDF days are the meeting place of the PDF industry, where experts conduct educational (non-commercial) presentations, panel, and discussion-based sessions about the format.


The richness of PDF offers many opportunities to reduce the weight of existing documents.

Organizations need to meet more and more legal requirements for archiving and data retention and often adopt a strategy to reduce the amount of storage used by their existing documents.

The PDF Optimization In-Depth Series


This series of articles will address the issues and constraints of such an approach, as well as various optimization methods that can be applied.

We will try to describe a maximum of optimization techniques, with or without loss of data, which can be adapted according to one’s expectations. We will discuss them with case studies dealing with documents of different nature (documents with vector content and documents containing only images).

Therefore, we will focus on the following issue: can compression allow data loss? If so, to what extent?

Choosing Lossless or Lossy Compression

We will introduce several methods of compression, some without loss of data, others with degradation. For this second category, it will be necessary to decide in advance whether the loss of data is tolerable, if so, to what extent.

For reducing the file size of PDF documents, images are the very first logical candidates for any compression. The reason is apparent – the image can be compressed with retaining its approximate representation of the original data without losing its meaning.

Did you know that a 50% compression applied to a single image will decrease the file size of that image by 90%?

Lossy Compression

Lossy compression often drastically reduces the file size, but it is at the expense of an irreversible loss of information. Some of the removed data are redundant. Some of them are not, but these are mostly not noticeable to the users in the result. There is no way back once you have used lossy.

When you want to perform further processing on the image, do not select lossy compression.

However, the definite pros of lossy image compression are the best compression ratios with good enough approximations.

Lossy compression will mostly be related to image reprocessing.
It will be necessary for each image to decide if one can:

  • Change its color depth. IE: 24-bit to 8-bit or 1-bit per pixel.
  • Perform a downscaling.
  • Alter its pixels (noise suppression, trimming, MRC processing …).
  • Re-encode with lossy compression algorithm (JPEG2000 – JBIG2).

Lossless Compression

In opposite, lossless compression maintains the image quality without any change while reducing its file size.
This is mainly achieved by removing metadata from source images. Therefore the size reduction is not so exciting here.

Lossless compression is recommended if the image is going to be processed further with its original quality.

In other words, it is usable for discrete data or any raster images as it retains raster values during compression.

Compression Schemes

The PDF specification allows seven compression schemes for images, which are:

  • LZW – An adaptive compression method, lossless, and mainly used in GIF and TIFF digital image formats.
  • RLE (RunLengthDecode) – A lossless compression method, primarily used for Group 3 and 4 faxes (black and white), BMP, and PCX.
  • CCITT (CCITTFaxDecode) – A lossless compression method, for bitonal images only.
  • JPEG (DCTDecode) – A compression method typically used with loss, for 8-bit grayscale or 24-bit color images.
  • zlib / deflate (FlateDecode) – A lossless compression method that couples the LZ77 algorithm and Huffman coding.
  • JBIG2 (JBIG2Decode) – A compression method, which can be lossy or lossless for bitonal images only.
  • JPEG 2000 (JPXDecode) – A compression method commonly used with loss, for 8-bit grayscale or color images using wavelet transforms.

For example, image formats like RAW, BMP, GIF, and PNG are all lossless image formats.
JPEG is a lossy compression type commonly used for digital images.
An alternative to the JPEG is the TIFF format with an LZW compression, which is considered a lossless file format.
And JBIG2 is an image compression suitable for both lossless and lossy.

To summarize, the dilemma of lossy vs. lossless is not about what is good or bad. It is about what suits the best your purpose.


The next article will be about lossless methods: deleting unnecessary and unused content and objects.

Stay tuned!

FAQs

1. Are PDF files inherently lossy or lossless?

PDF is a flexible container format that can include both lossy and lossless compression methods, depending on the type of content (e.g., text, images) and the compression settings applied during PDF creation or optimization.

2. Which compression methods in PDFs are lossless?

PDFs often use lossless compression for text and bitonal images through techniques like:

  • Flate (zlib/deflate)
  • LZW (Lempel-Ziv-Welch)
  • CCITT Group 3/4 Fax (for black-and-white scans)
    These retain 100% of the original data during compression and decompression.

3. Which components of a PDF typically use lossy compression?

Images embedded in PDFs frequently use lossy compression, especially:

  • JPEG (DCTDecode) for color and grayscale images
  • JPEG 2000 for high-ratio image compression
  • JBIG2 (in lossy mode) for scanned black-and-white documents
    This reduces file size but can degrade image quality.

4. Can a single PDF contain both lossy and lossless elements?

Yes. A PDF may use lossless compression for text and vector graphics, while using lossy methods for embedded images. This hybrid approach balances quality preservation with reduced file size.

5. How does JBIG2 compression in PDFs differ from other methods?

JBIG2 is unique in offering both lossy and lossless modes for black-and-white images. In lossy mode, it may substitute similar symbols (e.g., letters), which can lead to subtle content errors, especially in OCR-sensitive workflows.

6. Is it possible to convert a lossy PDF to a lossless version?

No, once a PDF has been compressed using lossy methods (like JPEG or JBIG2 with substitution), the lost data cannot be recovered. You can re-save or optimize it using lossless settings, but quality loss from previous steps remains.

7. How can developers control compression types when generating PDFs?

PDF SDKs like GdPicture.NET allow developers to explicitly set the compression scheme (e.g., Flate, JPEG, JBIG2) based on content type. This ensures control over the trade-off between quality and file size during PDF creation or optimization.


Tags: