Normal searchable PDF

Discussions about machine vision support in GdPicture.
Post Reply
lripoll
Posts: 10
Joined: Tue Apr 21, 2009 10:19 am

Normal searchable PDF

Post by lripoll » Mon May 04, 2009 10:22 am

Hi,

I am performing several tests with OCR plugin and I've noticed that the size of the resultant PDF files is much bigger than the incoming one. The size relation I got in my test are very variable, examples are 318Kb to 2,08Mb, 336kb to 2,43Mb, 123Kb to 1Mb, 1,4Mb to 14,68Mb and the more spectacular I got is 200Mb to 1,6Gb.
Well, I'm assuming that this increase in size is due to the fact that the OCR plugin is creating PDF/A, which has to be bigger files you want it or not. PDF/A is not a requirement of my customer and I'm wondering if it is possible to create normal searchable PDFs, I mean not PDF/A.

So for sort, is my first assumption true? The increase size is due to the use of PDF/A format?
If so, is there any other way of creating searchable PDFs without increase the size so much?

These are relevant lines of the code I'm using:
For Tiff2PDF process:

Code: Select all

If oImaging.TiffIsMultiPage(nImageID) Then
   oImaging.PdfOCRCreateFromMultipageTIFFEx nImageID, pathOut, TesseractDictionarySpanish, App.Path & "\AppData"
Else
   oImaging.SaveAsPDFOCREx pathOut, TesseractDictionarySpanish, App.Path & "\AppData"  'In AppData we should have ne needed dictionary files
End If
For PDF2PDF process:

Code: Select all

For nPage = 1 To oGdViewer.PageCount
   oGdViewer.DisplayFrame (nPage)

   RasterizedPage = oGdViewer.GetNativeImage

   If nPage = 1 Then oImaging.TwainPdfOCRStartEx (pathOut) 'Crea PDF/A
    
   Call oImaging.TwainAddGdPictureImageToPdfOCR(RasterizedPage, TesseractDictionarySpanish, App.Path & "\AppData")
Next nPage

User avatar
Loïc
Site Admin
Posts: 5881
Joined: Tue Oct 17, 2006 10:48 pm
Location: France
Contact:

Re: Normal searchable PDF

Post by Loïc » Tue May 05, 2009 9:53 am

Hi Luis,

1 - Check you are using the latest edition - We added better compression support for bitonal image
2 - For yout PDF 2 PDF conversion there is 2 ways to reduce output size: Reduce the value of PDFDPIRendering propertry of the GdViewer control & make a conversion to 1bpp image (PDF rasterization builds 32bpp bitmap). IE:

Code: Select all

For nPage = 1 To oGdViewer.PageCount
   oGdViewer.DisplayFrame (nPage)
   oImaging.SetNativeImage (oImaging.CreateClonedImage(oGdViewer.GetNativeImage))
   oImaging.ConvertTo1Bpp

   RasterizedPage = oGdViewer.GetNativeImage

   If nPage = 1 Then oImaging.TwainPdfOCRStartEx (pathOut) 'Crea PDF/A
   
   Call oImaging.TwainAddGdPictureImageToPdfOCR(RasterizedPage, TesseractDictionarySpanish, App.Path & "\AppData")
  oImaging.CloseNativeImage()
Next nPage

Kind regards,

Loïc

Post Reply

Who is online

Users browsing this forum: No registered users and 1 guest