OCRPages and Result
Posted: Thu Nov 10, 2016 4:11 pm
Hi
I developed 2 Variants of OCR Scanning. The First is to Convert the PDF to a Searchable File with OCRPages() and Read later the Result out of the File. The Second is the "normal" OCR Tesseract Recognition, if there is no Text in the PDF during the recognition Process.
The Advantage of the First Method is the fast OCR Recognition with MultiPage PDF Files. Unfortunatelly there is no function to do this with the OCR Tesseract Methods.
The Problem i had with this issue is, that Signatured Documents or Documents with PDFA Conformance cannot be converted to searchable PDF (loss of Integrity). For this it would be a huge advantage to execute a MultiPage OCR Processing without Writing the Result in the PDF (f.e. Execute with OCRPages and Get the Result in an Event with GetCharLeft/Top/Right/Bottom/Line/Confidence)
I developed 2 Variants of OCR Scanning. The First is to Convert the PDF to a Searchable File with OCRPages() and Read later the Result out of the File. The Second is the "normal" OCR Tesseract Recognition, if there is no Text in the PDF during the recognition Process.
The Advantage of the First Method is the fast OCR Recognition with MultiPage PDF Files. Unfortunatelly there is no function to do this with the OCR Tesseract Methods.
The Problem i had with this issue is, that Signatured Documents or Documents with PDFA Conformance cannot be converted to searchable PDF (loss of Integrity). For this it would be a huge advantage to execute a MultiPage OCR Processing without Writing the Result in the PDF (f.e. Execute with OCRPages and Get the Result in an Event with GetCharLeft/Top/Right/Bottom/Line/Confidence)