Bug in OCR engine?

Discussions about machine vision support in GdPicture.
Post Reply
acl
Posts: 19
Joined: Wed Oct 03, 2012 7:52 am

Bug in OCR engine?

Post by acl » Sat Jan 19, 2013 12:28 pm

I am observing some very strange behaviour when OCRing a fairly clean region of a 400 dpi scanned page.

Please check out the attached files (which are 1bpp when fed to the OCR engine).

The file good.gif contains four clear numbers plus some noise at the bottom. All the numbers are correctly recognized. The noise leads to some garbage, but I don't care (my parser filters that). Here is the output of OCRTesseractDoOCR:
21.047,74
20.416,31
4.287,43
25.335,17
f/\A lf\
The file bad.gif contains the exact same four numbers (pixel by pixel identical, I checked this with photoshop) but not the noise. Here, the OCR ignores part of the image entirely (the part in front of the dot). Here is the output of OCRTesseractDoOCR:
047,74
416,31
287,43
335,17
I don't get why it does this. Any help would be greatly appreciated.


Here is the relevant code excerpt (using GDPicture.NET 9.3):

Code: Select all

        imagingApi.OCRTesseractReinit()
        imagingApi.OCRTesseractSetOCRContext(OCRContext.OCRContextDocument)
        imagingApi.OCRTesseractSetPassCount(3)
        Dim test = imagingApi.CreateGdPictureImageFromFile("good.bmp")
        Dim s = imagingApi.OCRTesseractDoOCR(test, "fra", "OCR", "") 
        imagingApi.ReleaseGdPictureImage(test)
Attachments
bad-2.gif
Incorrectly recognized
bad-2.gif (2.78 KiB) Viewed 3015 times
good.gif
Correctly recognized.
good.gif (2.95 KiB) Viewed 3015 times

Cedric
Posts: 269
Joined: Sun Sep 02, 2012 7:30 pm

Re: Bug in OCR engine?

Post by Cedric » Tue Jan 22, 2013 2:20 pm

Hello,

I strongly suggest you open a ticket on our support platform, this issue need investigation.

Thanks!

Post Reply

Who is online

Users browsing this forum: No registered users and 1 guest