Unable to parse Text

Discussions about machine vision support in GdPicture.
Post Reply
josef
Posts: 1
Joined: Sat Mar 05, 2011 4:27 pm

Unable to parse Text

Post by josef » Sat Mar 05, 2011 4:38 pm

Hi,
Using the code below does not return an error, but the output string is garbage. At first I thought it was the quality of the image (attached), but then I simply did an image capture of a pdf page and tried to scan it and that produced garbage as well. I have attached the image I am trying to scan. It is very poor quality. I have attached the code I am using as well, to make sure it isn't user error.

Here is the code I am using. I will basically run this in sort of a batch mode over dozens of .tif files, extract the text and work with the text later on in the code.

Code: Select all

           GdPictureImaging oGdPictureImaging = new GdPictureImaging();
           oGdPictureImaging.SetLicenseNumber("my key");
           oGdPictureImaging.SetLicenseNumberOCRTesseract("my key");

            int ImageId = oGdPictureImaging.CreateGdPictureImageFromFile(@"C:\projects\pdf conversion\OCR\3-5-2011 8-19-37 AM.png");
            String output=oGdPictureImaging.OCRTesseractDoOCR(ImageId,TesseractDictionary.TesseractDictionaryEnglish,"C:/Program Files/GdPicture.NET/Redist/OCR/","");
            Console.WriteLine(output);
Any help would be appreciated.

Thanks,
Josef
Attachments
3-5-2011 8-19-37 AM.tif

User avatar
Loïc
Site Admin
Posts: 5881
Joined: Tue Oct 17, 2006 10:48 pm
Location: France
Contact:

Re: Unable to parse Text

Post by Loïc » Tue Mar 08, 2011 6:38 pm

Hi Josef,

Unfortunately I can't help. The quality of the document is definitively too poor to get a good accuracy with the Tesseract engine.

Kind regards,

Loïc

Post Reply

Who is online

Users browsing this forum: No registered users and 0 guests