Zone OCR in PDF

Discussions about machine vision support in GdPicture.
Post Reply
charuvas1
Posts: 38
Joined: Tue Dec 02, 2008 1:49 pm

Zone OCR in PDF

Post by charuvas1 » Fri Jul 31, 2009 3:14 pm

Hi,

I am trying to do Zonal OCr in scanned pdf files. Somehow it doesnot give the correct result. The region selected is different from the region that is OCRd. Could you suggest some code for this? My code works fine for tiff files. Here is the sample code-

Code: Select all

 Call GdViewer1.GetRectCoordinatesOnDocument(LeftArea, TopArea, WidthArea, HeightArea)
 Call oGdPictureImaging.SetROI(LeftArea, TopArea, WidthArea, HeightArea)
  oGdPictureImaging.OCRTesseractReinit()
  oGdPictureImaging.OCRTesseractSetPassCount(1)
  sOCR = oGdPictureImaging.OCRTesseractDoOCR(m_ImageID, TesseractDictionary.TesseractDictionaryEnglish, "C:\OCR", "")

Thanks
Charu

User avatar
Loïc
Site Admin
Posts: 5881
Joined: Tue Oct 17, 2006 10:48 pm
Location: France
Contact:

Re: Zone OCR in PDF

Post by Loïc » Sun Aug 02, 2009 6:41 pm

Hi,

First, in GdPicture.NET you should use GetRectCoordinatesOnDocumentInches() for PDF.
IE:

Code: Select all

Call GdViewer1.GetRectCoordinatesOnDocumentInches(LeftAreaInches, TopAreaInches, WidthAreaInches, HeightAreaInches)
Also, I suppose you rasterized the displayed page to a GdPicture image using the PdfRenderPageToGdPictureImage() method. In this method you specified a resolution (IE: 200). So you have to make a translation of coordinates like this:

Code: Select all

LeftAreaPixel = LeftAreaInches * (200 / 72)
TopAreaPixel = TopAreaInches * (200 / 72)
WidthAreaPixel = WidthAreaInches * (200 / 72)
HeightAreaPixel = heightAreaInches * (200 / 72)

Call oGdPictureImaging.SetROI(LeftAreaPixel, TopAreaPixel, WidthAreaPixel, HeightAreaPixel)
This should work.

With best regards,

Loïc

charuvas1
Posts: 38
Joined: Tue Dec 02, 2008 1:49 pm

Re: Zone OCR in PDF

Post by charuvas1 » Mon Aug 03, 2009 1:31 pm

I still dont get anything ocrd in a specified zone. Following is the code I wrote-

Code: Select all

Call GdViewer1.GetRectCoordinatesOnDocumentInches(sLeftArea, sTopArea, sWidthArea, sHeightArea)
                LeftArea = CInt(sLeftArea* 200 / 72))
                TopArea = CInt(sTopArea * (200 / 72))
                WidthArea = CInt(sWidthArea * (200 / 72))
                HeightArea = CInt(sHeightArea  * (200 / 72))
                Call oGdPictureImaging.SetROI(LeftArea, TopArea, WidthArea, HeightArea)
 oGdPictureImaging.OCRTesseractReinit()
        oGdPictureImaging.OCRTesseractSetPassCount(OcrPass)
        sOCR = oGdPictureImaging.OCRTesseractDoOCR(m_ImageID, Dictionary, TextBox1.Text, "")
Pls note that I have to convert single to integer because SETROI takes integer parameters. This results in some loss of information.
Why do I have to divide dpi by 72? or should it be 96 which is the dpi of my screen? ( I tried the above code using 96 too. I still get no results)

The result, "sOcr " is empty string.

Thanks
Charu

charuvas1
Posts: 38
Joined: Tue Dec 02, 2008 1:49 pm

Re: Zone OCR in PDF

Post by charuvas1 » Mon Aug 03, 2009 2:45 pm

Instead of 200/72 if I multiply with 200..I get the correct ocr. Is it the right way? What is the importance of 200/72 factor?

Thank you
Charu

User avatar
Loïc
Site Admin
Posts: 5881
Joined: Tue Oct 17, 2006 10:48 pm
Location: France
Contact:

Re: Zone OCR in PDF

Post by Loïc » Mon Aug 03, 2009 6:13 pm

Hi,

You are right. Sorry for the confusion.

here the correct code:

Code: Select all

LeftAreaPixel = LeftAreaInches *200
TopAreaPixel = TopAreaInches * 200
WidthAreaPixel = WidthAreaInches * 200
HeightAreaPixel = heightAreaInches * 200

Call oGdPictureImaging.SetROI(LeftAreaPixel, TopAreaPixel, WidthAreaPixel, HeightAreaPixel)
Kind regards,

Loïc

Post Reply

Who is online

Users browsing this forum: No registered users and 2 guests