OCR Individual Pages

Discussions about machine vision support in GdPicture.
Post Reply
rgoodson40
Posts: 32
Joined: Sun Jan 30, 2011 8:40 pm

OCR Individual Pages

Post by rgoodson40 » Mon Mar 05, 2012 12:16 am

Hello,

I need to ocr individual pages of tif files but I can't figure out an easy way to do that. Basically, all I am doing is looping through each page of a document, ocr'ing the page and then storing the text in a database. I need to go page-by-page in order to show progress.

The problem is that the OCRTesseractDoOCR method ocr's an entire GD Picture image, so it appears that I could use that if I could load individual pages of a document into a GDPictureImage object. I can't figure out how to do that though. By the way, the images do not need to be displayed. This will all be done behind the scenes, minus the progress information.

Thanks,
Reagan

User avatar
Loïc
Site Admin
Posts: 5881
Joined: Tue Oct 17, 2006 10:48 pm
Location: France
Contact:

Re: OCR Individual Pages

Post by Loïc » Mon Mar 05, 2012 11:01 am

Hello Reagan,

Do you mean you want to OCR a multipage TIFF image ?

Regards,

Loïc

rgoodson40
Posts: 32
Joined: Sun Jan 30, 2011 8:40 pm

Re: OCR Individual Pages

Post by rgoodson40 » Mon Mar 05, 2012 6:13 pm

Yes. But I would like to be able to do one page at a time so that I can show progress for it.

Thanks,
Reagan

User avatar
Loïc
Site Admin
Posts: 5881
Joined: Tue Oct 17, 2006 10:48 pm
Location: France
Contact:

Re: OCR Individual Pages

Post by Loïc » Tue Mar 06, 2012 5:48 pm

Hello,

ok it' easy to do:

1- Open the image
2- Select the desired page by using the TiffSelectPage() method
3- Run the ocr process

repeat 2-3 for each page of your file.

Let me know if I am not clear enough.

Kind regards,

Loïc

rgoodson40
Posts: 32
Joined: Sun Jan 30, 2011 8:40 pm

Re: OCR Individual Pages

Post by rgoodson40 » Thu Mar 08, 2012 1:11 am

Thanks. That worked.

Reagan

mdelbene
Posts: 31
Joined: Wed May 11, 2011 10:03 am

Re: OCR Individual Pages

Post by mdelbene » Wed Oct 10, 2012 10:18 am

Hi Loïc,
I read your hint about OCR a Tiff multipage file, but I'm encountering some problems. I try to explain you.
I'm using the sample C# project installed in GdViewerSamplesv8\OCR\ with some changes.

I open a Tiff multipage, then I loop on the pages and I call OCR on each page.
This is the code:

Code: Select all

// opens the file
int m_ImageID = oGdPictureImaging.CreateGdPictureImageFromFile(fileName);
string sOCR = string.Empty;

// loop pages
if (oGdPictureImaging.TiffIsMultiPage(m_ImageID))
{
	int pageCount = oGdPictureImaging.TiffGetPageCount(m_ImageID);
	for (int i = 1; i <= pageCount; i++)
	{
		if (i > 1)
		    oGdPictureImaging.TiffSelectPage(m_ImageID, i);

		oGdPictureImaging.Scale(m_ImageID, 300, System.Drawing.Drawing2D.InterpolationMode.HighQualityBicubic);
		oGdPictureImaging.OCRTesseractReinit();
		sOCR += oGdPictureImaging.OCRTesseractDoOCR(m_ImageID, txtLang.Text, TextBox1.Text, string.Empty);
		oGdPictureImaging.OCRTesseractClear();
	}
}
At the end of procedure in my string sOCR I have the text of the first page of file repeating for three times (because my tiff file has three pages).
I tried to use the property TiffOpenMultiPageForWrite, but nothing changes.

The only way to have the purposed result is to use

Code: Select all

 m_ImageID = oGdPictureImaging.TiffCreateMultiPageFromFile(fileName);
instead of

Code: Select all

m_ImageID = oGdPictureImaging.CreateGdPictureImageFromFile(fileName);
The problem to use this method is that sometimes I don't have a filename but I have a stream, so I use the method gdPicture.CreateGdPictureImageFromStream(binaryContent).

I'm probably doing something wrong.
Can you help me?

Thank you in advance.
Michela

P.S. I'm using GdPicture v. 8.3.

User avatar
Loïc
Site Admin
Posts: 5881
Joined: Tue Oct 17, 2006 10:48 pm
Location: France
Contact:

Re: OCR Individual Pages

Post by Loïc » Wed Oct 10, 2012 12:23 pm

Hello,

First I suggest you to upgrade to latest 8.X edition. To get the download link, please create a ticket here: https://www.gdpicture.com/support/getting-support-from-our-team

Also, have you tried to replace CreateGdPictureImageFromFile by TiffCreateMultipageFromFile() method?

Kind regards,

Loïc

mdelbene
Posts: 31
Joined: Wed May 11, 2011 10:03 am

Re: OCR Individual Pages

Post by mdelbene » Wed Oct 10, 2012 2:37 pm

Yes, using the method TiffCreateMultiPageFromFile() I get the expected behaviour.
Thanks a lot.
Michela

Post Reply

Who is online

Users browsing this forum: No registered users and 1 guest