Page 1 of 1

OCR results file

Posted: Wed Aug 17, 2016 6:32 am
by misterT
Hi,

I've built an invoice recognition/learning product that relies on the Recostar OCR engine. In particular, it processes the XML OCR results file created by Recostar. As I can't seem to find any way to license Recostar and because I'm wanting to build a web based point and click indexing solution, I'm considering GdPicture.

What I'd like to know is does Tesseract produce a similar kind of output file giving all characters and words along with their locations? Or do you have to use this command to get this information: PdfReaderGetPageTextWithCoords

Note that I searched for that command in the online documentation and I get no hits which is a bit of a worry?

Thanks, Turhan

Re: OCR results file

Posted: Wed Aug 24, 2016 11:46 pm
by misterT
Do GdPicture staff monitor this forum at all? I've seen many very sensible questions in the forum go unanswered. And I've not had any response in over five days! To me that almost rules this product out because a product is really only as good as the support provided. Add to that the fact that back in 2010 this new method was released "PdfReaderGetPageTextWithCoords":

post9116.html?hilit=PdfReaderGetPageTex ... ords#p9116

So why can't I find any mention of it in documentation six years later? That's pretty much inexcusable from my perspective.

All of this is such a huge shame because the product actually looks really good. But there is no way I can risk launching a commercial product without quality support for the underlying engine driving it.

Re: OCR results file

Posted: Mon Aug 29, 2016 11:08 am
by Cedric
PdfReaderGetPageTextWithCoords is a method that was introduced in GdPicture.NET 7 which is a long time discontinued version and this method does not exist in the product any more.
The reason is simple: since GdPicture.NET 8, PDF features have grown a lot and there is a separate PDF plugin that is in charge of all the PDF aspect, including the text extraction feature.

In the current GdPicture.NET release (GdPicture.NET 12) the method you are looking for is in the GdPicturePDF class and is called GetPageTextWithCoords.
Here is a link to the corresponding documentation: https://www.gdpicture.com/guides/gdpicture/web ... oords.html