GdPicture Imaging Forums

Posted: **Fri Jan 16, 2009 2:06 pm**

Hi,

Can you show me how to train OCR engine with user words? I did some research on Tesseract homepage. It is complicated and bad documented for someone new to Tesseract. I've tried to edit the nld.user-words with notepad like this:

Code: Select all

Factuurnummer
factuurnummer
Factuurnr.
factuurnr.
BTW-nr
BTW-nr.
BTW-nr.:
Kvk
KvK
K.v.k
K.v.K

Unfortunately this has absolutely NO effect on OCR-result whatsoever. I believe there two more files that may help, nld.freq-dawg and nld.word-dawg, but they must be edited & compiled. I coudn't succeed to do that.
Can you help?? Or is there may be an other way to help Tesseract recognize non-standard words as

BTW-nr.:
instead of
BTVVenri g

(i have already optimized image for OCR and the text size is right)

Kind regards,
Slava

Posted: **Thu Jan 22, 2009 12:19 pm**

Hi Slava,

It is the correct way to add your own words to the dictionary. However I think the problem you got is related to this one (which are now solved): viewtopic.php?t=1233

Let me know if you have other problems.

Best regards,

Loïc

Posted: **Thu Jan 22, 2009 12:22 pm**

Ok. Thanks Loic. I will do some more tests.

Slava

GdPicture Imaging Forums

How to train Tesseract with words for more accuracy?

How to train Tesseract with words for more accuracy?

Re: How to train Tesseract with words for more accuracy?

Re: How to train Tesseract with words for more accuracy?