Page 1 of 1

How to train Tesseract with words for more accuracy?

Posted: Fri Jan 16, 2009 2:06 pm
by Slava
Hi,

Can you show me how to train OCR engine with user words? I did some research on Tesseract homepage. It is complicated and bad documented for someone new to Tesseract. I've tried to edit the nld.user-words with notepad like this:

Code: Select all

Factuurnummer
factuurnummer
Factuurnr.
factuurnr.
BTW-nr
BTW-nr.
BTW-nr.:
Kvk
KvK
K.v.k
K.v.K
Unfortunately this has absolutely NO effect on OCR-result whatsoever. I believe there two more files that may help, nld.freq-dawg and nld.word-dawg, but they must be edited & compiled. I coudn't succeed to do that.
Can you help?? Or is there may be an other way to help Tesseract recognize non-standard words as

BTW-nr.:
instead of
BTVVenri g

(i have already optimized image for OCR and the text size is right)

Kind regards,
Slava

Re: How to train Tesseract with words for more accuracy?

Posted: Thu Jan 22, 2009 12:19 pm
by Loïc
Hi Slava,

It is the correct way to add your own words to the dictionary. However I think the problem you got is related to this one (which are now solved): viewtopic.php?t=1233


Let me know if you have other problems.

Best regards,

Loïc

Re: How to train Tesseract with words for more accuracy?

Posted: Thu Jan 22, 2009 12:22 pm
by Slava
Ok. Thanks Loic. I will do some more tests.

Slava