How to train Tesseract with words for more accuracy?
Posted: Fri Jan 16, 2009 2:06 pm
Hi,
Can you show me how to train OCR engine with user words? I did some research on Tesseract homepage. It is complicated and bad documented for someone new to Tesseract. I've tried to edit the nld.user-words with notepad like this:
Unfortunately this has absolutely NO effect on OCR-result whatsoever. I believe there two more files that may help, nld.freq-dawg and nld.word-dawg, but they must be edited & compiled. I coudn't succeed to do that.
Can you help?? Or is there may be an other way to help Tesseract recognize non-standard words as
BTW-nr.:
instead of
BTVVenri g
(i have already optimized image for OCR and the text size is right)
Kind regards,
Slava
Can you show me how to train OCR engine with user words? I did some research on Tesseract homepage. It is complicated and bad documented for someone new to Tesseract. I've tried to edit the nld.user-words with notepad like this:
Code: Select all
Factuurnummer
factuurnummer
Factuurnr.
factuurnr.
BTW-nr
BTW-nr.
BTW-nr.:
Kvk
KvK
K.v.k
K.v.K
Can you help?? Or is there may be an other way to help Tesseract recognize non-standard words as
BTW-nr.:
instead of
BTVVenri g
(i have already optimized image for OCR and the text size is right)
Kind regards,
Slava