How to train Tesseract with words for more accuracy?

Discussions about machine vision support in GdPicture.
Post Reply
Slava
Posts: 66
Joined: Fri Jun 22, 2007 4:43 pm

How to train Tesseract with words for more accuracy?

Post by Slava » Fri Jan 16, 2009 2:06 pm

Hi,

Can you show me how to train OCR engine with user words? I did some research on Tesseract homepage. It is complicated and bad documented for someone new to Tesseract. I've tried to edit the nld.user-words with notepad like this:

Code: Select all

Factuurnummer
factuurnummer
Factuurnr.
factuurnr.
BTW-nr
BTW-nr.
BTW-nr.:
Kvk
KvK
K.v.k
K.v.K
Unfortunately this has absolutely NO effect on OCR-result whatsoever. I believe there two more files that may help, nld.freq-dawg and nld.word-dawg, but they must be edited & compiled. I coudn't succeed to do that.
Can you help?? Or is there may be an other way to help Tesseract recognize non-standard words as

BTW-nr.:
instead of
BTVVenri g

(i have already optimized image for OCR and the text size is right)

Kind regards,
Slava

User avatar
Loïc
Site Admin
Posts: 5881
Joined: Tue Oct 17, 2006 10:48 pm
Location: France
Contact:

Re: How to train Tesseract with words for more accuracy?

Post by Loïc » Thu Jan 22, 2009 12:19 pm

Hi Slava,

It is the correct way to add your own words to the dictionary. However I think the problem you got is related to this one (which are now solved): viewtopic.php?t=1233


Let me know if you have other problems.

Best regards,

Loïc

Slava
Posts: 66
Joined: Fri Jun 22, 2007 4:43 pm

Re: How to train Tesseract with words for more accuracy?

Post by Slava » Thu Jan 22, 2009 12:22 pm

Ok. Thanks Loic. I will do some more tests.

Slava

Post Reply

Who is online

Users browsing this forum: No registered users and 2 guests