Page 1 of 1

Formatted Text Output?

Posted: Sun Mar 29, 2020 11:52 pm
by dwg
When saving the OCR result as regular text (.txt), will the formatting be preserved? This is important for example if amounts need to line up under a certain column name. Like ( had to added the --- because posting the question removes the extra spaces!):

Date--------Description---------------Credit------------Debit
01/12-------text------------------------100.00
01/14-------text-----------------------------------------2,392.00

If the formatting is removed it could look like:

Date Description Credit Debit
01/12 text 100.00
01/14 text 2,392.00

Which makes it impossible to tell debit from credit.

Re: Formatted Text Output?

Posted: Tue Mar 31, 2020 4:14 pm
by Hugo
Hi Dwg,

In our latest minor release we have improved text formatting when extracting text after OCR and saving the results as .txt.

This feature was greatly improved/implemented a few weeks ago.
I suggest you try this. Feel free to provide any document you are having trouble with and we'll take a look at it and fix it if necessary.

Regards,

Re: Formatted Text Output?

Posted: Thu Apr 02, 2020 11:38 pm
by dwg
Can you check this example PDF doc? It is important to keep the amounts under the correct columns...
I think I would like to evaluate v14 if this looks good. Thanks

Re: Formatted Text Output?

Posted: Fri Apr 03, 2020 1:19 pm
by Hugo
Hi Dwg,

Currently this is implemented but improvements can still be made. This is quite complex to implement as it needs to take into count the font style as well as the spaces.

This is currently how our OCR demo can render this. See attachments.

Regards