[Mayan EDMS: 2265] Can OCR be trained, or otherwise improved?

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

[Mayan EDMS: 2265] Can OCR be trained, or otherwise improved?

David Reagan
While experimenting with Mayan, I've noticed that the OCR is pretty unreliable.

CHRNGE instead of CHANGE, HOU instead of HOW, CRSHIER instead of CASHIER, UUU instead of WWW, OOESTIONS instead of QUESTIONS, etc.

Those are all examples on just one receipt. And the preview is pretty darn good looking.

So, is there a way to teach the OCR to get better?

Or some other way to improve OCR results? Maybe a newer version of Tesseract?


--

---
You received this message because you are subscribed to the Google Groups "Mayan EDMS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

[Mayan EDMS: 2287] Re: Can OCR be trained, or otherwise improved?

Michael Price
OCR itself is very prone errors. I've had good experience using transformation to lower the color space of images. I wonder why Tesseract doesn't do this itself. 

As for training, as far as I know Tesseract can be trained. Don't know the process. I think that language files for Tesseract are actually training files.

Some links I found on the topic:

http://pretius.com/how-to-prepare-training-files-for-tesseract-ocr-and-improve-characters-recognition/
https://github.com/tesseract-ocr/tesseract/wiki/Training-Tesseract-3.03–3.05


On Wednesday, February 14, 2018 at 9:33:05 PM UTC-4, David Reagan wrote:
While experimenting with Mayan, I've noticed that the OCR is pretty unreliable.

CHRNGE instead of CHANGE, HOU instead of HOW, CRSHIER instead of CASHIER, UUU instead of WWW, OOESTIONS instead of QUESTIONS, etc.

Those are all examples on just one receipt. And the preview is pretty darn good looking.

So, is there a way to teach the OCR to get better?

Or some other way to improve OCR results? Maybe a newer version of Tesseract?


--

---
You received this message because you are subscribed to the Google Groups "Mayan EDMS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.