[Mayan EDMS: 1805] Troubleshooting OCR with 2.3 in Docker container?

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[Mayan EDMS: 1805] Troubleshooting OCR with 2.3 in Docker container?

Hans Fritz
Hi,

I'm confused with the OCR backend setting. In the documentation (https://mayan.readthedocs.io/en/v2.3/topics/ocr_backend.html), it says the default setting is to use Tesseract. But when I go to the Settings for OCR via the web interface, it says "ocr.backends.pyocr.PyOCR ..." for OCR_BACKEND. Although it seems PyOCR uses Tesseract amongst others, I don't know which one it actually selects.

If I try to OCR a document, it doesn't seem to recognize anything (the OCR section is blank, just has - Page 1 -).

Is there a log somewhere I could check to understand what happened during the OCR process? Is there also somewhere I can check the OCR queue?

Thanks,

--

---
You received this message because you are subscribed to the Google Groups "Mayan EDMS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[Mayan EDMS: 1832] Re: Troubleshooting OCR with 2.3 in Docker container?

rosarior
Administrator
The default backend "ocr.backends.pyocr.PyOCR" uses PyOCR which in turn uses Tesseract by default. Make sure you installed the corresponding Tesseract language pack for the document language you are using.

To check the OCR error logs for the entire system go to 'Profile' -> 'Tools' -> 'OCR Errors'.
For the OCR error of a particular document, go to the document's 'OCR' tab, and then select 'OCR Errors' from the <Actions> dropdown.
To view the OCR queue go to 'Profile' -> 'Tools' -> 'Task manager'. Look for the OCR queue line entry and select either the active, reserved or scheduled task buttons to examine each.

On Thursday, June 22, 2017 at 5:00:05 PM UTC-4, Hans Fritz wrote:
Hi,

I'm confused with the OCR backend setting. In the documentation (<a href="https://mayan.readthedocs.io/en/v2.3/topics/ocr_backend.html" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fmayan.readthedocs.io%2Fen%2Fv2.3%2Ftopics%2Focr_backend.html\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNGrPXsRhP_xU4MaRXphq-wXXC5-4Q&#39;;return true;" onclick="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fmayan.readthedocs.io%2Fen%2Fv2.3%2Ftopics%2Focr_backend.html\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNGrPXsRhP_xU4MaRXphq-wXXC5-4Q&#39;;return true;">https://mayan.readthedocs.io/en/v2.3/topics/ocr_backend.html), it says the default setting is to use Tesseract. But when I go to the Settings for OCR via the web interface, it says "ocr.backends.pyocr.PyOCR ..." for OCR_BACKEND. Although it seems PyOCR uses Tesseract amongst others, I don't know which one it actually selects.

If I try to OCR a document, it doesn't seem to recognize anything (the OCR section is blank, just has - Page 1 -).

Is there a log somewhere I could check to understand what happened during the OCR process? Is there also somewhere I can check the OCR queue?

Thanks,

--

---
You received this message because you are subscribed to the Google Groups "Mayan EDMS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Loading...