[Mayan EDMS: 92] Document version and OCR

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[Mayan EDMS: 92] Document version and OCR

SaintGermain
Hello,

Continuing my evaluation, I found 2 issue:
1) I haven't been able to upload multiple versions of a same document.
How do you do it ? I haven't manage to find the relevant action
2) I have uploaded a simple TXT document and queue for an OCR (yes I
know) just to see what happen.
First it doesn't detect the txt type (the parse didn't recognize the
type) and so it launched tesseract. I then got an error:
get_image_cache_name() takes exactly 3 arguments (2 given)

Indeed in file apps/ocr/api.py we got:
document_filepath =
document_page.document.get_image_cache_name(page=document_page.page_number)

So it seems to be a mistake ?

Regards,
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[Mayan EDMS: 97] Re: Document version and OCR

rosarior
Administrator
Hi,

to upload a new version of a document, select the document in question, select the tab


of the document, and selectUpload new version

Under 'Other available actions', this will bring up a view similar to the new document upload one, but with the ability to setup the version number of the new file as well a entering a comment, and a few other options.

----

To keep pages appearance in sync with the text content, every document page needs to be rendered properly before Mayan is able to determine the page count of a document, text files are rendered by LibreOffice if installed, otherwise the page count defaults to 1 and a mime type place holder icon is displayed.  I'm trying to create a simple renderer for text files as well as other text based file types such as .RTF, .PY, .RB, .HTML, .CSS, etc.

Pushed a fix for the error you described, thanks.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[Mayan EDMS: 98] Re: Document version and OCR

SaintGermain
> To keep pages appearance in sync with the text content, every document page
> needs to be rendered properly before Mayan is able to determine the page
> count of a document, text files are rendered by LibreOffice if installed,
> otherwise the page count defaults to 1 and a mime type place holder icon is
> displayed.  I'm trying to create a simple renderer for text files as well
> as other text based file types such as .RTF, .PY, .RB, .HTML, .CSS, etc.
>

LibreOffice is installed on my computer (Debian Testing) but I don't
see any rendering.
How do you check for LibreOffice availability ?
Do you have some kind of logging for the result of third parties tools/
libraries availability check ?

Thanks,
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[Mayan EDMS: 102] Re: Document version and OCR

rosarior
Administrator
You need to install unoconv (https://github.com/dagwieers/unoconv).  Every time an office document MIME type is detected Mayan tries to call unoconv to convert that file into PDF, if there is any failure, page count defaults to 1 and place holder icon is used instead of a preview.  Since the unoconv included in most distributions is old, download the one from github into /usr/local/bin/ for example and set the

CONVERTER_UNOCONV_PATH configuration option to point to that executable.  If you already have office or text documents uploaded into Mayan, after setting up and testing unoconv by hand, go to 'Tools', 'Maintenance', 'Update office documents' page count', this will force a re-detection and re-processing of any document found to be of office format.   

Loading...