Quantcast

[Mayan EDMS: 1491] Duplicate document check on watch folders feature

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[Mayan EDMS: 1491] Duplicate document check on watch folders feature

Victor Zele
We have several watch folders setup for contracts, invoices, quotes, etc.

It would be nice if Mayan would validate a new document does not exist already in the system by checking maybe an MD5 checksum table of current documents in the system and reject the new document as already existing.

Also, for duplicates, it would be nice to run a cleaner on the /opt/mayan-edms/lib/python2.7/site-packages/mayan/media/document_storage directory of PDFs to clean out duplicates.  I can write a shell script to check for PDF duplicates via MD5 sums, but no way to automate cleaning them out of the Mayan system/DB.

Just an idea,
Victor

CONFIDENTIALITY NOTICE: 

This transmission may contain information which is Vimo, Inc. (DBA Getinsured) confidential and/or legally privileged. The information is intended only for the use of the individual or entity named on this transmission. If you are not the intended recipient, you are hereby notified that any disclosure, copying, or distribution of the contents of this transmission is strictly prohibited. If you have received this transmission in error, please immediately notify me by return e-mail and destroy all copies of the original message.

--

---
You received this message because you are subscribed to the Google Groups "Mayan EDMS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[Mayan EDMS: 1502] Re: Duplicate document check on watch folders feature

rosarior
Administrator
Those are very good ideas! 

There was once a duplicate search feature but was removed due to lack of usage and because it ran on the foreground and could take a long time since the checksum of each document was check against the checksum of each other document , the time was exponential to the number of documents. If checking from duplicates using the first as the first step, the second step would be to search those documents using the API by checksum. The checksum field is not exposed so that is another update to the API that would need to be done.

Skipping duplicates from the watch folder would be less difficult since this is just a single query to see it the checksum is already matched in the database.

I'm updating the roadmap wiki (https://gitlab.com/mayan-edms/mayan-edms/wikis/roadmap/) and will add these.

Thank you!

On Monday, January 30, 2017 at 8:22:49 PM UTC-4, Victor Zele wrote:
We have several watch folders setup for contracts, invoices, quotes, etc.

It would be nice if Mayan would validate a new document does not exist already in the system by checking maybe an MD5 checksum table of current documents in the system and reject the new document as already existing.

Also, for duplicates, it would be nice to run a cleaner on the /opt/mayan-edms/lib/python2.7/site-packages/mayan/media/document_storage directory of PDFs to clean out duplicates.  I can write a shell script to check for PDF duplicates via MD5 sums, but no way to automate cleaning them out of the Mayan system/DB.

Just an idea,
Victor

--

---
You received this message because you are subscribed to the Google Groups "Mayan EDMS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Loading...