[Mayan EDMS: 1851] Avoiding duplication

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[Mayan EDMS: 1851] Avoiding duplication

Kevin Lyda
Say I have a file named FLOYD.PDF. Is there a way to get a hash (md5, sha1, whatever) of the file and then see if I've already added it via the REST API? I know I could do it with metadata and a search, but is there a better way?

Essentially I have a number of places I've stored files in the past and I'd like to centralise them in Maya. However I'd rather not add files multiple times and I'd like to just make a job on several machines that tries to add files from likely directories and can be run over an over w/o constantly adding duplicate files.

Kevin

--

---
You received this message because you are subscribed to the Google Groups "Mayan EDMS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[Mayan EDMS: 1852] Re: Avoiding duplication

rosarior
Administrator
Two changes just recently landed on the code regarding this.

- "Add support to search documents and document pages by checksum." (https://gitlab.com/mayan-edms/mayan-edms/commit/0820d0c0e6519853b1218e73993201ada688b4e8)
- "Add duplicated document scan support." (https://gitlab.com/mayan-edms/mayan-edms/commit/d4e1a506edc6a0d7dd2d25393b84a0795b307ec1)

The first allows documents to be searched by checksum. This is useful for when you already know the checksum of a document and just want to search via the UI or the API
if there is another copy of the document. The second commit adds automatic duplicate search on upload. After uploading a document the system will search
in the background to see if there are duplicates and will append them to list of duplicates. This code branch is in "Release candidate" stage and will be released soon.

On Wednesday, July 5, 2017 at 9:25:43 AM UTC-4, Kevin Lyda wrote:
Say I have a file named FLOYD.PDF. Is there a way to get a hash (md5, sha1, whatever) of the file and then see if I've already added it via the REST API? I know I could do it with metadata and a search, but is there a better way?

Essentially I have a number of places I've stored files in the past and I'd like to centralise them in Maya. However I'd rather not add files multiple times and I'd like to just make a job on several machines that tries to add files from likely directories and can be run over an over w/o constantly adding duplicate files.

Kevin

--

---
You received this message because you are subscribed to the Google Groups "Mayan EDMS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[Mayan EDMS: 1852] Re: Avoiding duplication

rosarior
Administrator




On Thursday, July 6, 2017 at 3:03:18 AM UTC-4, Roberto Rosario wrote:
Two changes just recently landed on the code regarding this.

- "Add support to search documents and document pages by checksum." (<a href="https://gitlab.com/mayan-edms/mayan-edms/commit/0820d0c0e6519853b1218e73993201ada688b4e8" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fgitlab.com%2Fmayan-edms%2Fmayan-edms%2Fcommit%2F0820d0c0e6519853b1218e73993201ada688b4e8\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNEVm7yHlOyUktg7TBhihuFtNAw71Q&#39;;return true;" onclick="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fgitlab.com%2Fmayan-edms%2Fmayan-edms%2Fcommit%2F0820d0c0e6519853b1218e73993201ada688b4e8\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNEVm7yHlOyUktg7TBhihuFtNAw71Q&#39;;return true;">https://gitlab.com/mayan-edms/mayan-edms/commit/0820d0c0e6519853b1218e73993201ada688b4e8)
- "Add duplicated document scan support." (<a href="https://gitlab.com/mayan-edms/mayan-edms/commit/d4e1a506edc6a0d7dd2d25393b84a0795b307ec1" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fgitlab.com%2Fmayan-edms%2Fmayan-edms%2Fcommit%2Fd4e1a506edc6a0d7dd2d25393b84a0795b307ec1\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNHwonMswUcRIHYxNCGjtSw54xfryw&#39;;return true;" onclick="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fgitlab.com%2Fmayan-edms%2Fmayan-edms%2Fcommit%2Fd4e1a506edc6a0d7dd2d25393b84a0795b307ec1\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNHwonMswUcRIHYxNCGjtSw54xfryw&#39;;return true;">https://gitlab.com/mayan-edms/mayan-edms/commit/d4e1a506edc6a0d7dd2d25393b84a0795b307ec1)

The first allows documents to be searched by checksum. This is useful for when you already know the checksum of a document and just want to search via the UI or the API
if there is another copy of the document. The second commit adds automatic duplicate search on upload. After uploading a document the system will search
in the background to see if there are duplicates and will append them to list of duplicates. This code branch is in "Release candidate" stage and will be released soon.

On Wednesday, July 5, 2017 at 9:25:43 AM UTC-4, Kevin Lyda wrote:
Say I have a file named FLOYD.PDF. Is there a way to get a hash (md5, sha1, whatever) of the file and then see if I've already added it via the REST API? I know I could do it with metadata and a search, but is there a better way?

Essentially I have a number of places I've stored files in the past and I'd like to centralise them in Maya. However I'd rather not add files multiple times and I'd like to just make a job on several machines that tries to add files from likely directories and can be run over an over w/o constantly adding duplicate files.

Kevin

--

---
You received this message because you are subscribed to the Google Groups "Mayan EDMS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Loading...