[Mayan EDMS: 2470] Watch Folder and Scanner issue

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

[Mayan EDMS: 2470] Watch Folder and Scanner issue

Eddi
Hi,

I am using Mayan 2.7.3. I have a watch folder setup, where a scanner saves files to.

When the scanner starts scanning, it will create a file in the folder. When watching the folder via CLI while the scanner is scanning, I can see the file grow in size.

The issue comes, when the scanner is scanning, and the Mayan watch function checks the folder. The file is present (but not completed because the scanner is still scanning), but Mayan EDMS grabs the file.
The end result is a corrupt file (I use mostly PDF extensions). In Mayan it shows up as a red question mark.

How can I work around this?

Cheers,

Eddi


--

---
You received this message because you are subscribed to the Google Groups "Mayan EDMS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: [Mayan EDMS: 2471] Watch Folder and Scanner issue

Jesaja Everling
Hi Eddi,

I'm not familiar with the code that handles the watchfolder monitoring, and it has been a while that I worked with watchfolders myself. I think you have probably two options - make sure the watcher acts on the appropriate event, e.g. IN_CLOSE_WRITE in http://seb.dbzteam.org/pyinotify/pyinotify-module.html#IN_CLOSE_WRITE, or you could use a spin-loop to check if the file is still growing and only process it further when it has stopped growing for some time (which sounds less reliable then relying on pyinotify or watchdog or a similar solution).

Don't know how much that helps, hope it does.

Best Regards,

Jesaja Everling


On Thu, May 24, 2018 at 4:44 PM, Eddi <[hidden email]> wrote:
Hi,

I am using Mayan 2.7.3. I have a watch folder setup, where a scanner saves files to.

When the scanner starts scanning, it will create a file in the folder. When watching the folder via CLI while the scanner is scanning, I can see the file grow in size.

The issue comes, when the scanner is scanning, and the Mayan watch function checks the folder. The file is present (but not completed because the scanner is still scanning), but Mayan EDMS grabs the file.
The end result is a corrupt file (I use mostly PDF extensions). In Mayan it shows up as a red question mark.

How can I work around this?

Cheers,

Eddi


--

---
You received this message because you are subscribed to the Google Groups "Mayan EDMS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.

--

---
You received this message because you are subscribed to the Google Groups "Mayan EDMS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: [Mayan EDMS: 2472] Watch Folder and Scanner issue

Jesaja Everling
I had a very cursory glance at the source code, and it seems like the WatchfolderSource just checks if a file exists (using a periodic task in Celery if I'm not mistaken), and processes it if it does. So I think that could be improved.

@Roberto, do you think it would be worth it to have a standalone watchfolder daemon that can be configured to push files to Mayan e.g. using the API, which means it could also be run on a local machine and push to a remote Mayan instance, or do you prefer to keep it a Celery task? If it's a celery task, I don't think you can use some file watcher like pyinotify (which requires a continuously running process which is not really what Celery is intended for), but you could still have the task wait for the files it encounters to stop growing before processing them further.

On Thu, May 24, 2018 at 7:19 PM, Jesaja Everling <[hidden email]> wrote:
Hi Eddi,

I'm not familiar with the code that handles the watchfolder monitoring, and it has been a while that I worked with watchfolders myself. I think you have probably two options - make sure the watcher acts on the appropriate event, e.g. IN_CLOSE_WRITE in http://seb.dbzteam.org/pyinotify/pyinotify-module.html#IN_CLOSE_WRITE, or you could use a spin-loop to check if the file is still growing and only process it further when it has stopped growing for some time (which sounds less reliable then relying on pyinotify or watchdog or a similar solution).

Don't know how much that helps, hope it does.

Best Regards,

Jesaja Everling


On Thu, May 24, 2018 at 4:44 PM, Eddi <[hidden email]> wrote:
Hi,

I am using Mayan 2.7.3. I have a watch folder setup, where a scanner saves files to.

When the scanner starts scanning, it will create a file in the folder. When watching the folder via CLI while the scanner is scanning, I can see the file grow in size.

The issue comes, when the scanner is scanning, and the Mayan watch function checks the folder. The file is present (but not completed because the scanner is still scanning), but Mayan EDMS grabs the file.
The end result is a corrupt file (I use mostly PDF extensions). In Mayan it shows up as a red question mark.

How can I work around this?

Cheers,

Eddi


--

---
You received this message because you are subscribed to the Google Groups "Mayan EDMS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.


--

---
You received this message because you are subscribed to the Google Groups "Mayan EDMS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: [Mayan EDMS: 2482] Watch Folder and Scanner issue

rosarior
Administrator
We've tried pyinotify in the past and as you've mentioned has its own set of challenges. The proposed idea for the new watchfolder feature is to make it a two pass process. A process grabs a lock to scan the watchfolder in an exclusive manner. Scans the folder and writes to the database the file path (or hash, we are still debating this) and the current size of the files. Releases the lock. Then on the next period the same process or another process grabs a lock and scans the watchfolder. If a file in the folder appears in the database and has the same size, it is assumed to be complete and uploaded and deleted. If the size doesn't match it is assumed the file is still being created and is left alone. If the file is not the in the database it is assumed to be a new file, thus added to the database with this properties and left alone until the next pass. This proposal fixes the problem of not knowing when the file has finished being created while being platform agnostic and not depending on third party libraries. We wanted to add this for version 3.0 but the changes were too many and the specification still needs finalizing but I will be added in a future minor version.


On Thursday, May 24, 2018 at 1:30:41 PM UTC-4, Jesaja Everling wrote:
I had a very cursory glance at the source code, and it seems like the WatchfolderSource just checks if a file exists (using a periodic task in Celery if I'm not mistaken), and processes it if it does. So I think that could be improved.

@Roberto, do you think it would be worth it to have a standalone watchfolder daemon that can be configured to push files to Mayan e.g. using the API, which means it could also be run on a local machine and push to a remote Mayan instance, or do you prefer to keep it a Celery task? If it's a celery task, I don't think you can use some file watcher like pyinotify (which requires a continuously running process which is not really what Celery is intended for), but you could still have the task wait for the files it encounters to stop growing before processing them further.

On Thu, May 24, 2018 at 7:19 PM, Jesaja Everling <[hidden email]> wrote:
Hi Eddi,

I'm not familiar with the code that handles the watchfolder monitoring, and it has been a while that I worked with watchfolders myself. I think you have probably two options - make sure the watcher acts on the appropriate event, e.g. IN_CLOSE_WRITE in <a href="http://seb.dbzteam.org/pyinotify/pyinotify-module.html#IN_CLOSE_WRITE" target="_blank" rel="nofollow" onmousedown="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2Fseb.dbzteam.org%2Fpyinotify%2Fpyinotify-module.html%23IN_CLOSE_WRITE\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNGsjv-XLPi0u_-fKchKdd6R71qrig&#39;;return true;" onclick="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2Fseb.dbzteam.org%2Fpyinotify%2Fpyinotify-module.html%23IN_CLOSE_WRITE\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNGsjv-XLPi0u_-fKchKdd6R71qrig&#39;;return true;">http://seb.dbzteam.org/pyinotify/pyinotify-module.html#IN_CLOSE_WRITE, or you could use a spin-loop to check if the file is still growing and only process it further when it has stopped growing for some time (which sounds less reliable then relying on pyinotify or watchdog or a similar solution).

Don't know how much that helps, hope it does.

Best Regards,

Jesaja Everling


On Thu, May 24, 2018 at 4:44 PM, Eddi <[hidden email]> wrote:
Hi,

I am using Mayan 2.7.3. I have a watch folder setup, where a scanner saves files to.

When the scanner starts scanning, it will create a file in the folder. When watching the folder via CLI while the scanner is scanning, I can see the file grow in size.

The issue comes, when the scanner is scanning, and the Mayan watch function checks the folder. The file is present (but not completed because the scanner is still scanning), but Mayan EDMS grabs the file.
The end result is a corrupt file (I use mostly PDF extensions). In Mayan it shows up as a red question mark.

How can I work around this?

Cheers,

Eddi


--

---
You received this message because you are subscribed to the Google Groups "Mayan EDMS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://groups.google.com/d/optout&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/d/optout&#39;;return true;">https://groups.google.com/d/optout.


--

---
You received this message because you are subscribed to the Google Groups "Mayan EDMS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.