[Mayan EDMS: 1713] Large Document Repository

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

[Mayan EDMS: 1713] Large Document Repository

Gerrit Van Dyk
Hi

Our implementation will have at least 5 000 documents added to it per day. This will grow the repository by almost 2 million documents per annum.

As Mayan EDMS are storing all documents physically in one folder, should we be concerned about this, or how should we split the uploaded files over a directory structure. 

Is there any precautions that we should be aware of, before we setup such a large repository?

Gerrit

--

---
You received this message because you are subscribed to the Google Groups "Mayan EDMS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

[Mayan EDMS: 1718] Re: Large Document Repository

rosarior
Administrator
Hi,

Most common filesystems support many millions of files (https://en.wikipedia.org/wiki/Ext4). You can do folder sharding as you mention by subclassing the built in filesystem storage backend and store a document's file in a folder based on the first character for example. To avoid any possible problem in the future thought I would recommend starting with an object storage (like S3 for example) from the beginning. The other advantage to this is that you abstract storage further, the Mayan installation has no knowledge how files are actually stored (filesystem, etx4, XFS, RAID, local, remote, etc). 

The other recommendation would be to use RabbitMQ for the broker in a cluster setup with a few nodes. Same for REDIS for the results backend.

Lastly spread the queues over several workers to avoid tasks in a queue pilling and blocking other tasks.

This setup will be costly in terms of memory usage, but the exchange here is memory in favor of scalability.

I look forward to any details you can share from this setup, it will be a good case study for further improvements.

On Friday, May 19, 2017 at 4:33:54 AM UTC-4, Gerrit Van Dyk wrote:
Hi

Our implementation will have at least 5 000 documents added to it per day. This will grow the repository by almost 2 million documents per annum.

As Mayan EDMS are storing all documents physically in one folder, should we be concerned about this, or how should we split the uploaded files over a directory structure. 

Is there any precautions that we should be aware of, before we setup such a large repository?

Gerrit

--

---
You received this message because you are subscribed to the Google Groups "Mayan EDMS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.