[Mayan EDMS: 2264] I need help with document_analyzer basics

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

[Mayan EDMS: 2264] I need help with document_analyzer basics

David Reagan
Hey all,

I finally have time to experiment with Mayan-EDMS some more. So I'm back at trying to get https://gitlab.com/startmat/document_analyzer working the way I want.

Unfortunately, I can't seem to figure it out.

I'm currently testing on a vagrant instance. See: https://gitlab.com/mayan-edms/mayan-edms-vagrant

I ended up copying the document_analyzer app into the apps directory to get it loading.

I am using an Albertsons receipt to test with. The first two lines of OCR look like:

4S Albertsons
It's just better.

 I made an analyzer and assigned the 'receipt' document type to it. (That's the type I added and that the albertsons receipt's properties page says it is.)

Parameter:
first;(?ims)(?P<albertsons>(.*Albertsons.*))


This should cause document_analyzer to add a "albertsons" field to either the metadata or properties of the document. Am I wrong?

I also made an analyzer based on the document_analyzer's README.

Parameter:
first;(?i)(?P<Creator>Tele2|Apple|Microsoft|Billa|Albertsons)

I just added "Albertsons" to list of words to look for.


This should cause document_analyzer to add a "Creator" field to either the metadata or properties of the document. Am I wrong?


I used the menu item "Submit to analyze" http://localhost:8080/document_analyzer/analyzer/1/submit/ to run document_analyzer.


All I can see in the logs is that I clicked that menu item. The document's properties and metadata do not change.


Nothing is added to either the metadata or properties of the document.


If I test:


(?ims).*albertsons.*


on http://www.pyregex.com/ with the first two lines of the document, it reports a success.


/usr/share/mayan-edms/mayan/settings/local.py looks like:


from __future__ import absolute_import, unicode_literals

from .base import *

SECRET_KEY
= '5(kv&ow31r2m9e^#c65v%ppiwiv9epu-hxa*1jsa1#m5bi!g7+'

DATABASES
= {
   
'default': {
       
'ENGINE': 'django.db.backends.postgresql_psycopg2',
       
'NAME': 'mayan_edms',
       
'USER': 'mayan',
       
'PASSWORD': 'test123',
       
'HOST': 'localhost',
       
'PORT': '5432',
   
}
}
INSTALLED_APPS
+= (
   
'document_analyzer',
)

BROKER_URL
= 'redis://127.0.0.1:6379/0'
CELERY_RESULT_BACKEND
= 'redis://127.0.0.1:6379/0'

LOGGING
= {
   
'version': 1,
   
'disable_existing_loggers': True,
   
'formatters': {
       
'verbose': {
           
'format': '%(levelname)s %(asctime)s %(name)s %(process)d %(thread)d %(message)s'
       
},
       
'intermediate': {
           
'format': '%(name)s <%(process)d> [%(levelname)s] "%(funcName)s() %(message)s"'
       
},
       
'simple': {
           
'format': '%(levelname)s %(message)s'
       
},
   
},
   
'handlers': {
       
'console':{
           
'level':'DEBUG',
           
'class':'logging.StreamHandler',
           
'formatter': 'intermediate'
       
}
   
},
   
'loggers': {
       
#'documents': {
       
#    'handlers':['console'],
       
#    'propagate': True,
       
#    'level':'DEBUG',
       
#},
       
#'common': {
       
#    'handlers':['console'],
       
#    'propagate': True,
       
#    'level':'DEBUG',
       
#},
       
'document_analyzer': {
           
'handlers':['console'],
           
'propagate': True,
           
'level':'DEBUG',
       
},

   
}
}


Does anyone have any tips? Am I missing a step somewhere?

--

---
You received this message because you are subscribed to the Google Groups "Mayan EDMS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

[Mayan EDMS: 2266] Re: I need help with document_analyzer basics

Matthias Löblich
 Hi David,
you can navigate to the document_analyzer result by selecting the document version page and then select "Analyzer result" from the "Actions" Menu of the related Document Version.




The Analyzer Result is not stored as Metadata, it is using its own structure. You are able to build Mayan Indexes based on the Analyzer Result.

For you example you can build an Index like that:  {{ document.analyzer_value_of.Creator }}

br
Matthias

--

---
You received this message because you are subscribed to the Google Groups "Mayan EDMS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: [Mayan EDMS: 2266] Re: I need help with document_analyzer basics

David Reagan
Thanks Matthias.

Now I know where to look.

When I read the docs the other day, I thought indexes seemed similar to
a folder structure. Is that an ok way to think of them?

Is there a way to use document_analyzer to add tags, metadata, or
properties?

For example, if I upload a receipt from Amazon.com, I'd like to add it
to the "2018->Amazon" index, tag it with something pulled from the Items
Ordered section, and add metadata that includes: total, billed date,
ordered date, Amazon.com order number, and what card I used.

On 02/17/2018 04:17 AM, Matthias Löblich wrote:

>   Hi David,
> you can navigate to the document_analyzer result by selecting the
> document version page and then select "Analyzer result" from the
> "Actions" Menu of the related Document Version.
>
>
>
>
> The Analyzer Result is not stored as Metadata, it is using its own
> structure. You are able to build Mayan Indexes based on the Analyzer Result.
>
> For you example you can build an Index like that:  {{
> document.analyzer_value_of.|Creator| }}
>
> br
> Matthias
>
> --
>
> ---
> You received this message because you are subscribed to a topic in the
> Google Groups "Mayan EDMS" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/mayan-edms/1vDxSIvulNI/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> [hidden email]
> <mailto:[hidden email]>.
> For more options, visit https://groups.google.com/d/optout.

--
- David Reagan

--

---
You received this message because you are subscribed to the Google Groups "Mayan EDMS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: [Mayan EDMS: 2441] Re: I need help with document_analyzer basics

Alan
Bump

On Saturday, February 17, 2018 at 10:24:34 AM UTC-6, David Reagan wrote:
Thanks Matthias.

Now I know where to look.

When I read the docs the other day, I thought indexes seemed similar to
a folder structure. Is that an ok way to think of them?

Is there a way to use document_analyzer to add tags, metadata, or
properties?

For example, if I upload a receipt from Amazon.com, I'd like to add it
to the "2018->Amazon" index, tag it with something pulled from the Items
Ordered section, and add metadata that includes: total, billed date,
ordered date, Amazon.com order number, and what card I used.

On 02/17/2018 04:17 AM, Matthias Löblich wrote:

>   Hi David,
> you can navigate to the document_analyzer result by selecting the
> document version page and then select "Analyzer result" from the
> "Actions" Menu of the related Document Version.
>
>
>
>
> The Analyzer Result is not stored as Metadata, it is using its own
> structure. You are able to build Mayan Indexes based on the Analyzer Result.
>
> For you example you can build an Index like that:  {{
> document.analyzer_value_of.|Creator| }}
>
> br
> Matthias
>
> --
>
> ---
> You received this message because you are subscribed to a topic in the
> Google Groups "Mayan EDMS" group.
> To unsubscribe from this topic, visit
> <a href="https://groups.google.com/d/topic/mayan-edms/1vDxSIvulNI/unsubscribe" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://groups.google.com/d/topic/mayan-edms/1vDxSIvulNI/unsubscribe&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/d/topic/mayan-edms/1vDxSIvulNI/unsubscribe&#39;;return true;">https://groups.google.com/d/topic/mayan-edms/1vDxSIvulNI/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> <a href="javascript:" target="_blank" gdf-obfuscated-mailto="1j2ARPwbAwAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">mayan-edms+...@googlegroups.com
> <mailto:<a href="javascript:" target="_blank" gdf-obfuscated-mailto="1j2ARPwbAwAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">mayan-edms+unsubscribe@...>.
> For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://groups.google.com/d/optout&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/d/optout&#39;;return true;">https://groups.google.com/d/optout.

--
- David Reagan

--

---
You received this message because you are subscribed to the Google Groups "Mayan EDMS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.