-
Notifications
You must be signed in to change notification settings - Fork 279
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Entities automatically generated from mentions in an investigation during cross reference #2994
Comments
Hi @tom-claessens, thanks for opening this issue!
Just to clarify, when you say "cross-reference" did you only trigger the automatic cross-reference process by clicking the "compute" button in the cross-referencing section? Or did you also manually rate the corss-referencing results ("Same"/"Unsure"/"Different")? |
Hi @tillprochaska , I think it happened in both situations. I'm not entirely sure, as it is both something I've encountered, but also my colleagues. I think most of us are not very tempted to rate all cross-reference results, as sometimes there are thousands of results to rate. Does this mean that Aleph is supposed to add new entities from the manually chosen "sames" from the cross-reference results? |
I will need to reproduce the issue and get some more information from others as I'm not super familiar with the feature. If this is only happening for xref matches that are rated manually I could imagine that this is intended behavior. I'll geht back to you when I have more information. |
I have been able to reproduce this issue:
When viewing these entities, you can actually see that they are still linked to the source document using the For further debugging, these logs may help finding the relevant parts of the source code that trigger this behavior. Note that "[Test] Entities generated from mentions" is the title of the investigation I created for testing. |
Additional context from @brrttwrks:
|
I was able to confirm that the current behavior is indeed intended. It was implemented some time ago as an "experiment" with the expectation that there would be more iterations to refine the feature in the future, but that never happened. The idea behind it was the following: When Aleph extracts mentions of names from a document and is then able to find similar Person/Company entities in other datasets (e.g. in a companies registry or census database), it is likely that that name is the name of a person or company, respectively. We do however understand that the current behavior is confusing and inconsistent and can lead to cluttered investigations and will consider adjusting or removing the behavior. |
One additional small detail I just observed: When cross-referencing a collection with mentions, entities are created as outlined in this thread. When I then delete the entity that was automatically created, the respective cross-referencing match is deleted as well (makes sense). When I re-run the cross-referencing, the mention is ignored, i.e., two cross-referencing runs with the same data lead to different results. |
Describe the bug
When cross-referencing an investigation with a multitude of uploaded documents in Aleph, the mentions within the documents are automatically conversed into entities.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
Ideally, cross-referencing should happen without generating entities within the investigation. So that the cross-referencing exists of:
Aleph version
Latest version. Problem is encountered within the Aleph instance of Follow the Money (NL).
Screenshots
Example of unwanted generated entities
lem.
The text was updated successfully, but these errors were encountered: