Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documents inexplicably orphaned #1581

Open
mattcg opened this issue Jan 20, 2021 · 7 comments
Open

Documents inexplicably orphaned #1581

mattcg opened this issue Jan 20, 2021 · 7 comments

Comments

@mattcg
Copy link
Contributor

mattcg commented Jan 20, 2021

This is a bit difficult to reproduce and I have tried debugging and gotten nowhere. Periodically, some documents that are deep within a directory hierarchy will appear, as copies of the original documents but orphaned from the parent directory, at the root of the dataset directly. After deleting these orphan documents, some event - a re-index, re-ingest or upgrade - seems to trigger their re-appearance.

In other instances, these documents are not actual documents but empty 'Table' documents. Again, when deleted they re-appear. If were to guess I'd imagine it's some race condition - attempting to index the child before the parent document is indexed, but this is just an uneducated guess.

@pudo
Copy link
Contributor

pudo commented Jan 20, 2021

Can you debug it and submit a patch, please?

@mattcg
Copy link
Contributor Author

mattcg commented Jan 20, 2021

Yes, will do so!

@mattcg
Copy link
Contributor Author

mattcg commented Mar 18, 2021

I was able to replicate this after re-indexing a very large dataset. It's certainly a bug; I just haven't discovered the cause yet.

@pudo
Copy link
Contributor

pudo commented Mar 25, 2021

If you have any lead on what the parent document of the stray fragment might be (and ideally share it), that would help us debug it.

@sunu
Copy link
Contributor

sunu commented Feb 8, 2022

This might have been related to #3923

@sunu sunu added the ingest label Feb 8, 2022
@pudo
Copy link
Contributor

pudo commented Feb 8, 2022

What a debug find, @sunu. This one had been killing me for ages. That's a super logical explanation....

@mattcg
Copy link
Contributor Author

mattcg commented Feb 8, 2022

Wow sunu, incredible find! Yes, this is definitely the reason. It also explains a problem we were constantly facing, of Tables showing up in search results without a parent document that could be downloaded.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants