-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ability to search over triples with their IDs #141
Conversation
Hello, can we merge this to 3.0.0 as well? Because we still have some other modifications based on this PR. |
Hi @AlyHdr , LGTM, but your PR removed a lot of the javadoc, was this intentional? |
Hello @mielvds, there was a problem with the java doc generation when compiling with java 11 I think and my IDE just reformated stuff, so it's not intentional at all... |
There's a jena4 branch which lifts everything to Java 11. I'm planning to include this in 2.3, so I will rebase the 3.0.0 branch on that branch as well. You'll have to probably align your PR's |
Branch has been rebased and is now java 11. Can you check the docs? It's mostly return types that are removed. |
I just saw that there is a branch where Javier tried already to implement this functionality: https://github.com/rdfhdt/hdt-java/tree/triplePosition we maybe need to compare and see what was his proposed solution ..... |
In fact the API that he proposes is much cleaner ... instead of having a separate iterator: IteratorTripleString searchWithId(CharSequence subject, CharSequence predicate, CharSequence object) he adds a new method public long getNextTriplePosition() this is very elegant. I know that there might be performance problems with this. Making this logic work automatically for every iterator might be problematic. I will check Javiers implementation. Maybe he found a way to avoid performance problems |
We already tried first to have a method over the TripleString and TripleID classes to get the position, but that implies that we have to always find the position(i.e performance problems as @D063520 said) because the API doesn't support any flags or something, so one either have to add a new search method as it's done in the PR or do it as Javier did it in order to stay reverse compatible with the original API. |
But for example this one: |
Yes it would be cool if this can be implemented in all the other iterators without the need to do a search... |
We skip this pull request in favour of the pull request #150 since it offers a more elegant solution on the API level with the same functionality |
We introduce a new feature to be able to search over the triples and get an ID of the found list or one hit of triples, as well we provide a function find(long id) over the triples which basically takes an index or ID in the list of triples and looks up in the trees in a bottom up fashion to find the given tripleID.
This code doesn't change the original hdt-api but adds two extra functions:
For the search with ID method all the modifications are over the bitmap triples iterators of the different query patterns, all of the modifications doesn't cost any performance issue because we take into account the indexes used to find the matching triple patterns and we use the index of Z (the object in this case) as an identifier of the triple or the index in the list.
P.S: There is only one case when we use BitmapTriplesIteratorZFOQ to search over a fixed object we need to do a binary search over the extra object list in order to find the position of the given object z.
Regarding the find triple by ID, we go up in the tree using the indexes and rank1 operations over the bitmaps:
example: find(3) in the above figure.
triples list:
1 4 5, pos = 1
2 2 1, pos = 2
2 3 3, pos = 3
2 3 4, pos = 4
2 5 2, pos = 5
3 1 1, pos = 6
4 1 1, pos = 7
solution: z = seqZ.get(3-1) = 3 ( id at position 2), posY = bitmapZ.rank1(3 -1) = 2, y = seqY.get(posY) = 3,
posX = bitmapY.rank1(2 - 1) = 1, x = seqX.get(posX) = 2, then => x = 2, y =3, z =3
Triple (2 3 3) is at position 3 in the list.
All those operations must be done in constant time because we just use some binary operations to access and find the positions of the triple.
A test has been added to make sure all the functionalities added are working over different triple patterns.