Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to search over triples with their IDs #141

Closed
wants to merge 2 commits into from
Closed

Ability to search over triples with their IDs #141

wants to merge 2 commits into from

Conversation

AlyHdr
Copy link

@AlyHdr AlyHdr commented Jan 13, 2022

We introduce a new feature to be able to search over the triples and get an ID of the found list or one hit of triples, as well we provide a function find(long id) over the triples which basically takes an index or ID in the list of triples and looks up in the trees in a bottom up fashion to find the given tripleID.

This code doesn't change the original hdt-api but adds two extra functions:

  • searchWithId(TripleID pattern): Iterates over all triples that match the pattern, and their IDs (i.e. index in the triples list)
  • find(long index): Returns a tripleID of the given index.

For the search with ID method all the modifications are over the bitmap triples iterators of the different query patterns, all of the modifications doesn't cost any performance issue because we take into account the indexes used to find the matching triple patterns and we use the index of Z (the object in this case) as an identifier of the triple or the index in the list.

P.S: There is only one case when we use BitmapTriplesIteratorZFOQ to search over a fixed object we need to do a binary search over the extra object list in order to find the position of the given object z.

Regarding the find triple by ID, we go up in the tree using the indexes and rank1 operations over the bitmaps:

  1. First we use the given ID or index of the triple and find the corresponding object ID from the sequence of objects.
  2. Get the position of the predicate by doing a rank1 operation over the bitmapZ using the object's position. (basically count the ones up to the given position)
  3. Find the predicate ID from the inferred predicate position using the sequence of predicates.
  4. Use the position of the predicate to get the position of the subject exactly by doing the same rank1 but over bitmapY
  5. Get the subject from the subjects sequence using the position got in (4)

Screenshot 2022-01-13 at 16 02 58

example: find(3) in the above figure.
triples list:
1 4 5, pos = 1
2 2 1, pos = 2
2 3 3, pos = 3
2 3 4, pos = 4
2 5 2, pos = 5
3 1 1, pos = 6
4 1 1, pos = 7
solution: z = seqZ.get(3-1) = 3 ( id at position 2), posY = bitmapZ.rank1(3 -1) = 2, y = seqY.get(posY) = 3,
posX = bitmapY.rank1(2 - 1) = 1, x = seqX.get(posX) = 2, then => x = 2, y =3, z =3

Triple (2 3 3) is at position 3 in the list.

All those operations must be done in constant time because we just use some binary operations to access and find the positions of the triple.

A test has been added to make sure all the functionalities added are working over different triple patterns.

@mielvds mielvds added this to the 3.0.0 milestone Jan 13, 2022
@AlyHdr
Copy link
Author

AlyHdr commented Jan 14, 2022

Hello, can we merge this to 3.0.0 as well? Because we still have some other modifications based on this PR.

@mielvds mielvds changed the base branch from master to 3.0.0 January 14, 2022 16:01
@mielvds mielvds self-requested a review January 14, 2022 16:02
@mielvds
Copy link
Member

mielvds commented Feb 3, 2022

Hi @AlyHdr , LGTM, but your PR removed a lot of the javadoc, was this intentional?

@AlyHdr
Copy link
Author

AlyHdr commented Feb 3, 2022

Hello @mielvds, there was a problem with the java doc generation when compiling with java 11 I think and my IDE just reformated stuff, so it's not intentional at all...

@mielvds
Copy link
Member

mielvds commented Feb 3, 2022

There's a jena4 branch which lifts everything to Java 11. I'm planning to include this in 2.3, so I will rebase the 3.0.0 branch on that branch as well. You'll have to probably align your PR's

@mielvds
Copy link
Member

mielvds commented Feb 16, 2022

Branch has been rebased and is now java 11. Can you check the docs? It's mostly return types that are removed.

@D063520
Copy link
Contributor

D063520 commented Feb 17, 2022

I just saw that there is a branch where Javier tried already to implement this functionality:

https://github.com/rdfhdt/hdt-java/tree/triplePosition

we maybe need to compare and see what was his proposed solution .....

@D063520
Copy link
Contributor

D063520 commented Feb 17, 2022

In fact the API that he proposes is much cleaner ... instead of having a separate iterator:

IteratorTripleString searchWithId(CharSequence subject, CharSequence predicate, CharSequence object)

he adds a new method

public long getNextTriplePosition()

this is very elegant. I know that there might be performance problems with this. Making this logic work automatically for every iterator might be problematic. I will check Javiers implementation. Maybe he found a way to avoid performance problems

@AlyHdr
Copy link
Author

AlyHdr commented Feb 17, 2022

We already tried first to have a method over the TripleString and TripleID classes to get the position, but that implies that we have to always find the position(i.e performance problems as @D063520 said) because the API doesn't support any flags or something, so one either have to add a new search method as it's done in the PR or do it as Javier did it in order to stay reverse compatible with the original API.

@D063520
Copy link
Contributor

D063520 commented Feb 17, 2022

But for example this one:
8086aee#diff-fe04f5a5ab3ac989b85ada9c61d8b6689a5ad05b9d74c2bdede5fd07c14bcf72R214
basically it calls the search only if the method is called. I'm not 100% sure but it could work ....

@AlyHdr
Copy link
Author

AlyHdr commented Feb 17, 2022

Yes it would be cool if this can be implemented in all the other iterators without the need to do a search...

@D063520
Copy link
Contributor

D063520 commented Feb 22, 2022

We skip this pull request in favour of the pull request #150 since it offers a more elegant solution on the API level with the same functionality

@D063520 D063520 closed this Feb 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants