Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about Bitmap Triples Iterator ZFOQ implementation #191

Closed
KonradHoeffner opened this issue Feb 8, 2023 · 5 comments
Closed

Question about Bitmap Triples Iterator ZFOQ implementation #191

KonradHoeffner opened this issue Feb 8, 2023 · 5 comments

Comments

@KonradHoeffner
Copy link

I was trying to understand why the Java version is so much faster than our Rust implementation in tests with a ?PO query and landed on the class BitmapTriplesZFOQ, which as far as I understand it uses less operations than in the paper "Exchange and Consumption of Huge RDF Data", however a Google search for "hdt zfoq" didn't return relevant results, so I tried to go through the code but I got stuck on the next() function, so I would appreciate if someone could help me understand it or if there is some written resource point me to that.
The thing I don't understand is the line long posY = adjIndex.get(posIndex);. In our own code, we use the OP-S index to get posZ (the position in the object layer), and then use a rank query to get the position in the y-layer, which takes more time, so I would be really interested on how we can skip this step and get posY directly.

@Override
public TripleID next() {
    long posY = adjIndex.get(posIndex);

    z = patZ!=0 ? patZ : adjIndex.findListIndex(posIndex)+1;
    y = patY!=0 ? patY : adjY.get(posY);
    x = adjY.findListIndex(posY)+1;

	updateOutput();

	posIndex++;

    return returnTriple;
}
@ate47
Copy link
Contributor

ate47 commented Feb 8, 2023

Hello,

In this iterator, the first thing made is to compute the boundaries in the calculateRange() method, it will fill the minIndex and maxIndex value, these ones:

image

So to get the posY, you simply need to sequentially read the adjIndex

@ate47
Copy link
Contributor

ate47 commented Feb 8, 2023

I found this schema in the slide 25 for ISWC 2017 (pdf)

@KonradHoeffner
Copy link
Author

Wow, thank you for the quick and detailed reply, I will read through the slides!
Maybe I built my index wrong because it points to the O-layer and not the P-layer.

@ate47
Copy link
Contributor

ate47 commented Feb 8, 2023

I was working on this index for #187 and I read the FOQ iterators yesterday, so it was fresh in my mind, good luck with your library ;)

@KonradHoeffner
Copy link
Author

Yes, our index indeed pointed to the wrong layer.
Thanks to your advice we got a 27% time reduction with PO queries excluding extraction and a 10% time reduction including extraction!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants