Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #126 "Make sure bytebuffer never exceeds Integer.MAX_VALUE" #154

Merged
merged 3 commits into from
Mar 11, 2022

Conversation

ate47
Copy link
Contributor

@ate47 ate47 commented Mar 2, 2022

I've made this little fix for #126, it's using multiple MappedByteBuffer instead of one if the size of the bytebuffer is too big (the max size is set to Integer.MAX_VALUE)

Because the ByteBuffer classes constructors of the nio package are package-private, I wasn't able to create a children of the ByteBuffer class, so I cloned the methods using the ByteBuffer to use my new class, BigMappedByteBuffer.

Tests

I've tried all the methods reimplemented to see if it was working across multiple mapped buffers. So 5 new tests were added + 1 ignored because it was using the test HDT dbpedia2016-10.hdt. (36GB+)

@ate47
Copy link
Contributor Author

ate47 commented Mar 2, 2022

@mielvds it was your issue if you want to review this PR :)

@mielvds mielvds self-requested a review March 2, 2022 15:34
Copy link
Member

@mielvds mielvds left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ate47 , very nice! It's not ideal of course, because using more memory is contradicting the use of memory mapping a bit, but it will definitely do. I was wondering: were you able to recreate the orginal issue in the end?

@mielvds mielvds added this to the 2.2.1 milestone Mar 11, 2022
@ate47
Copy link
Contributor Author

ate47 commented Mar 11, 2022

The original code was already splitting the file into buffers.

in fact when I've tried the HDT mentioned in the issue, without counting my splitting, the number of buffer was 237 with only 5 BigMappedByteBuffers with a buffers.size() > 1, so the count of mapped buffers is 237+6=243, added to that the BigMappedByteBuffer object which only contains a list, we aren't using that much memory compared to the previous implementation.

I wasn't able to recreate the original issue after my fix.

@mielvds
Copy link
Member

mielvds commented Mar 11, 2022

The original code was already splitting the file into buffers.

in fact when I've tried the HDT mentioned in the issue, without counting my splitting, the number of buffer was 237 with only 5 BigMappedByteBuffers with a buffers.size() > 1, so the count of mapped buffers is 237+6=243, added to that the BigMappedByteBuffer object which only contains a list, we aren't using that much memory compared to the previous implementation.

great!

I wasn't able to recreate the original issue after my fix.

I meant before your fix? because I wasn't sure about the root of the problem.

@ate47
Copy link
Contributor Author

ate47 commented Mar 11, 2022

Ah! Yes it was crashing at this line:

buffers[buffer] = ch.map(MapMode.READ_ONLY, base+bytePos, nextBytePos-bytePos);

Because nextBytePos-bytePos > Integer.MAX_VALUE and the max capacity of a mapped buffer is Integer.MAX_VALUE (2.14B), so the 5 mappings mentioned earlier with the sizes 3.26B, 3.23B, 5.85B, 3.77B and 2.40B were crashing the map of the dictionary.

@mielvds
Copy link
Member

mielvds commented Mar 11, 2022

Great! I was wondering whether this should be merged in the master branch instead of the 3.0.0 and release a 2.2.1. WDYT @D063520 ?

@ate47 ate47 changed the base branch from 3.0.0 to master March 11, 2022 12:41
@ate47
Copy link
Contributor Author

ate47 commented Mar 11, 2022

I've rebased my change to the master branch

@mielvds mielvds merged commit aede684 into rdfhdt:master Mar 11, 2022
@D063520
Copy link
Contributor

D063520 commented Mar 11, 2022

Yes, sure, I can do it .... to test again the CI ... but it is very strange that it is not findable in mvn central yet

@mielvds
Copy link
Member

mielvds commented Mar 11, 2022

I'm also not sure why, but let's check again on monday to be sure, cuz this process is very slow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants