Fix #126 "Make sure bytebuffer never exceeds Integer.MAX_VALUE" #154

ate47 · 2022-03-02T15:02:22Z

I've made this little fix for #126, it's using multiple MappedByteBuffer instead of one if the size of the bytebuffer is too big (the max size is set to Integer.MAX_VALUE)

Because the ByteBuffer classes constructors of the nio package are package-private, I wasn't able to create a children of the ByteBuffer class, so I cloned the methods using the ByteBuffer to use my new class, BigMappedByteBuffer.

Tests

I've tried all the methods reimplemented to see if it was working across multiple mapped buffers. So 5 new tests were added + 1 ignored because it was using the test HDT dbpedia2016-10.hdt. (36GB+)

ate47 · 2022-03-02T15:02:56Z

@mielvds it was your issue if you want to review this PR :)

mielvds

Hi @ate47 , very nice! It's not ideal of course, because using more memory is contradicting the use of memory mapping a bit, but it will definitely do. I was wondering: were you able to recreate the orginal issue in the end?

ate47 · 2022-03-11T10:09:35Z

The original code was already splitting the file into buffers.

in fact when I've tried the HDT mentioned in the issue, without counting my splitting, the number of buffer was 237 with only 5 BigMappedByteBuffers with a buffers.size() > 1, so the count of mapped buffers is 237+6=243, added to that the BigMappedByteBuffer object which only contains a list, we aren't using that much memory compared to the previous implementation.

I wasn't able to recreate the original issue after my fix.

mielvds · 2022-03-11T10:20:57Z

The original code was already splitting the file into buffers.

in fact when I've tried the HDT mentioned in the issue, without counting my splitting, the number of buffer was 237 with only 5 BigMappedByteBuffers with a buffers.size() > 1, so the count of mapped buffers is 237+6=243, added to that the BigMappedByteBuffer object which only contains a list, we aren't using that much memory compared to the previous implementation.

great!

I wasn't able to recreate the original issue after my fix.

I meant before your fix? because I wasn't sure about the root of the problem.

ate47 · 2022-03-11T10:31:56Z

Ah! Yes it was crashing at this line:

hdt-java/hdt-java-core/src/main/java/org/rdfhdt/hdt/dictionary/impl/section/PFCDictionarySectionMap.java

Line 138 in e29e4fc

buffers[buffer] = ch.map(MapMode.READ_ONLY, base+bytePos, nextBytePos-bytePos);

Because nextBytePos-bytePos > Integer.MAX_VALUE and the max capacity of a mapped buffer is Integer.MAX_VALUE (2.14B), so the 5 mappings mentioned earlier with the sizes 3.26B, 3.23B, 5.85B, 3.77B and 2.40B were crashing the map of the dictionary.

mielvds · 2022-03-11T11:20:42Z

Great! I was wondering whether this should be merged in the master branch instead of the 3.0.0 and release a 2.2.1. WDYT @D063520 ?

ate47 · 2022-03-11T12:47:22Z

I've rebased my change to the master branch

D063520 · 2022-03-11T15:54:37Z

Yes, sure, I can do it .... to test again the CI ... but it is very strange that it is not findable in mvn central yet

mielvds · 2022-03-11T16:18:51Z

I'm also not sure why, but let's check again on monday to be sure, cuz this process is very slow.

mielvds self-requested a review March 2, 2022 15:34

mielvds approved these changes Mar 11, 2022

View reviewed changes

mielvds added this to the 2.2.1 milestone Mar 11, 2022

ate47 added 3 commits March 11, 2022 13:40

create BigMappedByteBuffer and tests

fd27c9c

add method for the new ByteBuffer type

ffc3927

replace ByteBuffer with BigMappedByteBuffer

084a52b

ate47 force-pushed the big_bytebuffer branch from 0575877 to 084a52b Compare March 11, 2022 12:41

ate47 changed the base branch from 3.0.0 to master March 11, 2022 12:41

mielvds merged commit aede684 into rdfhdt:master Mar 11, 2022

ate47 mentioned this pull request Apr 5, 2022

Fix #104 "java.lang.NegativeArraySizeException parsing Wikidata HDT file" and fix some warnings #157

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix #126 "Make sure bytebuffer never exceeds Integer.MAX_VALUE" #154

Fix #126 "Make sure bytebuffer never exceeds Integer.MAX_VALUE" #154

ate47 commented Mar 2, 2022 •

edited

Loading

ate47 commented Mar 2, 2022

mielvds left a comment

ate47 commented Mar 11, 2022

mielvds commented Mar 11, 2022

ate47 commented Mar 11, 2022

mielvds commented Mar 11, 2022

ate47 commented Mar 11, 2022

D063520 commented Mar 11, 2022

mielvds commented Mar 11, 2022

Fix #126 "Make sure bytebuffer never exceeds Integer.MAX_VALUE" #154

Fix #126 "Make sure bytebuffer never exceeds Integer.MAX_VALUE" #154

Conversation

ate47 commented Mar 2, 2022 • edited Loading

Tests

ate47 commented Mar 2, 2022

mielvds left a comment

Choose a reason for hiding this comment

ate47 commented Mar 11, 2022

mielvds commented Mar 11, 2022

ate47 commented Mar 11, 2022

mielvds commented Mar 11, 2022

ate47 commented Mar 11, 2022

D063520 commented Mar 11, 2022

mielvds commented Mar 11, 2022

ate47 commented Mar 2, 2022 •

edited

Loading