-
Notifications
You must be signed in to change notification settings - Fork 29
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Browse files
Browse the repository at this point in the history
We have discovered some places where the IO buffer is not efficiently used in the `unsafe` code path. There are also some places in which we could reduce the number of memory copies. Here are the details: 1. In `SplashUnsafeSorter.writeSortedFile`, we used a `writerBuffer` to hold the serialized data. And then write the content in this buffer to the output stream. To avoid the second copy during the write, we create our own `SplashBufferedOutputStream` which exposes the internal buffer so that it could be used by the serializer to fill the serialized data directly. By doing this, we could also save the memory used by the original `writerBuffer`. It could also improve the testability of the buffer mechanism. Unit tests are added for `SplashBufferedOutputStream` to make sure we manage the buffer correctly. 2. Replace `IOUtils.copy` with `SplashUtils.copy`. This function borrows most of the code from `IOUtils.copy`. The only difference is that it allows the user to specify the size of the buffer. In previous tests, we identified some 4K IO requests. Those IO requests are issued by `IOUtils.copy`. Because this function uses a fixed 4K IO buffer. This is not efficient nor elastic in a shared file system or distributed file system. This buffer now shares the same Spark configuration `spark.shuffle.file.buffer`. What's more, since we already have this IO buffer. We could use `InputStream` and `OutputStream` directly instead of the buffered version. This helps us to save more memory. Since the copy procedure is executed in the same thread, we could safely reuse the same buffer during the copy. It helps us reduce the GC time. 3. Add more performance tests.
- Loading branch information
Showing
16 changed files
with
426 additions
and
122 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
68 changes: 68 additions & 0 deletions
68
src/main/java/com/memverge/splash/SplashBufferedOutputStream.java
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,68 @@ | ||
/* | ||
* Copyright (C) 2019 MemVerge Inc. | ||
* | ||
* Licensed under the Apache License, Version 2.0 (the "License"); | ||
* you may not use this file except in compliance with the License. | ||
* You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, software | ||
* distributed under the License is distributed on an "AS IS" BASIS, | ||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
* See the License for the specific language governing permissions and | ||
* limitations under the License. | ||
*/ | ||
package com.memverge.splash; | ||
|
||
import java.io.BufferedOutputStream; | ||
import java.io.IOException; | ||
import java.io.OutputStream; | ||
import lombok.val; | ||
import org.apache.spark.unsafe.Platform; | ||
|
||
public class SplashBufferedOutputStream extends BufferedOutputStream { | ||
|
||
public SplashBufferedOutputStream(OutputStream out) { | ||
this(out, StorageFactoryHolder.getFactory().getFileBufferSize()); | ||
} | ||
|
||
public SplashBufferedOutputStream(OutputStream out, int size) { | ||
super(out, size); | ||
} | ||
|
||
public byte[] getBuffer() { | ||
return buf; | ||
} | ||
|
||
public int getBufferSize() { | ||
return buf.length; | ||
} | ||
|
||
public int write(byte[] bytes, long offset) throws IOException { | ||
val length = bytes.length - (int) offset; | ||
return write(bytes, Platform.BYTE_ARRAY_OFFSET + offset, length); | ||
} | ||
|
||
public int write(Object src, Long srcOffset, int length) throws IOException { | ||
val bufSize = getBufferSize(); | ||
int dataRemaining = length; | ||
long offset = srcOffset; | ||
while (dataRemaining > 0) { | ||
val toTransfer = Math.min(bufSize, dataRemaining); | ||
if (count + toTransfer > bufSize && count > 0) { | ||
flush(); | ||
} | ||
Platform.copyMemory( | ||
src, | ||
offset, | ||
buf, | ||
Platform.BYTE_ARRAY_OFFSET + count, | ||
toTransfer); | ||
count += toTransfer; | ||
offset += toTransfer; | ||
dataRemaining -= toTransfer; | ||
} | ||
return length - dataRemaining; | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.