-
Notifications
You must be signed in to change notification settings - Fork 120
Fix Issue #211 : Improved Embedding Performance by Handling Base64 Encoding #303
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
First commit to fix Issue openai#211 This commit includes the fix described in Issue openai#211. * Addressed the issue where Base64 encoding could not be handled. * Improved performance by using Base64 encoding by default.
Dear @TomerAberbach san, If you have time, please review my PR please? |
Some high-level thoughts:
|
@JsonDeserialize(using = EmbeddingValueDeserializer::class) | ||
class EmbeddingValue( | ||
var base64Embedding: Optional<String> = Optional.empty(), | ||
floatEmbedding: Optional<MutableList<Double>> = Optional.empty(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest that you replace all the Optional
type declarations in this class with Kotlin's nullable type which is more natively supported, built into the language, and is more idiomatic and efficient.
E.g.
var base64Embedding: String? = null,
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@essien and @TomerAberbach, Thank you so much, I will update the codes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yoshioterada just an FYI that @essien is not a maintainer on this repo
I don't agree with his suggestion because this library is meant to be consumed by Java users (hence OpenAI Java) so we should be exposing optional fields as Optional
instead of nullable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback, @TomerAberbach. I understand the concern about maintaining Java compatibility. However, I noticed that this repo already uses Kotlin's nullable types quite extensively (e.g., in HttpRequest, SseMessage, etc.). Given that context, my suggestion aligns with current conventions in the repo and avoids introducing Optional, which is redundant and less idiomatic in Kotlin.
Overview
This commit includes the fix described in Issue #211.
Detail
This pull request introduces several changes to the
Embedding
class and related components in theopenai-java-core
package. The primary goal is to enhance the handling of embedding vectors by supporting both float lists and Base64-encoded strings. The most important changes include the introduction of theEmbeddingValue
class, modifications to theEmbedding
class to useEmbeddingValue
, and updates to the deserialization logic.Enhancements to embedding handling:
openai-java-core/src/main/kotlin/com/openai/models/embeddings/Embedding.kt
: Modified theEmbedding
class to useEmbeddingValue
instead ofList<Double>
for embedding vectors. This includes changes to the constructor, builder, and relevant methods. [1] [2] [3] [4] [5] [6]Introduction of
EmbeddingValue
class:openai-java-core/src/main/kotlin/com/openai/models/embeddings/EmbeddingValue.kt
: Added a new classEmbeddingValue
to represent embedding vectors, which can be either a list of floats or a Base64-encoded string. This class includes methods for converting between these representations.Deserialization improvements:
openai-java-core/src/main/kotlin/com/openai/models/embeddings/EmbeddingValueDeserializer.kt
: Introduced a custom deserializerEmbeddingValueDeserializer
to handle the deserialization ofEmbeddingValue
objects from JSON, supporting both float arrays and Base64 strings.Default encoding format:
openai-java-core/src/main/kotlin/com/openai/models/embeddings/EmbeddingCreateParams.kt
: Set the defaultEncodingFormat
toBASE64
for performance improvements.Test updates:
openai-java-core/src/test/kotlin/com/openai/models/embeddings/CreateEmbeddingResponseTest.kt
andopenai-java-core/src/test/kotlin/com/openai/models/embeddings/EmbeddingTest.kt
: Updated test cases to accommodate the changes in theEmbedding
class and the introduction ofEmbeddingValue
. [1] [2] [3]This code will run look like following Java code.
This PR code will run with look like following code style.