-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Elastic Inference Service] Add ElasticInferenceService Unified ChatCompletions Integration #118871
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fortunately this worked mostly out of the box. I had so change EIS a bit to reflect the SSE.
https://github.com/elastic/eis-gateway/pull/207
It sends the response with a data prefix.
Did we want to implement more tests?
return new URI(elasticInferenceServiceComponents().elasticInferenceServiceUrl() + "/api/v1/chat/completions"); | ||
} | ||
|
||
// TODO create the Configuration class? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain why you had this TODO? I'm not sure what it brings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Jason and I talked about this offline. Basically we need to add a Configuration
class like this: https://github.com/elastic/elasticsearch/blob/main/x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/openai/completion/OpenAiChatCompletionModel.java#L127
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a follow up, I think we can address this after we merge this PR. Maybe create an issue so we don't forget it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
public static final String NAME = "elastic_inference_service_completion_service_settings"; | ||
|
||
// TODO what value do we put here? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@timgrein , do you have any suggestion? I'm not up to speed on the state of rate limiting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question, I guess we could use the default from bedrock for now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It depends on the environment and quota set... We should leave it as is for now unless any objection. Is it ok to leave the TODO? I'll drop a note in the ES integration issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the other bedrock service settings for chat completion it's like 240: https://github.com/elastic/elasticsearch/blob/32ddbb3449d19a0970b96eefe960d4ab006357fc/x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/amazonbedrock/AmazonBedrockServiceSettings.java
Maybe we lower it to something closer to that 🤷♂️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I put it to 240 for now, but, a customers quota and our shared quota can be different. In any case rate limiting is mildly opaque to me. This is a good enough number for now.
public static ElasticInferenceServiceCompletionServiceSettings fromMap(Map<String, Object> map, ConfigurationParseContext context) { | ||
ValidationException validationException = new ValidationException(); | ||
|
||
// TODO does EIS have this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@timgrein, same thing, do we want limit per model at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean rate limit grouping per model? Not yet, I think we'll group on project ids first. When ELSER is available on EIS we can additionally group by model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was not clear. I meant in the context of ES. Or did you mean we should rate limit on project id within ES?
private static final String ROLE = "user"; | ||
private static final String USER = "a_user"; | ||
|
||
// TODO remove if EIS doesn't use the model and user fields |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@maxhniebergall, we need the model. The user field is a bit ambiguous. Do we set it and ignore it or should we stop sending it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's discuss at the inference sync tomorrow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like we'll get rid of it for now. It's available for some Bedrock models but it has to passed in an odd way. I'll remove the references to it in the code as well.
As for its usage, I don't think we use it in a meaningful way. My brief Googling shows that its useful for the provider to identify one of your users who is "jailbreaking" the LLM should you get suspended.
Pinging @elastic/search-inference-team (Team:Search - Inference) |
Pinging @elastic/search-eng (Team:SearchOrg) |
…inference/services/elastic/ElasticInferenceServiceSparseEmbeddingsModel.java Co-authored-by: Tim Grein <[email protected]>
…inference/services/elastic/completion/EISCompletionServiceSettingsTests.java Co-authored-by: Tim Grein <[email protected]>
…inference/services/elastic/completion/EISCompletionModelTests.java Co-authored-by: Tim Grein <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we start to use a common prefix for PRs for EIS to make it easier to grep through PRs/commits? Something like [EIS]
or [Elastic Inference Service]
? We could also simply use [Inference API]
- I think that's how we do it for the other integrations..as elastic
is just another provider it could make sense to stick to one common prefix.
docs/changelog/118871.yaml
Outdated
@@ -0,0 +1,5 @@ | |||
pr: 118871 | |||
summary: "Add EIS Unified `ChatCompletions` Integration" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also use Elastic Inference Service
here? We could also use Elastic Inference Service (EIS)
. I think this will land in the changelog, which is often read by customers AFAIK, so probably better to be a bit more explicit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Vote for [Elastic Inference Service] (maybe a tag is better long term?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the changes! 🚢
); | ||
} catch (URISyntaxException e) { | ||
throw new ElasticsearchStatusException( | ||
"Failed to create URI for sparse embeddings service: " + e.getMessage(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think Tim is referring to the service name here: https://github.com/elastic/elasticsearch/blob/main/x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/elastic/ElasticInferenceService.java#L65
So saying something like:
Strings.format(Failed to create URI for spare embeddings for service %s: %s, NAME, e.getMessage())
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks Jason!
@@ -74,7 +74,8 @@ public void execute( | |||
private static ResponseHandler createCompletionHandler() { | |||
return new ElasticInferenceServiceUnifiedChatCompletionResponseHandler( | |||
"elastic inference service completion", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this string with spaces in it correct? Seems like a bit of a weird value. Normally I think our non-error-message strings uses underscores instead of spaces. Definitely just a nit though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The OpenAI one does the same thing. Not sure whats best, but I think they should be consistent. I'll keep this for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for the changes 🚢
public void testParseRequestConfig_ThrowsUnsupportedModelType() throws IOException { | ||
try (var service = createServiceWithMockSender()) { | ||
var failureListener = getModelListenerForException( | ||
ElasticsearchStatusException.class, | ||
"The [elastic] service does not support task type [completion]" | ||
); | ||
|
||
service.parseRequestConfig( | ||
"id", | ||
TaskType.COMPLETION, | ||
getRequestConfigMap(Map.of(ServiceFields.MODEL_ID, ElserModels.ELSER_V2_MODEL), Map.of(), Map.of()), | ||
failureListener | ||
); | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing this for now @jonathan-buttner, feels like we should move the configs where they belong. Too tightly coupled (and somewhat incorrect). Lmk if thats ok. I'll merge otherwise, the merges from main are catching up to me haha.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep this looks good since we support the completion
task type now 👍
💚 All backports created successfully
Questions ?Please refer to the Backport tool documentation |
…ompletions Integration (elastic#118871) (cherry picked from commit 18345c4) # Conflicts: # server/src/main/java/org/elasticsearch/TransportVersions.java # x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/external/request/openai/OpenAiUnifiedChatCompletionRequestEntity.java # x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/elastic/ElasticInferenceServiceSparseEmbeddingsModel.java
… ChatCompletions Integration (#118871) (#120136) * [Elastic Inference Service] Add ElasticInferenceService Unified ChatCompletions Integration (#118871) (cherry picked from commit 18345c4) # Conflicts: # server/src/main/java/org/elasticsearch/TransportVersions.java # x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/external/request/openai/OpenAiUnifiedChatCompletionRequestEntity.java # x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/elastic/ElasticInferenceServiceSparseEmbeddingsModel.java * Fixing switch case issue --------- Co-authored-by: Jason Botzas-Coluni <[email protected]>
Parent PR: #118301
We need to call EIS via Elasticsearch. This PR implements the functionality.
Testing
Run via
It returns
Then we perform inference via
Returns