Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Elastic Inference Service] Add ElasticInferenceService Unified ChatCompletions Integration #118871

Merged
merged 67 commits into from
Jan 8, 2025

Conversation

jaybcee
Copy link
Member

@jaybcee jaybcee commented Dec 17, 2024

Parent PR: #118301

We need to call EIS via Elasticsearch. This PR implements the functionality.

Testing

Run via

1. `./gradlew localDistro`
2. `cd build/distribution/local/elasticsearch-9.0.0-SNAPSHOT`
3. `./bin/elasticsearch -E xpack.inference.elastic.url=https://localhost:8443 -E xpack.inference.elastic.http.ssl.verification_mode=none -E xpack.security.enabled=false -E xpack.security.enrollment.enabled=false`
  1. Create endpoint via
curl --location --request PUT 'http://localhost:9200/_inference/completion/test' \
--header 'Content-Type: application/json' \
--data '{
    "service": "elastic",
    "service_settings": {
        "model_id": "elastic-model"
    }
}' -k
  1. We eventually expect to have a default endpoint.
  2. The model name is a bit of a placeholder for now its unclear to me what we expose. In any case its trivial. We have an external to internal mapping.

It returns

{
    "inference_id": "test",
    "task_type": "completion",
    "service": "elastic",
    "service_settings": {
        "model_id": "elastic-model",
        "rate_limit": {
            "requests_per_minute": 1000
        }
    }
}

Then we perform inference via

curl --location 'http://localhost:9200/_inference/completion/test/_unified' \
--header 'Content-Type: application/json' \
--data '{
    "messages": [
        {
            "role": "user",
            "content": "In only two digits and nothing else, what is the meaning of life?"
        }
    ],
    "model" : "elastic-model",
    "temperature": 0.7,
    "max_completion_tokens": 300
}' -k 

Returns

curl --location 'http://localhost:9200/_inference/completion/test/_unified' \
--header 'Content-Type: application/json' \
--data '{
    "messages": [
        {
            "role": "user",
            "content": "In only two digits and nothing else, what is the meaning of life?"
        }
    ],
    "model" : "elastic-model",
    "temperature": 0.7,
    "max_completion_tokens": 300
}' -k
event: message
data: {"id":"unified-a52c5569-6fca-48dd-9a03-cf6b2d999995","choices":[{"delta":{"role":"assistant"},"index":0}],"model":"elastic-model","object":"chat.completion.chunk"}

event: message
data: {"id":"unified-a52c5569-6fca-48dd-9a03-cf6b2d999995","choices":[{"delta":{"content":"42"},"index":0}],"model":"elastic-model","object":"chat.completion.chunk"}

event: message
data: {"id":"unified-a52c5569-6fca-48dd-9a03-cf6b2d999995","choices":[{"delta":{},"index":0}],"model":"elastic-model","object":"chat.completion.chunk"}

event: message
data: {"id":"unified-a52c5569-6fca-48dd-9a03-cf6b2d999995","choices":[{"delta":{},"finish_reason":"stop","index":0}],"model":"elastic-model","object":"chat.completion.chunk"}

event: message
data: {"id":"unified-a52c5569-6fca-48dd-9a03-cf6b2d999995","choices":[{"delta":{},"index":0}],"model":"elastic-model","object":"chat.completion.chunk","usage":{"completion_tokens":4,"prompt_tokens":22,"total_tokens":26}}

event: message
data: [DONE]

Copy link
Member Author

@jaybcee jaybcee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fortunately this worked mostly out of the box. I had so change EIS a bit to reflect the SSE.

https://github.com/elastic/eis-gateway/pull/207

It sends the response with a data prefix.

Did we want to implement more tests?

return new URI(elasticInferenceServiceComponents().elasticInferenceServiceUrl() + "/api/v1/chat/completions");
}

// TODO create the Configuration class?
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jonathan-buttner

Can you explain why you had this TODO? I'm not sure what it brings.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a follow up, I think we can address this after we merge this PR. Maybe create an issue so we don't forget it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


public static final String NAME = "elastic_inference_service_completion_service_settings";

// TODO what value do we put here?
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@timgrein , do you have any suggestion? I'm not up to speed on the state of rate limiting.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question, I guess we could use the default from bedrock for now?

Copy link
Member Author

@jaybcee jaybcee Dec 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depends on the environment and quota set... We should leave it as is for now unless any objection. Is it ok to leave the TODO? I'll drop a note in the ES integration issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put it to 240 for now, but, a customers quota and our shared quota can be different. In any case rate limiting is mildly opaque to me. This is a good enough number for now.

public static ElasticInferenceServiceCompletionServiceSettings fromMap(Map<String, Object> map, ConfigurationParseContext context) {
ValidationException validationException = new ValidationException();

// TODO does EIS have this?
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@timgrein, same thing, do we want limit per model at all?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean rate limit grouping per model? Not yet, I think we'll group on project ids first. When ELSER is available on EIS we can additionally group by model.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was not clear. I meant in the context of ES. Or did you mean we should rate limit on project id within ES?

private static final String ROLE = "user";
private static final String USER = "a_user";

// TODO remove if EIS doesn't use the model and user fields
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maxhniebergall, we need the model. The user field is a bit ambiguous. Do we set it and ignore it or should we stop sending it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's discuss at the inference sync tomorrow

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we'll get rid of it for now. It's available for some Bedrock models but it has to passed in an odd way. I'll remove the references to it in the code as well.

As for its usage, I don't think we use it in a meaningful way. My brief Googling shows that its useful for the provider to identify one of your users who is "jailbreaking" the LLM should you get suspended.

@jaybcee jaybcee marked this pull request as ready for review December 19, 2024 02:26
@elasticsearchmachine elasticsearchmachine added the needs:triage Requires assignment of a team area label label Dec 19, 2024
@jaybcee jaybcee added the :SearchOrg/Inference Label for the Search Inference team label Dec 19, 2024
@elasticsearchmachine elasticsearchmachine added Team:SearchOrg Meta label for the Search Org (Enterprise Search) Team:Search - Inference labels Dec 19, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/search-inference-team (Team:Search - Inference)

@elasticsearchmachine elasticsearchmachine removed the needs:triage Requires assignment of a team area label label Dec 19, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/search-eng (Team:SearchOrg)

jaybcee and others added 7 commits January 7, 2025 09:30
…inference/services/elastic/ElasticInferenceServiceSparseEmbeddingsModel.java

Co-authored-by: Tim Grein <[email protected]>
…inference/services/elastic/completion/EISCompletionServiceSettingsTests.java

Co-authored-by: Tim Grein <[email protected]>
…inference/services/elastic/completion/EISCompletionModelTests.java

Co-authored-by: Tim Grein <[email protected]>
Copy link
Contributor

@timgrein timgrein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we start to use a common prefix for PRs for EIS to make it easier to grep through PRs/commits? Something like [EIS] or [Elastic Inference Service]? We could also simply use [Inference API] - I think that's how we do it for the other integrations..as elastic is just another provider it could make sense to stick to one common prefix.

@@ -0,0 +1,5 @@
pr: 118871
summary: "Add EIS Unified `ChatCompletions` Integration"
Copy link
Contributor

@timgrein timgrein Jan 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also use Elastic Inference Service here? We could also use Elastic Inference Service (EIS). I think this will land in the changelog, which is often read by customers AFAIK, so probably better to be a bit more explicit.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Vote for [Elastic Inference Service] (maybe a tag is better long term?)

@jaybcee jaybcee changed the title EIS Unified ChatCompletions Integration [Elastic Inference Service] Add ElasticInferenceService Unified ChatCompletions Integration Jan 7, 2025
@jaybcee jaybcee requested a review from timgrein January 7, 2025 16:33
Copy link
Contributor

@jonathan-buttner jonathan-buttner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes! 🚢

);
} catch (URISyntaxException e) {
throw new ElasticsearchStatusException(
"Failed to create URI for sparse embeddings service: " + e.getMessage(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Tim is referring to the service name here: https://github.com/elastic/elasticsearch/blob/main/x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/elastic/ElasticInferenceService.java#L65

So saying something like:

Strings.format(Failed to create URI for spare embeddings for service %s: %s, NAME, e.getMessage())

Copy link
Contributor

@maxhniebergall maxhniebergall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks Jason!

@@ -74,7 +74,8 @@ public void execute(
private static ResponseHandler createCompletionHandler() {
return new ElasticInferenceServiceUnifiedChatCompletionResponseHandler(
"elastic inference service completion",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this string with spaces in it correct? Seems like a bit of a weird value. Normally I think our non-error-message strings uses underscores instead of spaces. Definitely just a nit though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The OpenAI one does the same thing. Not sure whats best, but I think they should be consistent. I'll keep this for now.

Copy link
Contributor

@timgrein timgrein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the changes 🚢

Comment on lines -117 to -132
public void testParseRequestConfig_ThrowsUnsupportedModelType() throws IOException {
try (var service = createServiceWithMockSender()) {
var failureListener = getModelListenerForException(
ElasticsearchStatusException.class,
"The [elastic] service does not support task type [completion]"
);

service.parseRequestConfig(
"id",
TaskType.COMPLETION,
getRequestConfigMap(Map.of(ServiceFields.MODEL_ID, ElserModels.ELSER_V2_MODEL), Map.of(), Map.of()),
failureListener
);
}
}

Copy link
Member Author

@jaybcee jaybcee Jan 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing this for now @jonathan-buttner, feels like we should move the configs where they belong. Too tightly coupled (and somewhat incorrect). Lmk if thats ok. I'll merge otherwise, the merges from main are catching up to me haha.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep this looks good since we support the completion task type now 👍

@jaybcee jaybcee enabled auto-merge (squash) January 8, 2025 19:33
@jaybcee jaybcee merged commit 18345c4 into main Jan 8, 2025
17 checks passed
@jaybcee jaybcee deleted the ml-eis-integration-jbc branch January 8, 2025 19:33
@jonathan-buttner
Copy link
Contributor

💚 All backports created successfully

Status Branch Result
8.x

Questions ?

Please refer to the Backport tool documentation

jonathan-buttner pushed a commit to jonathan-buttner/elasticsearch that referenced this pull request Jan 14, 2025
…ompletions Integration (elastic#118871)

(cherry picked from commit 18345c4)

# Conflicts:
#	server/src/main/java/org/elasticsearch/TransportVersions.java
#	x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/external/request/openai/OpenAiUnifiedChatCompletionRequestEntity.java
#	x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/elastic/ElasticInferenceServiceSparseEmbeddingsModel.java
jonathan-buttner added a commit that referenced this pull request Jan 15, 2025
… ChatCompletions Integration (#118871) (#120136)

* [Elastic Inference Service] Add ElasticInferenceService Unified ChatCompletions Integration (#118871)

(cherry picked from commit 18345c4)

# Conflicts:
#	server/src/main/java/org/elasticsearch/TransportVersions.java
#	x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/external/request/openai/OpenAiUnifiedChatCompletionRequestEntity.java
#	x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/elastic/ElasticInferenceServiceSparseEmbeddingsModel.java

* Fixing switch case issue

---------

Co-authored-by: Jason Botzas-Coluni <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement Feature:GenAI Features around GenAI :SearchOrg/Inference Label for the Search Inference team Team:Search - Inference Team:SearchOrg Meta label for the Search Org (Enterprise Search) v8.18.0 v9.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants