[Elastic Inference Service] Add ElasticInferenceService Unified ChatCompletions Integration #118871

jaybcee · 2024-12-17T17:37:16Z

Parent PR: #118301

We need to call EIS via Elasticsearch. This PR implements the functionality.

Testing

Run via

1. `./gradlew localDistro`
2. `cd build/distribution/local/elasticsearch-9.0.0-SNAPSHOT`
3. `./bin/elasticsearch -E xpack.inference.elastic.url=https://localhost:8443 -E xpack.inference.elastic.http.ssl.verification_mode=none -E xpack.security.enabled=false -E xpack.security.enrollment.enabled=false`

Create endpoint via

curl --location --request PUT 'http://localhost:9200/_inference/completion/test' \
--header 'Content-Type: application/json' \
--data '{
    "service": "elastic",
    "service_settings": {
        "model_id": "elastic-model"
    }
}' -k

We eventually expect to have a default endpoint.
The model name is a bit of a placeholder for now its unclear to me what we expose. In any case its trivial. We have an external to internal mapping.

It returns

{
    "inference_id": "test",
    "task_type": "completion",
    "service": "elastic",
    "service_settings": {
        "model_id": "elastic-model",
        "rate_limit": {
            "requests_per_minute": 1000
        }
    }
}

Then we perform inference via

curl --location 'http://localhost:9200/_inference/completion/test/_unified' \
--header 'Content-Type: application/json' \
--data '{
    "messages": [
        {
            "role": "user",
            "content": "In only two digits and nothing else, what is the meaning of life?"
        }
    ],
    "model" : "elastic-model",
    "temperature": 0.7,
    "max_completion_tokens": 300
}' -k

Returns

curl --location 'http://localhost:9200/_inference/completion/test/_unified' \
--header 'Content-Type: application/json' \
--data '{
    "messages": [
        {
            "role": "user",
            "content": "In only two digits and nothing else, what is the meaning of life?"
        }
    ],
    "model" : "elastic-model",
    "temperature": 0.7,
    "max_completion_tokens": 300
}' -k
event: message
data: {"id":"unified-a52c5569-6fca-48dd-9a03-cf6b2d999995","choices":[{"delta":{"role":"assistant"},"index":0}],"model":"elastic-model","object":"chat.completion.chunk"}

event: message
data: {"id":"unified-a52c5569-6fca-48dd-9a03-cf6b2d999995","choices":[{"delta":{"content":"42"},"index":0}],"model":"elastic-model","object":"chat.completion.chunk"}

event: message
data: {"id":"unified-a52c5569-6fca-48dd-9a03-cf6b2d999995","choices":[{"delta":{},"index":0}],"model":"elastic-model","object":"chat.completion.chunk"}

event: message
data: {"id":"unified-a52c5569-6fca-48dd-9a03-cf6b2d999995","choices":[{"delta":{},"finish_reason":"stop","index":0}],"model":"elastic-model","object":"chat.completion.chunk"}

event: message
data: {"id":"unified-a52c5569-6fca-48dd-9a03-cf6b2d999995","choices":[{"delta":{},"index":0}],"model":"elastic-model","object":"chat.completion.chunk","usage":{"completion_tokens":4,"prompt_tokens":22,"total_tokens":26}}

event: message
data: [DONE]

… entity

…ntegration

jaybcee

Fortunately this worked mostly out of the box. I had so change EIS a bit to reflect the SSE.

https://github.com/elastic/eis-gateway/pull/207

It sends the response with a data prefix.

Did we want to implement more tests?

jaybcee · 2024-12-18T21:33:59Z

...arch/xpack/inference/services/elastic/completion/ElasticInferenceServiceCompletionModel.java

+        return new URI(elasticInferenceServiceComponents().elasticInferenceServiceUrl() + "/api/v1/chat/completions");
+    }
+
+    // TODO create the Configuration class?


@jonathan-buttner

Can you explain why you had this TODO? I'm not sure what it brings.

Jason and I talked about this offline. Basically we need to add a Configuration class like this: https://github.com/elastic/elasticsearch/blob/main/x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/openai/completion/OpenAiChatCompletionModel.java#L127

Just a follow up, I think we can address this after we merge this PR. Maybe create an issue so we don't forget it.

Issue here: https://github.com/elastic/search-team/issues/8997

jaybcee · 2024-12-18T21:34:24Z

.../inference/services/elastic/completion/ElasticInferenceServiceCompletionServiceSettings.java

+
+    public static final String NAME = "elastic_inference_service_completion_service_settings";
+
+    // TODO what value do we put here?


@timgrein , do you have any suggestion? I'm not up to speed on the state of rate limiting.

Good question, I guess we could use the default from bedrock for now?

It depends on the environment and quota set... We should leave it as is for now unless any objection. Is it ok to leave the TODO? I'll drop a note in the ES integration issue.

For the other bedrock service settings for chat completion it's like 240: https://github.com/elastic/elasticsearch/blob/32ddbb3449d19a0970b96eefe960d4ab006357fc/x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/amazonbedrock/AmazonBedrockServiceSettings.java

Maybe we lower it to something closer to that 🤷‍♂️

I put it to 240 for now, but, a customers quota and our shared quota can be different. In any case rate limiting is mildly opaque to me. This is a good enough number for now.

jaybcee · 2024-12-18T21:34:58Z

.../inference/services/elastic/completion/ElasticInferenceServiceCompletionServiceSettings.java

+    public static ElasticInferenceServiceCompletionServiceSettings fromMap(Map<String, Object> map, ConfigurationParseContext context) {
+        ValidationException validationException = new ValidationException();
+
+        // TODO does EIS have this?


@timgrein, same thing, do we want limit per model at all?

Do you mean rate limit grouping per model? Not yet, I think we'll group on project ids first. When ELSER is available on EIS we can additionally group by model.

I was not clear. I meant in the context of ES. Or did you mean we should rate limit on project id within ES?

jaybcee · 2024-12-18T21:36:46Z

...rch/xpack/inference/external/request/elastic/EISUnifiedChatCompletionRequestEntityTests.java

+    private static final String ROLE = "user";
+    private static final String USER = "a_user";
+
+    // TODO remove if EIS doesn't use the model and user fields


@maxhniebergall, we need the model. The user field is a bit ambiguous. Do we set it and ignore it or should we stop sending it?

Let's discuss at the inference sync tomorrow

Looks like we'll get rid of it for now. It's available for some Bedrock models but it has to passed in an odd way. I'll remove the references to it in the code as well.

As for its usage, I don't think we use it in a meaningful way. My brief Googling shows that its useful for the provider to identify one of your users who is "jailbreaking" the LLM should you get suspended.

elasticsearchmachine · 2024-12-19T02:27:26Z

Pinging @elastic/search-inference-team (Team:Search - Inference)

elasticsearchmachine · 2024-12-19T02:27:27Z

Pinging @elastic/search-eng (Team:SearchOrg)

…inference/services/elastic/ElasticInferenceServiceSparseEmbeddingsModel.java Co-authored-by: Tim Grein <[email protected]>

…inference/services/elastic/completion/EISCompletionServiceSettingsTests.java Co-authored-by: Tim Grein <[email protected]>

…inference/services/elastic/completion/EISCompletionModelTests.java Co-authored-by: Tim Grein <[email protected]>

timgrein

Should we start to use a common prefix for PRs for EIS to make it easier to grep through PRs/commits? Something like [EIS] or [Elastic Inference Service]? We could also simply use [Inference API] - I think that's how we do it for the other integrations..as elastic is just another provider it could make sense to stick to one common prefix.

timgrein · 2025-01-07T15:30:39Z

docs/changelog/118871.yaml

@@ -0,0 +1,5 @@
+pr: 118871
+summary: "Add EIS Unified `ChatCompletions` Integration"


Should we also use Elastic Inference Service here? We could also use Elastic Inference Service (EIS). I think this will land in the changelog, which is often read by customers AFAIK, so probably better to be a bit more explicit.

Vote for [Elastic Inference Service] (maybe a tag is better long term?)

jonathan-buttner

Thanks for the changes! 🚢

jonathan-buttner · 2025-01-07T15:25:54Z

...ticsearch/xpack/inference/services/elastic/ElasticInferenceServiceSparseEmbeddingsModel.java

+            );
+        } catch (URISyntaxException e) {
+            throw new ElasticsearchStatusException(
+                "Failed to create URI for sparse embeddings service: " + e.getMessage(),


I think Tim is referring to the service name here: https://github.com/elastic/elasticsearch/blob/main/x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/elastic/ElasticInferenceService.java#L65

So saying something like:

Strings.format(Failed to create URI for spare embeddings for service %s: %s, NAME, e.getMessage())

maxhniebergall

LGTM, thanks Jason!

maxhniebergall · 2025-01-07T18:47:52Z

...k/inference/external/http/sender/ElasticInferenceServiceUnifiedCompletionRequestManager.java

@@ -74,7 +74,8 @@ public void execute(
    private static ResponseHandler createCompletionHandler() {
        return new ElasticInferenceServiceUnifiedChatCompletionResponseHandler(
            "elastic inference service completion",


Is this string with spaces in it correct? Seems like a bit of a weird value. Normally I think our non-error-message strings uses underscores instead of spaces. Definitely just a nit though.

The OpenAI one does the same thing. Not sure whats best, but I think they should be consistent. I'll keep this for now.

timgrein

LGTM, thanks for the changes 🚢

jaybcee · 2025-01-08T18:31:05Z

...st/java/org/elasticsearch/xpack/inference/services/elastic/ElasticInferenceServiceTests.java

-    public void testParseRequestConfig_ThrowsUnsupportedModelType() throws IOException {
-        try (var service = createServiceWithMockSender()) {
-            var failureListener = getModelListenerForException(
-                ElasticsearchStatusException.class,
-                "The [elastic] service does not support task type [completion]"
-            );
-
-            service.parseRequestConfig(
-                "id",
-                TaskType.COMPLETION,
-                getRequestConfigMap(Map.of(ServiceFields.MODEL_ID, ElserModels.ELSER_V2_MODEL), Map.of(), Map.of()),
-                failureListener
-            );
-        }
-    }
-


Removing this for now @jonathan-buttner, feels like we should move the configs where they belong. Too tightly coupled (and somewhat incorrect). Lmk if thats ok. I'll merge otherwise, the merges from main are catching up to me haha.

Yep this looks good since we support the completion task type now 👍

jonathan-buttner · 2025-01-14T17:36:07Z

💚 All backports created successfully

Status	Branch	Result
✅	8.x

Questions ?

Please refer to the Backport tool documentation

…ompletions Integration (elastic#118871) (cherry picked from commit 18345c4) # Conflicts: # server/src/main/java/org/elasticsearch/TransportVersions.java # x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/external/request/openai/OpenAiUnifiedChatCompletionRequestEntity.java # x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/elastic/ElasticInferenceServiceSparseEmbeddingsModel.java

… ChatCompletions Integration (#118871) (#120136) * [Elastic Inference Service] Add ElasticInferenceService Unified ChatCompletions Integration (#118871) (cherry picked from commit 18345c4) # Conflicts: # server/src/main/java/org/elasticsearch/TransportVersions.java # x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/external/request/openai/OpenAiUnifiedChatCompletionRequestEntity.java # x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/elastic/ElasticInferenceServiceSparseEmbeddingsModel.java * Fixing switch case issue --------- Co-authored-by: Jason Botzas-Coluni <[email protected]>

jonathan-buttner and others added 10 commits December 6, 2024 16:02

Starting completion model

ccec39b

Adding model

467747f

initial implementation of request and response handling, manager, and…

69ba46d

… entity

Working response from openai

39e2c27

Update docs/changelog/118301.yaml

7984b69

Fixing comment

be588f4

Adding some initial tests

38a58f9

Merge branch 'main' of github.com:elastic/elasticsearch into ml-eis-i…

cad6f1e

…ntegration

Moving tests around

2e4fb05

Merge branch 'main' into ml-eis-integration

1c0ab90

elasticsearchmachine added the v9.0.0 label Dec 17, 2024

jaybcee and others added 5 commits December 18, 2024 16:22

Address some TODOs

1abe7e6

Merge branch 'main' into ml-eis-integration-jbc

8e47f34

Merge branch 'main' into ml-eis-integration-jbc

9840c62

Remove a TODO

cfd7580

[CI] Auto commit changes from spotless

4fb6930

jaybcee commented Dec 18, 2024

View reviewed changes

jaybcee added 4 commits December 18, 2024 18:09

Merge branch 'main' into ml-eis-integration-jbc

6a3d916

Fix tests

2730017

Merge branch 'main' into ml-eis-integration-jbc

729be3d

Fix more tests

ab979a1

jaybcee requested a review from jonathan-buttner December 19, 2024 02:26

jaybcee marked this pull request as ready for review December 19, 2024 02:26

elasticsearchmachine added the needs:triage Requires assignment of a team area label label Dec 19, 2024

jaybcee added the :SearchOrg/Inference Label for the Search Inference team label Dec 19, 2024

elasticsearchmachine added Team:SearchOrg Meta label for the Search Org (Enterprise Search) Team:Search - Inference labels Dec 19, 2024

elasticsearchmachine removed the needs:triage Requires assignment of a team area label label Dec 19, 2024

jaybcee and others added 7 commits January 7, 2025 09:30

Update x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/…

eefcecf

…inference/services/elastic/ElasticInferenceServiceSparseEmbeddingsModel.java Co-authored-by: Tim Grein <[email protected]>

Update x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/…

567c54e

…inference/services/elastic/completion/EISCompletionServiceSettingsTests.java Co-authored-by: Tim Grein <[email protected]>

Update x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/…

01b60d6

…inference/services/elastic/completion/EISCompletionModelTests.java Co-authored-by: Tim Grein <[email protected]>

[CI] Auto commit changes from spotless

503fb4b

Merge branch 'main' into ml-eis-integration-jbc

44f9f8d

Address comments

0419ab9

[CI] Auto commit changes from spotless

e9afd72

timgrein reviewed Jan 7, 2025

View reviewed changes

jaybcee changed the title ~~EIS Unified ChatCompletions Integration~~ [Elastic Inference Service] Add ElasticInferenceService Unified ChatCompletions Integration Jan 7, 2025

jaybcee added 3 commits January 7, 2025 11:20

Update changelog

f105279

Better error messages

a9dc184

Merge branch 'main' into ml-eis-integration-jbc

343519d

jaybcee requested a review from timgrein January 7, 2025 16:33

jonathan-buttner approved these changes Jan 7, 2025

View reviewed changes

maxhniebergall mentioned this pull request Jan 7, 2025

[Inference API] Default eis endpoint #119694

Closed

maxhniebergall approved these changes Jan 7, 2025

View reviewed changes

jonathan-buttner mentioned this pull request Jan 7, 2025

[ML] EIS Unified chat completions integration #118301

Closed

timgrein approved these changes Jan 8, 2025

View reviewed changes

jaybcee added 2 commits January 8, 2025 10:33

Merge branch 'main' into ml-eis-integration-jbc

7c8b1b3

Merge branch 'main' into ml-eis-integration-jbc

f48eb06

jaybcee commented Jan 8, 2025

View reviewed changes

[CI] Auto commit changes from spotless

e014357

jaybcee enabled auto-merge (squash) January 8, 2025 19:33

jaybcee merged commit 18345c4 into main Jan 8, 2025
17 checks passed

jaybcee deleted the ml-eis-integration-jbc branch January 8, 2025 19:33

jonathan-buttner added the v8.18.0 label Jan 14, 2025

jonathan-buttner mentioned this pull request Jan 14, 2025

[8.x] [Elastic Inference Service] Add ElasticInferenceService Unified ChatCompletions Integration (#118871) #120136

Merged


		public static final String NAME = "elastic_inference_service_completion_service_settings";

		// TODO what value do we put here?

		@@ -0,0 +1,5 @@
		pr: 118871
		summary: "Add EIS Unified `ChatCompletions` Integration"

[Elastic Inference Service] Add ElasticInferenceService Unified ChatCompletions Integration #118871

[Elastic Inference Service] Add ElasticInferenceService Unified ChatCompletions Integration #118871

Uh oh!

Conversation

jaybcee commented Dec 17, 2024 • edited by maxhniebergall Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jaybcee left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jaybcee Dec 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

elasticsearchmachine commented Dec 19, 2024

Uh oh!

elasticsearchmachine commented Dec 19, 2024

Uh oh!

timgrein left a comment

Choose a reason for hiding this comment

Uh oh!

timgrein Jan 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jonathan-buttner left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

maxhniebergall left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

timgrein left a comment

Choose a reason for hiding this comment

Uh oh!

jaybcee Jan 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jonathan-buttner commented Jan 14, 2025

💚 All backports created successfully

Questions ?

jaybcee commented Dec 17, 2024 •

edited by maxhniebergall

Loading

jaybcee Dec 19, 2024 •

edited

Loading

timgrein Jan 7, 2025 •

edited

Loading

jaybcee Jan 8, 2025 •

edited

Loading