Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Inference API] Default eis endpoint #119694

Closed
wants to merge 46 commits into from
Closed

Conversation

maxhniebergall
Copy link
Contributor

@maxhniebergall maxhniebergall commented Jan 7, 2025

This PR adds the first iteration model id for the elastic inference service.

Model id: rainbow-sprinkles

The default endpoint id: .rainbow-sprinkles-elastic

@elasticsearchmachine elasticsearchmachine added the Team:ML Meta label for the ML team label Jan 7, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

@elasticsearchmachine
Copy link
Collaborator

Hi @maxhniebergall, I've created a changelog YAML for you.

@maxhniebergall maxhniebergall marked this pull request as draft January 7, 2025 20:39
@jonathan-buttner jonathan-buttner added >non-issue auto-backport Automatically create backport pull requests when merged and removed >enhancement labels Jan 10, 2025
@elasticsearchmachine
Copy link
Collaborator

@maxhniebergall please enable the option "Allow edits and access to secrets by maintainers" on your PR. For more information, see the documentation.

Comment on lines +184 to +194
if (DEFAULT_EIS_ENDPOINT_IDS.contains(inferenceEntityId)) {
parsedModelListener.onFailure(
new ElasticsearchStatusException(
"[{}] is a reserved inference Id. Cannot create a new inference endpoint with a reserved Id",
RestStatus.BAD_REQUEST,
inferenceEntityId
)
);
return;
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing wrong here, but I don't quite understand whats going on. Can you explain

  1. What this is loosely?
  2. How could this happen?
  3. What is the result of it happening? Does ES fail to boot?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah sure:

What this is loosely?

During a PUT request to persist a new inference endpoint, this could is checking to see if the inferenceEntityId that the user is requesting for the new endpoint matches one of the default ones we've reserved. We don't allow that because it could cause clashes and we wouldn't know how to handle inference requests against that inference endpoint.

How could this happen?

This can happen when a user is creating a new inference endpoint like this:

PUT http://localhost:9200/_inference/.eis-alpha-1
{
    "service": "elastic",
    "task_type": "completion",
    "service_settings": {
         ...
    }
}

It's protecting us from a user trying to use a reserved value for their inference endpoint id

What is the result of it happening? Does ES fail to boot?

The user's PUT request will result in a 400 error. I don't believe this code path would be executed during boot up.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Let's move this check to TransportPutInferenceModelAction then the check will apply to all the services.

ModelRegistry knows about the default Ids via the addDefaultIds(defaultIds) function which is called by the InferencePlugin at node start up.

In TransportPutInferenceModelAction#masterOperation the same check can be made by querying ModelRegistry

@jonathan-buttner
Copy link
Contributor

@elasticmachine test this please

@jonathan-buttner jonathan-buttner marked this pull request as ready for review January 14, 2025 13:35
@jonathan-buttner
Copy link
Contributor

@elasticmachine test this please

private static final EnumSet<TaskType> supportedTaskTypes = EnumSet.of(TaskType.SPARSE_EMBEDDING, TaskType.COMPLETION);
private static final String SERVICE_NAME = "Elastic";
private static final String DEFAULT_EIS_CHAT_COMPLETION_MODEL_ID_V1 = "rainbow-sprinkles";
private static final String DEFAULT_EIS_CHAT_COMPLETION_ENDPOINT_ID_V1 = ".eis-alpha-1";
Copy link
Member

@joshdevins joshdevins Jan 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we name this the same way we name other default endpoints? I think we settled on modelId-providerName which would mean: .rainbow-sprinkles-elastic?

return List.of(v1DefaultCompletionModel());
}

private ElasticInferenceServiceCompletionModel v1DefaultCompletionModel() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this initialization of the default endpoint conditional on the health check? We want to add an ACL in the gateway which will dynamically control if EIS is available or not. In the case that this customer does not have access to EIS, will it skip creating the default endpoint?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a great point. The current implementation does not consider that. I believe I also read that individual models might be available separately for different customers. Which means that we'd need EIS enabled for the customer, and this particular model.

@timgrein @maxjakob @jaybcee

Since it sounds like we're going to keep the information in memory to determine what is available to this cluster, I wonder if we could pass that information along via the ElasticInferenceServiceComponents in the constructor 🤔 . Although if we ever go with a caching approach then the default inference endpoints should attempt to be created on each call to retrieve the defaults 🤔

I think we should postpone merging this PR until the logic is merged to handle determine if EIS is available and which models are available.

@jonathan-buttner
Copy link
Contributor

Closing this in favor of this PR: #120847

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport Automatically create backport pull requests when merged :ml Machine learning >non-issue Team:ML Meta label for the ML team v8.18.0 v9.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants