-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Inference API] Default eis endpoint #119694
Conversation
Pinging @elastic/ml-core (Team:ML) |
Hi @maxhniebergall, I've created a changelog YAML for you. |
…inference/services/elastic/completion/EISCompletionServiceSettingsTests.java Co-authored-by: Tim Grein <[email protected]>
…inference/services/elastic/completion/EISCompletionModelTests.java Co-authored-by: Tim Grein <[email protected]>
f8972cf
to
986654b
Compare
@maxhniebergall please enable the option "Allow edits and access to secrets by maintainers" on your PR. For more information, see the documentation. |
if (DEFAULT_EIS_ENDPOINT_IDS.contains(inferenceEntityId)) { | ||
parsedModelListener.onFailure( | ||
new ElasticsearchStatusException( | ||
"[{}] is a reserved inference Id. Cannot create a new inference endpoint with a reserved Id", | ||
RestStatus.BAD_REQUEST, | ||
inferenceEntityId | ||
) | ||
); | ||
return; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing wrong here, but I don't quite understand whats going on. Can you explain
- What this is loosely?
- How could this happen?
- What is the result of it happening? Does ES fail to boot?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah sure:
What this is loosely?
During a PUT request to persist a new inference endpoint, this could is checking to see if the inferenceEntityId
that the user is requesting for the new endpoint matches one of the default ones we've reserved. We don't allow that because it could cause clashes and we wouldn't know how to handle inference requests against that inference endpoint.
How could this happen?
This can happen when a user is creating a new inference endpoint like this:
PUT http://localhost:9200/_inference/.eis-alpha-1
{
"service": "elastic",
"task_type": "completion",
"service_settings": {
...
}
}
It's protecting us from a user trying to use a reserved value for their inference endpoint id
What is the result of it happening? Does ES fail to boot?
The user's PUT request will result in a 400 error. I don't believe this code path would be executed during boot up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Let's move this check to TransportPutInferenceModelAction
then the check will apply to all the services.
ModelRegistry knows about the default Ids via the addDefaultIds(defaultIds) function which is called by the InferencePlugin at node start up.
In TransportPutInferenceModelAction#masterOperation the same check can be made by querying ModelRegistry
@elasticmachine test this please |
@elasticmachine test this please |
private static final EnumSet<TaskType> supportedTaskTypes = EnumSet.of(TaskType.SPARSE_EMBEDDING, TaskType.COMPLETION); | ||
private static final String SERVICE_NAME = "Elastic"; | ||
private static final String DEFAULT_EIS_CHAT_COMPLETION_MODEL_ID_V1 = "rainbow-sprinkles"; | ||
private static final String DEFAULT_EIS_CHAT_COMPLETION_ENDPOINT_ID_V1 = ".eis-alpha-1"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we name this the same way we name other default endpoints? I think we settled on modelId-providerName
which would mean: .rainbow-sprinkles-elastic
?
return List.of(v1DefaultCompletionModel()); | ||
} | ||
|
||
private ElasticInferenceServiceCompletionModel v1DefaultCompletionModel() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this initialization of the default endpoint conditional on the health check? We want to add an ACL in the gateway which will dynamically control if EIS is available or not. In the case that this customer does not have access to EIS, will it skip creating the default endpoint?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a great point. The current implementation does not consider that. I believe I also read that individual models might be available separately for different customers. Which means that we'd need EIS enabled for the customer, and this particular model.
Since it sounds like we're going to keep the information in memory to determine what is available to this cluster, I wonder if we could pass that information along via the ElasticInferenceServiceComponents
in the constructor 🤔 . Although if we ever go with a caching approach then the default inference endpoints should attempt to be created on each call to retrieve the defaults 🤔
I think we should postpone merging this PR until the logic is merged to handle determine if EIS is available and which models are available.
Closing this in favor of this PR: #120847 |
This PR adds the first iteration model id for the elastic inference service.
Model id:
rainbow-sprinkles
The default endpoint id:
.rainbow-sprinkles-elastic