Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add index_options to semantic_text field mappings #119967

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

kderusso
Copy link
Member

@kderusso kderusso commented Jan 10, 2025

Adds index_options support for semantic_text fields using dense models.

Example:

PUT _inference/text_embedding/my-e5-model
{
  "service": "elasticsearch",
  "service_settings": {
    "num_allocations": 1,
    "num_threads": 1,
    "model_id": ".multilingual-e5-small"
  }
}

PUT my-semantic-index
{
  "mappings": {
    "properties": {
      "inference_field": {
        "type": "semantic_text",
        "inference_id": "my-e5-model",
        "index_options": {
          "type": "bbq_hnsw",
          "ef_construction": 100
        }
      }
    }
  }
}

@kderusso kderusso force-pushed the kderusso/semantic-text-index-options branch from e096a61 to 342d769 Compare January 10, 2025 16:17
@kderusso kderusso force-pushed the kderusso/semantic-text-index-options branch from 342d769 to d822301 Compare January 10, 2025 16:29
@kderusso kderusso added >enhancement auto-backport Automatically create backport pull requests when merged :SearchOrg/Relevance Label for the Search (solution/org) Relevance team v8.18.0 labels Jan 10, 2025
@elasticsearchmachine
Copy link
Collaborator

Hi @kderusso, I've created a changelog YAML for you.

@kderusso kderusso added the :Search Relevance/Search Catch all for Search Relevance label Jan 10, 2025
@kderusso kderusso marked this pull request as ready for review January 10, 2025 16:38
@kderusso kderusso requested review from jimczi, Mikep86 and a team January 10, 2025 16:39
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/search-eng (Team:SearchOrg)

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/search-relevance (Team:Search - Relevance)

Copy link
Contributor

@Mikep86 Mikep86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good start to this! I have a bunch of comments, but they're mostly interrelated, so it's not as much as it seems.

@@ -0,0 +1,5 @@
pr: 119967
summary: Add `index_options` to `semantic_text` field mappings
area: Relevance
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mapping is probably a more appropriate area for this change

Comment on lines +101 to +102
For dense vector models, the configuration of `index_options` models the <<dense-vector-index-options>> configuration and defaults for dense vectors.
There are currently no supported index options for sparse models.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
For dense vector models, the configuration of `index_options` models the <<dense-vector-index-options>> configuration and defaults for dense vectors.
There are currently no supported index options for sparse models.
For dense vector models, the configuration of `index_options` specifies the <<dense-vector-index-options>> configuration and defaults for dense vectors.
There are currently no supported index options for sparse vector models.

try {
builder.field(key, value);
} catch (IOException e) {
throw new UncheckedIOException(e);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get why you're doing this, but it would be nice if we didn't have to use UncheckedIOException here. The method already throws IOException. Can we iterate over the map without using forEach so that we can throw IOException directly?

return asMap().toString();
}

public abstract Map<String, Object> asMap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we implement asMap() in the base class and handle type in there? We could have an abstract asMap(Map<String, Object> map) method for writing the rest of the params that vary by implementation.

@@ -186,6 +194,101 @@ private void validateFieldNotPresent(String field, Object fieldValue) {
}
}

public abstract static class IndexOptions implements ToXContentObject {

protected final String type;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make this private?

Comment on lines +618 to +622
test_runner_features: [ capabilities ]
capabilities:
- method: GET
path: /_inference
capabilities: [ default_elser_2 ]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

- do:
indices.create:
index: test-index-options
body:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to specify the index setting to use the new format for this test:

          settings:
            index:
              mapping:
                semantic_text:
                  use_legacy_format: false

- match: { "test-index-options.mappings.properties.semantic_field.index_options.m": 16 }
- match: { "test-index-options.mappings.properties.semantic_field.index_options.ef_construction": 100 }
- match: { "test-index-options.mappings.properties.semantic_field.index_options.confidence_interval": 1.0 }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you index a doc with dense vectors here to test that the index options apply successfully? You'll also need to specify the use_legacy_format index setting to use the new format when doing so.

- not_exists: test-index-options.mappings.properties.semantic_field.index_options.m
- not_exists: test-index-options.mappings.properties.semantic_field.index_options.ef_construction
- not_exists: test-index-options.mappings.properties.semantic_field.index_options.confidence_interval

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, can you index a doc?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the tests that rely on the semantic text format to index docs (Supports index options, Supports partial index options, Index options on sparse_vector fields are ignored), can you replicate them in 10_semantic_text_field_mapping_bwc.yml using the legacy format?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport Automatically create backport pull requests when merged >enhancement :Search Relevance/Search Catch all for Search Relevance :SearchOrg/Relevance Label for the Search (solution/org) Relevance team v8.18.0 v9.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants