Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Object properties order not preserved in _source for fields with "type": "object", "enabled": false #119347

Closed
nemphys opened this issue Dec 30, 2024 · 10 comments
Assignees
Labels
:StorageEngine/Mapping The storage related side of mappings Team:StorageEngine

Comments

@nemphys
Copy link

nemphys commented Dec 30, 2024

Elasticsearch Version

8.15.2

Installed Plugins

analysis_icu

Java Version

bundled

OS Version

MacOS

Problem Description

According to the documentation (https://www.elastic.co/guide/en/elasticsearch/reference/current/enabled.html), fields mapped as "type": "object", "enabled": false are supposedly not processed/parsed by Elasticsearch and therefore one would expect that any stored objects in such fields would be stored in the document _source exactly as sent in the indexing request.

This does not seem to be the case when it comes to the order of their properties, since we have cases where such fields inside the document _source (retrieved using a plain GET document request, no searching involved) do not retain the original order.

Eg. a document indexed like this:

{
   fieldA: {
      propA: valueA,
      propB: valueB,
      propC: valueC
   }
}

is returned like this right after it is indexed:

{
   _source: {   
      fieldA: {
          propC: valueC,
          propA: valueA,
          propB: valueB
       }
   }
}

Is this normal/to be expected (ie. object serialization in source fields is not guaranteed to preserve properties order), or is it a bug?

Steps to Reproduce

PUT /test
{
  "mappings": {
    "_source": {
      "enabled": true,
      "excludes": [
        "*.test"
      ]
    },
    "properties": {
      "content": {
        "type": "object",
        "enabled": false
      }
    }
  }
}

PUT /test/_doc/12345678
{
  "content": {
    "propA": [
      "valueA"
    ],
    "propB": [
      "valueB"
    ],
    "propC": [
      "valueC"
    ],
    "propD": [
      "valueD"
    ],
    "propE": [
      "valueE"
    ],
    "propF": [
      "valueF"
    ],
    "propG": [
      "valueG"
    ],
    "propH": [
      "valueH"
    ],
    "propI": [
      "valueI"
    ]
  }
}

GET /test/_doc/12345678

Logs (if relevant)

No response

@nemphys nemphys added >bug needs:triage Requires assignment of a team area label labels Dec 30, 2024
@astefan
Copy link
Contributor

astefan commented Dec 30, 2024

Let's first establish that this is actually a bug in theory and then I can provide reproduction steps 😃

Just testing this as you described it, it doesn't reproduce. Unless you provide a reproduceable scenario (complete mapping and settings, exact document indexed and commands showing the mangled _source) I cannot confirm this as a bug. Our documentation provides a similar to what you described behavior for synthetic source otherwise _source should be as it was when the document was indexed.
Also, next time please provide a reproduceable scenario with the described bug. Github is reserved for actual issues, all other types of questions should be posted on our forum. Consider reopening this issue with full list of steps that reproduces the described behavior.

@astefan astefan closed this as completed Dec 30, 2024
@astefan astefan removed the >bug label Dec 30, 2024
@nemphys
Copy link
Author

nemphys commented Dec 30, 2024

@astefan I just wanted to make sure that this is not considered normal. If you reopen this I can provide a working example.

@astefan astefan reopened this Dec 30, 2024
@nemphys
Copy link
Author

nemphys commented Dec 30, 2024

@astefan thank you, will update the issue later tonight.

@nemphys
Copy link
Author

nemphys commented Dec 30, 2024

@astefan after a little digging, I have discovered the culprit: this only happens when the index mapping for _source has a "excludes" property.

I have updated the issue with a reproducible example, you will see that after the last GET request, the properties of the "content" field object in the document source are in a seemingly random order.

Please inform me if you want me to change the title to something more descriptive, now that I have narrowed down the cause of the issue.

@nemphys
Copy link
Author

nemphys commented Dec 30, 2024

After further testing, it seems to happen only when the excludes property contains a wildcard 😃

@nemphys
Copy link
Author

nemphys commented Dec 30, 2024

And one final finding is that this affects not only the property order of the nested objects (like the one in the example), but rather of the whole source object and all nested objects.

Therefore I think that we should rewrite the issue title and description from scratch, since it seems that it has nothing to do with enabled: false.

@tvernum tvernum added :StorageEngine/Mapping The storage related side of mappings and removed needs:triage Requires assignment of a team area label labels Jan 7, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

@tvernum
Copy link
Contributor

tvernum commented Jan 7, 2025

I don't think we attempt to provide any guarantees about preserving the order of fields in the source.

Most of the time the source will be preserved because we don't bother parsing and rewriting the source unless we need to, but if we need to exclude field then we need to parse and rewrite it and that may introduce changes.

Since fields in JSON are explicitly unordered 1, I don't think you should be relying on the source coming back with the same order as it was stored.

Footnotes

  1. From RFC 8259

    An object is an unordered collection of zero or more name/value
    pairs, where a name is a string and a value is a string, number,
    boolean, null, object, or array.

@kkrik-es kkrik-es self-assigned this Jan 7, 2025
@kkrik-es
Copy link
Contributor

kkrik-es commented Jan 7, 2025

Well said, @tvernum. More so, there are (documented modifications) when synthetic source is used. In other words, it's not safe to assume that the _source contents match the input at index time verbatim.

I'm closing this as working as intended.

@kkrik-es kkrik-es closed this as not planned Won't fix, can't repro, duplicate, stale Jan 7, 2025
@nemphys
Copy link
Author

nemphys commented Jan 7, 2025

@kkrik-es please note that I am not using synthetic source.
@tvernum Fair enough, but maybe it could be documented somewhere that the source keys order is not guaranteed to be preserved, so that issues like this can be avoided in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:StorageEngine/Mapping The storage related side of mappings Team:StorageEngine
Projects
None yet
Development

No branches or pull requests

5 participants