Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

change_point agg fails with IllegalArgumentException while indexing an array #112805

Closed
weltenwort opened this issue Sep 12, 2024 · 3 comments · Fixed by #119578
Closed

change_point agg fails with IllegalArgumentException while indexing an array #112805

weltenwort opened this issue Sep 12, 2024 · 3 comments · Fixed by #119578
Assignees
Labels
>bug :ml Machine learning Team:ML Meta label for the ML team

Comments

@weltenwort
Copy link
Member

Elasticsearch Version

8.16.0-SNAPSHOT

Installed Plugins

No response

Java Version

bundled

OS Version

Linux 6.5.0-1024-gcp #26~22.04.1-Ubuntu SMP Fri Jun 14 18:48:45 UTC 2024 x86_64 GNU/Linux

Problem Description

When run on some sets of documents the change_point aggregation throws an IllegalArgumentException. The Observability Logs UX team is trying to use the aggregation to detect change points in log documents. I was unable to detect a pattern to the failures, though, as slight modifications to the buckets cause it to disappear or re-appear:

{
  "error": {
    "root_cause": [],
    "type": "search_phase_execution_exception",
    "reason": "",
    "phase": "rank-feature",
    "grouped": true,
    "failed_shards": [],
    "caused_by": {
      "type": "illegal_argument_exception",
      "reason": "0 > -1",
      "stack_trace": """java.lang.IllegalArgumentException: 0 > -1
	at java.base/java.util.Arrays.copyOfRange(Arrays.java:4090)
	at [email protected]/org.elasticsearch.xpack.ml.aggs.changepoint.ChangePointAggregator.testDistributionChange(ChangePointAggregator.java:487)
	at [email protected]/org.elasticsearch.xpack.ml.aggs.changepoint.ChangePointAggregator.testForChange(ChangePointAggregator.java:241)
	at [email protected]/org.elasticsearch.xpack.ml.aggs.changepoint.ChangePointAggregator.testForChange(ChangePointAggregator.java:160)
	at [email protected]/org.elasticsearch.xpack.ml.aggs.changepoint.ChangePointAggregator.doReduce(ChangePointAggregator.java:129)
	at [email protected]/org.elasticsearch.search.aggregations.InternalAggregations.topLevelReduce(InternalAggregations.java:235)
	at [email protected]/org.elasticsearch.search.aggregations.InternalAggregations.topLevelReduceDelayable(InternalAggregations.java:211)
	at [email protected]/org.elasticsearch.action.search.SearchPhaseController.reduceAggs(SearchPhaseController.java:682)
	at [email protected]/org.elasticsearch.action.search.SearchPhaseController.reducedQueryPhase(SearchPhaseController.java:636)
	at [email protected]/org.elasticsearch.action.search.QueryPhaseResultConsumer.reduce(QueryPhaseResultConsumer.java:139)
	at [email protected]/org.elasticsearch.action.search.RankFeaturePhase.innerRun(RankFeaturePhase.java:92)
	at [email protected]/org.elasticsearch.action.search.RankFeaturePhase$1.doRun(RankFeaturePhase.java:79)
	at [email protected]/org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
	at [email protected]/org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:33)
	at [email protected]/org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:991)
	at [email protected]/org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
	at java.base/java.lang.Thread.run(Thread.java:1570)
"""
    },
    "stack_trace": """Failed to execute phase [rank-feature], 
	at [email protected]/org.elasticsearch.action.search.AbstractSearchAsyncAction.onPhaseFailure(AbstractSearchAsyncAction.java:726)
	at [email protected]/org.elasticsearch.action.search.RankFeaturePhase$1.onFailure(RankFeaturePhase.java:84)
	at [email protected]/org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:28)
	at [email protected]/org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:33)
	at [email protected]/org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:991)
	at [email protected]/org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
	at java.base/java.lang.Thread.run(Thread.java:1570)
Caused by: java.lang.IllegalArgumentException: 0 > -1
	at java.base/java.util.Arrays.copyOfRange(Arrays.java:4090)
	at [email protected]/org.elasticsearch.xpack.ml.aggs.changepoint.ChangePointAggregator.testDistributionChange(ChangePointAggregator.java:487)
	at [email protected]/org.elasticsearch.xpack.ml.aggs.changepoint.ChangePointAggregator.testForChange(ChangePointAggregator.java:241)
	at [email protected]/org.elasticsearch.xpack.ml.aggs.changepoint.ChangePointAggregator.testForChange(ChangePointAggregator.java:160)
	at [email protected]/org.elasticsearch.xpack.ml.aggs.changepoint.ChangePointAggregator.doReduce(ChangePointAggregator.java:129)
	at [email protected]/org.elasticsearch.search.aggregations.InternalAggregations.topLevelReduce(InternalAggregations.java:235)
	at [email protected]/org.elasticsearch.search.aggregations.InternalAggregations.topLevelReduceDelayable(InternalAggregations.java:211)
	at [email protected]/org.elasticsearch.action.search.SearchPhaseController.reduceAggs(SearchPhaseController.java:682)
	at [email protected]/org.elasticsearch.action.search.SearchPhaseController.reducedQueryPhase(SearchPhaseController.java:636)
	at [email protected]/org.elasticsearch.action.search.QueryPhaseResultConsumer.reduce(QueryPhaseResultConsumer.java:139)
	at [email protected]/org.elasticsearch.action.search.RankFeaturePhase.innerRun(RankFeaturePhase.java:92)
	at [email protected]/org.elasticsearch.action.search.RankFeaturePhase$1.doRun(RankFeaturePhase.java:79)
	at [email protected]/org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
	... 6 more
"""
  },
  "status": 400
}

Steps to Reproduce

I originally encountered this when running on millions of log entries, but I could reduce it to this synthetic scenario:

DELETE change-point-test

POST _bulk
{ "index" : { "_index" : "change-point-test", "_id" : "1" } }
{ "key" : 1, "value" : 0 }
{ "index" : { "_index" : "change-point-test", "_id" : "2" } }
{ "key" : 2, "value" : 0 }
{ "index" : { "_index" : "change-point-test", "_id" : "3" } }
{ "key" : 3, "value" : 0 }
{ "index" : { "_index" : "change-point-test", "_id" : "4" } }
{ "key" : 4, "value" : 0 }
{ "index" : { "_index" : "change-point-test", "_id" : "5" } }
{ "key" : 5, "value" : 0 }
{ "index" : { "_index" : "change-point-test", "_id" : "6" } }
{ "key" : 6, "value" : 0 }
{ "index" : { "_index" : "change-point-test", "_id" : "7" } }
{ "key" : 7, "value" : 0 }
{ "index" : { "_index" : "change-point-test", "_id" : "8" } }
{ "key" : 8, "value" : 0 }
{ "index" : { "_index" : "change-point-test", "_id" : "9" } }
{ "key" : 9, "value" : 0 }
{ "index" : { "_index" : "change-point-test", "_id" : "10" } }
{ "key" : 10, "value" : 0 }
{ "index" : { "_index" : "change-point-test", "_id" : "11" } }
{ "key" : 11, "value" : 0 }
{ "index" : { "_index" : "change-point-test", "_id" : "12" } }
{ "key" : 12, "value" : 0 }
{ "index" : { "_index" : "change-point-test", "_id" : "13" } }
{ "key" : 13, "value" : 0 }
{ "index" : { "_index" : "change-point-test", "_id" : "14" } }
{ "key" : 14, "value" : 0 }
{ "index" : { "_index" : "change-point-test", "_id" : "15" } }
{ "key" : 15, "value" : 0 }
{ "index" : { "_index" : "change-point-test", "_id" : "16" } }
{ "key" : 16, "value" : 0 }
{ "index" : { "_index" : "change-point-test", "_id" : "17" } }
{ "key" : 17, "value" : 0 }
{ "index" : { "_index" : "change-point-test", "_id" : "18" } }
{ "key" : 18, "value" : 0 }
{ "index" : { "_index" : "change-point-test", "_id" : "19" } }
{ "key" : 19, "value" : 0 }
{ "index" : { "_index" : "change-point-test", "_id" : "20" } }
{ "key" : 20, "value" : 0 }
{ "index" : { "_index" : "change-point-test", "_id" : "21" } }
{ "key" : 21, "value" : 0 }
{ "index" : { "_index" : "change-point-test", "_id" : "22" } }
{ "key" : 22, "value" : 700 }
{ "index" : { "_index" : "change-point-test", "_id" : "23" } }
{ "key" : 23, "value" : 735 }
{ "index" : { "_index" : "change-point-test", "_id" : "24" } }
{ "key" : 24, "value" : 715 }
{ "index" : { "_index" : "change-point-test", "_id" : "25" } }
{ "key" : 25, "value" : 0 }

POST change-point-test/_search?error_trace
{
  "size": 0,
  "aggs": {
    "buckets": {
      "terms": {
        "field": "key",
        "size": 100
      },
      "aggs": {
        "values": {
          "max": {
            "field": "value",
            "missing": 0
          }
        }
      }
    },
    "change": {
      "change_point": {
        "buckets_path": "buckets>values"
      }
    }
  }
}

Logs (if relevant)

No response

@weltenwort weltenwort added >bug needs:triage Requires assignment of a team area label labels Sep 12, 2024
@weltenwort
Copy link
Member Author

Changing the values or adding/removing buckets can cause the problem to disappear or reappear. So it seems to be very dependent on the specific metric that the change point runs on.

@iverase iverase added the :ml Machine learning label Sep 12, 2024
@elasticsearchmachine elasticsearchmachine added the Team:ML Meta label for the ML team label Sep 12, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

@gareth-ellis
Copy link
Member

We're encountering a very similar error in 8.14.3 - I will see if I can get a reproduction case later, though it's on an internal cluster so ML team can get access if required
Request:

GET rally-results-*/_search?error_trace
{
  "query": {
    "bool": {
      "filter": [
        {
          "query_string": {
            "query": "* AND environment:\"serverless-nightly\" AND active:true AND user-tags.benchmark-name:\"serverless-qa-aws-so_vector\" AND merge_throttle_time"
          }
        },
        {
          "range": {
            "race-timestamp": {
              "gte": "2024-11-18T10:08:35+00:00||-30d/d",
              "lte": "2024-11-18T10:08:35+00:00",
              "format": "strict_date_optional_time"
            }
          }
        }
      ]
    }
  },
  "size": 0,
  "aggs": {
    "race_timestamp_term": {
      "terms": {
        "field": "race-timestamp",
        "size": 30,
        "order": {
          "_key": "asc"
        }
      },
      "aggs": {
        "metric": {
          "avg": {
            "field": "value.single"
          }
        }
      }
    },
    "changes": {
      "change_point": {
        "buckets_path": "race_timestamp_term>metric.value"
      }
    }
  }
}

Error

{
  "error": {
    "root_cause": [],
    "type": "search_phase_execution_exception",
    "reason": "",
    "phase": "fetch",
    "grouped": true,
    "failed_shards": [],
    "caused_by": {
      "type": "illegal_argument_exception",
      "reason": "0 > -1",
      "stack_trace": """java.lang.IllegalArgumentException: 0 > -1
	at java.base/java.util.Arrays.copyOfRange(Arrays.java:4090)
	at [email protected]/org.elasticsearch.xpack.ml.aggs.changepoint.ChangePointAggregator.testDistributionChange(ChangePointAggregator.java:487)
	at [email protected]/org.elasticsearch.xpack.ml.aggs.changepoint.ChangePointAggregator.testForChange(ChangePointAggregator.java:241)
	at [email protected]/org.elasticsearch.xpack.ml.aggs.changepoint.ChangePointAggregator.testForChange(ChangePointAggregator.java:160)
	at [email protected]/org.elasticsearch.xpack.ml.aggs.changepoint.ChangePointAggregator.doReduce(ChangePointAggregator.java:129)
	at [email protected]/org.elasticsearch.search.aggregations.InternalAggregations.topLevelReduce(InternalAggregations.java:246)
	at [email protected]/org.elasticsearch.search.aggregations.InternalAggregations.topLevelReduceDelayable(InternalAggregations.java:222)
	at [email protected]/org.elasticsearch.action.search.SearchPhaseController.reduceAggs(SearchPhaseController.java:678)
	at [email protected]/org.elasticsearch.action.search.SearchPhaseController.reducedQueryPhase(SearchPhaseController.java:632)
	at [email protected]/org.elasticsearch.action.search.QueryPhaseResultConsumer.reduce(QueryPhaseResultConsumer.java:139)
	at [email protected]/org.elasticsearch.action.search.FetchSearchPhase.innerRun(FetchSearchPhase.java:99)
	at [email protected]/org.elasticsearch.action.search.FetchSearchPhase$1.doRun(FetchSearchPhase.java:87)
	at [email protected]/org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
	at [email protected]/org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:33)
	at [email protected]/org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:984)
	at [email protected]/org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
	at java.base/java.lang.Thread.run(Thread.java:1570)
"""
    },
    "stack_trace": """Failed to execute phase [fetch], 
	at [email protected]/org.elasticsearch.action.search.AbstractSearchAsyncAction.onPhaseFailure(AbstractSearchAsyncAction.java:712)
	at [email protected]/org.elasticsearch.action.search.FetchSearchPhase$1.onFailure(FetchSearchPhase.java:92)
	at [email protected]/org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:28)
	at [email protected]/org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:33)
	at [email protected]/org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:984)
	at [email protected]/org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
	at java.base/java.lang.Thread.run(Thread.java:1570)
Caused by: java.lang.IllegalArgumentException: 0 > -1
	at java.base/java.util.Arrays.copyOfRange(Arrays.java:4090)
	at [email protected]/org.elasticsearch.xpack.ml.aggs.changepoint.ChangePointAggregator.testDistributionChange(ChangePointAggregator.java:487)
	at [email protected]/org.elasticsearch.xpack.ml.aggs.changepoint.ChangePointAggregator.testForChange(ChangePointAggregator.java:241)
	at [email protected]/org.elasticsearch.xpack.ml.aggs.changepoint.ChangePointAggregator.testForChange(ChangePointAggregator.java:160)
	at [email protected]/org.elasticsearch.xpack.ml.aggs.changepoint.ChangePointAggregator.doReduce(ChangePointAggregator.java:129)
	at [email protected]/org.elasticsearch.search.aggregations.InternalAggregations.topLevelReduce(InternalAggregations.java:246)
	at [email protected]/org.elasticsearch.search.aggregations.InternalAggregations.topLevelReduceDelayable(InternalAggregations.java:222)
	at [email protected]/org.elasticsearch.action.search.SearchPhaseController.reduceAggs(SearchPhaseController.java:678)
	at [email protected]/org.elasticsearch.action.search.SearchPhaseController.reducedQueryPhase(SearchPhaseController.java:632)
	at [email protected]/org.elasticsearch.action.search.QueryPhaseResultConsumer.reduce(QueryPhaseResultConsumer.java:139)
	at [email protected]/org.elasticsearch.action.search.FetchSearchPhase.innerRun(FetchSearchPhase.java:99)
	at [email protected]/org.elasticsearch.action.search.FetchSearchPhase$1.doRun(FetchSearchPhase.java:87)
	at [email protected]/org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
	... 6 more
"""
  },
  "status": 400
} 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :ml Machine learning Team:ML Meta label for the ML team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants