Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] SearchTransportTelemetryTests testSearchTransportMetricsQueryThenFetch failing #104184

Closed
DaveCTurner opened this issue Jan 10, 2024 · 10 comments · Fixed by #113194
Closed
Assignees
Labels
low-risk An open issue or test failure that is a low risk to future releases :Search Foundations/Search Catch all for Search Foundations Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch >test-failure Triaged test failures from CI

Comments

@DaveCTurner
Copy link
Contributor

CI Link

Only failed locally, sorry no build scan

Repro line

':server:test' --tests "org.elasticsearch.search.TelemetryMetrics.SearchTransportTelemetryTests.testSearchTransportMetricsQueryThenFetch" -Dtests.seed=A7DDD2883CF9F74D:5932D02B45F0F45A -Dtests.locale=is-IS -Dtests.timezone=Asia/Dacca -Druntime.java=21

Does it reproduce?

No

Applicable branches

main

Failure history

https://es-delivery-stats.elastic.dev/app/r/s/9tCmJ

Failure excerpt

No failures in CI it seems, but failed on a local run:

org.elasticsearch.search.TelemetryMetrics.SearchTransportTelemetryTests > testSearchTransportMetricsQueryThenFetch FAILED
    java.lang.AssertionError: Values should be different. Actual: 0
        at __randomizedtesting.SeedInfo.seed([A7DDD2883CF9F74D:5932D02B45F0F45A]:0)
        at org.junit.Assert.fail(Assert.java:89)
        at org.junit.Assert.failEquals(Assert.java:187)
        at org.junit.Assert.assertNotEquals(Assert.java:201)
        at org.junit.Assert.assertNotEquals(Assert.java:213)
        at org.elasticsearch.search.TelemetryMetrics.SearchTransportTelemetryTests.testSearchTransportMetricsQueryThenFetch(SearchTransportTelemetryTests.java:81)
  1> [2024-01-10T07:14:05,007][INFO ][o.e.s.T.SearchTransportTelemetryTests] [testSearchTransportMetricsQueryThenFetch] before test
  1> [2024-01-10T07:14:05,013][INFO ][o.e.s.T.SearchTransportTelemetryTests] [testSearchTransportMetricsQueryThenFetch] [SearchTransportTelemetryTests#testSearchTransportMetricsQueryThenFetch]: setting up test
  1> [2024-01-10T07:14:05,028][INFO ][o.e.t.InternalTestCluster] [testSearchTransportMetricsQueryThenFetch] Setup InternalTestCluster [TEST-TEST_WORKER_VM=[1142]-CLUSTER_SEED=[-2142127958903691377]-HASH=[B8C2190D1CAC]-cluster] with seed [E245A1F88A1CE78F] using [0] dedicated masters, [1] (data) nodes and [0] coord only nodes (master nodes are [auto-managed])
  1> [2024-01-10T07:14:05,035][INFO ][o.e.n.Node               ] [testSearchTransportMetricsQueryThenFetch] version[8.13.0-SNAPSHOT], pid[1723656], build[unknown/unknown/unknown], OS[Linux/6.2.0-1020-gcp/amd64], JVM[Oracle Corporation/OpenJDK 64-Bit Server VM/21.0.1/21.0.1+12-29]
  1> [2024-01-10T07:14:05,035][INFO ][o.e.n.Node               ] [testSearchTransportMetricsQueryThenFetch] JVM home [/home/davidturner/.gradle/jdks/oracle_corporation-21-amd64-linux/jdk-21.0.1], using bundled JDK [false]
  1> [2024-01-10T07:14:05,035][INFO ][o.e.n.Node               ] [testSearchTransportMetricsQueryThenFetch] JVM arguments [-Des.insecure_network_trace_enabled=true, -Des.scripting.update.ctx_in_params=false, -Des.search.rewrite_sort=true, -Des.transport.cname_in_publish_address=true, -Dgradle.dist.lib=/home/davidturner/.gradle/wrapper/dists/gradle-8.5-all/3zlzzgtsutfj0pbojr50n2l7z/gradle-8.5/lib, -Dgradle.user.home=/home/davidturner/.gradle, -Dgradle.worker.jar=/home/davidturner/.gradle/caches/8.5/workerMain/gradle-worker.jar, -Dio.netty.noKeySetOptimization=true, -Dio.netty.noUnsafe=true, -Dio.netty.recycler.maxCapacityPerThread=0, -Djava.awt.headless=true, -Djava.locale.providers=SPI,COMPAT, -Djava.security.manager=allow, -Djna.nosys=true, -Dorg.gradle.internal.worker.tmpdir=/home/davidturner/src/elasticsearch/server/build/tmp/test/work, -Dorg.gradle.native=false, -Dtests.artifact=server, -Dtests.gradle=true, -Dtests.gradle-loop-iteration=19, -Dtests.logger.level=WARN, -Dtests.security.manager=true, -Dtests.seed=A7DDD2883CF9F74D, -Dtests.task=:server:test, --add-opens=java.base/java.security.cert=ALL-UNNAMED, --add-opens=java.base/java.nio.channels=ALL-UNNAMED, --add-opens=java.base/java.net=ALL-UNNAMED, --add-opens=java.base/javax.net.ssl=ALL-UNNAMED, --add-opens=java.base/java.nio.file=ALL-UNNAMED, --add-opens=java.base/java.time=ALL-UNNAMED, --add-opens=java.management/java.lang.management=ALL-UNNAMED, -XX:+HeapDumpOnOutOfMemoryError, -esa, -XX:HeapDumpPath=/home/davidturner/src/elasticsearch/server/build/heapdump, -Xms512m, -Xmx512m, -Dfile.encoding=UTF-8, -Djava.io.tmpdir=/home/davidturner/src/elasticsearch/server/build/testrun/test/temp, -Duser.country, -Duser.language=en, -Duser.variant, -ea]
  1> [2024-01-10T07:14:05,035][INFO ][o.e.n.Node               ] [testSearchTransportMetricsQueryThenFetch] Default Locale [is_IS]
  1> [2024-01-10T07:14:05,036][WARN ][o.e.n.Node               ] [testSearchTransportMetricsQueryThenFetch] version [8.13.0-SNAPSHOT] is a pre-release version of Elasticsearch and is not suitable for production
  1> [2024-01-10T07:14:05,036][WARN ][o.e.d.n.Node             ] [testSearchTransportMetricsQueryThenFetch] data_stream.dataset="deprecation.elasticsearch" data_stream.namespace="default" data_stream.type="logs" elasticsearch.event.category="settings" event.code="shared-data-path" message="setting [path.shared_data] is deprecated and will be removed in a future release"
  1> [2024-01-10T07:14:05,232][INFO ][o.e.e.NodeEnvironment    ] [testSearchTransportMetricsQueryThenFetch] using [1] data paths, mounts [[/ (/dev/root)]], net usable_space [224.8gb], net total_space [242.2gb], types [ext4]
  1> [2024-01-10T07:14:05,233][INFO ][o.e.e.NodeEnvironment    ] [testSearchTransportMetricsQueryThenFetch] heap size [512mb], compressed ordinary object pointers [true]
  1> [2024-01-10T07:14:05,236][INFO ][o.e.n.Node               ] [testSearchTransportMetricsQueryThenFetch] node name [node_t0], node ID [9pmAQkDpT2-BAiFU_LPorA], cluster name [TEST-TEST_WORKER_VM=[1142]-CLUSTER_SEED=[-2142127958903691377]-HASH=[B8C2190D1CAC]-cluster], roles [master, data_warm, data_content, transform, data_hot, ml, data_frozen, ingest, data_cold, data, remote_cluster_client]
  1> [2024-01-10T07:14:05,328][INFO ][o.e.f.FeatureService     ] [testSearchTransportMetricsQueryThenFetch] Registered local node features [features_supported]
  1> [2024-01-10T07:14:05,350][INFO ][o.e.i.r.RecoverySettings ] [testSearchTransportMetricsQueryThenFetch] using rate limit [40mb] with [default=40mb, read=0b, write=0b, max=0b]
  1> [2024-01-10T07:14:05,352][INFO ][o.e.d.DiscoveryModule    ] [testSearchTransportMetricsQueryThenFetch] using discovery type [multi-node] and seed hosts providers [settings, file]
  1> [2024-01-10T07:14:05,569][INFO ][o.e.n.Node               ] [testSearchTransportMetricsQueryThenFetch] initialized
  1> [2024-01-10T07:14:05,596][INFO ][o.e.n.Node               ] [[test_TEST-TEST_WORKER_VM=[1142]-CLUSTER_SEED=[-2142127958903691377]-HASH=[B8C2190D1CAC]-cluster[T#1]]] starting ...
  1> [2024-01-10T07:14:05,676][INFO ][o.e.t.TransportService   ] [[test_TEST-TEST_WORKER_VM=[1142]-CLUSTER_SEED=[-2142127958903691377]-HASH=[B8C2190D1CAC]-cluster[T#1]]] publish_address {127.0.0.1:28181}, bound_addresses {[::1]:28181}, {127.0.0.1:28181}
  1> [2024-01-10T07:14:05,724][INFO ][o.e.c.c.ClusterBootstrapService] [[test_TEST-TEST_WORKER_VM=[1142]-CLUSTER_SEED=[-2142127958903691377]-HASH=[B8C2190D1CAC]-cluster[T#1]]] this node has not joined a bootstrapped cluster yet; [cluster.initial_master_nodes] is set to [node_t0]
  1> [2024-01-10T07:14:05,747][INFO ][o.e.c.c.Coordinator      ] [[test_TEST-TEST_WORKER_VM=[1142]-CLUSTER_SEED=[-2142127958903691377]-HASH=[B8C2190D1CAC]-cluster[T#1]]] setting initial configuration to VotingConfiguration{9pmAQkDpT2-BAiFU_LPorA}
  1> [2024-01-10T07:14:05,845][INFO ][o.e.c.s.MasterService    ] [node_t0] elected-as-master ([1] nodes joined in term 1)[_FINISH_ELECTION_, {node_t0}{9pmAQkDpT2-BAiFU_LPorA}{a2O6zoWDQFiQrNroIsfMsQ}{node_t0}{127.0.0.1}{127.0.0.1:28181}{cdfhilmrstw}{8.13.0}{7000099-8501000} completing election], term: 1, version: 1, delta: master node changed {previous [], current [{node_t0}{9pmAQkDpT2-BAiFU_LPorA}{a2O6zoWDQFiQrNroIsfMsQ}{node_t0}{127.0.0.1}{127.0.0.1:28181}{cdfhilmrstw}{8.13.0}{7000099-8501000}]}
  1> [2024-01-10T07:14:05,880][INFO ][o.e.c.c.CoordinationState] [node_t0] cluster UUID set to [I8Rrx0LzSuqGvOAS0lB8Jg]
  1> [2024-01-10T07:14:05,900][INFO ][o.e.c.s.ClusterApplierService] [node_t0] master node changed {previous [], current [{node_t0}{9pmAQkDpT2-BAiFU_LPorA}{a2O6zoWDQFiQrNroIsfMsQ}{node_t0}{127.0.0.1}{127.0.0.1:28181}{cdfhilmrstw}{8.13.0}{7000099-8501000}]}, term: 1, version: 1, reason: Publication{term=1, version=1}
  1> [2024-01-10T07:14:05,930][INFO ][o.e.c.f.AbstractFileWatchingService] [node_t0] starting file watcher ...
  1> [2024-01-10T07:14:05,970][INFO ][o.e.c.c.NodeJoinExecutor ] [node_t0] node-join: [{node_t0}{9pmAQkDpT2-BAiFU_LPorA}{a2O6zoWDQFiQrNroIsfMsQ}{node_t0}{127.0.0.1}{127.0.0.1:28181}{cdfhilmrstw}{8.13.0}{7000099-8501000}] with reason [completing election]
  1> [2024-01-10T07:14:05,972][INFO ][o.e.c.f.AbstractFileWatchingService] [[elasticsearch[file-watcher[/home/davidturner/src/elasticsearch/server/build/testrun/test/temp/org.elasticsearch.search.TelemetryMetrics.SearchTransportTelemetryTests_A7DDD2883CF9F74D-001/tempDir-002/config/operator/settings.json]]]] file settings service up and running [tid=1396]
  1> [2024-01-10T07:14:05,998][INFO ][o.e.n.Node               ] [[test_TEST-TEST_WORKER_VM=[1142]-CLUSTER_SEED=[-2142127958903691377]-HASH=[B8C2190D1CAC]-cluster[T#1]]] started {node_t0}{9pmAQkDpT2-BAiFU_LPorA}{a2O6zoWDQFiQrNroIsfMsQ}{node_t0}{127.0.0.1}{127.0.0.1:28181}{cdfhilmrstw}{8.13.0}{7000099-8501000}
  1> [2024-01-10T07:14:06,024][INFO ][o.e.g.GatewayService     ] [node_t0] recovered [0] indices into cluster_state
  1> [2024-01-10T07:14:06,040][INFO ][o.e.h.n.s.HealthNodeTaskExecutor] [node_t0] Node [{node_t0}{9pmAQkDpT2-BAiFU_LPorA}] is selected as the current health node.
  1> [2024-01-10T07:14:06,159][INFO ][o.e.c.m.MetadataIndexTemplateService] [node_t0] adding template [random_index_template] for index patterns [*]
  1> [2024-01-10T07:14:06,181][INFO ][o.e.s.T.SearchTransportTelemetryTests] [testSearchTransportMetricsQueryThenFetch] [SearchTransportTelemetryTests#testSearchTransportMetricsQueryThenFetch]: all set up test
  1> [2024-01-10T07:14:06,188][INFO ][o.e.c.m.MetadataCreateIndexService] [node_t0] [test2] creating index, cause [api], templates [random_index_template], shards [3]/[0]
  1> [2024-01-10T07:14:06,478][INFO ][o.e.c.r.a.AllocationService] [node_t0] current.health="GREEN" message="Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[test2][0]]])." previous.health="YELLOW" reason="shards started [[test2][0]]"
  1> [2024-01-10T07:14:06,502][INFO ][o.e.s.T.SearchTransportTelemetryTests] [testSearchTransportMetricsQueryThenFetch] Index [1] docs async: [true] bulk: [false]
  1> [2024-01-10T07:14:06,613][INFO ][o.e.c.m.MetadataMappingService] [node_t0] [test2/LloPpu4jSU6HaBIMKMpbyA] create_mapping
  1> [2024-01-10T07:14:06,994][INFO ][o.e.s.T.SearchTransportTelemetryTests] [testSearchTransportMetricsQueryThenFetch] [SearchTransportTelemetryTests#testSearchTransportMetricsQueryThenFetch]: cleaning up after test
  1> [2024-01-10T07:14:07,173][INFO ][o.e.c.m.MetadataDeleteIndexService] [node_t0] [test2/LloPpu4jSU6HaBIMKMpbyA] deleting index
  1> [2024-01-10T07:14:07,203][INFO ][o.e.t.s.M.Listener       ] [node_t0] [test2][1] start check index
  1> [2024-01-10T07:14:07,214][INFO ][o.e.t.s.M.Listener       ] [node_t0] [test2][1] end check index
  1> [2024-01-10T07:14:07,221][INFO ][o.e.t.s.M.Listener       ] [node_t0] [test2][0] start check index
  1> [2024-01-10T07:14:07,223][INFO ][o.e.t.s.M.Listener       ] [node_t0] [test2][0] end check index
  1> [2024-01-10T07:14:07,227][INFO ][o.e.t.s.M.Listener       ] [node_t0] [test2][2] start check index
  1> [2024-01-10T07:14:07,230][INFO ][o.e.t.s.M.Listener       ] [node_t0] [test2][2] end check index
  1> [2024-01-10T07:14:07,246][INFO ][o.e.c.m.MetadataIndexTemplateService] [node_t0] removing template [random_index_template]
  1> [2024-01-10T07:14:07,261][INFO ][o.e.n.Node               ] [testSearchTransportMetricsQueryThenFetch] stopping ...
  1> [2024-01-10T07:14:07,264][INFO ][o.e.c.f.AbstractFileWatchingService] [[elasticsearch[file-watcher[/home/davidturner/src/elasticsearch/server/build/testrun/test/temp/org.elasticsearch.search.TelemetryMetrics.SearchTransportTelemetryTests_A7DDD2883CF9F74D-001/tempDir-002/config/operator/settings.json]]]] shutting down watcher thread
  1> [2024-01-10T07:14:07,270][INFO ][o.e.c.f.AbstractFileWatchingService] [testSearchTransportMetricsQueryThenFetch] watcher service stopped
  1> [2024-01-10T07:14:07,282][INFO ][o.e.n.Node               ] [testSearchTransportMetricsQueryThenFetch] stopped
  1> [2024-01-10T07:14:07,282][INFO ][o.e.n.Node               ] [testSearchTransportMetricsQueryThenFetch] closing ...
  1> [2024-01-10T07:14:07,286][INFO ][o.e.n.Node               ] [testSearchTransportMetricsQueryThenFetch] closed
  1> [2024-01-10T07:14:07,291][INFO ][o.e.s.T.SearchTransportTelemetryTests] [testSearchTransportMetricsQueryThenFetch] [SearchTransportTelemetryTests#testSearchTransportMetricsQueryThenFetch]: cleaned up after test
  1> [2024-01-10T07:14:07,292][INFO ][o.e.s.T.SearchTransportTelemetryTests] [testSearchTransportMetricsQueryThenFetch] after test
@DaveCTurner DaveCTurner added :Search/Search Search-related issues that do not fall into other categories >test-failure Triaged test failures from CI labels Jan 10, 2024
@elasticsearchmachine elasticsearchmachine added blocker Team:Search Meta label for search team labels Jan 10, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

DaveCTurner added a commit that referenced this issue Jan 10, 2024
@benwtrent
Copy link
Member

@JVerwolf looks like most of SearchTransportTelemetryTests are flaky and muted. Mind determining impact and digging in?

@rjernst
Copy link
Member

rjernst commented Mar 12, 2024

Another instance on a PR, it seems the original test here was not muted so I've done so in ad47ffb

@elasticsearchmachine elasticsearchmachine added the needs:risk Requires assignment of a risk label (low, medium, blocker) label Apr 24, 2024
@javanna javanna added :Search Foundations/Search Catch all for Search Foundations and removed :Search/Search Search-related issues that do not fall into other categories labels Jul 17, 2024
@elasticsearchmachine elasticsearchmachine added Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch and removed Team:Search Meta label for search team labels Jul 17, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-foundations (Team:Search Foundations)

@julio-santana
Copy link
Contributor

@JVerwolf , please provide a risk assessment on this muted test.

@andreidan
Copy link
Contributor

Labeling this as low risk due to it targeting transport telemetry

@andreidan andreidan added low-risk An open issue or test failure that is a low risk to future releases and removed needs:risk Requires assignment of a risk label (low, medium, blocker) labels Oct 30, 2024
@JVerwolf
Copy link
Contributor

JVerwolf commented Oct 30, 2024

Agreed, @andreidan. I spent a number of cycles trying to reproduce, but there must be some logic that causes various transport actions to be called differently depending on the random cluster topology (e.g. via an optimization).

IIR, @carlosdelest was adding telemetry that will replace this. When that happens, these can be deleted.

@carlosdelest
Copy link
Member

IIR, @carlosdelest was adding telemetry that will replace this. When that happens, these can be deleted.

@JVerwolf correct, these are removed as part of #113194

@JVerwolf JVerwolf linked a pull request Jan 9, 2025 that will close this issue
@JVerwolf
Copy link
Contributor

JVerwolf commented Jan 9, 2025

Closed by #113194

@JVerwolf JVerwolf closed this as completed Jan 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
low-risk An open issue or test failure that is a low risk to future releases :Search Foundations/Search Catch all for Search Foundations Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch >test-failure Triaged test failures from CI
Projects
None yet
Development

Successfully merging a pull request may close this issue.