Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue S3 web identity token refresh call with sufficient permissions #119748

Merged
merged 8 commits into from
Jan 9, 2025

Conversation

pxsalehi
Copy link
Member

@pxsalehi pxsalehi commented Jan 8, 2025

Closes #119747

@pxsalehi pxsalehi added >bug :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs v9.0.0 v8.18.0 labels Jan 8, 2025
@elasticsearchmachine
Copy link
Collaborator

Hi @pxsalehi, I've created a changelog YAML for you.

@pxsalehi
Copy link
Member Author

pxsalehi commented Jan 8, 2025

The fix is simple. I'm working on a test. As the issue is pretty clear, we could also merge this and follow up with a test. I'm gonna leave this on draft until I figure out an IT (probably based on RepositoryS3StsCredentialsRestIT).

@pxsalehi pxsalehi requested review from ywangd and arteam January 8, 2025 11:15
@pxsalehi
Copy link
Member Author

pxsalehi commented Jan 8, 2025

I tried to reproduce this with an IT in 7339307, based on RepositoryS3StsCredentialsRestIT but I can't make it fail. I don't know why. It is pretty clear that the refresh call needs to be in a SocketAccess.doPrivilegedVoid since making the same call outside the onFileChanged would lead to the AccessControlException, and other refresh calls are also done as a privileged operation. AFAICT, the onFileChanged is not coming from a privileged call itself, and it does successfully lead to a call to the fixture!

In any case, I think the fix on its own is valid. The rest test is also not ideal even if it works, since it relies on periodic (60s) intervals to pick up the file change.

@pxsalehi pxsalehi marked this pull request as ready for review January 8, 2025 16:36
@elasticsearchmachine elasticsearchmachine added the Team:Distributed Coordination Meta label for Distributed Coordination team label Jan 8, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination)

Copy link
Contributor

@mhl-b mhl-b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ywangd
Copy link
Member

ywangd commented Jan 9, 2025

I tried to reproduce this with an IT in 7339307, based on RepositoryS3StsCredentialsRestIT but I can't make it fail. I don't know why

I think you may need test with real s3 to make it fail or a fixture that is not running on localhost. The permission check is for "connect,resolve". If a connection is already available in the pool, it may not trigger either. I had similar experience, see also #108280 (comment)

Copy link
Member

@ywangd ywangd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

It would be great if you can confirm this does fix the issue with some manual test if automated test is not feasible.

@pxsalehi
Copy link
Member Author

pxsalehi commented Jan 9, 2025

Thanks, both!

If a connection is already available in the pool

Yeah, you're right. That is the "why". As i mentioned, if I add an early refresh call, I can see the permission issue.

It would be great if you can confirm this does fix the issue with some manual test if automated test is not feasible.

A slightly different ordering of the dirty IT I had, does reproduce the exact exception and shows that the fix gets rid of it. See here. Considering this is about the permission issue not the STS stuff per se, I'm gonna leave it at that. An automated test would also not work in this specific setup due to how we check for the file change. I also don't think it is worth the hassle to setup a whole ECK with STS scenario.

@pxsalehi pxsalehi added the auto-backport Automatically create backport pull requests when merged label Jan 9, 2025
@pxsalehi pxsalehi enabled auto-merge (squash) January 9, 2025 10:33
@pxsalehi pxsalehi merged commit d18e329 into elastic:main Jan 9, 2025
16 checks passed
@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
8.x

pxsalehi added a commit to pxsalehi/elasticsearch that referenced this pull request Jan 9, 2025
elasticsearchmachine pushed a commit that referenced this pull request Jan 9, 2025
@ywangd
Copy link
Member

ywangd commented Jan 9, 2025

I suggest we backport this bug fix to 8.16 and 8.17 as well. What do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport Automatically create backport pull requests when merged >bug :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs Team:Distributed Coordination Meta label for Distributed Coordination team v8.16.4 v8.17.2 v8.18.0 v9.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Refreshing S3 web identity token fails due to missing socket permission
4 participants