Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workload identity federation doesn't support full aws credential sources. #1408

Open
ksauzz opened this issue May 22, 2024 · 4 comments
Open
Assignees
Labels
priority: p2 Moderately-important priority. Fix may not be included in next release. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.

Comments

@ksauzz
Copy link

ksauzz commented May 22, 2024

InternalAwsSecurityCredentialsSupplier only support environment variables or EC2 metadata server to get AWS credential.

In my usecase, I can't use workload identity federation from AWS Glue (spark) to load data to BigQuery table using spark-bigquery-connector. This spark environment has no EC2 metadata endpoint, and spark driver process' environment variables cannot be updated from a job.

Environment details

AWS Glue 4.0 (spark) + pyspark

Steps to reproduce

  1. Prepare workload identity federation settings
  2. run AWS Glue job

External references such as API reference guides

Any additional information below

I think AWS SDKs including aws-sdk-java provide comprehensive ways to get credential from various AWS environments, so it would be nice to use DefaultCredentialsProvider or something instead of custom implementation in this library. But I guess google team wouldn't like to use such other vendor library...

DefaultCredentialsProvider's docs

AWS credentials provider chain that looks for credentials in this order:

  1. Java System Properties - aws.accessKeyId and aws.secretKey
  2. Environment Variables - AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY
  3. Credential profiles file at the default location (~/.aws/credentials) shared by all AWS SDKs and the AWS CLI
  4. Credentials delivered through the Amazon EC2 container service if AWS_CONTAINER_CREDENTIALS_RELATIVE_URI" environment variable is set and security manager has permission to access the variable,
  5. Instance profile credentials delivered through the Amazon EC2 metadata service
@lsirac
Copy link
Contributor

lsirac commented May 23, 2024

Hi @ksauzz, you can supply your own custom AWS credential supplier to the library that handles your use case. See here.

@ksauzz
Copy link
Author

ksauzz commented May 24, 2024

I think It doesn't work for spark-bigquery-connector because the connector doesn't the config item to change the supplier. I hope core auth library would have this functionality without any patches by users. Otherwise, GCP users have to make a patch to each google libraries involving google-auth-library-java.
Thank you.

@lsirac lsirac added the type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. label May 28, 2024
@GrigorievNick
Copy link

GrigorievNick commented Oct 17, 2024

Hi @lsirac, it's impossible to use custom awsSecurityCredentialsSupplier. There is no way to create it with the GoogleCredentials class from the file.

The issue:
While AwsCredentials class support supplier.
This class is wrapped by ExternalAccountCredentials. When it creates AWS credentials from file_type config, it does not allow specifying awsSecurityCredentialsSupplier as property.

Besides that, you can write custom ExternalCredentials, cause its list of supported types hardcoded in GoogleCredentials class.

Yep, I agree with @ksauzz that we need to fix this part to specify different suppliers.
Otherwise, when you use a library like BQ Spark connector or hadoop-gcs, you always need to override AcccesTokenProvider on the library level, which usually means copying 90% of google-auth-java-lib for AWS but with a different AWS credentials provider.
Because by default, Google lib knows how to take credentials from ENV variable and Ec2Metadata only.

P.S.
Also, I am ready to contribute changes to ExternalAccountCredentials to allow users to choose whether to use AwsCredentialSource or awsSecurityCredentialsSupplier.
But the only way to do it is to create awsSecurityCredentialsSupplier with reflection.
And this will bring new restrictions to API. Constructor Argument restrictions.
There are two possible restrictions.

  • Constructor is always empty; in this case, we always create awsSecurityCredentialsSupplier class with the empty constructor. Then, we need an additional function that will take credentialSourceMap if we want to pass extra arguments to our awsSecurityCredentialsSupplier implementation.
  • Constructor always takes credentialSourceMap as the argument; in this case, we always create the awsSecurityCredentialsSupplier class with a static argument.

Theoretically, we can support both cases, but this makes API even less clean for me. But flexible.

P.P.S.
Why are extra arguments required?
-> There can be arguments for different authorization mechanisms that AWS supports.
For example, if you want to use the AWS Assume Role feature, awsSecurityCredentialsSupplier must take the AWS ARN role name to assume.

@GrigorievNick
Copy link

By the way, in our case environment is AWS EMR-S, and we use it to populate data in BQ and GCS.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority: p2 Moderately-important priority. Fix may not be included in next release. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.
Projects
None yet
Development

No branches or pull requests

4 participants