Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS_ENDPOINT_URL not inferred by cloud I/O #18758

Closed
2 tasks done
hutch3232 opened this issue Sep 15, 2024 · 1 comment
Closed
2 tasks done

AWS_ENDPOINT_URL not inferred by cloud I/O #18758

hutch3232 opened this issue Sep 15, 2024 · 1 comment
Labels
bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars

Comments

@hutch3232
Copy link

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

import os
import polars as pl

os.environ["AWS_ENDPOINT_URL"] = "https://my-endpoint.com/"

pl.read_parquet("s3://my-bucket/my-parquet/*.parquet")

---------------------------------------------------------------------------
ComputeError                              Traceback (most recent call last)
Cell In[6], line 1
----> 1 pl.read_parquet("s3://my-bucket/my-parquet/*.parquet")

File /opt/conda/lib/python3.9/site-packages/polars/_utils/deprecation.py:91, in deprecate_renamed_parameter.<locals>.decorate.<locals>.wrapper(*args, **kwargs)
     86 @wraps(function)
     87 def wrapper(*args: P.args, **kwargs: P.kwargs) -> T:
     88     _rename_keyword_argument(
     89         old_name, new_name, kwargs, function.__qualname__, version
     90     )
---> 91     return function(*args, **kwargs)

File /opt/conda/lib/python3.9/site-packages/polars/_utils/deprecation.py:91, in deprecate_renamed_parameter.<locals>.decorate.<locals>.wrapper(*args, **kwargs)
     86 @wraps(function)
     87 def wrapper(*args: P.args, **kwargs: P.kwargs) -> T:
     88     _rename_keyword_argument(
     89         old_name, new_name, kwargs, function.__qualname__, version
     90     )
---> 91     return function(*args, **kwargs)

File /opt/conda/lib/python3.9/site-packages/polars/io/parquet/functions.py:209, in read_parquet(source, columns, n_rows, row_index_name, row_index_offset, parallel, use_statistics, hive_partitioning, glob, hive_schema, try_parse_hive_dates, rechunk, low_memory, storage_options, retries, use_pyarrow, pyarrow_options, memory_map)
    206     else:
    207         lf = lf.select(columns)
--> 209 return lf.collect()

File /opt/conda/lib/python3.9/site-packages/polars/lazyframe/frame.py:2032, in LazyFrame.collect(self, type_coercion, predicate_pushdown, projection_pushdown, simplify_expression, slice_pushdown, comm_subplan_elim, comm_subexpr_elim, cluster_with_columns, no_optimization, streaming, engine, background, _eager, **_kwargs)
   2030 # Only for testing purposes
   2031 callback = _kwargs.get("post_opt_callback", callback)
-> 2032 return wrap_df(ldf.collect(callback))

ComputeError: Generic S3 error: Error performing list request: Error after 1 retries in 10.103150752s, max_retries:2, retry_timeout:10s, source:error sending request for url (https://s3.us-east-1.amazonaws.com/my-bucket?list-type=2&prefix=my-parquet%2F)

However, specifying endpoint_url explicitly in storage_options does work:

pl.read_parquet("s3://my-bucket/my-parquet/*.parquet",
                storage_options={"endpoint_url": "https://my-endpoint.com/"})

Log output

No response

Issue description

Related issue: #18420

The docs for polars.read_parquet says:

If storage_options is not provided, Polars will try to infer the information from environment variables.

And it appears that AWS_ENDPOINT_URL should be supported since aws_endpoint_url is listed as a valid key for storage_options here: https://docs.rs/object_store/latest/object_store/aws/enum.AmazonS3ConfigKey.html#variant.Endpoint

AWS docs on configuring the endpoint: https://docs.aws.amazon.com/cli/v1/userguide/cli-configure-endpoints.html#endpoints-global

Expected behavior

polars read_* and write_* functions for cloud URLs should pick up AWS_ENDPOINT_URL if specified.

Installed versions

1.7.1

@hutch3232 hutch3232 added bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars labels Sep 15, 2024
@nameexhaustion
Copy link
Collaborator

This should work on the latest release (1.21.0)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars
Projects
None yet
Development

No branches or pull requests

2 participants