1.21.0 is 5-9 times slower than 1.17.0 on collect on concattenated Azure blob parquet files. #20959
Open
2 tasks done
Labels
bug
Something isn't working
needs triage
Awaiting prioritization by a maintainer
python
Related to Python Polars
Checks
Reproducible example
Per this comment on #13381, I have a large collection of parquet blobs I can't be certain that all have the same column order so I create lazy frames like this, sometimes with hundreds of parquet files concatted:
Since 1.20.0, operations on these lazy frames are 5-9 times slower than 1.17. The speed is fast without
concat
- if I have consistent columns and so I can avoid usingpl.concat
Log output
Issue description
Speed of collect operations on diagonally concatenated parquet blobs from Azure storage are slower than 1.17.0
Expected behavior
The speed be the same as 1.17.0, before #20610 fixed the scan_parquet filter crash.
Installed versions
The text was updated successfully, but these errors were encountered: