-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Insights: pola-rs/polars
Overview
Could not load contribution data
Please try again later
58 Pull requests merged by 19 people
-
fix: Quadratic allocations when loading nested Parquet column metadata
#21050 merged
Feb 3, 2025 -
fix: Invalidate sortedness flag when sorting from pl.Categorical to pl.Categorical("lexical")
#21044 merged
Feb 3, 2025 -
feat(python): Add CredentialProviderAzure parameter to accept user-instantiated azure credential classes
#21047 merged
Feb 3, 2025 -
feat(python): Expose unity catalog dataclasses and type aliases
#21046 merged
Feb 3, 2025 -
chore: Fix new ruff lints
#21040 merged
Feb 2, 2025 -
fix: Calling
top_k
on list type panics#21043 merged
Feb 2, 2025 -
fix: Fix rolling on empty DataFrame panicking
#21042 merged
Feb 2, 2025 -
feat: Support max/min method for Time dtype
#19815 merged
Feb 1, 2025 -
docs(python): Add example showing use of
write_delta
withdelta_lake.WriterProperties
#20746 merged
Feb 1, 2025 -
docs(python): Add missing
shape
param toArray
docstring#20747 merged
Feb 1, 2025 -
fix: Fix
set_tbl_width_chars
panicking with negative width#20906 merged
Feb 1, 2025 -
test(python): Added test to check for the computation of list.len for null
#20938 merged
Feb 1, 2025 -
feat: Implement a streaming merge sorted node
#20960 merged
Feb 1, 2025 -
chore(python): Bump mkdocs-material from 9.5.27 to 9.6.1 in /docs in the documentation group
#21036 merged
Feb 1, 2025 -
fix(python): Ensure
write_excel
recognises the Array dtype and writes it out as a string#20994 merged
Feb 1, 2025 -
refactor(rust): Use string-based keyboard interrupt panic detection
#21030 merged
Feb 1, 2025 -
docs(python): Add IO plugins to Python API reference
#21028 merged
Jan 31, 2025 -
perf: Remove cast to boolean after comparison in optimizer
#21022 merged
Jan 31, 2025 -
perf: Split last rowgroup among all threads in new-streaming parquet reader
#21027 merged
Jan 31, 2025 -
feat: Automatically use temporary credentials API for scanning Unity catalog tables
#21020 merged
Jan 31, 2025 -
chore: Add make fix for running cargo clippy --fix
#21024 merged
Jan 31, 2025 -
fix: Fix
merge_sorted
producing incorrect results or panicking for some logical types#21018 merged
Jan 31, 2025 -
fix: Fix all-null list aggregations returning Null dtype
#20992 merged
Jan 31, 2025 -
refactor(rust): Spawn threads on our rayon pool in new-streaming
#21012 merged
Jan 30, 2025 -
perf: Recombine into larger morsels in new-streaming join
#21008 merged
Jan 30, 2025 -
docs: Explicitly call out that the GPU open beta runs on a single GPU
#21000 merged
Jan 30, 2025 -
feat: Add negative slice support to new-streaming engine
#21001 merged
Jan 30, 2025 -
test: Add tests for resolved issues
#20999 merged
Jan 30, 2025 -
ci: Update code coverage workflow to use macos-latest runners
#20995 merged
Jan 30, 2025 -
fix: Ensure scalar-only with_columns are broadcasted on new-streaming
#20983 merged
Jan 30, 2025 -
feat: Allow for more RG skipping by rewriting expr in planner
#20828 merged
Jan 30, 2025 -
refactor(rust): Remove unnecessary unsafe around warning function
#20985 merged
Jan 30, 2025 -
feat: Rename catalog
schema
tonamespace
#20993 merged
Jan 30, 2025 -
docs: Document IO plugins
#20982 merged
Jan 29, 2025 -
chore: Remove unused arrow file
#20974 merged
Jan 29, 2025 -
perf: Improve
list.min
andlist.max
performance for logical types#20972 merged
Jan 29, 2025 -
refactor(rust): Remove thiserror dependency
#20979 merged
Jan 29, 2025 -
feat: Add functionality to create and delete catalogs, tables and schemas to Unity catalog client
#20956 merged
Jan 29, 2025 -
feat(python): Allow custom JSONEncoder for the
json_normalize
function, minor speedup#20966 merged
Jan 29, 2025 -
chore: Deprecate the old streaming engine
#20949 merged
Jan 29, 2025 -
fix: Improve SQL interface behaviour when
INTERVAL
is not a fixed duration#20958 merged
Jan 29, 2025 -
fix(python): Address minor regression for one-column DataFrame passed to
is_in
expressions#20948 merged
Jan 29, 2025 -
fix: Add Arrow Float16 conversion DataType
#20970 merged
Jan 29, 2025 -
feat(python): Support passing
aws_profile
instorage_options
#20965 merged
Jan 29, 2025 -
fix(rust): Feature-gate
ClosedWindow
#20963 merged
Jan 29, 2025 -
feat: Improved support for KeyboardInterrupts
#20961 merged
Jan 29, 2025 -
fix: Revert length check of
patterns
instr.extract_many()
#20953 merged
Jan 29, 2025 -
chore: Rename over args
#20952 merged
Jan 28, 2025 -
fix: Add maintain order for flaky new-streaming test
#20954 merged
Jan 28, 2025 -
feat(python): Make the available
concat
alignment strategies more generic#20644 merged
Jan 28, 2025 -
refactor: Extract merge sorted IR node
#20939 merged
Jan 28, 2025 -
feat: Extract timezone info from python datetimes
#20822 merged
Jan 28, 2025 -
chore: Update copyright year
#20764 merged
Jan 28, 2025 -
fix: Allow for respawning of new streaming sinks
#20934 merged
Jan 28, 2025 -
refactor: Move Parquet deserialization to
BitmapBuilder
#20896 merged
Jan 28, 2025 -
chore: Also publish polars-python
#20933 merged
Jan 28, 2025 -
feat: Add hint for
POLARS_AUTO_USE_AZURE_STORAGE_ACCOUNT_KEY
to error message#20942 merged
Jan 28, 2025
15 Pull requests opened by 10 people
-
feat(python): WIP: Add GPU engine to sink_csv
#20940 opened
Jan 28, 2025 -
feat: Add `linear_spaces`
#20941 opened
Jan 28, 2025 -
perf: Speed up list operations that use amortized_iter()
#20964 opened
Jan 28, 2025 -
feat(python): Improve `df.corr`, add "spearman" method and row labels, align with `pl.corr`
#20967 opened
Jan 29, 2025 -
refactor: Switch to new multifile
#20973 opened
Jan 29, 2025 -
fix(python): Ensure `lit` handles datetimes with tzinfo that represents a fixed offset from UTC
#21003 opened
Jan 30, 2025 -
feat(rust): Allow setting custom client options
#21007 opened
Jan 30, 2025 -
perf: Experiments in trying to speed up CI
#21010 opened
Jan 30, 2025 -
feat: Multi/Hive scans in new streaming engine
#21011 opened
Jan 30, 2025 -
fix: Don't silently produce null values from invalid input to `pl.datetime` and `pl.date`
#21013 opened
Jan 30, 2025 -
chore(python): Bump the python group in /py-polars with 2 updates
#21035 opened
Feb 1, 2025 -
build: Bump the rust group with 7 updates
#21037 opened
Feb 1, 2025 -
feat: Hold string cache in new streaming engine
#21039 opened
Feb 1, 2025 -
feat: Implement `merge_sorted` for binary
#21045 opened
Feb 2, 2025 -
fix: Coerce types for join keys after join_where rewrite
#21049 opened
Feb 3, 2025
53 Issues closed by 7 people
-
High memory usage reading Parquet files with many struct fields
#21031 closed
Feb 3, 2025 -
Casting pl.Categorical("physical") to pl.Categorical("lexical") doesn't invalidate sortedness flag
#20864 closed
Feb 3, 2025 -
Allow to explicitly specify Azure `TokenCredential` in `storage_options`
#20635 closed
Feb 3, 2025 -
Panic when calling `top_k` on list-of-lists type
#17225 closed
Feb 2, 2025 -
`group_by` rolling on time with an empty `DataFrame` throws `PanicException`
#21032 closed
Feb 2, 2025 -
Docs on `write_delta` parameter `delta_write_options` to show how to pass `WriterProperties`
#20739 closed
Feb 1, 2025 -
PanicException when printing df with Config `set_tbl_width_chars(-1)`
#18386 closed
Feb 1, 2025 -
computation of list.len for null list seems incorrect
#18522 closed
Feb 1, 2025 -
Polars use nest_asyncio
#17334 closed
Feb 1, 2025 -
Add IO Plugins to API reference
#20996 closed
Jan 31, 2025 -
Bad query plan produced when `all_horizontal` is used with `join_where`
#21009 closed
Jan 31, 2025 -
Incorrect overflowing cast of literals containing Series
#21023 closed
Jan 31, 2025 -
Decimal incorrect result for merge sorted
#20990 closed
Jan 31, 2025 -
Decimal panics for in merge sorted
#20989 closed
Jan 31, 2025 -
Merge sorted throws panic for categorical
#20987 closed
Jan 31, 2025 -
polars List type columns accessors not returning the right dtype
#17361 closed
Jan 31, 2025 -
Result of numerical operations depends on chunking of series
#21016 closed
Jan 30, 2025 -
Add `nulls_last` parameter to `over`
#18419 closed
Jan 30, 2025 -
Extend python Enum interop to DataFrame/Series/Enum construction
#18018 closed
Jan 30, 2025 -
Add `pl.concat_arr`: similar to `pl.concat_list` but returns an array data type
#13846 closed
Jan 30, 2025 -
Support pl.is_in expression in streaming mode
#17514 closed
Jan 30, 2025 -
Add `pl.concat_arr`
#14540 closed
Jan 30, 2025 -
Add `tree_format` for `LogicalPlan`
#11917 closed
Jan 30, 2025 -
Allow passing schema argument to scan_parquet()
#11853 closed
Jan 30, 2025 -
Implement methods like "get", "first", "last" from list namespace in Array namespace
#9718 closed
Jan 30, 2025 -
Add a parameter to join to upcast join columns
#10137 closed
Jan 30, 2025 -
Allow `StringNameSpace::split()` to take an expression as its first argument
#7683 closed
Jan 30, 2025 -
panic calling `collect_schema` on lazy group_by + map_batches
#17327 closed
Jan 30, 2025 -
Support `concat()` with `how="diagonal"` for `Object` type
#14651 closed
Jan 30, 2025 -
Consistently support Python types as aliases for polars types
#13117 closed
Jan 30, 2025 -
groupby for list and struct type columns
#4175 closed
Jan 30, 2025 -
`cast` of float to `pl.Decimal` silently fails but also changes float values
#12775 closed
Jan 30, 2025 -
Rust tests do not run on GitHub runner `macos-latest`
#15917 closed
Jan 30, 2025 -
`with_columns` with reduction differs between `collect` and `collect(new_streaming=True)`
#20930 closed
Jan 30, 2025 -
New-streaming engine `with_columns` with exclusively scalar output collapses length
#20981 closed
Jan 30, 2025 -
Feature request: allow user to use a boto3-native credential provider/session for s3 etc.
#15838 closed
Jan 30, 2025 -
Support lazy schema retrieval in IO Plugins
#18638 closed
Jan 29, 2025 -
Allow other `json` callable (like `orjson.json`) used for loading JSON data
#20950 closed
Jan 29, 2025 -
`is_in` throws a `ComputeError` shape mismatch when comparing with dataframe
#20937 closed
Jan 29, 2025 -
Creating DataFrame from empty arrow table with float16 fails
#20946 closed
Jan 29, 2025 -
`AWS_PROFILE` should be supported in cloud storage I/O config
#18757 closed
Jan 29, 2025 -
Cannot compile `polars-plan` with `range` feature enabled
#20955 closed
Jan 29, 2025 -
str.extract_many throws ComputeError in Polars 1.21.0
#20962 closed
Jan 29, 2025 -
str.contains_any losing matches when adding search queries
#20968 closed
Jan 29, 2025 -
`AWS_ENDPOINT_URL` not inferred by cloud I/O
#18758 closed
Jan 29, 2025 -
Issue reading S3 files
#18907 closed
Jan 29, 2025 -
Support for identity based access to Azure using DefaultAzureCredential
#18931 closed
Jan 29, 2025 -
Allow Overriding Object Store Credential Provider
#18979 closed
Jan 29, 2025 -
Feature Request: Add `how="align_left"` to `pl.concat()` for faster alignment
#20637 closed
Jan 28, 2025 -
Data type List(Datetime) omits the timezone, even if inputs have a set timezone
#19509 closed
Jan 28, 2025 -
Polars does not retain timezone information when reading data from a nested dictionary
#20766 closed
Jan 28, 2025 -
Parquet reading regression from 1.17.0 on
#20298 closed
Jan 28, 2025 -
`scan_csv()` in container fails with disk space error (e.g. AWS lambda, or container)
#17946 closed
Jan 28, 2025
39 Issues opened by 34 people
-
Warning about sortedness doesn't appear in example of `join_asof()`
#21051 opened
Feb 3, 2025 -
bug: `join` may result in duplicate column names
#21048 opened
Feb 3, 2025 -
`iter_batches` support for `read_database_uri` with connectorx
#21041 opened
Feb 1, 2025 -
Rolling mean operation is ignored after diff
#21038 opened
Feb 1, 2025 -
Consider adding Narwhals to Ecosystem page
#21033 opened
Jan 31, 2025 -
`unpivot` after incorrect `unnest` PanicException
#21029 opened
Jan 31, 2025 -
Merge sorted produces wrong result for lexical categorical
#21025 opened
Jan 31, 2025 -
`col` not pickleable
#21021 opened
Jan 31, 2025 -
Have df.head() ignore set_tbl_rows Config
#21019 opened
Jan 31, 2025 -
Automatically parse datetimes with 12-hour time (AM/PM)
#21017 opened
Jan 30, 2025 -
Support `adjust=True` in `ewm_mean_by`
#21015 opened
Jan 30, 2025 -
`list.any/all` block predicate pushdown
#21014 opened
Jan 30, 2025 -
read_csv followed by type conversion panics above a certain number of rows
#21006 opened
Jan 30, 2025 -
OverflowError when initializing DataFrame with large integers using `orient="row"` and Float64
#21005 opened
Jan 30, 2025 -
If weakrefs are ever used with Series's inner Arc, Series._get_inner_mut() could start panicing
#21004 opened
Jan 30, 2025 -
Errors on file close are silently ignored
#21002 opened
Jan 30, 2025 -
Allow native Delta reader to prune partitions before scanning underlying Parquet
#20998 opened
Jan 30, 2025 -
Support using s3 urls of the shape `https://<bucket>.s3.<region>.amazonaws.com/<key>`
#20997 opened
Jan 30, 2025 -
Merge sorted null behavior is undocumented
#20991 opened
Jan 29, 2025 -
Merge sorted not implemented for binary
#20988 opened
Jan 29, 2025 -
Merge sorted not implemented for structs
#20986 opened
Jan 29, 2025 -
DataFrame construction with list vs non-list changes integer dtype
#20984 opened
Jan 29, 2025 -
Python polars write database Time zone issue
#20980 opened
Jan 29, 2025 -
Reading geoparquet files from geopandas
#20978 opened
Jan 29, 2025 -
`pl.datetime` does not raise with argument values outside of the specified domain
#20977 opened
Jan 29, 2025 -
Rust API - Reading a 1 billion file into data frame in parallel
#20976 opened
Jan 29, 2025 -
Example of converting Data frame to json in Rust API
#20975 opened
Jan 29, 2025 -
Allow scan functions to read storage options from UPath/S3Path
#20971 opened
Jan 29, 2025 -
1.21.0 is 5-9 times slower than 1.17.0 on collect on concattenated Azure blob parquet files.
#20959 opened
Jan 28, 2025 -
Improve `df.corr`: remove numpy; add method='spearman'; add labels; make consistent with `pl.corr`
#20957 opened
Jan 28, 2025 -
Varying quantile by group is broken
#20951 opened
Jan 28, 2025 -
Tracking issue for the new streaming engine
#20947 opened
Jan 28, 2025 -
scan_delta & read_delta fails when path contains space
#20944 opened
Jan 28, 2025 -
[Tracking] Flaky tests for new-streaming in CI
#20943 opened
Jan 28, 2025 -
`read_csv` fails schema inference even when `schema_overrides` provides valid dtype
#20936 opened
Jan 27, 2025 -
join_where query normalisation doesn't run type-coercion pass
#20935 opened
Jan 27, 2025 -
Support `lit` Expression with `replace/_strict`
#20932 opened
Jan 27, 2025 -
`group_by` with categoricals fails with `new_streaming=True`
#20931 opened
Jan 27, 2025
50 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
feat: Add `list.pad_start()`
#20674 commented on
Feb 2, 2025 • 7 new comments -
feat(rust,python): Add config to specify GPU polars as the default engine
#20717 commented on
Jan 29, 2025 • 5 new comments -
fix(python): Throw exception if dataframe is too large to be compatible with Excel
#20900 commented on
Jan 29, 2025 • 1 new comment -
Failure when attempting to parse pendulum.DateTime into DataFrames
#20544 commented on
Jan 30, 2025 • 0 new comments -
Support adding prefixes to `DataFrame.unnest`
#9790 commented on
Jan 30, 2025 • 0 new comments -
map_group sometimes can't deduce schema
#20774 commented on
Jan 30, 2025 • 0 new comments -
RecordBatch requires all its arrays to have an equal number of rows when running pipeline in streaming mode
#18599 commented on
Jan 30, 2025 • 0 new comments -
Allocated memory is not being reused
#20708 commented on
Jan 30, 2025 • 0 new comments -
Polars gets progressively slower when doing operations in repeated cycles
#20256 commented on
Jan 30, 2025 • 0 new comments -
Add example to contains
#20581 commented on
Jan 30, 2025 • 0 new comments -
`JoinType::Cross` can be used even when `cross_join` feature is not enabled
#20912 commented on
Jan 31, 2025 • 0 new comments -
Excessive Memory Usage after 1.15.0
#20218 commented on
Jan 31, 2025 • 0 new comments -
Option to automatically set column prefix when using `unnest()`
#19778 commented on
Feb 1, 2025 • 0 new comments -
write_excel writes empty file if >1M rows
#20870 commented on
Feb 2, 2025 • 0 new comments -
Provide a "natural" sort for string columns
#14862 commented on
Feb 2, 2025 • 0 new comments -
Feature request: Faster backward- and forward_fill() functions
#16875 commented on
Feb 3, 2025 • 0 new comments -
`LazyFrame.collect_schema()` cannot resolve the column type after application of a `numpy` 'ufunc'
#17422 commented on
Feb 3, 2025 • 0 new comments -
Support sink_parquet for anonymous scan
#8719 commented on
Feb 3, 2025 • 0 new comments -
`corr` with ignoring null values
#11062 commented on
Feb 3, 2025 • 0 new comments -
docs: Add example of unpivoting multiple sets of columns (#18513)
#18519 commented on
Feb 1, 2025 • 0 new comments -
feat(python): Add a `show` method to DataFrame and LazyFrame
#19634 commented on
Jan 31, 2025 • 0 new comments -
perf: Provide a general fast path for `arg_sort_multiple`.
#20444 commented on
Feb 1, 2025 • 0 new comments -
.list.index_of_in() architectural review PR
#20733 commented on
Jan 28, 2025 • 0 new comments -
feat(python): Add frame-level `all_horizontal` / `any_horizontal`
#20790 commented on
Jan 28, 2025 • 0 new comments -
docs(rust): Add example of list creation in user guide
#20854 commented on
Jan 29, 2025 • 0 new comments -
Join on enums fails to match when sinking to parquet
#20916 commented on
Jan 27, 2025 • 0 new comments -
Parsing M.YYYY dates (without leading zero) raises instead of parsing
#20924 commented on
Jan 27, 2025 • 0 new comments -
polars df.write_database with progress
#20686 commented on
Jan 27, 2025 • 0 new comments -
Improved business day support
#20884 commented on
Jan 27, 2025 • 0 new comments -
Add `linear_spaces` (being to `linear_space` what `int_ranges` is to `int_range`)
#20922 commented on
Jan 27, 2025 • 0 new comments -
Support scan with hive partitioning in GPU engine
#20577 commented on
Jan 27, 2025 • 0 new comments -
Allow inserting or moving columns before or after a specific column
#13233 commented on
Jan 27, 2025 • 0 new comments -
`_arrow_c_stream` does not work with `pl.from_arrow()` but does with `pl.DataFrame`
#20872 commented on
Jan 28, 2025 • 0 new comments -
Allow for schema evolution in scan_parquet/_csv/...
#20926 commented on
Jan 28, 2025 • 0 new comments -
PanicException after simple filter operation with LazyFrame on big dataset
#20894 commented on
Jan 28, 2025 • 0 new comments -
Enable multithread zstd compression in polars-parquet
#15568 commented on
Jan 28, 2025 • 0 new comments -
When should join_where be considered stable?
#20848 commented on
Jan 28, 2025 • 0 new comments -
scan_parquet panics when file is bigger than 2^32 but materialized query isn't. duckdb and pyarrow can do query.
#20777 commented on
Jan 28, 2025 • 0 new comments -
read_csv_batched should return an instance of an iterator
#13885 commented on
Jan 28, 2025 • 0 new comments -
Add an error handling mechanism for collect_all
#20835 commented on
Jan 28, 2025 • 0 new comments -
Cannot pass GCP access token in storage_options (Google Cloud)
#13138 commented on
Jan 29, 2025 • 0 new comments -
Panic on concat_arr method when using Enums
#20917 commented on
Jan 29, 2025 • 0 new comments -
`cs.by_name` fails to return columns in specified order when passed `require_all=False`
#19384 commented on
Jan 29, 2025 • 0 new comments -
removing fsspec in python in favour of object_store in rust
#11056 commented on
Jan 29, 2025 • 0 new comments -
Wrong results with `steaming=True` when concat dataframes.
#20833 commented on
Jan 29, 2025 • 0 new comments -
Allow skipping non-matching schema in json_decode
#19847 commented on
Jan 29, 2025 • 0 new comments -
Inconsistent API of join_asof
#18496 commented on
Jan 29, 2025 • 0 new comments -
DstTzInfo is not a fixed offset timezone
#20898 commented on
Jan 30, 2025 • 0 new comments -
Add polars selectors in filters example in the doc
#20360 commented on
Jan 30, 2025 • 0 new comments -
Create Golang binding
#20269 commented on
Jan 30, 2025 • 0 new comments