Polars is a blazingly fast DataFrames library implemented in Rust. Its memory model uses Apache Arrow as backend.
It currently consists of an eager API similar to pandas and a lazy API that is somewhat similar to spark. Amongst more, Polars has the following functionalities.
To learn more about the inner workings of Polars read the WIP book.
Polars is currently transitioning from py-polars
to polars
. Some docs may still refer the old name.
We're working towards a new 0.7.0
release. For the
mean time install a pre-release version. This will likely be more stable than 0.6.7
.
Install the latest pre-release version with:
$ pip install polars==0.7.0-beta.1
Functionality | Eager | Lazy (DataFrame) | Lazy (Series) |
---|---|---|---|
Filters | ✔ | ✔ | ✔ |
Shifts | ✔ | ✔ | ✔ |
Joins | ✔ | ✔ | |
GroupBys + aggregations | ✔ | ✔ | |
Comparisons | ✔ | ✔ | ✔ |
Arithmetic | ✔ | ✔ | |
Sorting | ✔ | ✔ | ✔ |
Reversing | ✔ | ✔ | ✔ |
Closure application (User Defined Functions) | ✔ | ✔ | |
SIMD | ✔ | ✔ | |
Pivots | ✔ | ✗ | |
Melts | ✔ | ✗ | |
Filling nulls + fill strategies | ✔ | ✗ | ✔ |
Aggregations | ✔ | ✔ | ✔ |
Moving Window aggregates | ✔ | ✗ | ✗ |
Find unique values | ✔ | ✗ | |
Rust iterators | ✔ | ✔ | |
IO (csv, json, parquet, Arrow IPC | ✔ | ✗ | |
Query optimization: (predicate pushdown) | ✗ | ✔ | |
Query optimization: (projection pushdown) | ✗ | ✔ | |
Query optimization: (type coercion) | ✗ | ✔ | |
Query optimization: (simplify expressions) | ✗ | ✔ | |
Query optimization: (aggregate pushdown) | ✗ | ✔ |
Note that almost all eager operations supported by Eager on Series
/ChunkedArrays
can be used in Lazy via UDF's
Want to know about all the features Polars support? Read the docs!
- installation guide:
pip install polars
- the book
- Reference guide
Polars is written to be performant, and it is! But don't take my word for it, take a look at the results in h2oai's db-benchmark.
Additional cargo features:
temporal (default)
- Conversions between Chrono and Polars for temporal data
simd (nightly)
- SIMD operations
parquet
- Read Apache Parquet format
json
- Json serialization
ipc
- Arrow's IPC format serialization
random
- Generate array's with randomly sampled values
ndarray
- Convert from
DataFrame
tondarray
- Convert from
lazy
- Lazy api
strings
- String utilities for
Utf8Chunked
- String utilities for
object
- Support for generic ChunkedArray's called
ObjectChunked<T>
(generic overT
). These will downcastable from Series through the Any trait.
- Support for generic ChunkedArray's called
parallel
- ChunkedArrays can be used by rayon::par_iter()
[plain_fmt | pretty_fmt]
(mutually exclusive)- one of them should be chosen to fmt DataFrames.
pretty_fmt
can deal with overflowing cells and looks nicer but has more dependencies.plain_fmt (default)
is plain formatting.
- one of them should be chosen to fmt DataFrames.
Want to contribute? Read our contribution guideline.
POLARS_PAR_SORT_BOUND
-> Sets the lower bound of rows at which Polars will use a parallel sorting algorithm. Default is 1M rows.POLARS_FMT_MAX_COLS
-> maximum number of columns shown when formatting DataFrames.POLARS_FMT_MAX_ROWS
-> maximum number of rows shown when formatting DataFrames.POLARS_TABLE_WIDTH
-> width of the tables used during DataFrame formatting.POLARS_MAX_THREADS
-> maximum number of threads used in join algorithm. Default is unbounded.
If you want a bleeding edge release or maximal performance you should compile py-polars from source.
This can be done by going through the following steps in sequence:
- install the latest rust compiler
$ pip3 install maturin
$ cd py-polars && maturin develop --release
Note that the Rust crate implementing the Python bindings is called py-polars
to distinguish from the wrapped
Rust crate polars
itself. However, both the Python package and the Python module are named polars
, so you
can pip install polars
and import polars
(previously, these were called py-polars
and pypolars
).
Development of Polars is proudly powered by