Polars is a blazingly fast DataFrames library implemented in Rust. Its memory model uses Apache Arrow as backend.
It currently consists of an eager API similar to pandas and a lazy API that is somewhat similar to spark. Amongst more, Polars has the following functionalities.
To learn more about the inner workings of Polars read the User Guide (wip).
Polars cannot deploy a new version to crates.io
until a new arrow release is issued. Arrow's release cycle takes 3/4
months which is a lot slower than I'd like to release. If it has been a while since a release is issued, it is recommended
to use the current master
branch instead of the published version on crates.io
.
You can add the master like this:
polars = {version="0.13.0", git = "https://github.com/ritchie46/polars" }
Or by fixing to a specific version:
polars = {version="0.13.0", git = "https://github.com/ritchie46/polars", rev = "<optional git tag>" }
Required Rust version >=1.51
Polars is currently transitioning from py-polars
to polars
. Some docs may still refer the old name.
Install the latest polars version with:
$ pip3 install polars
Functionality | Eager | Lazy (DataFrame) | Lazy (Series) |
---|---|---|---|
Filters | ✔ | ✔ | ✔ |
Shifts | ✔ | ✔ | ✔ |
Joins | ✔ | ✔ | |
GroupBys + aggregations | ✔ | ✔ | |
Comparisons | ✔ | ✔ | ✔ |
Arithmetic | ✔ | ✔ | |
Sorting | ✔ | ✔ | ✔ |
Reversing | ✔ | ✔ | ✔ |
Closure application (User Defined Functions) | ✔ | ✔ | |
SIMD | ✔ | ✔ | |
Pivots | ✔ | ✗ | |
Melts | ✔ | ✗ | |
Filling nulls + fill strategies | ✔ | ✗ | ✔ |
Aggregations | ✔ | ✔ | ✔ |
Moving Window aggregates | ✔ | ✗ | ✗ |
Find unique values | ✔ | ✗ | |
Rust iterators | ✔ | ✔ | |
IO (csv, json, parquet, Arrow IPC | ✔ | ✗ | |
Query optimization: (predicate pushdown) | ✗ | ✔ | |
Query optimization: (projection pushdown) | ✗ | ✔ | |
Query optimization: (type coercion) | ✗ | ✔ | |
Query optimization: (simplify expressions) | ✗ | ✔ | |
Query optimization: (aggregate pushdown) | ✗ | ✔ |
Note that almost all eager operations supported by Eager on Series
/ChunkedArrays
can be used in Lazy via UDF's
Want to know about all the features Polars support? Read the docs!
- installation guide:
$ pip3 install polars
- User Guide
- Reference guide
Polars is written to be performant, and it is! But don't take my word for it, take a look at the results in h2oai's db-benchmark.
Additional cargo features:
temporal (default)
- Conversions between Chrono and Polars for temporal data
simd (nightly)
- SIMD operations
parquet
- Read Apache Parquet format
json
- Json serialization
ipc
- Arrow's IPC format serialization
random
- Generate array's with randomly sampled values
ndarray
- Convert from
DataFrame
tondarray
- Convert from
lazy
- Lazy api
strings
- String utilities for
Utf8Chunked
- String utilities for
object
- Support for generic ChunkedArray's called
ObjectChunked<T>
(generic overT
). These will downcastable from Series through the Any trait.
- Support for generic ChunkedArray's called
[plain_fmt | pretty_fmt]
(mutually exclusive)- one of them should be chosen to fmt DataFrames.
pretty_fmt
can deal with overflowing cells and looks nicer but has more dependencies.plain_fmt (default)
is plain formatting.
- one of them should be chosen to fmt DataFrames.
Want to contribute? Read our contribution guideline.
POLARS_PAR_SORT_BOUND
-> Sets the lower bound of rows at which Polars will use a parallel sorting algorithm. Default is 1M rows.POLARS_FMT_MAX_COLS
-> maximum number of columns shown when formatting DataFrames.POLARS_FMT_MAX_ROWS
-> maximum number of rows shown when formatting DataFrames.POLARS_TABLE_WIDTH
-> width of the tables used during DataFrame formatting.POLARS_MAX_THREADS
-> maximum number of threads used in join algorithm. Default is unbounded.POLARS_VERBOSE
-> print logging info to stderr
If you want a bleeding edge release or maximal performance you should compile py-polars from source.
This can be done by going through the following steps in sequence:
- install the latest rust compiler
$ pip3 install maturin
- Choose any of:
- Very long compile times, fastest binary:
$ cd py-polars && maturin develop --rustc-extra-args="-C target-cpu=native" --release
- Shorter compile times, fast binary:
$ cd py-polars && maturin develop --rustc-extra-args="-C codegen-units=16 lto=no target-cpu=native" --release
Note that the Rust crate implementing the Python bindings is called py-polars
to distinguish from the wrapped
Rust crate polars
itself. However, both the Python package and the Python module are named polars
, so you
can pip install polars
and import polars
(previously, these were called py-polars
and pypolars
).
Development of Polars is proudly powered by