Polars is a blazingly fast DataFrames library implemented in Rust. Its memory model uses Apache Arrow as backend.
It currently consists of an eager API similar to pandas and a lazy API that is somewhat similar to spark. Amongst more, Polars has the following functionalities.
Functionality | Eager | Lazy |
---|---|---|
Filters | ✔ | ✔ |
Shifts | ✔ | ✔ |
Joins | ✔ | ✔ |
GroupBys + aggregations | ✔ | ✔ |
Comparisons | ✔ | ✔ |
Arithmetic | ✔ | ✔ |
Sorting | ✔ | ✔ |
Reversing | ✔ | ✔ |
Closure application (User Defined Functions) | ✔ | ✔ |
SIMD | ✔ | ✔ |
Pivots | ✔ | ✗ |
Melts | ✔ | ✗ |
Filling nulls + fill strategies | ✔ | ✗ |
Aggregations | ✔ | ✗ |
Find unique values | ✔ | ✗ |
Rust iterators | ✔ | ✗ |
IO (csv, json, parquet, Arrow IPC | ✔ | ✗ |
Query optimization: (predicate pushdown) | ✗ | ✔ |
Query optimization: (projection pushdown) | ✗ | ✔ |
Query optimization: (type coercion) | ✗ | ✔ |
Note that almost all eager operations supported by Eager on Series
/ChunkedArrays
can be use in Lazy via UDF's
Want to know about all the features Polars supports? Check the current master docs.
Most features are described on the DataFrame,
Series, and ChunkedArray
structs in that order. For ChunkedArray
a lot of functionality is also defined by Traits
in the
ops module.
Other useful parts of the documentation are:
Polars is written to be performant. Below are some comparisons with the (also very fast) Pandas DataFrame library.
Take a look at the 10 minutes to Polars notebook to get you started.
Want to run the notebook yourself? Clone the repo and run $ cargo c && docker-compose up
. This will spin up a jupyter
notebook on http://localhost:8891
. The notebooks are in the /examples
directory.
Oh yeah.. and get a cup of coffee because compilation will take a while during the first run.
A subset of the Polars functionality is also exposed through Python bindings. You can install them with:
$ pip install py-polars
Next you can check the 10 minutes to py-polars notebook or take a look at the reference.
Additional cargo features:
pretty
(default)- pretty printing of DataFrames
temporal (default)
- Conversions between Chrono and Polars for temporal data
simd (default)
- SIMD operations
parquet
- Read Apache Parquet format
random
- Generate array's with randomly sampled values
ndarray
- Convert from
DataFrame
tondarray
- Convert from
lazy
- Lazy api
Want to contribute? Read our contribution guideline.