Skip to content

justanotherdot/polars

 
 

Repository files navigation

Polars

rust docs Build, test and docs Gitter

Blazingly fast DataFrames in Rust

Polars is a blazingly fast DataFrames library implemented in Rust. Its memory model uses Apache Arrow as backend.

It currently consists of an eager API similar to pandas and a lazy API that is somewhat similar to spark. Amongst more, Polars has the following functionalities.

Functionality Eager Lazy
Filters
Shifts
Joins
GroupBys + aggregations
Comparisons
Arithmetic
Sorting
Reversing
Closure application (User Defined Functions)
SIMD
Pivots
Melts
Filling nulls + fill strategies
Aggregations
Find unique values
Rust iterators
IO (csv, json, parquet, Arrow IPC
Query optimization: (predicate pushdown)
Query optimization: (projection pushdown)
Query optimization: (type coercion)

Note that almost all eager operations supported by Eager on Series/ChunkedArrays can be use in Lazy via UDF's

Documentation

Want to know about all the features Polars supports? Check the current master docs.

Most features are described on the DataFrame, Series, and ChunkedArray structs in that order. For ChunkedArray a lot of functionality is also defined by Traits in the ops module. Other useful parts of the documentation are:

Performance

Polars is written to be performant. Below are some comparisons with the (also very fast) Pandas DataFrame library.

GroupBy

Joins

First run in Rust

Take a look at the 10 minutes to Polars notebook to get you started. Want to run the notebook yourself? Clone the repo and run $ cargo c && docker-compose up. This will spin up a jupyter notebook on http://localhost:8891. The notebooks are in the /examples directory.

Oh yeah.. and get a cup of coffee because compilation will take a while during the first run.

First run in Python

A subset of the Polars functionality is also exposed through Python bindings. You can install them with:

$ pip install py-polars

Next you can check the 10 minutes to py-polars notebook or take a look at the reference.

Features

Additional cargo features:

  • pretty (default)
    • pretty printing of DataFrames
  • temporal (default)
    • Conversions between Chrono and Polars for temporal data
  • simd (default)
    • SIMD operations
  • parquet
    • Read Apache Parquet format
  • random
    • Generate array's with randomly sampled values
  • ndarray
    • Convert from DataFrame to ndarray
  • lazy
    • Lazy api

Contribution

Want to contribute? Read our contribution guideline.

About

Rust DataFrame library

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Rust 91.4%
  • Python 8.3%
  • Other 0.3%