Skip to content

Commit

Permalink
docs: Clarify arrow usage (pola-rs#16152)
Browse files Browse the repository at this point in the history
  • Loading branch information
ritchie46 authored May 10, 2024
1 parent b38aa4b commit d341156
Show file tree
Hide file tree
Showing 3 changed files with 4 additions and 4 deletions.
2 changes: 1 addition & 1 deletion docs/user-guide/ecosystem.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

## Introduction

On this page you can find a non-exhaustive list of libraries and tools that support Polars. As the data ecosystem is evolving fast, more libraries will likely support Polars in the future. One of the main drivers is that Polars makes use of `Apache Arrow` in it's backend.
On this page you can find a non-exhaustive list of libraries and tools that support Polars. As the data ecosystem is evolving fast, more libraries will likely support Polars in the future. One of the main drivers is that Polars makes adheres its memory layout to the `Apache Arrow` spec.

### Table of contents:

Expand Down
2 changes: 1 addition & 1 deletion docs/user-guide/expressions/missing-data.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ This page sets out how missing data is represented in Polars and how missing dat

## `null` and `NaN` values

Each column in a `DataFrame` (or equivalently a `Series`) is an Arrow array or a collection of Arrow arrays [based on the Apache Arrow format](https://arrow.apache.org/docs/format/Columnar.html#null-count). Missing data is represented in Arrow and Polars with a `null` value. This `null` missing value applies for all data types including numerical values.
Each column in a `DataFrame` (or equivalently a `Series`) is an Arrow array or a collection of Arrow arrays [based on the Apache Arrow spec](https://arrow.apache.org/docs/format/Columnar.html#null-count). Missing data is represented in Arrow and Polars with a `null` value. This `null` missing value applies for all data types including numerical values.

Polars also allows `NotaNumber` or `NaN` values for float columns. These `NaN` values are considered to be a type of floating point data rather than missing data. We discuss `NaN` values separately below.

Expand Down
4 changes: 2 additions & 2 deletions docs/user-guide/migration/pandas.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,9 +23,9 @@ more explicit, more readable and less error-prone.

Note that an 'index' data structure as known in databases will be used by Polars as an optimization technique.

### Polars uses Apache Arrow arrays to represent data in memory while pandas uses NumPy arrays
### Polars adheres to the Apache Arrow memory format to represent data in memory while pandas uses NumPy arrays

Polars represents data in memory with Arrow arrays while pandas represents data in
Polars represents data in memory according to the Arrow memory spec while pandas represents data in
memory with NumPy arrays. Apache Arrow is an emerging standard for in-memory columnar
analytics that can accelerate data load times, reduce memory usage and accelerate
calculations.
Expand Down

0 comments on commit d341156

Please sign in to comment.