Skip to content

Commit

Permalink
docs: Realign file structure of user guide (pola-rs#14360)
Browse files Browse the repository at this point in the history
  • Loading branch information
r-brink authored Feb 8, 2024
1 parent 455e7bf commit 41f73f3
Show file tree
Hide file tree
Showing 17 changed files with 87 additions and 87 deletions.
File renamed without changes.
20 changes: 10 additions & 10 deletions docs/src/rust/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -25,19 +25,19 @@ path = "home/example.rs"
required-features = ["polars/lazy"]

[[bin]]
name = "user-guide-basics-expressions"
path = "user-guide/basics/expressions.rs"
name = "user-guide-getting-started-expressions"
path = "user-guide/getting-started/expressions.rs"
required-features = ["polars/lazy"]
[[bin]]
name = "user-guide-basics-joins"
path = "user-guide/basics/joins.rs"
name = "user-guide-getting-started-joins"
path = "user-guide/getting-started/joins.rs"
[[bin]]
name = "user-guide-basics-reading-writing"
path = "user-guide/basics/reading-writing.rs"
name = "user-guide-getting-started-reading-writing"
path = "user-guide/getting-started/reading-writing.rs"
required-features = ["polars/json"]
[[bin]]
name = "user-guide-basics-series-dataframes"
path = "user-guide/basics/series-dataframes.rs"
name = "user-guide-concepts-data-structures"
path = "user-guide/concepts/data-structures.rs"

[[bin]]
name = "user-guide-concepts-contexts"
Expand Down Expand Up @@ -81,8 +81,8 @@ name = "user-guide-expressions-lists"
path = "user-guide/expressions/lists.rs"
required-features = ["polars/lazy"]
[[bin]]
name = "user-guide-expressions-null"
path = "user-guide/expressions/null.rs"
name = "user-guide-expressions-missing-data"
path = "user-guide/expressions/missing-data.rs"
required-features = ["polars/lazy"]
[[bin]]
name = "user-guide-expressions-operators"
Expand Down
File renamed without changes.
24 changes: 12 additions & 12 deletions docs/user-guide/concepts/data-structures.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,20 +7,20 @@ The core base data structures provided by Polars are `Series` and `DataFrame`.
Series are a 1-dimensional data structure. Within a series all elements have the same [Data Type](data-types/overview.md) .
The snippet below shows how to create a simple named `Series` object.

{{code_block('user-guide/basics/series-dataframes','series',['Series'])}}
{{code_block('user-guide/concepts/data-structures','series',['Series'])}}

```python exec="on" result="text" session="user-guide/data-structures"
--8<-- "python/user-guide/basics/series-dataframes.py:series"
--8<-- "python/user-guide/concepts/data-structures.py:series"
```

## DataFrame

A `DataFrame` is a 2-dimensional data structure that is backed by a `Series`, and it can be seen as an abstraction of a collection (e.g. list) of `Series`. Operations that can be executed on a `DataFrame` are very similar to what is done in a `SQL` like query. You can `GROUP BY`, `JOIN`, `PIVOT`, but also define custom functions.

{{code_block('user-guide/basics/series-dataframes','dataframe',['DataFrame'])}}
{{code_block('user-guide/concepts/data-structures','dataframe',['DataFrame'])}}

```python exec="on" result="text" session="user-guide/data-structures"
--8<-- "python/user-guide/basics/series-dataframes.py:dataframe"
--8<-- "python/user-guide/concepts/data-structures.py:dataframe"
```

### Viewing data
Expand All @@ -31,38 +31,38 @@ This part focuses on viewing data in a `DataFrame`. We will use the `DataFrame`

The `head` function shows by default the first 5 rows of a `DataFrame`. You can specify the number of rows you want to see (e.g. `df.head(10)`).

{{code_block('user-guide/basics/series-dataframes','head',['head'])}}
{{code_block('user-guide/concepts/data-structures','head',['head'])}}

```python exec="on" result="text" session="user-guide/data-structures"
--8<-- "python/user-guide/basics/series-dataframes.py:head"
--8<-- "python/user-guide/concepts/data-structures.py:head"
```

#### Tail

The `tail` function shows the last 5 rows of a `DataFrame`. You can also specify the number of rows you want to see, similar to `head`.

{{code_block('user-guide/basics/series-dataframes','tail',['tail'])}}
{{code_block('user-guide/concepts/data-structures','tail',['tail'])}}

```python exec="on" result="text" session="user-guide/data-structures"
--8<-- "python/user-guide/basics/series-dataframes.py:tail"
--8<-- "python/user-guide/concepts/data-structures.py:tail"
```

#### Sample

If you want to get an impression of the data of your `DataFrame`, you can also use `sample`. With `sample` you get an _n_ number of random rows from the `DataFrame`.

{{code_block('user-guide/basics/series-dataframes','sample',['sample'])}}
{{code_block('user-guide/concepts/data-structures','sample',['sample'])}}

```python exec="on" result="text" session="user-guide/data-structures"
--8<-- "python/user-guide/basics/series-dataframes.py:sample"
--8<-- "python/user-guide/concepts/data-structures.py:sample"
```

#### Describe

`Describe` returns summary statistics of your `DataFrame`. It will provide several quick statistics if possible.

{{code_block('user-guide/basics/series-dataframes','describe',['describe'])}}
{{code_block('user-guide/concepts/data-structures','describe',['describe'])}}

```python exec="on" result="text" session="user-guide/data-structures"
--8<-- "python/user-guide/basics/series-dataframes.py:describe"
--8<-- "python/user-guide/concepts/data-structures.py:describe"
```
2 changes: 1 addition & 1 deletion docs/user-guide/expressions/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ In the `Contexts` sections we outlined what `Expressions` are and how they are i
- [Casting](casting.md)
- [Strings](strings.md)
- [Aggregation](aggregation.md)
- [Null](null.md)
- [Missing data](missing-data.md)
- [Window](window.md)
- [Folds](folds.md)
- [Lists](lists.md)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,11 +10,11 @@ Polars also allows `NotaNumber` or `NaN` values for float columns. These `NaN` v

You can manually define a missing value with the python `None` value:

{{code_block('user-guide/expressions/null','dataframe',['DataFrame'])}}
{{code_block('user-guide/expressions/missing-data','dataframe',['DataFrame'])}}

```python exec="on" result="text" session="user-guide/null"
--8<-- "python/user-guide/expressions/null.py:setup"
--8<-- "python/user-guide/expressions/null.py:dataframe"
```python exec="on" result="text" session="user-guide/missing-data"
--8<-- "python/user-guide/expressions/missing-data.py:setup"
--8<-- "python/user-guide/expressions/missing-data.py:dataframe"
```

!!! info
Expand All @@ -27,10 +27,10 @@ Each Arrow array used by Polars stores two kinds of metadata related to missing

The first piece of metadata is the `null_count` - this is the number of rows with `null` values in the column:

{{code_block('user-guide/expressions/null','count',['null_count'])}}
{{code_block('user-guide/expressions/missing-data','count',['null_count'])}}

```python exec="on" result="text" session="user-guide/null"
--8<-- "python/user-guide/expressions/null.py:count"
```python exec="on" result="text" session="user-guide/missing-data"
--8<-- "python/user-guide/expressions/missing-data.py:count"
```

The `null_count` method can be called on a `DataFrame`, a column from a `DataFrame` or a `Series`. The `null_count` method is a cheap operation as `null_count` is already calculated for the underlying Arrow array.
Expand All @@ -40,10 +40,10 @@ The validity bitmap is memory efficient as it is bit encoded - each value is eit

You can return a `Series` based on the validity bitmap for a column in a `DataFrame` or a `Series` with the `is_null` method:

{{code_block('user-guide/expressions/null','isnull',['is_null'])}}
{{code_block('user-guide/expressions/missing-data','isnull',['is_null'])}}

```python exec="on" result="text" session="user-guide/null"
--8<-- "python/user-guide/expressions/null.py:isnull"
```python exec="on" result="text" session="user-guide/missing-data"
--8<-- "python/user-guide/expressions/missing-data.py:isnull"
```

The `is_null` method is a cheap operation that does not require scanning the full column for `null` values. This is because the validity bitmap already exists and can be returned as a Boolean array.
Expand All @@ -59,30 +59,30 @@ Missing data in a `Series` can be filled with the `fill_null` method. You have t

We illustrate each way to fill nulls by defining a simple `DataFrame` with a missing value in `col2`:

{{code_block('user-guide/expressions/null','dataframe2',['DataFrame'])}}
{{code_block('user-guide/expressions/missing-data','dataframe2',['DataFrame'])}}

```python exec="on" result="text" session="user-guide/null"
--8<-- "python/user-guide/expressions/null.py:dataframe2"
```python exec="on" result="text" session="user-guide/missing-data"
--8<-- "python/user-guide/expressions/missing-data.py:dataframe2"
```

### Fill with specified literal value

We can fill the missing data with a specified literal value with `pl.lit`:

{{code_block('user-guide/expressions/null','fill',['fill_null'])}}
{{code_block('user-guide/expressions/missing-data','fill',['fill_null'])}}

```python exec="on" result="text" session="user-guide/null"
--8<-- "python/user-guide/expressions/null.py:fill"
```python exec="on" result="text" session="user-guide/missing-data"
--8<-- "python/user-guide/expressions/missing-data.py:fill"
```

### Fill with a strategy

We can fill the missing data with a strategy such as filling forward:

{{code_block('user-guide/expressions/null','fillstrategy',['fill_null'])}}
{{code_block('user-guide/expressions/missing-data','fillstrategy',['fill_null'])}}

```python exec="on" result="text" session="user-guide/null"
--8<-- "python/user-guide/expressions/null.py:fillstrategy"
```python exec="on" result="text" session="user-guide/missing-data"
--8<-- "python/user-guide/expressions/missing-data.py:fillstrategy"
```

You can find other fill strategies in the API docs.
Expand All @@ -92,10 +92,10 @@ You can find other fill strategies in the API docs.
For more flexibility we can fill the missing data with an expression. For example,
to fill nulls with the median value from that column:

{{code_block('user-guide/expressions/null','fillexpr',['fill_null'])}}
{{code_block('user-guide/expressions/missing-data','fillexpr',['fill_null'])}}

```python exec="on" result="text" session="user-guide/null"
--8<-- "python/user-guide/expressions/null.py:fillexpr"
```python exec="on" result="text" session="user-guide/missing-data"
--8<-- "python/user-guide/expressions/missing-data.py:fillexpr"
```

In this case the column is cast from integer to float because the median is a float statistic.
Expand All @@ -104,20 +104,20 @@ In this case the column is cast from integer to float because the median is a fl

In addition, we can fill nulls with interpolation (without using the `fill_null` function):

{{code_block('user-guide/expressions/null','fillinterpolate',['interpolate'])}}
{{code_block('user-guide/expressions/missing-data','fillinterpolate',['interpolate'])}}

```python exec="on" result="text" session="user-guide/null"
--8<-- "python/user-guide/expressions/null.py:fillinterpolate"
```python exec="on" result="text" session="user-guide/missing-data"
--8<-- "python/user-guide/expressions/missing-data.py:fillinterpolate"
```

## `NotaNumber` or `NaN` values

Missing data in a `Series` has a `null` value. However, you can use `NotaNumber` or `NaN` values in columns with float datatypes. These `NaN` values can be created from Numpy's `np.nan` or the native python `float('nan')`:

{{code_block('user-guide/expressions/null','nan',['DataFrame'])}}
{{code_block('user-guide/expressions/missing-data','nan',['DataFrame'])}}

```python exec="on" result="text" session="user-guide/null"
--8<-- "python/user-guide/expressions/null.py:nan"
```python exec="on" result="text" session="user-guide/missing-data"
--8<-- "python/user-guide/expressions/missing-data.py:nan"
```

!!! info
Expand All @@ -133,8 +133,8 @@ Polars has `is_nan` and `fill_nan` methods which work in a similar way to the `i

One further difference between `null` and `NaN` values is that taking the `mean` of a column with `null` values excludes the `null` values from the calculation but with `NaN` values taking the mean results in a `NaN`. This behaviour can be avoided by replacing the `NaN` values with `null` values;

{{code_block('user-guide/expressions/null','nanfill',['fill_nan'])}}
{{code_block('user-guide/expressions/missing-data','nanfill',['fill_nan'])}}

```python exec="on" result="text" session="user-guide/null"
--8<-- "python/user-guide/expressions/null.py:nanfill"
```python exec="on" result="text" session="user-guide/missing-data"
--8<-- "python/user-guide/expressions/missing-data.py:nanfill"
```
Loading

0 comments on commit 41f73f3

Please sign in to comment.