Skip to content

Commit

Permalink
ci(python,rust): Add some documentation on the CI workflows (pola-rs#…
Browse files Browse the repository at this point in the history
  • Loading branch information
stinodego authored Jun 16, 2023
1 parent 3a3cf9c commit 0fd7772
Show file tree
Hide file tree
Showing 7 changed files with 81 additions and 52 deletions.
45 changes: 45 additions & 0 deletions .github/workflows/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
# Continuous integration setup

Polars uses GitHub Actions as its continuous integration (CI) tool. The setup is reasonably complex, as far as CI setups go. This document explains some of the design choices.

## Goal

Overall, the CI suite aims to achieve the following:

• Enforce code correctness by running automated tests.
• Enforce code quality by running automated linting checks.
• Enforce code performance by running benchmark tests.
• Enforce that code is properly documented.
• Allow maintainers to easily publish new releases.

We rely on a wide range of tools to achieve this for both the Rust and the Python code base, and thus a lot of checks are triggered on each pull request.

It's entirely possible that you submit a relatively trivial fix that subsequently fails a bunch of checks. Do not despair - check the logs to see what went wrong and try to fix it. You can run the failing command locally to verify that everything works correctly. If you can't figure it out, ask a maintainer for help!

## Design

The CI setup is designed with the following requirements in mind:

• Get feedback on each step individually. We want to avoid our test job being cancelled because a linting check failed, only to find out later that we also have a failing test.
• Get feedback on each check as quickly as possible. We want to be able to iterate quickly if it turns out our code does not pass some of the checks.
• Only run checks when they need to be run. A change to the Rust code does not warrant a linting check of the Python code, for example.

This results in a modular setup with many separate workflows and jobs that rely heavily on caching.

### Modular setup

The repository consists of two main parts: the Rust code base and the Python code base. Both code bases are interdependent: Rust code is tested through Python tests, and the Python code relies on the Rust implementation for most functionality.

To make sure CI jobs are only run when they need to be run, each workflow is triggered only when relevant files are modified.

### Caching

The main challenge is that the Rust code base for Polars is quite large, and consequently, compiling the project from scratch is slow. This is addressed by caching the Rust build artifacts.

However, since GitHub Actions does not allow sharing caches between feature branches, we need to run the workflows on the main branch as well - at least the part that builds the Rust cache. This leads to many workflows that trigger both on pull request AND on push to the main branch, with individual steps of jobs enabled or disabled based on the branch it runs on.

Care must also be taken not to exceed the maximum cache space of 10Gb allotted to open source GitHub repositories. Hence we do not do any caching on feature branches - we always use the cache available from the main branch. This also avoids any extra time that would be required to store the cache.

# Releases

The release jobs for Rust and Python get triggered when a new release is published. Release drafter is used to automatically draft these releases. Refer to the [contributing guide](/CONTRIBUTING.md#release-flow) for the full release process.
4 changes: 2 additions & 2 deletions .github/workflows/benchmark.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,8 @@ concurrency:
cancel-in-progress: true

env:
SCCACHE_GHA_ENABLED: "true"
RUSTC_WRAPPER: "sccache"
SCCACHE_GHA_ENABLED: 'true'
RUSTC_WRAPPER: sccache

jobs:
main:
Expand Down
39 changes: 0 additions & 39 deletions .github/workflows/cache-rust.yml

This file was deleted.

2 changes: 1 addition & 1 deletion .github/workflows/release-python.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
name: Create Python release
name: Release Python

on:
push:
Expand Down
14 changes: 14 additions & 0 deletions .github/workflows/release-rust.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
name: Release Rust

on:
push:
tags:
- rs-*

# TODO: Implement
jobs:
release-rust:
if: false
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
25 changes: 17 additions & 8 deletions .github/workflows/test-python.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,21 @@ on:
- py-polars/**
- polars/**
- .github/workflows/test-python.yml
push:
branches:
- main
paths:
- polars/**
- py-polars/**
- .github/workflows/test-python.yml

concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true

env:
RUSTFLAGS: -C debuginfo=0 # Do not produce debug symbols to keep memory usage down

defaults:
run:
working-directory: py-polars
Expand Down Expand Up @@ -45,24 +55,24 @@ jobs:
- name: Cache Rust
uses: Swatinem/rust-cache@v2
with:
shared-key: shared-ubuntu-latest
workspaces: py-polars
save-if: false
save-if: ${{ github.ref_name == 'main' }}

- name: Install Polars
env:
RUSTFLAGS: -C debuginfo=0 # Do not produce debug symbols to keep memory usage down
run: |
source activate
maturin develop
- name: Run tests and report coverage
if: github.ref_name != 'main'
run: pytest --cov -n auto --dist worksteal -m "not benchmark"

- name: Run doctests
if: github.ref_name != 'main'
run: python tests/docs/run_doctest.py

- name: Check import without optional dependencies
if: github.ref_name != 'main'
run: |
declare -a deps=("pandas"
"pyarrow"
Expand Down Expand Up @@ -104,22 +114,21 @@ jobs:
- name: Cache Rust
uses: Swatinem/rust-cache@v2
with:
shared-key: shared-windows-latest
workspaces: py-polars
save-if: false
save-if: ${{ github.ref_name == 'main' }}

- name: Install Polars
shell: bash
env:
RUSTFLAGS: -C debuginfo=0 # Do not produce debug symbols to keep memory usage down
run: |
maturin build
pip install target/wheels/polars-*.whl
- name: Run tests
if: github.ref_name != 'main'
run: pytest -n auto --dist worksteal -m "not benchmark"

- name: Check import without optional dependencies
if: github.ref_name != 'main'
run: |
pip uninstall pandas -y
python -c 'import polars'
Expand Down
4 changes: 2 additions & 2 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -138,7 +138,7 @@ Please adhere to the following guidelines:
- In the pull request description, [link](https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue) to the issue you were working on.
- Add any relevant information to the description that you think may help the maintainers review your code.
- Make sure your branch is [rebased](https://docs.github.com/en/get-started/using-git/about-git-rebase) against the latest version of the `main` branch.
- Make sure all GitHub Actions checks pass.
- Make sure all [GitHub Actions checks](/.github/workflows/README.md) pass.

After you have opened your pull request, a maintainer will review it and possibly leave some comments.
Once all issues are resolved, the maintainer will merge your pull request, and your work will be part of the next Polars release!
Expand Down Expand Up @@ -226,7 +226,7 @@ Start by bumping the version number in the source code:
Directly after merging your pull request, release the new version:

8. Go back to the [releases page](https://github.com/pola-rs/polars/releases) and click _Edit_ on the appropriate draft release.
9. On the draft release page, click _Publish release_. This will create a new release and a new tag, which will trigger the GitHub Actions release workflow ([Python](https://github.com/pola-rs/polars/actions/workflows/create-python-release.yml) / [Rust](https://github.com/pola-rs/polars/actions/workflows/release-rust.yml)).
9. On the draft release page, click _Publish release_. This will create a new release and a new tag, which will trigger the GitHub Actions release workflow ([Python](https://github.com/pola-rs/polars/actions/workflows/release-python.yml) / [Rust](https://github.com/pola-rs/polars/actions/workflows/release-rust.yml)).
10. Wait for all release jobs to finish, then check [crates.io](https://crates.io/crates/polars)/[PyPI](https://pypi.org/project/polars/) to verify that the new Polars release is now available.

### Troubleshooting
Expand Down

0 comments on commit 0fd7772

Please sign in to comment.