feat(python): Overhaul parametric test implementations and update Hypothesis to latest version #16062

stinodego · 2024-05-05T17:16:27Z

The newest Hypothesis version is stricter about randomness - using Python's built-in random module in strategies is not allowed (for good reason), and there is additional detection for tests that do not function properly.

This is great, but it prohibited us from upgrading. A thorough revision of the code was needed to adhere to the new requirements. Some minor user-facing changes were necessary, but since this concerns code that is only used in test suites, I think we can go ahead with these changes without a major version increase. More details below.

Changes

Randomness is now restricted to hypothesis-controlled randomness within strategies. This affects the following functions:
- column no longer selects a dtype upon creation. This now happens within the dataframes strategy. Use of column passed to dataframes is unaltered, but users who used the column outside of this intended usage will notice this as a breaking change.
- columns had built-in functionality to randomly determine a number of columns. This is no longer possible. The function has been deprecated, with the recommendation to build your own columns using column in conjunction with a list comprehension. It continues to function for now by leveraging .example().
- create_list_strategy needs to determine the exact inner data type to select an appropriate strategy. This is no longer possible in the same way. It has been deprecated. Users can use the lists strategy to do something similar, but they should supply a fully instantiated data type or defaults will be used - this avoids the randomness. The function continues to work for now by leveraging .example().
New strategy dtypes has been added which generates a random Polars data type.
Various improvements to the data generation strategies
- Fixed an issue with the decimal strategy.
- Extended the range of timedeltas. This exposed some bugs for parsing millisecond durations - so the full range for those will be enabled later.
- Categoricals now have a minimum length of 1. Hypothesis shrinks to empty strings which leads to reproducible examples that are hard to read.
Update hypothesis to the latest version.
Change the default row/column limit from 10/8 to 5/5. This should be enough for parametrized testing in general. It can be overwritten by the user if they require more rows/cols.
Various refactorings / cleanups to make everything work nicely.

codecov · 2024-05-05T17:32:38Z

Codecov Report

Attention: Patch coverage is 81.31579% with 71 lines in your changes are missing coverage. Please review.

Project coverage is 80.99%. Comparing base (0b66308) to head (78349e8).

Files	Patch %	Lines
...lars/polars/testing/parametric/strategies/dtype.py	70.33%	26 Missing and 9 partials ⚠️
...ars/polars/testing/parametric/strategies/legacy.py	56.75%	11 Missing and 5 partials ⚠️
...olars/polars/testing/parametric/strategies/data.py	89.53%	6 Missing and 3 partials ⚠️
...olars/polars/testing/parametric/strategies/core.py	91.91%	4 Missing and 4 partials ⚠️
py-polars/polars/testing/parametric/__init__.py	40.00%	2 Missing and 1 partial ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##             main   #16062   +/-   ##
=======================================
  Coverage   80.99%   80.99%           
=======================================
  Files        1387     1392    +5     
  Lines      178832   178884   +52     
  Branches     2877     2893   +16     
=======================================
+ Hits       144839   144887   +48     
- Misses      33500    33501    +1     
- Partials      493      496    +3

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

alexander-beedie · 2024-05-05T18:00:41Z

Nice; happy to see these methods getting some love 😎👍

stinodego · 2024-05-09T14:06:06Z

This got a little out of hand 😅 but I think we have something clean now that works well. @alexander-beedie would you mind taking a look at this if you have the time?

alexander-beedie · 2024-05-10T07:36:01Z

This got a little out of hand 😅 but I think we have something clean now that works well. @alexander-beedie would you mind taking a look at this if you have the time?

If you can wait until Sunday then I can over it thoroughly, with pleasure ;))

stinodego · 2024-05-13T04:35:08Z

@alexander-beedie I'm going to go ahead with this one, there's a few more things I want to build on top of this. If you have any comments I'd be happy to address them in a follow-up!

alexander-beedie · 2024-05-13T13:59:45Z

@alexander-beedie I'm going to go ahead with this one, there's a few more things I want to build on top of this. If you have any comments I'd be happy to address them in a follow-up!

@stinodego: Been working my way through it slowly, heh; looks good to me so far, and I'm really happy to see the foundations being built out and incorporated in more places! Will be poking at some of the updated API design to give it a proper test shortly ✌️

github-actions bot added enhancement New feature or an improvement of an existing feature python Related to Python Polars labels May 5, 2024

stinodego force-pushed the hyp-strats branch 2 times, most recently from 4025ad3 to cef503e Compare May 8, 2024 10:30

stinodego marked this pull request as ready for review May 9, 2024 14:04

stinodego requested review from ritchie46, c-peters, alexander-beedie, MarcoGorelli and reswqa as code owners May 9, 2024 14:04

stinodego mentioned this pull request May 9, 2024

Add tests for writing-then-reading randomly-generated dataframes #16121

Open

Bump hypothesis version

da66e00

stinodego force-pushed the hyp-strats branch from af49327 to e775974 Compare May 13, 2024 03:55

stinodego added 2 commits May 13, 2024 06:28

Update implementation

02d4400

Add and update tests

7312551

stinodego force-pushed the hyp-strats branch from e775974 to 6d2e8a8 Compare May 13, 2024 04:28

Update API reference

78349e8

stinodego force-pushed the hyp-strats branch from 6d2e8a8 to 78349e8 Compare May 13, 2024 04:31

stinodego merged commit dbfc6b2 into main May 13, 2024
14 checks passed

stinodego deleted the hyp-strats branch May 13, 2024 04:51

c-peters added the accepted Ready for implementation label May 21, 2024

c-peters assigned stinodego May 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(python): Overhaul parametric test implementations and update Hypothesis to latest version #16062

feat(python): Overhaul parametric test implementations and update Hypothesis to latest version #16062

stinodego commented May 5, 2024 •

edited

Loading

codecov bot commented May 5, 2024 •

edited

Loading

alexander-beedie commented May 5, 2024

stinodego commented May 9, 2024

alexander-beedie commented May 10, 2024

stinodego commented May 13, 2024

alexander-beedie commented May 13, 2024 •

edited

Loading

feat(python): Overhaul parametric test implementations and update Hypothesis to latest version #16062

feat(python): Overhaul parametric test implementations and update Hypothesis to latest version #16062

Conversation

stinodego commented May 5, 2024 • edited Loading

Changes

codecov bot commented May 5, 2024 • edited Loading

Codecov Report

alexander-beedie commented May 5, 2024

stinodego commented May 9, 2024

alexander-beedie commented May 10, 2024

stinodego commented May 13, 2024

alexander-beedie commented May 13, 2024 • edited Loading

stinodego commented May 5, 2024 •

edited

Loading

codecov bot commented May 5, 2024 •

edited

Loading

alexander-beedie commented May 13, 2024 •

edited

Loading