Skip to content

Latest commit

 

History

History
761 lines (575 loc) · 35.6 KB

CHANGELOG.md

File metadata and controls

761 lines (575 loc) · 35.6 KB

Changelog polars (Python bindings)

The Rust crate polars has its own changelog.

Changelog

py-polars-v0.10.0 (2021-10-08)

Full Changelog

Merged pull requests:

py-polars-v0.9.12 (2021-09-27)

Full Changelog

Closed issues:

  • ShapeMisMatch error when vstacking two dataframes with equal amount columns, but different number of rows #1452
  • split agg_std and agg_mean for floats #1447
  • .std().over('groups') raises a PanicException when one of the columns has dtype Float32. #1446
  • A numeric type is converted to a string type #1444
  • Adding a column without stating df length #1440
  • Sorting by multiple columns doesn't work when one of the columns is Date64 #1437
  • Include a center option in the rolling function (like pandas) #1436
  • Parquet File Size Larger than CSV File Size #1381
  • Dead links in source #1370

py-polars-v0.9.11 (2021-09-24)

Full Changelog

Closed issues:

  • Adding an optional suffix to overlaping columns when joining dataframes #1432
  • collect reverse #1429
  • indexing bool column gives NotImplemented error #1422
  • use sep instead of delimiter in DataFrame.to_csv #1415

py-polars-v0.9.10 (2021-09-22)

Full Changelog

Closed issues:

  • Rolling_mean only working with floats #1411
  • Add numeric bitwise operations on Series/ Exprs #1410
  • Polars date arithmetic is not correct #1404
  • mean, median, mode, ... probably should not work for pl.Categorical #1401
  • Initially empty series causes memcpy_values to index OOB #1396
  • Filter gets mis-optimized and doesn't filter out false values #1395
  • pl.DataFrame.explode() gives an empty name name now for the exploded column. #1391
  • m1 wheels generation #1345
  • Arrow dictionaries -> Polars Categorical #1308
  • RuntimeError: Other("Could not determine output type") #1307
  • Add from_pandas flag for converting NaN to None #1164

py-polars-v0.9.9 (2021-09-19)

Full Changelog

Closed issues:

  • add rolling_std #1388
  • Allow fill_null to accept Expr for Series #1383
  • getting AttributeError when using .is_in on polars 0.9.7, but works on polars 0.8.* #1382

py-polars-v0.9.8 (2021-09-18)

Full Changelog

Closed issues:

  • Faster gzip decompression #1359

py-polars-v0.9.7 (2021-09-16)

Full Changelog

Closed issues:

  • Series with large ints lose precision when divided #1369
  • // integer division in python not working #1362
  • Add a method to extract a column as a series #1346

py-polars-v0.9.6 (2021-09-15)

Full Changelog

Full Changelog

Closed issues:

  • Allow rounding within .agg() #1336
  • Filtering of DataFrame based on the column with dates in Date32 format is not working #1332
  • Filters sometime fill null string value with "" #1322
  • Allow conversion from/to list of dicts #1300

py-polars-v0.9.5 (2021-09-10)

Full Changelog

Full Changelog

Closed issues:

  • read_csv: dtypes date32 and date64 are not inferred correctly anymore #1330
  • add clip function #1326
  • Allow .agg() to work on ungrouped data frames #1324
  • Add .with_column_renamed() eager method #1323
  • add two list array. #1316
  • TypeError: large_list() takes exactly one argument (0 given) #1303
  • parquet dict encoded panic #1281
  • fix explode on lists with empty values #1177

py-polars-v0.9.4 (2021-09-10)

Full Changelog

Closed issues:

  • Add standard error as an aggregation expression (standard deviation / sqrt(number of measurements) ) #1315
  • Make impossible datatime aggregations consistent with primitives #1311
  • Add an argument strict to constructors, cast, etc to ensure the integer range etc on python polars (and otherwise) instead of silently converting to null. #1293

py-polars-v0.9.3 (2021-09-04)

Full Changelog

Closed issues:

  • data = data[range(N)] isn't very performant #1283

Full Changelog

py-polars-v0.9.2 (2021-09-03)

Full Changelog

Closed issues:

  • cannot explode dataframe with single list column #1288
  • Sort dataframe using Date32 column is causing nulls to appear #1277
  • Drop duplicates #1260
  • rolling functions in lazy #1185

py-polars-v0.9.1 (2021-08-31)

Full Changelog

py-polars-v0.9.0 (2021-08-31)

Full Changelog

Full Changelog

Full Changelog

Closed issues:

  • Aggregation fails when grouped by multiple columns with one column of datatype Int8 or Int16 #1255
  • [Rust] cross_join coalesce left to right #1254
  • rename all null to none #1247
  • expr filter.count() gives wrong result #1242
  • Lazy: hard error on duplicate names #1241
  • horizontal_sum define null behavior #1173
  • the new types with udf in apply function #1165

py-polars-v0.8.29 (2021-08-27)

Full Changelog

Closed issues:

  • pyo3_runtime.PanicException: should already be coerced to u64 when joining two DataFrames #1231
  • Error when performing modulo operation '%" #1230
  • CsvReader ignores the last charactor if there's no newline at EOF #1229
  • Bad link in doc #1227

py-polars-v0.8.28 (2021-08-26)

Full Changelog

py-polars-v0.8.27 (2021-08-26)

Full Changelog

Closed issues:

  • Remove HSTACK node #1218
  • Groupby Rank #1209
  • Alias dense_rank to argsort_by + 1 #1207

Full Changelog

Closed issues:

  • Predicate pushdown using a filter with null values doesn't correctly trigger the filter #1217
  • Lazy: use a lazy function to determine type of apply/map #1203
  • [python] missing dependency when using Anaconda #1200
  • Add transpose method to dataframe #1176

Full Changelog

Closed issues:

  • Python: add rolling apply #1194
  • Slow concat with large list of dataframes #1183

py-polars-v0.8.26 (2021-08-21)

Full Changelog

py-polars-v0.8.25 (2021-08-20)

Full Changelog

py-polars-v0.8.24 (2021-08-20)

Full Changelog

Closed issues:

  • Expose interpolate on DataFrame in core #1161
  • [Python] df.filter fails with "out of range" error #1157

py-polars-v0.8.23 (2021-08-18)

Full Changelog

Closed issues:

  • shift_and_fill by groups + other operations by grouping variables #1124

py-polars-v0.8.22 (2021-08-17)

Full Changelog

Closed issues:

  • inspect/debug expr #1146

Full Changelog

Closed issues:

  • Filter expression should call eval_on_groups #1148
  • the trait FromIterator<&&str> is not implemented for polars::prelude::Series #1147
  • Shift in aggregation context #1141
  • Get/set categorical levels? #1115
  • Is it possible to create a dataframe from row-like (Vec<Struct>) data? #1111
  • PanicException when filter on sorted columns #1110
  • Pickle support for Polars dataframes #1109
  • Rosetta stone for groupby between pandas and polars? #1083

py-polars-v0.8.21 (2021-08-13)

Full Changelog

Closed issues:

  • csv schema inference scientific float notation #1134
  • python: csv read_cvs dtypes accept list #1133

Full Changelog

Full Changelog

Closed issues:

  • Extremely slow pivot.count (compared with pivot.sum, pivot.max, pivot.first, ...) #1129
  • Only first gzip stream of gzipped CSV/TSV files with multiple gzip streams is read. #1126
  • implement __copy__, __deepcopy__ #1120
  • pretty print failure output of frame_equal assertions #1112
  • PanicException when converting non-Utf8 column to Catergorical. #1107
  • read_csv of a compressed file fails when selecting a subset of columns #1026
  • csv-parser: remove dependency on csv crate. #956

py-polars-v0.8.20 (2021-08-07)

Full Changelog

py-polars-v0.8.19 (2021-08-06)

Full Changelog

Closed issues:

  • numpy boolean vectors get converted as "objects" instead of cast to bool #1105
  • PanicException: assertion failed: i < (self.bits.len() << 3) #1098

Polars 0.8.18

  • feature

    • select columns by regex
    • support >1M columns in IPC reader
    • make DataFrame.sort arguments equal to LazyFrame.sort
    • pl.all() == pl.col("*")
  • bug fix

    • fix bugs due to filtering in aggregations: #1101
    • fix bug in wildcard in functions 3163ee5

Polars 0.8.17

  • feature
    • keep_name expr
    • exclude expr
    • drop_nulls expre
    • explode accepts expression (thus wildcard)
    • groupby: head + tail added
    • df[::2] slicing added

Polars 0.8.16

patch release to fix panic #1077

Polars 0.8.15

  • feature
    • extract jsonpath
    • more object support
  • performance
    • improve list take performance
  • bug fix
    • don't panic in out of bounds take, but error
    • update offsets in case of utf8lossy
    • fix bug in pyarrow round trip with list types

Polars 0.8.13/(14 patch)

  • feature
    • concat_str function
    • more object support
    • hash and row_hash function/ expr
    • reinterpret function/ expr
    • Series.mode expr/function
    • csv file decompression
    • read_sql support
  • performance
    • divide and conquer binary expressions

Polars 0.8.12

  • feature
    • cross join added
    • dot-product
  • performance
    • improve csv-parser performance by ~25%
  • bug fix
    • various minor

Polars 0.8.11

  • feature
    • cross join added
    • dot-product
  • performance
    • improve csv-parser performance by ~25%
  • bug fix
    • various minor

Polars 0.8.10

  • feature
    • is_first expr/method
    • asof join added
    • eager io can open multiple sources with ffspec
    • resolve ~ to homedir
    • python arange add step and run eager
  • performance
    • use fast csv-parser for more python memory buffers/streams
  • bug fix
    • kleene or and and operations
    • maybe fix rayon deadlock
    • concat is a pure function
    • string addition lhs broadcast

Polars 0.8.9

  • feature
    • correct type hints for python 3.6
    • csv-parser option to ignore comment lines
  • performance
    • improve take on DataFrame
    • remove bound checks in buffer creation
    • improve performance of sorting by multiple columns
    • improve argsort performance
  • bug fix
    • fix backward/forward fill
    • window groupby context
    • fix is_duplicated dispatch

Polars 0.8.8

  • bug fix
    • fix UB due to slice in take kernel
    • fix join for dates

Polars 0.8.7

  • feature
    • from_pandas accept series and date range #875
    • expr: forward_fill, backward_fill #874
    • gzipped file support in csv parser
  • performance
    • reduce memory usage of multi-key groupby
    • improve variance and std-dev aggregation
  • bug fix
    • cast to large-utf8 before collecting chunks #870
    • various

Polars 0.8.6

  • performance
    • improve hashing performance for grouping on two keys for 64 bit and 32 and 64 bit data.
    • improve cache coherence take operation of multiple chunks
  • bug fix
    • fix replaxing string with None #802

Polars 0.8.5

  • feature
    • improve compatibility with pyarrow csv parser
  • performance
    • improve hashing performance for grouping on two keys for 64 bit and 32 and 64 bit data.
    • improve cache coherence take operation of multiple chunks
    • fast path for categorical unique
    • decrease memory fragmentation and usage of csv-parser
  • bug fix
    • split utf8 data only at valid char boundaries #789
    • fix bug in outer join due to new partitioning algorithm

Polars 0.8.4

  • feature
    • Series.round
    • head/ limit aliases
  • performance
    • partitioned hashing

Polars 0.8.0

  • breaking change

    • str namespace Series.str_* methods to Series.str.
    • dt namespace Series datetime related methods to Series.dt.
  • feature

    • DataFrame.rows method
    • apply on object types
    • Series.dt.to_python_datetime
    • Series.dt.timestamp
  • bug fix

    • preserve date64 in round trip to parquet #723
    • during arrow conversion coerce categorical to utf8 (this preserves string data) #725
    • fix bug in csv skip rows
  • performance

    • improve hashing of string data in groupby and join
    • improve numeric hashing in join
    • fast path for filtering no data and all date (upstream)

polars 0.7.19

  • feature

    • window function by multiple group columns
  • bug fix

    • fix bug in argsort multiple
    • fix bug in filter with nulls (upstream)
  • performance

    • improve numeric hashing in groupby
    • fast paths for filters (upstream)

polars 0.7.18

  • feature
    • argsort multiple columns

polars 0.7.17

  • feature

    • support more indexing
    • scan_csv low memory argument
    • Series.filter accept list of expressions
    • object type:
      • zip
      • take -> join / groupby agg
      • agg first/ last
  • performance

    • change memory usage of csv-parser
    • binary aggregation in parallel
    • determine groupby keys in threadpool

polars 0.7.16

  • feature

    • Series literal may have any length
    • change globaly string cache behavior
    • Add Expr.arg_sort
    • Make literals typed
  • bug fix

    • Fix Expr.fill_null
    • set offset in null buffers (fixes aggregation with null values)
  • performance

    • sample cardinality in groupby and choose algorithm

polars 0.7.15

  • feature

    • join allows expression syntax
    • use pyarrow as default ipc backend
  • bug fix

    • fix deadlock in window expressions

polars 0.7.13 / 0.7.14 (patch) 2021-05-08

  • bug fix

    • fix bug in cumsum #604
  • feature

    • DataFrame.describe method #606
    • Multi-level sorting of a DataFrame #607
    • Expand functionality of Expr.is_in #614
    • Csv-parser low_memory option #615
    • Allow expressions in pl.arange #611
  • performance

    • sort().reverse() optimization #605

polars 0.7.12

  • bug fix
    • null handling in mean, std, var, and cov aggregations. #595
    • rev-mapping of categorical stored duplicates. #595
    • fix memory surge after csv-parsing #593

polars 0.7.11

  • bug fix

    • Throw error on join from different string cache #584
    • fix covariance of array with null values #585
  • feature

    • Series describe method #569
    • dsl: take, arg_unique, unique
    • allow lazy expressions in Eager API # 588
    • describe Series
  • performance

    • fix accidental expensive appends #592
    • remove chunk_id from ChunkedArray #593

polars 0.7.8 -> 0.7.9 (patched)

  • bug fix

    • ensure column name persist after pyarrow cast #563
    • make sure that agg_list maintains dtype #567
    • fix panic in physical dispatch of Date dtypes
  • feature

    • Implicitly Cast dtypes to temporal types in csv parser #560
    • Series describe method #569
  • performance

    • Cache and improve window functions performance #570

polars 0.7.7

  • bug fix

    • fix bug with pyarrow chunkedarray: #545
  • feature

    • DataFrame.apply method
    • Make a Series a Literal
    • Make None a Literal
  • performance

    • Update arrow
      • faster iterators
      • faster kernels

polars 0.7.6

  • bug fix

    • fix bug in downsample: #537
  • feature

    • cast categorical in csv parser: #533
    • add many groupby-context aware operations: #534
    • dowcast by month: #537
  • performance

    • improve iterator in no null case: #538
    • remove indirection: #536

polars 0.7.5

  • bug fix

    • fix bug in vectorized hashing algorithm that affected groupbys with null values: #523
    • fix bug in downsample: 528
    • change median algorithm: #527
  • feature

    • use lazy groupby API/DSL in eager API: #522
    • make sort groupby-context aware: #522
  • performance

    • improve sort algorithms for sort and argsort: #526

polars 0.7.4

  • performance

    • [python | rust] multi-threaded outer join
    • [python | rust] better performance in groupby on multiple keys (faster hashmap comparisons)
    • [python | rust] better performance in multi column joins
  • bug fix

    • [python] make horizontal aggregations null aware
  • feature

    • [python | rust] Downsample by week
    • [python | rust] join by unlimited columns
    • [python] Create a list Series directly.
    • [python] Create DataFrame from np.ndarray

polars 0.7.3

  • bug fix

    • [python] pandas to polars date64, maintain time information
    • [python] fix bug in Date64 Series.year
    • [python] fix bug Series.mean (did not correct for null values) #484
    • [python | rust ] fix bug in rolling windows #484
    • [python | rust ] fix bug lazy csv parser #459
  • feature

    • [python | rust] Series methods
      • Series.week
      • Series.weekday
      • Series.arg_min
      • Series.arg_max
      • Series.shape

polars 0.7.2

  • bug fix

    • [python] More pyarrow -> polars conversions.
  • feature

    • [python] DataFrame methods: [ shift_and_fill].
    • [python] eager: sum, min, max, mean horizontal aggregation.

polars 0.7.1

  • performance
    • [python | rust] arrow arrays have a layer of indirection less; 10/20% performance improvement

polars 0.7.0

  • name change: Python bindings module renamed from pypolars to polars

  • name change: Python bindings package renamed from py-polars to polars

  • feature

    • [python] lazy: DataFrame methods: [ tail, first, last ].
    • [python] eager: DataFrame fold for horizontal aggregation.
    • [python] eager: Series methods: [median, quantile, is_in, to_frame]
    • [python] eager: iterate over groupby and yield groups' DataFrames
    • [python] eager: groupby.get_group('value')
    • [python] add parquet compression
    • [python] shift_and_fill expression
    • [python] implicitly download raw files from the web in read_parquet, read_csv.
    • [python | rust] methods for local peak finding in numerical series
    • [python | rust] faster query optimization due to local memory arena's.
    • [rust] reduce default compile time by making less features default.
    • [python | rust] Series zip_with implicitly cast to supertype.
    • [python | rust] window functions have a min_periods argument to control when to compute a result
  • bug fix

    • [python] support file buffers for reading and writing csv and parquet
    • [python | rust] fix csv-parser: allow new-line character in a string field
    • [python | rust] don't let predicate-pushdown pass shift | sort operation to maintain correctness.

py-polars 0.6.7

  • performance
    • [python | rust] use mimalloc global allocator
    • [python | rust] undo performance regression on large number of threads
  • bug fix
    • [python | rust] fix accidental over-allocation in csv-parser
    • [python] support agg (dictionary aggregation) for downsample

py-polars 0.6.6

  • performance
    • [python | rust] categorical type groupby keys (use size hint)
    • [python | rust] remove indirection layer in vector hasher
    • [python | rust] improve performance of null array creation
  • bug fix
    • [python] implement set_with_mask for Boolean type
    • [python | rust] don't panic (instead return null) in dataframe aggregation std and var
  • other
    • [rust] internal refactors

py-polars 0.6.5

  • bug fix
    • [python] fix various pyarrow related bugs

py-polars 0.6.4

  • feature
    • [python] render html tables
  • performance
    • [python] default to pyarrow for parquet reading
    • [python | rust] use u32 instead of usize in groupby and join to increase cache coherence and reduce memory pressure.