Add transpose method to dataframe #1176

tversteeg · 2021-08-20T07:58:42Z

Similar to pandas.

ritchie46 · 2021-08-20T08:57:23Z

df = pl.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
print(df)

print(pl.DataFrame(df.rows()))

shape: (2, 2)
╭──────┬──────╮
│ col1 ┆ col2 │
│ ---  ┆ ---  │
│ i64  ┆ i64  │
╞══════╪══════╡
│ 1    ┆ 3    │
├╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ 2    ┆ 4    │
╰──────┴──────╯
shape: (2, 2)
╭──────────┬──────────╮
│ column_0 ┆ column_1 │
│ ---      ┆ ---      │
│ i64      ┆ i64      │
╞══════════╪══════════╡
│ 1        ┆ 2        │
├╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┤
│ 3        ┆ 4        │
╰──────────┴──────────╯

tversteeg · 2021-08-20T09:01:39Z

Thanks! Would it make sense to create an alias for that function called transpose, which also creates a new dataframe from it? If not it might be a good idea to add the keyword "transpose" in the description somewhere of rows so searching it yields it.

ritchie46 · 2021-08-20T09:03:24Z

Yes, will add the alias. 👍

alippai · 2021-08-20T09:03:49Z

@ritchie46 I'm wondering whether it would make sense to introduce a lazy transpose - skipping the extra allocation (turning it into an iterator) if there is a subsequent operation. Edit: this might fit ndarray more, eg it could optimize DF * DF.transpose()

ritchie46 · 2021-08-20T11:20:24Z

Now that I think of it. I will support this natively, as going to python rows is super expensive.

@ritchie46 I'm wondering whether it would make sense to introduce a lazy transpose - skipping the extra allocation (turning it into an iterator) if there is a subsequent operation. Edit: this might fit ndarray more, eg it could optimize DF * DF.transpose()

@alippai Currently these operation sadly cannot be done in lazy. I need to know the schema of every node in the query plan. An operation like pivot, and transpose create the schema based on the data (which is unknown at that point).

alippai · 2021-08-20T11:28:23Z

Makes sense, I really appreciate the implementation detail!

jorgecarleitao · 2021-08-20T11:30:45Z

fwiw, spark supports it (lazily), but it is a shotgun shot to the foot, as it performs two queries, one of them to compute the distincts during planning.

alippai · 2021-08-20T11:43:27Z

Just to get the complexity of the task: an n x m sized 2D single type (int/float) specialization of the DF type would be needed for this, right?

ritchie46 · 2021-08-20T11:45:06Z

fwiw, spark supports it (lazily), but it is a shotgun shot to the foot, as it performs two queries, one of them to compute the distincts during planning.

Ouch.. that's definitely a shotgun.

In such I cases, I'd rather have a user doing something like this, and document why'd they want this:

temp = long_query().transpose().collect()

(temp.lazy()
 .select(..)   # continue from here.
 )

ritchie46 · 2021-08-20T11:48:03Z

Just to get the complexity of the task: an n x m sized 2D single type (int/float) specialization of the DF type would be needed for this, right?

For max performance I guess something like that. I am planning to use the AnyValue enums (these are enums around all possible dtypes). Then we can turn the DataFrame into rows. Infer the schema per row and read the rows as columns.

ritchie46 · 2021-08-23T11:38:24Z

Added in 2fa53db

ritchie46 closed this as completed Aug 23, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add transpose method to dataframe #1176

Add transpose method to dataframe #1176

tversteeg commented Aug 20, 2021

ritchie46 commented Aug 20, 2021

tversteeg commented Aug 20, 2021 •

edited

Loading

ritchie46 commented Aug 20, 2021

alippai commented Aug 20, 2021 •

edited

Loading

ritchie46 commented Aug 20, 2021

alippai commented Aug 20, 2021

jorgecarleitao commented Aug 20, 2021

alippai commented Aug 20, 2021

ritchie46 commented Aug 20, 2021

ritchie46 commented Aug 20, 2021

ritchie46 commented Aug 23, 2021

Add transpose method to dataframe #1176

Add transpose method to dataframe #1176

Comments

tversteeg commented Aug 20, 2021

ritchie46 commented Aug 20, 2021

tversteeg commented Aug 20, 2021 • edited Loading

ritchie46 commented Aug 20, 2021

alippai commented Aug 20, 2021 • edited Loading

ritchie46 commented Aug 20, 2021

alippai commented Aug 20, 2021

jorgecarleitao commented Aug 20, 2021

alippai commented Aug 20, 2021

ritchie46 commented Aug 20, 2021

ritchie46 commented Aug 20, 2021

ritchie46 commented Aug 23, 2021

tversteeg commented Aug 20, 2021 •

edited

Loading

alippai commented Aug 20, 2021 •

edited

Loading