You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
commodity | location | seller | price
-- | -- | -- | -- | --
Wheat | Chicago | Paul | 0.55
Wheat | StPaul | Charlie | 0.70
Wheat | StPaul | Susan | 0.80
Corn | Chicago | Charlie | 1.80
Corn | Chicago | Ed | 2.00
When I try to write the same in polars, I get something different:
pdf = pl.DataFrame(commodity_prices)
pdf.groupby(['location', 'commodity']).agg([
pl.col('*').sort_by('price').head(2)])
location
commodity
commodity
location
seller
price
str
str
list
list
list
list
"Chicago"
"Corn"
[Corn, Corn]
[Chicago, Chicago]
[Charlie, Ed]
[1.8, 2]
"Chicago"
"Wheat"
[Wheat]
[Chicago]
[Paul]
[0.55]
"StPaul"
"Wheat"
[Wheat, Wheat]
[StPaul, StPaul]
[Charlie, Susan]
[0.7, 0.8]
polars seems to be calculating the data I want, but giving it back in a way that's not quite what I want. Other than post-processing the output in Python, is there a better way to get pandas-style output for this query?
The text was updated successfully, but these errors were encountered:
Hi, That's an interesting one. A snippet that produces the same output is this:
df=pl.DataFrame(commodity_prices)
(df.sort(by="price")
.groupby(["commodity", "location"])
.agg([
col("seller").head(2).list().alias("seller"), # take the first two and aggregate to listcol("price").head(2).list().alias("price")
])
.explode(["price", "seller"]) # explode the lists to long format
.sort(by="price") # not really needed, but makes output predictable
)
However, I understand that a head(n) aggregation is far more ergonomic, I will see if I can add that.
Are you using Python or Rust?
Python.
What version of polars are you using?
polars 0.8.16
What operating system are you using polars on?
manylinux x86_64 (google colab)
Describe your bug.
I'm trying to translate the following pandas code to polars:
Output:
commodity | location | seller | price
-- | -- | -- | -- | --
Wheat | Chicago | Paul | 0.55
Wheat | StPaul | Charlie | 0.70
Wheat | StPaul | Susan | 0.80
Corn | Chicago | Charlie | 1.80
Corn | Chicago | Ed | 2.00
When I try to write the same in polars, I get something different:
polars seems to be calculating the data I want, but giving it back in a way that's not quite what I want. Other than post-processing the output in Python, is there a better way to get pandas-style output for this query?
The text was updated successfully, but these errors were encountered: