Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it possible to create a dataframe from row-like (Vec<Struct>) data? #1111

Closed
tomlister opened this issue Aug 7, 2021 · 10 comments
Closed

Comments

@tomlister
Copy link

tomlister commented Aug 7, 2021

Hi,

Is it possible to create a dataframe from row-like (Vec) data?

i.e A vec of OHLCFrame?

struct OHLCFrame {
    close_time: i64,
    open_price: f64,
    high_price: f64,
    low_price: f64,
    close_price: f64,
    volume: f64,
    quote_volume: f64
}

Cheers.

Also, I have a pretty cool issue number :)

@kkonghao
Copy link

kkonghao commented Aug 7, 2021

Yes I think it is necessary that I also have the same data processing requirements to add a new row to the dataframe

@ritchie46
Copy link
Member

Haha.. congrats on the number.

You can use from_rows. If performance is important, I'd recommend to see if you can still create data in a columnar fashion with the builders.

@tomlister
Copy link
Author

tomlister commented Aug 7, 2021

Dank u.
Ben je Nederlands? Ik spreek beetje Nederlands.

Ik ben Australisch

@tomlister
Copy link
Author

Bit confused as how to cast of convert to rows to supply an argument to from_rows

@ritchie46
Copy link
Member

Jep, Nederlands. 😄

A Row can have any combination of datatypes. In your case its all f64, but Polars cannot know that normally. So you will have to wrap the f64 data in the AnyValue enum to be able to wrap them inti a Row.

@tomlister
Copy link
Author

Thanks.

Sorry for being a pest. How do I enumerate row by row?

@ritchie46
Copy link
Member

Sorry for being a pest. How do I enumerate row by row?

How do you mean? What do you want to do?

@tomlister
Copy link
Author

tomlister commented Aug 8, 2021

Let's say we have a dataframe.

How would one go about iterating over it by row?

i.e (this is not the way to do it but will suffice in explaining what I'm trying to do)

for (idx, row) in df.iter().unwrap().enumerate() {
    let { close_time, close_price } = row;
}

@ritchie46
Copy link
Member

You can use get_row to access by index. This however quite slow as the data is layed out columnar in memory. Every row access comprises many cache misses.

Often the same goal can be acquired in a columnar fashion. In you're case, I'd first select the close_time and close_price columns and work on them.

Ideally I would also do the arithmetic on columns.

@tomlister
Copy link
Author

tomlister commented Aug 8, 2021 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants