2

How can we convert a dask_cudf column of string or nanoseconds to a datetime object? to_datetime is available in pandas and cudf. See sample data below

import pandas
import cudf

# with pandas
df = pandas.DataFrame( {'city' : ['Dallas','Bogota','Chicago','Juarez'], 
                      'timestamp' : [1664828099973725440,1664828099972763136,1664828094775313920,1664828081313273856]})

df['datetime'] = pd.to_datetime(df['timestamp'])

# with cdf
cdf = cudf.DataFrame( {'city' : ['Dallas','Bogota','Chicago','Juarez'], 
                      'timestamp' : [1664828099973725440,1664828099972763136,1664828094775313920,1664828081313273856]})
cdf['datetime'] = cudf.to_datetime(cdf['timestamp'])

print(df)
print(cdf) 

in either case, the result is the same:

      city            timestamp                      datetime
0   Dallas  1664828099973725440 2022-10-03 20:14:59.973725440
1   Bogota  1664828099972763136 2022-10-03 20:14:59.972763136
2  Chicago  1664828094775313920 2022-10-03 20:14:54.775313920
3   Juarez  1664828081313273856 2022-10-03 20:14:41.313273856

This recent SO question suggests using dask:

import dask_cudf
from dask import dataframe as dd

ddf = dask_cudf.from_cudf(cdf, npartitions=2)

dd.to_datetime(ddf['timestamp']).head()

produces an error. I am creating a dask_cudf from a large number of csv files in one directory.

1

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Browse other questions tagged or ask your own question.