Newest 'rapids+dask' Questions

0 votes

2 answers

79 views

how do I append the output of a dask_cudf apply function to the original dask_cudf?

I am applying a function (e.g. letter frequency) to a dask_cudf dataframe that consists of a single column of words of fixed length. I am trying to merge the output or append the output into the ...

Austin So

1

asked Oct 28 at 16:23

0 votes

0 answers

116 views

How to Distribute Dask-CUDA Workload Across Multiple GPUs?

I'm working on a project where I need to evenly distribute data processing tasks across multiple GPUs using dask_cudf. Despite my current setup, the workload seems to be handled by only one GPU. I'm ...

allo allo

1

asked May 29 at 9:14

0 votes

0 answers

132 views

Dask Dataframe using memory from a single GPU instead of all available in the cluster

I have a script running on an EC2 instance that reads vector embeddings from s3 and dumps them into a list variable; from there, it creates a dask dataframe that will be used in a Dask KMeans ...

Péricles Serotini

11

asked Feb 21 at 13:03

0 votes

1 answer

281 views

Explain Dask-cuDF behavior

I try to read and process the 8gb csv file using cudf. Reading all file at once doesn't fit neither into GPU memory nor into my RAM. That's why I use the dask_cudf library. Here is the code: import ...

shda

734

asked Jan 25 at 12:13

0 votes

0 answers

103 views

Feature Selection, Outlier Removal, Target Transformer for Dask-ML pipelines

While FS, OR, TT have well-established components in "classic" scikit-learn pipelines, documentation of dask-ml and RAPIDS totally omits them. What are the best practices to implement ...

Anatoly Alekseev

2,370

asked Jul 2, 2023 at 7:49

1 vote

1 answer

879 views

How to parallel GPU processing of Dask dataframe

I would like to use dask to parallelize the data processing for dask cudf from Jupyter notebook on multiple GPUs. import cudf from dask.distributed import Client, wait, get_worker, get_client from ...

mtnt

31

asked Jun 21, 2023 at 0:40

1 vote

1 answer

75 views

NVidia Rapids filter neither works nor raises warn/errors

I am using Rapids 23.04 and trying to select reading from parquet/orc files based on select columns and rows. However, strangely the row filter is not working and I am unable to find the cause. Any ...

Quiescent

1,144

asked May 17, 2023 at 15:22

0 votes

1 answer

86 views

Rapidsai (DGA Streamz): ERROR- module dask has no attribute distributed

I have been trying to run the dga detection streamz on the rapidsai clx streamz docker container for the last few days without any resolution.I'm following the instructions on the rapids website: ...

Swooz

5

asked May 2, 2023 at 20:36

2 votes

0 answers

200 views

how to convert 'dask_cudf' column to datetime?

How can we convert a dask_cudf column of string or nanoseconds to a datetime object? to_datetime is available in pandas and cudf. See sample data below import pandas import cudf # with pandas df = ...

dleal

2,304

asked Apr 23, 2023 at 2:10

1 vote

0 answers

192 views

dask_cudf dataframe convert column of datetime string to column of datetime object

I am a new user of Dask and RapidsAI. An exerpt of my data (in csv format): Symbol,Date,Open,High,Low,Close,Volume AADR,17-Oct-2017 09:00,57.47,58.3844,57.3645,58.3844,2094 AADR,17-Oct-2017 10:00,57....

stucash

1,208

asked Mar 30, 2023 at 14:55

1 vote

1 answer

936 views

RuntimeError: Cluster failed to start with dask LocalCudaCluster example setup

I am new to Dask and I run into problems when executing the example code: from dask.distributed import Client from dask_cuda import LocalCUDACluster cluster = LocalCUDACluster() client = Client(...

LRyougiShikiZ

13

asked Feb 24, 2023 at 8:07

0 votes

2 answers

129 views

cugraph create NoneType

I tried to create a Graph from a dask_cudf DataFrame, but the Graph get Nonetype without error Message. I tried it with the same data set also with a pandas dataframe. Then I tried it with three ...

padul

174

asked Sep 23, 2022 at 11:18

1 vote

1 answer

853 views

Dask-cuDF to CuDF dataframe conversion

Is there any function, that convert Dask-cudf dataframe to Cudf dataframe?Like from_cudf for cudf to dask-cudf. dgdf = dask_cudf.from_cudf(df, npartitions=2)

Nidhi Kumari

11

asked Aug 15, 2022 at 14:58

0 votes

1 answer

756 views

Running out of memory in Dask cuDF

I've been trying to solve memory management issues in dask_cudf in my recent project for quite some time recently, but it seems I'm missing something and I need your help. I am working on Tesla T4 GPU ...

Milos

1

asked Jun 24, 2022 at 15:05

-1 votes

1 answer

395 views

DASK CUDA on multi node EMR cluster is unable to detect nodes

I have setup an AWS EMR cluster using 10 core nodes of type g4dn.xlarge (each machine/node conatins 1 GPU). When I run the following commands on Zeppelin Notebook, I see only 1 worker allotted in my ...

Putt

299

asked Jun 7, 2022 at 12:00

1 vote

1 answer

76 views

Cannot create 3rd lagged columns with dask-cudf

I have the following dask_cudf.core.DataFrame:- import pandas as pd import numpy as np import dask_cudf import cudf data = {"x":range(1,21), "nor":np.random.normal(2, 4, 20), &...

Shawn Brar

1,420

asked Jun 4, 2022 at 10:52

1 vote

2 answers

1k views

List operation with CUDF dataframe

I have a Cudf dataframe which looks like this The dtype of columns POSITION_ANTENNA1 and POSITION_ANTENNA2 are lists, and I want to construct a column = POSITION_ANTENNA1 - POSITION_ANTENNA2. However,...

Arpan Das

339

asked May 25, 2022 at 14:41

0 votes

1 answer

2k views

Handle "std::bad_alloc: out_of_memory: CUDA error" at Dask-cudf

I have a pc with a Nvida 3090 and 32GB ram. I am loading a 9GB csv dataset, with millions of rows and 5 columns. Anytime I run compute() it doesn't work and throws std::bad_alloc: out_of_memory: CUDA ...

jack

13

asked May 15, 2022 at 10:29

0 votes

0 answers

37 views

'sub' operator not supported Dask_cudf

I came here due a question that surged while I'm following the tutorial's methodology https://docs.rapids.ai/api/cudf/nightly/user_guide/10min.html. I have a dataframe imported as csv with the ...

jack

13

asked May 11, 2022 at 6:54

0 votes

0 answers

201 views

How to read Protobuf files with Dask?

Has anyone tried reading Protobuf files over Dask? Each Protobuf file I have, has multiple records, and each record is prefixed with the length of the record (4 bytes) as shown in the snippet. This is ...

Chinmay Chandak

11

asked Apr 26, 2022 at 4:47

0 votes

0 answers

320 views

Runtime Error when running a simple cuML code in a Dask environment

I'm trying to test a simple code using two remote workers. I don't know what is going on and what the error refers to. The code is simple: #!/usr/bin/python3 from cuml.dask.cluster import KMeans from ...

jcfaracco

894

asked Dec 27, 2021 at 15:01

0 votes

1 answer

550 views

Out of memory error with Dask and cudf loop

I am using Dask and Rapidsai to run an xgboost model on a large (6.9GB) dataset. The hardware is 4x 2080 TIs with 11 GB of memory each. The raw dataset has a few dozen target columns that have been ...

datahappy

856

asked Jun 8, 2021 at 16:01

0 votes

1 answer

585 views

Unable to load and compute dask_cudf dataframe into blazing table and seeing some memory related errors. (cudaErrorMemoryAllocation out of memory)

Issue : Trying to load a file (CSV and Parquet) using Dask CUDF and seeing some memory related errors. The dataset can easily fit into memory and the file can be read correctly using BlazingSQL's ...

chaitanyac3

1

asked Apr 29, 2021 at 5:15

1 vote

0 answers

373 views

Can I split physical GPUs into multiple Logical/Virtual GPUS and pass them to dask_cuda.LocalCUDACluster?

I have a workflow which is greatly benefited from GPU acceleration, but each task has relatively low memory requirements (2-4 GB). I'm using a combination of dask.dataframe, dask.distributed.Client, ...

Alex Rakowski

39

asked Apr 16, 2021 at 22:02

1 vote

1 answer

1k views

Why am I getting an assertion error when create Device Quantile Matrix?

I am using the following code to load a csv file into a dask cudf, and then creating a devicequantilematrix for xgboost which yields the error: cluster = LocalCUDACluster(rmm_pool_size=parse_bytes(&...

lara_toff

442

asked Jan 21, 2021 at 1:13

2 votes

1 answer

4k views

How do I install dask_cudf?

I am using the follow lines in terminal to install rapids and then dask cudf: conda create -n rapids-core-0.14 -c rapidsai -c nvidia -c conda-forge \ -c defaults rapids=0.14 python=3.7 ...

lara_toff

442

asked Jan 20, 2021 at 4:20

1 vote

1 answer

333 views

Why is cuml predict() method for KNearestNeighbors taking so long with dask_cudf DataFrame?

I have a large dataset (around 80 million rows) and I am training a KNearestNeighbors Regression model using cuml with a dask_cudf DataFrame. I am using 4 GPU's with an rmm_pool_size of 15GB each: ...

agp

31

asked Nov 11, 2020 at 11:36

4 votes

2 answers

3k views

ERROR: Could not find a version that satisfies the requirement dask-cudf (from versions: none)

Describe the bug When I am trying to import dask_cudf I get the following ERROR: --------------------------------------------------------------------------- ModuleNotFoundError ...

sogu

3,076

asked Oct 28, 2020 at 16:13

0 votes

2 answers

2k views

MemoryError: std::bad_alloc: rapids.ai Dask-cuDF

I would like to load 5.9 GB CSV and I don't use pandas library. I have 4 GPUs. I use rapids.ai to load this large dataset faster but every time that I tried, this error is shown to me although I have ...

Omid Erfanmanesh

617

asked Aug 26, 2020 at 13:03

1 vote

2 answers

2k views

Interpreting package requests conflicts for a failed conda install

Attempting the following conda install operation (derived from the NVIDIA RAPIDS installation instructions): conda config --prepend channels rapidsai && \ conda config --prepend channels ...

Aleksey Bilogur

3,836

asked Jul 30, 2020 at 19:04

2 votes

1 answer

2k views

Warning with CUDF/Python: "User Warning: No NVIDIA GPU detected"

I am having some difficulty running code with the cudf and dask_cudf modules in python. I am working on Jupyter Labs through Anaconda. I have been able to correctly install my nvidia-gpu driver, cudf (...

Maggie

23

asked Jul 13, 2020 at 16:45

2 votes

1 answer

2k views

How can I use xgboost.dask with gpu to model a very large dataset in both a distributed and batched manner?

I would like to utilise multiple GPUs spread across many nodes to train an XGBoost model on a very large data set within Azure Machine Learning using 3 NC12s_v3 compute nodes. The dataset size exceeds ...

HowdyEarth

63

asked Jul 2, 2020 at 11:28

8 votes

2 answers

2k views

Dask Vs Rapids. What does rapids provide which dask doesn't have?

I want to understand what is the difference between dask and rapids, what benefits does rapids provides which dask doesn't have. Does rapids internally use dask code? If so then why do we have dask, ...

DjVasu

113

asked Mar 18, 2020 at 11:44

4 votes

2 answers

1k views

MultiGPU Kmeans clustering with RAPIDs freezes

I am new into Python and Rapids.AI and I am trying to recreate SKLearn KMeans in a multinode GPU (I have 2 GPUs) using Dask and RAPIDs (I am using rapids with its docker, which mounts a Jupyter ...

JuMoGar

1,760

asked Mar 6, 2020 at 11:56

1 vote

1 answer

478 views

cuML functions running on DASK? and dask_cudf manipulation?

How to run dask_cuML (logistic regression for example) on a large dataset, dask_cudf? I can not run cuML on my cudf dataframe because dataset is large so "OUT of MEMORY" as soon as I try anything. ...

Salchem

128

asked Feb 6, 2020 at 6:34

2 votes

1 answer

700 views

How to pre-cache dask.dataframe to all workers and partitions to reduce communication need

It’s sometimes appealing to use dask.dataframe.map_partitions for operations like merges. In some scenarios, when doing merges between a left_df and a right_df using map_partitions, I’d like to ...

Nick Becker

4,214

asked Jul 30, 2019 at 14:47

2 votes

0 answers

314 views

Options for accelerating Python code through parallelizing/ multiprocessing

Below, I've gathered 4 ways to complete the execution of code that involves sorting updating Pandas Dataframes. I would like to apply the best methods to speed up the code execution. Am I using the ...

Kdog

513

asked Feb 19, 2019 at 21:41

2 votes

1 answer

293 views

How much overhead is there per partition when loading dask_cudf partitions into GPU memory?

PCIE bus bandwidth latencies force constraints on how and when applications should copy data to and from GPUs. When working with cuDF directly, I can efficiently move a single large chunk of data ...

Randy Gelhausen

135

asked Feb 14, 2019 at 18:41

Collectives™ on Stack Overflow

All Questions

Related Tags