Skip to main content

All Questions

Tagged with
Filter by
Sorted by
Tagged with
0 votes
2 answers
79 views

how do I append the output of a dask_cudf apply function to the original dask_cudf?

I am applying a function (e.g. letter frequency) to a dask_cudf dataframe that consists of a single column of words of fixed length. I am trying to merge the output or append the output into the ...
Austin So's user avatar
0 votes
0 answers
116 views

How to Distribute Dask-CUDA Workload Across Multiple GPUs?

I'm working on a project where I need to evenly distribute data processing tasks across multiple GPUs using dask_cudf. Despite my current setup, the workload seems to be handled by only one GPU. I'm ...
allo allo's user avatar
0 votes
0 answers
132 views

Dask Dataframe using memory from a single GPU instead of all available in the cluster

I have a script running on an EC2 instance that reads vector embeddings from s3 and dumps them into a list variable; from there, it creates a dask dataframe that will be used in a Dask KMeans ...
Péricles Serotini's user avatar
0 votes
1 answer
281 views

Explain Dask-cuDF behavior

I try to read and process the 8gb csv file using cudf. Reading all file at once doesn't fit neither into GPU memory nor into my RAM. That's why I use the dask_cudf library. Here is the code: import ...
shda's user avatar
  • 734
0 votes
0 answers
103 views

Feature Selection, Outlier Removal, Target Transformer for Dask-ML pipelines

While FS, OR, TT have well-established components in "classic" scikit-learn pipelines, documentation of dask-ml and RAPIDS totally omits them. What are the best practices to implement ...
Anatoly Alekseev's user avatar
1 vote
1 answer
879 views

How to parallel GPU processing of Dask dataframe

I would like to use dask to parallelize the data processing for dask cudf from Jupyter notebook on multiple GPUs. import cudf from dask.distributed import Client, wait, get_worker, get_client from ...
mtnt's user avatar
  • 31
1 vote
1 answer
75 views

NVidia Rapids filter neither works nor raises warn/errors

I am using Rapids 23.04 and trying to select reading from parquet/orc files based on select columns and rows. However, strangely the row filter is not working and I am unable to find the cause. Any ...
Quiescent's user avatar
  • 1,144
0 votes
1 answer
86 views

Rapidsai (DGA Streamz): ERROR- module dask has no attribute distributed

I have been trying to run the dga detection streamz on the rapidsai clx streamz docker container for the last few days without any resolution.I'm following the instructions on the rapids website: ...
Swooz's user avatar
  • 5
2 votes
0 answers
200 views

how to convert 'dask_cudf' column to datetime?

How can we convert a dask_cudf column of string or nanoseconds to a datetime object? to_datetime is available in pandas and cudf. See sample data below import pandas import cudf # with pandas df = ...
dleal's user avatar
  • 2,304
1 vote
0 answers
192 views

dask_cudf dataframe convert column of datetime string to column of datetime object

I am a new user of Dask and RapidsAI. An exerpt of my data (in csv format): Symbol,Date,Open,High,Low,Close,Volume AADR,17-Oct-2017 09:00,57.47,58.3844,57.3645,58.3844,2094 AADR,17-Oct-2017 10:00,57....
stucash's user avatar
  • 1,208
1 vote
1 answer
936 views

RuntimeError: Cluster failed to start with dask LocalCudaCluster example setup

I am new to Dask and I run into problems when executing the example code: from dask.distributed import Client from dask_cuda import LocalCUDACluster cluster = LocalCUDACluster() client = Client(...
LRyougiShikiZ's user avatar
0 votes
2 answers
129 views

cugraph create NoneType

I tried to create a Graph from a dask_cudf DataFrame, but the Graph get Nonetype without error Message. I tried it with the same data set also with a pandas dataframe. Then I tried it with three ...
padul's user avatar
  • 174
1 vote
1 answer
853 views

Dask-cuDF to CuDF dataframe conversion

Is there any function, that convert Dask-cudf dataframe to Cudf dataframe?Like from_cudf for cudf to dask-cudf. dgdf = dask_cudf.from_cudf(df, npartitions=2)
Nidhi Kumari's user avatar
0 votes
1 answer
756 views

Running out of memory in Dask cuDF

I've been trying to solve memory management issues in dask_cudf in my recent project for quite some time recently, but it seems I'm missing something and I need your help. I am working on Tesla T4 GPU ...
Milos's user avatar
  • 1
-1 votes
1 answer
395 views

DASK CUDA on multi node EMR cluster is unable to detect nodes

I have setup an AWS EMR cluster using 10 core nodes of type g4dn.xlarge (each machine/node conatins 1 GPU). When I run the following commands on Zeppelin Notebook, I see only 1 worker allotted in my ...
Putt's user avatar
  • 299
1 vote
1 answer
76 views

Cannot create 3rd lagged columns with dask-cudf

I have the following dask_cudf.core.DataFrame:- import pandas as pd import numpy as np import dask_cudf import cudf data = {"x":range(1,21), "nor":np.random.normal(2, 4, 20), &...
Shawn Brar's user avatar
  • 1,420
1 vote
2 answers
1k views

List operation with CUDF dataframe

I have a Cudf dataframe which looks like this The dtype of columns POSITION_ANTENNA1 and POSITION_ANTENNA2 are lists, and I want to construct a column = POSITION_ANTENNA1 - POSITION_ANTENNA2. However,...
Arpan Das's user avatar
  • 339
0 votes
1 answer
2k views

Handle "std::bad_alloc: out_of_memory: CUDA error" at Dask-cudf

I have a pc with a Nvida 3090 and 32GB ram. I am loading a 9GB csv dataset, with millions of rows and 5 columns. Anytime I run compute() it doesn't work and throws std::bad_alloc: out_of_memory: CUDA ...
jack's user avatar
  • 13
0 votes
0 answers
37 views

'sub' operator not supported Dask_cudf

I came here due a question that surged while I'm following the tutorial's methodology https://docs.rapids.ai/api/cudf/nightly/user_guide/10min.html. I have a dataframe imported as csv with the ...
jack's user avatar
  • 13
0 votes
0 answers
201 views

How to read Protobuf files with Dask?

Has anyone tried reading Protobuf files over Dask? Each Protobuf file I have, has multiple records, and each record is prefixed with the length of the record (4 bytes) as shown in the snippet. This is ...
Chinmay Chandak's user avatar
0 votes
0 answers
320 views

Runtime Error when running a simple cuML code in a Dask environment

I'm trying to test a simple code using two remote workers. I don't know what is going on and what the error refers to. The code is simple: #!/usr/bin/python3 from cuml.dask.cluster import KMeans from ...
jcfaracco's user avatar
  • 894
0 votes
1 answer
550 views

Out of memory error with Dask and cudf loop

I am using Dask and Rapidsai to run an xgboost model on a large (6.9GB) dataset. The hardware is 4x 2080 TIs with 11 GB of memory each. The raw dataset has a few dozen target columns that have been ...
datahappy's user avatar
  • 856
0 votes
1 answer
585 views

Unable to load and compute dask_cudf dataframe into blazing table and seeing some memory related errors. (cudaErrorMemoryAllocation out of memory)

Issue : Trying to load a file (CSV and Parquet) using Dask CUDF and seeing some memory related errors. The dataset can easily fit into memory and the file can be read correctly using BlazingSQL's ...
chaitanyac3's user avatar
1 vote
0 answers
373 views

Can I split physical GPUs into multiple Logical/Virtual GPUS and pass them to dask_cuda.LocalCUDACluster?

I have a workflow which is greatly benefited from GPU acceleration, but each task has relatively low memory requirements (2-4 GB). I'm using a combination of dask.dataframe, dask.distributed.Client, ...
Alex Rakowski's user avatar
1 vote
1 answer
1k views

Why am I getting an assertion error when create Device Quantile Matrix?

I am using the following code to load a csv file into a dask cudf, and then creating a devicequantilematrix for xgboost which yields the error: cluster = LocalCUDACluster(rmm_pool_size=parse_bytes(&...
lara_toff's user avatar
  • 442
2 votes
1 answer
4k views

How do I install dask_cudf?

I am using the follow lines in terminal to install rapids and then dask cudf: conda create -n rapids-core-0.14 -c rapidsai -c nvidia -c conda-forge \ -c defaults rapids=0.14 python=3.7 ...
lara_toff's user avatar
  • 442
1 vote
1 answer
333 views

Why is cuml predict() method for KNearestNeighbors taking so long with dask_cudf DataFrame?

I have a large dataset (around 80 million rows) and I am training a KNearestNeighbors Regression model using cuml with a dask_cudf DataFrame. I am using 4 GPU's with an rmm_pool_size of 15GB each: ...
agp's user avatar
  • 31
4 votes
2 answers
3k views

ERROR: Could not find a version that satisfies the requirement dask-cudf (from versions: none)

Describe the bug When I am trying to import dask_cudf I get the following ERROR: --------------------------------------------------------------------------- ModuleNotFoundError ...
sogu's user avatar
  • 3,076
0 votes
2 answers
2k views

MemoryError: std::bad_alloc: rapids.ai Dask-cuDF

I would like to load 5.9 GB CSV and I don't use pandas library. I have 4 GPUs. I use rapids.ai to load this large dataset faster but every time that I tried, this error is shown to me although I have ...
Omid Erfanmanesh's user avatar
1 vote
2 answers
2k views

Interpreting package requests conflicts for a failed conda install

Attempting the following conda install operation (derived from the NVIDIA RAPIDS installation instructions): conda config --prepend channels rapidsai && \ conda config --prepend channels ...
Aleksey Bilogur's user avatar
2 votes
1 answer
2k views

Warning with CUDF/Python: "User Warning: No NVIDIA GPU detected"

I am having some difficulty running code with the cudf and dask_cudf modules in python. I am working on Jupyter Labs through Anaconda. I have been able to correctly install my nvidia-gpu driver, cudf (...
Maggie's user avatar
  • 23
2 votes
1 answer
2k views

How can I use xgboost.dask with gpu to model a very large dataset in both a distributed and batched manner?

I would like to utilise multiple GPUs spread across many nodes to train an XGBoost model on a very large data set within Azure Machine Learning using 3 NC12s_v3 compute nodes. The dataset size exceeds ...
HowdyEarth's user avatar
8 votes
2 answers
2k views

Dask Vs Rapids. What does rapids provide which dask doesn't have?

I want to understand what is the difference between dask and rapids, what benefits does rapids provides which dask doesn't have. Does rapids internally use dask code? If so then why do we have dask, ...
DjVasu's user avatar
  • 113
4 votes
2 answers
1k views

MultiGPU Kmeans clustering with RAPIDs freezes

I am new into Python and Rapids.AI and I am trying to recreate SKLearn KMeans in a multinode GPU (I have 2 GPUs) using Dask and RAPIDs (I am using rapids with its docker, which mounts a Jupyter ...
JuMoGar's user avatar
  • 1,760
1 vote
1 answer
478 views

cuML functions running on DASK? and dask_cudf manipulation?

How to run dask_cuML (logistic regression for example) on a large dataset, dask_cudf? I can not run cuML on my cudf dataframe because dataset is large so "OUT of MEMORY" as soon as I try anything. ...
Salchem's user avatar
  • 128
2 votes
1 answer
700 views

How to pre-cache dask.dataframe to all workers and partitions to reduce communication need

It’s sometimes appealing to use dask.dataframe.map_partitions for operations like merges. In some scenarios, when doing merges between a left_df and a right_df using map_partitions, I’d like to ...
Nick Becker's user avatar
  • 4,214
2 votes
0 answers
314 views

Options for accelerating Python code through parallelizing/ multiprocessing

Below, I've gathered 4 ways to complete the execution of code that involves sorting updating Pandas Dataframes. I would like to apply the best methods to speed up the code execution. Am I using the ...
Kdog's user avatar
  • 513
2 votes
1 answer
293 views

How much overhead is there per partition when loading dask_cudf partitions into GPU memory?

PCIE bus bandwidth latencies force constraints on how and when applications should copy data to and from GPUs. When working with cuDF directly, I can efficiently move a single large chunk of data ...
Randy Gelhausen's user avatar