All Questions
38 questions
0
votes
2
answers
79
views
how do I append the output of a dask_cudf apply function to the original dask_cudf?
I am applying a function (e.g. letter frequency) to a dask_cudf dataframe that consists of a single column of words of fixed length.
I am trying to merge the output or append the output into the ...
0
votes
0
answers
116
views
How to Distribute Dask-CUDA Workload Across Multiple GPUs?
I'm working on a project where I need to evenly distribute data processing tasks across multiple GPUs using dask_cudf. Despite my current setup, the workload seems to be handled by only one GPU. I'm ...
0
votes
0
answers
132
views
Dask Dataframe using memory from a single GPU instead of all available in the cluster
I have a script running on an EC2 instance that reads vector embeddings from s3 and dumps them into a list variable; from there, it creates a dask dataframe that will be used in a Dask KMeans ...
0
votes
1
answer
281
views
Explain Dask-cuDF behavior
I try to read and process the 8gb csv file using cudf. Reading all file at once doesn't fit neither into GPU memory nor into my RAM. That's why I use the dask_cudf library. Here is the code:
import ...
0
votes
0
answers
103
views
Feature Selection, Outlier Removal, Target Transformer for Dask-ML pipelines
While FS, OR, TT have well-established components in "classic" scikit-learn pipelines, documentation of dask-ml and RAPIDS totally omits them.
What are the best practices to implement ...
1
vote
1
answer
879
views
How to parallel GPU processing of Dask dataframe
I would like to use dask to parallelize the data processing for dask cudf from Jupyter notebook on multiple GPUs.
import cudf from dask.distributed
import Client, wait, get_worker, get_client
from ...
1
vote
1
answer
75
views
NVidia Rapids filter neither works nor raises warn/errors
I am using Rapids 23.04 and trying to select reading from parquet/orc files based on select columns and rows. However, strangely the row filter is not working and I am unable to find the cause. Any ...
0
votes
1
answer
86
views
Rapidsai (DGA Streamz): ERROR- module dask has no attribute distributed
I have been trying to run the dga detection streamz on the rapidsai clx streamz docker container for the last few days without any resolution.I'm following the instructions on the rapids website: ...
2
votes
0
answers
200
views
how to convert 'dask_cudf' column to datetime?
How can we convert a dask_cudf column of string or nanoseconds to a datetime object? to_datetime is available in pandas and cudf. See sample data below
import pandas
import cudf
# with pandas
df = ...
1
vote
0
answers
192
views
dask_cudf dataframe convert column of datetime string to column of datetime object
I am a new user of Dask and RapidsAI.
An exerpt of my data (in csv format):
Symbol,Date,Open,High,Low,Close,Volume
AADR,17-Oct-2017 09:00,57.47,58.3844,57.3645,58.3844,2094
AADR,17-Oct-2017 10:00,57....
1
vote
1
answer
936
views
RuntimeError: Cluster failed to start with dask LocalCudaCluster example setup
I am new to Dask and I run into problems when executing the example code:
from dask.distributed import Client
from dask_cuda import LocalCUDACluster
cluster = LocalCUDACluster()
client = Client(...
0
votes
2
answers
129
views
cugraph create NoneType
I tried to create a Graph from a dask_cudf DataFrame, but the Graph get Nonetype without error Message. I tried it with the same data set also with a pandas dataframe. Then I tried it with three ...
1
vote
1
answer
853
views
Dask-cuDF to CuDF dataframe conversion
Is there any function, that convert Dask-cudf dataframe to Cudf dataframe?Like from_cudf for cudf to dask-cudf.
dgdf = dask_cudf.from_cudf(df, npartitions=2)
0
votes
1
answer
756
views
Running out of memory in Dask cuDF
I've been trying to solve memory management issues in dask_cudf in my recent project for quite some time recently, but it seems I'm missing something and I need your help. I am working on Tesla T4 GPU ...
-1
votes
1
answer
395
views
DASK CUDA on multi node EMR cluster is unable to detect nodes
I have setup an AWS EMR cluster using 10 core nodes of type g4dn.xlarge (each machine/node conatins 1 GPU). When I run the following commands on Zeppelin Notebook, I see only 1 worker allotted in my ...
1
vote
1
answer
76
views
Cannot create 3rd lagged columns with dask-cudf
I have the following dask_cudf.core.DataFrame:-
import pandas as pd
import numpy as np
import dask_cudf
import cudf
data = {"x":range(1,21), "nor":np.random.normal(2, 4, 20), &...
1
vote
2
answers
1k
views
List operation with CUDF dataframe
I have a Cudf dataframe which looks like this
The dtype of columns POSITION_ANTENNA1 and POSITION_ANTENNA2 are lists, and I want to construct a column = POSITION_ANTENNA1 - POSITION_ANTENNA2. However,...
0
votes
1
answer
2k
views
Handle "std::bad_alloc: out_of_memory: CUDA error" at Dask-cudf
I have a pc with a Nvida 3090 and 32GB ram.
I am loading a 9GB csv dataset, with millions of rows and 5 columns.
Anytime I run compute() it doesn't work and throws std::bad_alloc: out_of_memory: CUDA ...
0
votes
0
answers
37
views
'sub' operator not supported Dask_cudf
I came here due a question that surged while I'm following the tutorial's methodology https://docs.rapids.ai/api/cudf/nightly/user_guide/10min.html.
I have a dataframe imported as csv with the ...
0
votes
0
answers
201
views
How to read Protobuf files with Dask?
Has anyone tried reading Protobuf files over Dask? Each Protobuf file I have, has multiple records, and each record is prefixed with the length of the record (4 bytes) as shown in the snippet.
This is ...
0
votes
0
answers
320
views
Runtime Error when running a simple cuML code in a Dask environment
I'm trying to test a simple code using two remote workers. I don't know what is going on and what the error refers to.
The code is simple:
#!/usr/bin/python3
from cuml.dask.cluster import KMeans
from ...
0
votes
1
answer
550
views
Out of memory error with Dask and cudf loop
I am using Dask and Rapidsai to run an xgboost model on a large (6.9GB) dataset. The hardware is 4x 2080 TIs with 11 GB of memory each. The raw dataset has a few dozen target columns that have been ...
0
votes
1
answer
585
views
Unable to load and compute dask_cudf dataframe into blazing table and seeing some memory related errors. (cudaErrorMemoryAllocation out of memory)
Issue :
Trying to load a file (CSV and Parquet) using Dask CUDF and seeing some memory related errors. The dataset can easily fit into memory and the file
can be read correctly using BlazingSQL's ...
1
vote
0
answers
373
views
Can I split physical GPUs into multiple Logical/Virtual GPUS and pass them to dask_cuda.LocalCUDACluster?
I have a workflow which is greatly benefited from GPU acceleration, but each task has relatively low memory requirements (2-4 GB). I'm using a combination of dask.dataframe, dask.distributed.Client, ...
1
vote
1
answer
1k
views
Why am I getting an assertion error when create Device Quantile Matrix?
I am using the following code to load a csv file into a dask cudf, and then creating a devicequantilematrix for xgboost which yields the error:
cluster = LocalCUDACluster(rmm_pool_size=parse_bytes(&...
2
votes
1
answer
4k
views
How do I install dask_cudf?
I am using the follow lines in terminal to install rapids and then dask cudf:
conda create -n rapids-core-0.14 -c rapidsai -c nvidia -c conda-forge \
-c defaults rapids=0.14 python=3.7 ...
1
vote
1
answer
333
views
Why is cuml predict() method for KNearestNeighbors taking so long with dask_cudf DataFrame?
I have a large dataset (around 80 million rows) and I am training a KNearestNeighbors Regression model using cuml with a dask_cudf DataFrame.
I am using 4 GPU's with an rmm_pool_size of 15GB each:
...
4
votes
2
answers
3k
views
ERROR: Could not find a version that satisfies the requirement dask-cudf (from versions: none)
Describe the bug
When I am trying to import dask_cudf I get the following ERROR:
---------------------------------------------------------------------------
ModuleNotFoundError ...
0
votes
2
answers
2k
views
MemoryError: std::bad_alloc: rapids.ai Dask-cuDF
I would like to load 5.9 GB CSV and I don't use pandas library. I have 4 GPUs. I use rapids.ai to load this large dataset faster but every time that I tried, this error is shown to me although I have ...
1
vote
2
answers
2k
views
Interpreting package requests conflicts for a failed conda install
Attempting the following conda install operation (derived from the NVIDIA RAPIDS installation instructions):
conda config --prepend channels rapidsai && \
conda config --prepend channels ...
2
votes
1
answer
2k
views
Warning with CUDF/Python: "User Warning: No NVIDIA GPU detected"
I am having some difficulty running code with the cudf and dask_cudf modules in python.
I am working on Jupyter Labs through Anaconda. I have been able to correctly install my nvidia-gpu driver, cudf (...
2
votes
1
answer
2k
views
How can I use xgboost.dask with gpu to model a very large dataset in both a distributed and batched manner?
I would like to utilise multiple GPUs spread across many nodes to train an XGBoost model on a very large data set within Azure Machine Learning using 3 NC12s_v3 compute nodes. The dataset size exceeds ...
8
votes
2
answers
2k
views
Dask Vs Rapids. What does rapids provide which dask doesn't have?
I want to understand what is the difference between dask and rapids, what benefits does rapids provides which dask doesn't have.
Does rapids internally use dask code? If so then why do we have dask, ...
4
votes
2
answers
1k
views
MultiGPU Kmeans clustering with RAPIDs freezes
I am new into Python and Rapids.AI and I am trying to recreate SKLearn KMeans in a multinode GPU (I have 2 GPUs) using Dask and RAPIDs (I am using rapids with its docker, which mounts a Jupyter ...
1
vote
1
answer
478
views
cuML functions running on DASK? and dask_cudf manipulation?
How to run dask_cuML (logistic regression for example) on a large dataset, dask_cudf?
I can not run cuML on my cudf dataframe because dataset is large so "OUT of MEMORY" as soon as I try anything. ...
2
votes
1
answer
700
views
How to pre-cache dask.dataframe to all workers and partitions to reduce communication need
It’s sometimes appealing to use dask.dataframe.map_partitions for operations like merges. In some scenarios, when doing merges between a left_df and a right_df using map_partitions, I’d like to ...
2
votes
0
answers
314
views
Options for accelerating Python code through parallelizing/ multiprocessing
Below, I've gathered 4 ways to complete the execution of code that involves sorting updating Pandas Dataframes.
I would like to apply the best methods to speed up the code execution.
Am I using the ...
2
votes
1
answer
293
views
How much overhead is there per partition when loading dask_cudf partitions into GPU memory?
PCIE bus bandwidth latencies force constraints on how and when applications should copy data to and from GPUs.
When working with cuDF directly, I can efficiently move a single large chunk of data ...