Newest 'ray' Questions

0 votes

0 answers

24 views

AttributeError: 'numpy.ndarray' object has no attribute 'categories'

Modin DataFrame Merge Issue After dropna on Categorical Column: I'm encountering an issue when using Modin to merge DataFrames that contain categorical columns. The issue arose after I performed a ...

Sumukha G C

13

asked yesterday

0 votes

0 answers

16 views

Correct way of using foreach_worker and foreach_env

I am quite new to Reinforcement Learning and can’t understand it. I am unable to update configurations for the batch data using PPO. I am using my custom-defined GYM environment, and want to train it ...

Abid Meraj

1

asked Dec 10 at 11:42

0 votes

1 answer

47 views

How to Parallelize a Flask App with Gunicorn and Distribute GPU Usage Among Workers?

I am building a Flask app to handle facial embeddings using DeepFace. My goal is to serve approximately 50 clients, with an estimated 10 requests per minute. Each request involves running deepface....

gpu-try-deepface

1

asked Dec 3 at 9:39

-1 votes

0 answers

58 views

After ray.get a stored large object within a remote function I get Error

I used the following example code received from chatGPT when asking "error ObjectRef in ray.get after ray.put". The model suggested this code example should work, but it doesn't: import ray ...

mike

43

asked Nov 20 at 10:05

0 votes

0 answers

22 views

Getting Serialisation Error on Initial Call to Class Function Decorated with Ray.remote

I'm using Ray with ray.remote to define an InferenceActor class, which includes a method run_inference which contains one parameter (A list of strings) for handling model inference tasks. However, ...

Matthew Dickson

21

asked Nov 19 at 11:02

0 votes

1 answer

32 views

Pytorch + Ray Tune Reporting ImplicitFunc Is Too Large, No Idea Which Reference Is Large

Similar to this question, Ray Tune is reporting to me: ValueError: The actor ImplicitFunc is too large (421 MiB > FUNCTION_SIZE_ERROR_THRESHOLD=95 MiB). Check that its definition is not implicitly ...

Falcondance

90

asked Nov 12 at 16:21

0 votes

1 answer

74 views

How can I ensure that my Python logic runs exclusively on the Apache Ray Worker Nodes?

I am using Apache Ray to create a customized cluster for running my logic. However, when I submit my tasks with ray.remote, they are executing on the driver node rather than on the worker nodes I ...

Learnings

2,938

asked Nov 11 at 5:14

0 votes

1 answer

70 views

How to configure Ray cluster to utilize the Full Capacity of Databricks Cluster

I have a Databricks cluster configured with a minimum of 1 worker and a maximum of 4 workers, with auto-scaling enabled. What should my Ray configuration (setup_ray_cluster) be to fully utilize the ...

Learnings

2,938

asked Nov 8 at 4:40

0 votes

2 answers

62 views

Ridiculous VMEM usage when using Ray on a cluster

Initial Problem: I am testing out a multiprocessing Python package called Ray to parallelise my code. Original code works fine on my laptop, core-i7-13800H, 32GB RAM. When running on a local cluster ...

Bawb

11

asked Nov 4 at 11:51

0 votes

0 answers

28 views

Understanding the ray.get() method

I am fairly new to Ray and I am struggling to understand what the ray.get() function actually does. I found a small example online here that can help. @ray.remote class Prime: # Constructor ...

Kit Searle

1

asked Oct 24 at 16:49

1 vote

0 answers

58 views

inter core interconnect Checking in simple Slices of TPU

ICI (inter core interconnects) offers a very fast connectivity with TPUs (that is connected with different hosts) and thus also increase its total available memory for TPU calculations (I guess!). ...

Krishna Mohan

11

asked Oct 14 at 11:04

0 votes

0 answers

55 views

helm install raycluster kuberay/ray-cluster --version 1.1.1 stuck at pulling image

I follow the documentation to deploy a raycluster on kubernete, I already setup a Kubernete cluster, now I am deploy a Ray cluster on top of it. https://docs.ray.io/en/latest/cluster/kubernetes/...

Liang

155

asked Oct 2 at 18:14

0 votes

0 answers

24 views

Android sceneView 3D model HitResult

This is my first time trying 3D, I'm using SceneView <io.github.sceneview.SceneView android:id="@+id/sceneView" android:layout_width="match_parent" ...

WeiChen Chen

1

asked Sep 25 at 1:51

0 votes

0 answers

21 views

Policies' directory not present in saved checkpoint

I'm using RayRL Lib, and after switching to the new API version, the checkpoint directory no longer includes the policies folder. Why this might be happening? Currently, the checkpoints contain the ...

Khashayar Ghamati

368

asked Sep 16 at 15:18

0 votes

0 answers

118 views

What is `_serve_asgi_lifespan` in Ray Serve?

I'm trying to use Ray + vLLM and face AttributeError ('VLLMDeployment' object has no attribute '_serve_asgi_lifespan'). I would like to know how to solve this issue. Steps Download the Ray Docker ...

dmjy

1,801

asked Sep 16 at 9:12

0 votes

2 answers

70 views

How can I specify the port number of health check of Ray?

I have two windows servers (192.168.1.11 and 192.168.1.12) and try to run a Ray Docker container (image tag = 2.35.0-py312-gpu) on each server. Steps I run these two commands to start the Ray process....

dmjy

1,801

asked Sep 13 at 11:56

0 votes

0 answers

53 views

K8s Readiness probe failed: success for ray-worker, docs maybe unclear

I’m working on setting up a Ray cluster with a head node and worker pods. While the head node deploys successfully and functions as expected, the worker node fails with the error: “Readiness probe ...

zacko

357

asked Sep 12 at 12:31

1 vote

2 answers

255 views

How to make ray task async

I want to run a function (Ray Task) that may trigger another request afterward. For example, if I have 10 tasks but only 1 CPU, the system will process one task at a time since each task requires 1 ...

zacko

357

asked Sep 9 at 14:12

0 votes

0 answers

39 views

Using Ray and RayCast to look around in first person

I am currently working on a small project where you controll the character in first person mode, allthough this question is also relevant for third person. What I want to do is looking around by just ...

Oliver Hostettler

1

asked Sep 5 at 15:03

1 vote

0 answers

35 views

Discrepancies in Output with Ray Parallelization of ODE Propagation

I am using Ray to parallelize my ODE propagation code, which employs solve_ivp with the lsoda solver. For performance, the ODE code also uses Numba JIT. While the code runs correctly in a single-...

Peng

11

asked Aug 23 at 18:20

0 votes

0 answers

27 views

Slurm step failure capture via trap

I am trying to setup my ray cluster with a sbatch script. I am starting head & worker nodes as steps in my script. The worker nodes are expected to keep running till the job is alive. mysbatch....

DOOM

1,244

asked Aug 5 at 19:35

0 votes

0 answers

113 views

How to do distributed batch inference using tensor parallelism with Ray?

I want to perform offline batch inference with a model that is too large to fit into one GPU. I want to use tensor parallelism for this. Previously I have used vLLM for batch inference. However, now I ...

ganto

222

asked Aug 5 at 11:18

0 votes

0 answers

33 views

Ray + lightning prepare_data_loader MisconfigurationException

I am trying to start a training session with Ray on GPU but experiencing errors while on CPU everything works smoothly. The issues are raising from the data modules: I have the following class which ...

magzmag

1

asked Aug 5 at 4:16

0 votes

0 answers

53 views

utilize computer resources on RLLIB config.build().train() effectively

I am trying to learn what are the best practices for determining parameters that utilize computer resources effectively in RlLib. On my laptop, I have 16 cpu cores and 1 gpu. I tried running the ...

Jordan

45

asked Aug 1 at 3:14

0 votes

0 answers

46 views

Extending a Ray Actor or make a subclass be a Ray Actor

Look at the code below: class A: def __init__(self, n): self.n = n class B(A): def __init__(self): super(B, self).__init__(n=10) Adding @ray.remote at the beginning of any ...

Rick Dou

314

asked Jul 30 at 9:32

0 votes

0 answers

89 views

Pyarrow error with ray & lightning on databricks

I am trying to train a neural net with pytorch lightning on ray on a databricks cluster. As a start, I copied the example from https://docs.ray.io/en/latest/train/getting-started-pytorch-lightning....

DataDiver

1

asked Jul 25 at 8:08

0 votes

0 answers

34 views

Memory Issue with Ray in Cluster Environment

I'm new to using Ray, and I've set up a workflow to read and process several .csv files using Pandas. Here's a snippet of my setup: with on_ray( num_cpus=6, object_store_memory=10 * 1024 * ...

Vgamero

1

asked Jul 25 at 3:31

0 votes

0 answers

27 views

When to apply backface culling, depending on the ray and material type?

I am currently implementing a ray tracer, which supports reflection and refraction. I have the following types of rays: camera rays shadow rays reflection rays refraction rays I have the following ...

Kotaka Danski

590

asked Jul 19 at 16:19

0 votes

0 answers

32 views

It is possible to set up ray cluster on top of p2p networks?

i am learning libp2p and ray. just wonder whether it is possible to get the IPs (reachable IPs among these nodes) of some nodes in p2p networks and then set up ray cluster on these nodes? here, we do ...

Chengcheng Pei

11

asked Jul 15 at 23:57

0 votes

0 answers

164 views

How to Forward gRPC Requests in a Proxy Server for Ray?

I am implementing a reverse proxy for Ray. The reverse proxy works well for HTTP requests, but some communications of Ray use gRPC. For example, the reverse proxy is getting requests like http://...

Emmanuel Murairi

401

asked Jul 10 at 4:45

0 votes

0 answers

65 views

Error: Missing argument 'CLUSTER_CONFIG_FILE'. Ray GCP

I have created a GCP Ray cluster from the ray cluster dashboard within GCP, I have also created a Ray cluster locally via docker compose. Is there an easy way to generate the ray cluster config? For ...

jm-nab

74

asked Jul 8 at 15:02

0 votes

0 answers

152 views

How to implement ray server with multiple gpus?

I'm trying to implement a multi-gpu local server with ray and vllm. I have uploaded my full code and commands to this github repository. In short, I want to serve a big model that requires 2 gpus, but ...

Boyuan Chen

43

asked Jul 3 at 13:19

0 votes

0 answers

141 views

Overriding Ray dashboard url returned by ray.init()

I'm running a JupyterHub installation on Kubernetes (EKS) and an elastic Ray cluster automatically starts for each user when their notebook starts, and automatically stops when the notebook closes. ...

Igor

337

asked Jun 18 at 13:39

0 votes

0 answers

159 views

Ray logging is not working for logging.info calls in main or worker process

I am trying to set up a logger on Windows that can output messages to both a log file as well as stdout with different log levels. I am using ray to run a couple of remote worker processes so I would ...

altwood

131

asked Jun 12 at 10:50

0 votes

1 answer

66 views

Custom MLPPolicy issues in Ray RLLIB

I'm trying to create a custom MLP-based policy in Ray Rllib using this code below: python: 3.10 Rayrlib version: 2.23 class CustomMLPModel(TorchModelV2, nn.Module): def __init__(self, obs_space, ...

David

33

asked Jun 12 at 5:07

0 votes

0 answers

23 views

How exactly do Ray resource requests compare with K8s?

I'm reading Ray's Architecture doc, and it says: Note that because resource requests are logical, physical resource limits are not enforced by Ray. It is up to the user to specify accurate resource ...

Ovesh

5,369

asked Jun 7 at 22:35

0 votes

1 answer

42 views

Does Ray Offer any Functional/Declarative Interface to Map a Remote Function to an Iterator/Iterable?

My present Code #!/usr/bin/env python3 # encoding: utf-8 """Demonstration of Ray parallelism""" import ray from typing import Iterator ray.init() @ray.remote def square(n:...

Della

1,602

asked Jun 4 at 7:41

1 vote

1 answer

295 views

Ray serve error: serve.run throws utf-8 can't decode byte 0xf8 in position 0 invalid byte

I am trying to run serve.run in my test method but when the test runs, it throws an error in this part of the code: serve.run(RayFastApiWrapper.bind(), route_prefix=settings.app_root_url) ...

Jorge Cespedes

587

asked May 25 at 16:54

0 votes

0 answers

34 views

How do I access shared object with ray in python?

I got an issue while parallelizing the attached code. Unfortunately it seems that ray cannot access the object spe10. The error it gives is AttributeError: Can't get attribute 'Spe10' on <module '...

user24959319

1

asked May 9 at 16:58

1 vote

0 answers

72 views

How to properly clean up non-serializable states associated with a Ray object?

Suppose I have a Ray actor that can create a Ray object that associates with some non-serializable states. In the following example, the non-serializable state is a temporary directory. class MyObject:...

Yang Bo

3,710

asked May 2 at 18:19

0 votes

0 answers

64 views

!!! FAIL serialization: cannot pickle 'EnhancedModule' object. Problem: Ray Serialization

I've been running a Python task on Ray Cluster and AWS. I have some custom classes and functions and can't get past serialization. Receiving such errors: !!! FAIL serialization: cannot pickle '...

Tim Cvetko

1

asked May 1 at 16:45

0 votes

0 answers

99 views

How to parallelize a for-loop within a defined function using Ray in Python

I have been tasked with parallelizing a piece of code for a huge project I am taking a part in. The code.py I have been tasked with the parallelization goes something like this: import numpy as np #...

user399840

11

asked May 1 at 10:29

0 votes

0 answers

109 views

ray failed to init on win11 in python 3.9 and raise a TimeoutError

Python 3.9.19 (main, Mar 21 2024, 17:21:27) [MSC v.1916 64 bit (AMD64)] ray.__version__: '2.10.0' platform:Win11 When I init ray as follows: import ray ray.init() it raise an error: Traceback (most ...

yichuan zhang

1

asked Apr 27 at 2:07

2 votes

1 answer

401 views

Ray error when trying to deploy Llama3 70b with VLLM with Vertex AI

Using Vertex ai custom container online predictions, i'm trying to deploy: meta-llama/Meta-Llama-3-70B-Instruct with vllm 0.4.1 on 8 NVIDIA_L4 gpus and gettings: /tmp/ray is over 95% full, ...

Tsvi Sabo

665

asked Apr 26 at 14:29

1 vote

0 answers

28 views

Loading pickle file in Mac after writing in linux causing issues

I am using ray rllib to save and load checkpoints. I am using the same version across Mac and Linux. I want to be able train on linux and infer on my Mac but I am getting the following error: ...

Prabhjot Singh Rai

2,507

asked Apr 25 at 10:38

0 votes

1 answer

71 views

In a Jupyter notebook, create an object from a class referenced using a string to use with ray

My 3.6.3 Jupyterlab notebook is running Python 3.10.11. I am trying to use Ray to implement some asymmetrical code. This would be simple, except that in my remote function I am trying to implement a ...

Calab

363

asked Apr 10 at 19:25

0 votes

0 answers

71 views

How to serialise a TensorFlow Hub "KerasLayer" to enable parallel computing with libraries like joblib or Ray?

To work within the constraints of a Kaggle competition, I'm using a pre-trained EfficientNetv2-m model that's loaded into my Kaggle notebook using TensorFlow Hub (I can't see another way of loading ...

user23798276

1

asked Apr 8 at 19:04

3 votes

1 answer

353 views

Small PyTorch networks take almost 3 GB of RAM to train on MNIST

I am running into problems using PyTorch. I have to run some experiments on PyTorch custom models, and given that I have to train a lot of them I tried to run them in parallel using ray[tune]. I have ...

Noumeno

161

asked Apr 3 at 16:28

0 votes

0 answers

166 views

GPU management using Tensorflow and Ray clients (OOM)

I am using Ray to simulate several clients learning with Tensorflow (for a federated learning task with Flower). Ray allows the GPU to be shared between the clients. The GPU is therefore divided into ...

Milodupipo

61

asked Apr 3 at 7:56

0 votes

1 answer

266 views

Ray tune checkpoints for training a YOLOv8 network

I've been trying to train my own fine-tuned network with the YOLOv8 architecture, and I also want to optimize hyperparameters and find the best parameters for data augmentation. Now, I'm seeking to ...

Jan

50

asked Apr 2 at 20:22

Collectives™ on Stack Overflow

Related Tags