0 votes
0 answers

SB3 for imitation learning. How to force demonstration action at given state?

I am trying to train a RL agent using SB3 (PPO algorithm), Gymnasium, and PyTorch. As the dynamics of the environment is quite complex, I have a dataset of about 200 trajectories that I can use as ...
Claudio's user avatar
0 votes
0 answers

Julia with SB3 for RL in WSL brings to segmentation fault problems

I am trying to train a RL agent using SB3, torch, and Gymnasium. I use a linux environment through wsl2 (Ubuntu-22.04), and VisualStudio Code. To speed up the step phase in my environment, I invoke ...
Claudio's user avatar
1 vote
0 answers

what input should I use to predict rl model? will it be scaled or inv scaled?

I am using sb3 DQN to train stock data where my obs is last 120 candle with 7 feature i.e open high low close hour min rsi etc... . so obs shape would be (120,7) output would be discrete with 3 int 0, ...
manan5439's user avatar
  • 958
0 votes
0 answers

PPO stable baselines 3

I am using custom environment, custom model for the environment. The goal is to train this custom model using reinforcement learning. I have defined my action space like this self.action_space = gym....
Adeetya's user avatar
0 votes
0 answers

Agumented Random Search from stable baselines contrib stops trainging after 2,464M steps

ARS always stops after 2,464M num of steps, despite exponential reward grow if __name__ == "__main__": env = CustomEnv() #check_env(env) # Simplified architecture ...
Xardas's user avatar
  • 1
0 votes
0 answers

Replay buffer in StableBaselines3 for a Gymnasium environment

I'm creating a customized replay buffer class based on ReplayBuffer from stable_baselines3.common.buffers, using a gymnasium environment instead of the gym environment. The return value of the env....
Siqi Wang's user avatar
2 votes
1 answer

Training a Custom Feature Extractor in Stable Baselines3 Starting from Pre-trained Weights?

I am using the following custom feature extractor for my StableBaselines3 model: import torch.nn as nn from stable_baselines3 import PPO class Encoder(nn.Module): def __init__(self, input_dim, ...
Sayyor Y's user avatar
  • 1,286
0 votes
0 answers

RL Model training

I trained a PPO algorithm using stablebaselines3, but when loading the model this happens NotImplementedError: <class 'stable_baselines3.common.policies.ActorCriticCnnPolicy'> observation space ...
TBG6819's user avatar
0 votes
1 answer

requested array would exceed the maximum number of dimension of 1 issue in gym

let us suppose we have folloing code : import gym from stable_baselines3 import PPO env = gym.make("CartPole-v1", render_mode="human") model = PPO("MlpPolicy", env, ...
AI ML's user avatar
  • 189
0 votes
0 answers

Stable-baselines3 how to impose policy action_space different than environment action_space

Normally, with eg. sac policy, you would have observations -> sac -> actions -> environment. But because i want to have observations -> sac -> extra_block -> actions -> ...
meerkatUI's user avatar
0 votes
0 answers

How can I represent multiple inputs in observation space

I am getting this error: "AssertionError: Unsupported structured space '<class 'gym.spaces.dict.Dict'>'" and I am not able to figure out what this error is? This is my code self....
SANRAJ LACHHIRAMKA 2022 BatchP's user avatar
1 vote
1 answer

Baseline3 TD3, reset() method too many values to unpack error

The env is python 3.10, stable-baseline3 2.3.0 and I'm trying TD3 Algorithm. I'm keep getting same error for whatever I do. As far as I know, the reset method has return as same as observation space ...
GatesPlan's user avatar
  • 497
0 votes
0 answers

Get Q values in Stable-baseline3 callback

Is there a way to access the q values/mean- q value in a DQN using Stable baseline3? This doesnt work and I cant seem to find a way written in the docs or a way I can implement this given im new to ...
Mofasa E's user avatar
0 votes
0 answers

Multiprocess environement with stablebaseline3 SubprocVecEnv

I have a working (complex) Gymnasium environment that needs two processes to work properly, and I want to train an agent to accomplish some task in this environment. To train the agent, I would like ...
Ben's user avatar
  • 7,598