Skip to main content
Filter by
Sorted by
Tagged with
0 votes
0 answers
15 views

Cartpole gym spawn point

How can I change the initial spawn point on of the cartpole while resetting the environment? I have to use a custom reward in testing reward is like: def new_reward(state, x0): s = state[0] ...
Giuseppe Denina Rivera's user avatar
0 votes
0 answers
82 views

Hide and Seek game with Unity ML agent. Advice on reward system for going towards hider

I am looking for some insights on my rewards for seeker agent. I want him to be able to find hider in maze-like environment. I am using Raycasts (Ray Perception Sensor) as observation of environment. ...
Milos Stanojevic's user avatar
0 votes
0 answers
16 views

Reward Function Design for RL Agent Switching Between Algorithms Based on State and Resource Use

I'm developing an application using Reinforcement Learning (RL) where my agent can choose between three different algorithms (actions) to determine its set of motions for achieving a task. These ...
aya_kh's user avatar
  • 49
0 votes
0 answers
15 views

DRL EV charging and discharging

I am using the TD3 algorithm for scheduling the charging and discharging of electric vehicles (EVs), considering time-of-use electricity pricing. I have set up three reward values: r_anx, r_dep, and ...
Wen Smith's user avatar
0 votes
0 answers
28 views

Pine Script V5, how to I set SL/TP pre inputing a range?

I tried to use the word range, but it didn´t work, what I want is to position the SL/TP orders together with the order based on a pre set range of $ or %. Sample: order long price $1000, pre set range ...
Ricardo Marinho's user avatar
0 votes
0 answers
9 views

Multi-information DQN

Hypothetically how would one pass two sets of information to a DQN on the same map. Imagine you want to pass information for DQN that is conducted search and rescue in a space and needs to both know ...
Edwin Meriaux's user avatar
1 vote
0 answers
86 views

What kind of reward should I set in q-learning to get values closer to the result I expect?

I'm working on a Q-learning project using OpenAI Gym and PyBullet drones. My goal is to control the height of the drone so that it stays at a height of 1 and remains stable at that point. I'm using ...
gulb's user avatar
  • 21
0 votes
0 answers
149 views

How to integrate Rewardful in React JS site and stripe?

I start a new project which require Rewardful Integration. I read Reawarful documentation and create same api but request not process successfully. I read Reawarful documentation and create same api ...
Hafiz Faisal Ali's user avatar
0 votes
0 answers
20 views

Charging scheduling of electric vehicle by TD3 algorithm(RL)

I set three reward values, namely r_anx, r_dep and r_price, in which r_dep<=0, and obtained corresponding reward values by setting different weight coefficients. However, two situations would occur....
Wen Smith's user avatar
1 vote
1 answer
320 views

How to make sense of the output of the reward model, how do we know what string it is preferring?

In the process of doing RLHF I made a reward model using a dataset of chosen and rejected string pairs. It is very similar to the example that's there in the official TRL library - Reward Modeling I ...
jar's user avatar
  • 2,888
0 votes
0 answers
94 views

How to save this DDPG model after the reward is saturated?

I have built a DDPG model for my research paper. The training is going fine but I want to save the policy at the end so I don't have to train again. I have used [This DDPG link as my reference](https:/...
Sukhamjot Singh's user avatar
2 votes
1 answer
2k views

Why is the mean reward per episode of my PPO and DQN decreasing over time?

I am training an RL agent to optimise dispatching in a job shop manufacturing system. My approach is based on this code: https://github.com/AndreasKuhnle/SimRLFab. It migrates the environment to a ...
WizardOfCLZ's user avatar
0 votes
1 answer
184 views

nan reward after hyperparameters optimization (ray, gym)

I launched a hyperopt algorithm on a custom gym environment. this is my code : config = { "env": "affecta", "sgd_minibatch_size": 1000, ...
Clm28's user avatar
  • 1
0 votes
1 answer
61 views

How to Record Variables in Pytorch Without Breaking Gradient Computation?

I am trying to implement some policy gradient training, similar to this. However, I would like to manipulate the rewards(like discounted future sum and other differentiable operations) before doing ...
Gabriella Chaos's user avatar
1 vote
1 answer
91 views

After the ethereum merge, how can I know the reward address..?

Before the Ethereum merge. The miner received the fee or reward, and the miner was known by looking at the json rpc function "eth_getBlockByNumber". Now, I know that people who participated ...
Jmob's user avatar
  • 89
-1 votes
2 answers
320 views

RL reward function with unknown range

For the sake of the argument, let's say that I am trying to minimize a number of mathematical functions using Reinforcement Learning, where the minimum can essentially lie anywhere between -inf and +...
Daniel von Eschwege's user avatar
1 vote
0 answers
754 views

Get callback when ADMOB reward ad is closed without seeing whole ad in ios swift

I am using reward admob ad in my project with latest sdk. How can i get proper callback that the user has closed the ad in between. I know there is a delegate method of fullscreencontentdelegate which ...
nishant narola's user avatar
1 vote
2 answers
425 views

Reinforcement learning does nothing when using test forex data

I am experimenting with RL and I am trying to write an AI so it can learn to trade the Forex market. Here is my code below: from gym import Env from gym.spaces import Discrete, Box import numpy as np ...
Joshua Attridge's user avatar
0 votes
0 answers
155 views

Reward Function for automated parking autonomous Robots

I'm implementing a reinforcement learning task, to solve a parking task for autonomous robots. So basically, the idea of the task is to start at a certain Point in front of the parking spot and drive ...
emilio ribadeneira's user avatar
1 vote
1 answer
354 views

Can contextual bandit rewards be changed over time?

I am working on implementing a contextual bandit with Vowpal Wabbit for dynamic pricing where arms represent price margins. The cost/reward is determined by taking price – expected cost. Cost is not ...
aab's user avatar
  • 11
0 votes
1 answer
62 views

can we get 'good' values of predefined constants in a cost function using reinforcement learning?

I am new to reinforcement learning and I know the basic theory behind it. However, I could not map the problem to the existing frameworks. The problem is as follows: Given an environment with ...
Samaresh Bera's user avatar
1 vote
2 answers
944 views

How to prevent my reward sum received during evaluation runs repeating in intervals when using RLlib?

I am using Ray 1.3.0 (for RLlib) with a combination of SUMO version 1.9.2 for the simulation of a multi-agent scenario. I have configured RLlib to use a single PPO network that is commonly updated/...
hridayns's user avatar
  • 697
1 vote
0 answers
299 views

Understanding the reward functionality in Reinforcment learning (atari breakout)

I'm trying to understand the reward functionality in Breakout atari implemented by Deepmind. I'm a little confused about the reward. They represent every state using four frames and depending on that ...
jon's user avatar
  • 11
0 votes
1 answer
485 views

Reward of Pong game - (OpenAI gym)

I know that the Pong Game initializes to new game when one side scores 20 points. By the way, the reward shows that it goes down below -20. Why is that so? One thing to expect is that after one side ...
ssw101's user avatar
  • 335
-1 votes
1 answer
85 views

question about reward in reinforcement learning (RL)

I have a question about reward in RL. is this sentence true? and if it is why? thank you in advance "the reward each time (for the same action from the same state) needs not to be the same."
Pouyan's user avatar
  • 51
1 vote
0 answers
445 views

Is the reward related to previous state or next state?

In the reinforcement learning framework, I am a little bit confused about the reward and how it is related to states. For example, in Q-learning, we have the following formula for updating the Q table:...
MadMage's user avatar
  • 186
3 votes
1 answer
1k views

Discount reward in REINFORCE deep reinforcement learning algorithm

I'm implementing a REINFORCE with baseline algorithm, but I have a doubt with the discount reward function. I implemented the discount reward function like this: def disc_r(rewards): r = np....
LRD's user avatar
  • 361
0 votes
2 answers
955 views

Why isn't my code increasing the score on reward video ad watched?

When I click a button to load a reward video ad the score should double. Instead the reward ad plays and when I close after watching it, nothing happens. I am using public static variable points to ...
Azhar Waheed's user avatar
1 vote
0 answers
47 views

What is the best way to deal with imbalanced sample database with rewards

I look for a solution to train a DNNClassifier (4 classes, 20 numeric features) from imbalanced rewarded samples datafile. Each class represents a game action and reward the action score. Features ...
GerardL's user avatar
  • 83
1 vote
1 answer
137 views

How to train a bad reward with a classifying Neural Net?

I am trying to train a Neural Net on playing Tic Tac Toe via Reinforcement Learning with Keras, Python. Currently the Net gets an Input of the current board: array([0,1,0,-1,0,1,0,0,0]) 1 = X -...
nailuj05's user avatar
1 vote
0 answers
310 views

Why are my rewards converging but still have a lot of variations

I am training a reinforcement learning agent on an episodic task of fixed episode length. I am tracking the training process by plotting the cumulative rewards over an episode. I am using tensorboard ...
chink's user avatar
  • 1,623
0 votes
1 answer
71 views

Formulation of a reward structure

I am new to reinforcement learning and experimenting with training of RL agents. I have a doubt about reward formulation, from a given state if a agent takes a good action i give a positive reward, ...
chink's user avatar
  • 1,623
0 votes
1 answer
154 views

Why is my reward function returning None in Python?

OK, so, I'm trying to make a intrinsic-curiosity agent using keras and tensorflow. This agent's reward function is the difference of an autoencoder's loss between the previous and current state, and ...
ZeroMaxinumXZ's user avatar
0 votes
1 answer
146 views

Reward distribution Reinforcement Learning

Problem1: We want to go from s to e. In each cell we can move right R or down D. The environment is fully known. The table has (4*5) 20 cells. The challenge is that we do not know what the reward of ...
Mohammad Abdollahi's user avatar
2 votes
2 answers
2k views

How create multiple reward video's in Unity application?

the last few days I am trying to implement reward video's (admob) in my Unity app. I want to have multiple rewards video's people can watch, with different types of rewards. I feel like I am close (...
DaanNetherlands's user avatar
1 vote
1 answer
2k views

Custom environment Gym for step function processing with DDPG Agent

I'm new to reinforcement learning, and I would like to process audio signal using this technique. I built a basic step function that I wish to flatten to get my hands on Gym OpenAI and reinforcement ...
Post. T.'s user avatar
1 vote
1 answer
2k views

Discounted rewards in basic reinforcement learning

I'm wondering how discounting rewards for reinforcement learning actually works. I believe the idea is that rewards later in an episode get weighted heavier than early rewards. That makes perfect ...
Perks's user avatar
  • 11
0 votes
1 answer
485 views

MDP implementation using python - dimensions

I have problem in implementing mdp (markov decision process) by python. I have these matrices: states: (1 x n) and actions: (1 x m) .Transition matrix is calculated by this code: p = np.zeros((n,n))...
Nasrin 's user avatar
2 votes
1 answer
453 views

Convergence of the Q-learning on the inverted pendulum

Hello I'm working on a total control of the cartpole problem (inverted pendulum). My aim is for the system to reach stability meaning all the states(x, xdot,theta and theta) should converge to zero. I ...
Stevy KUIMI's user avatar
0 votes
0 answers
68 views

Admob- is it possible to put a link to another App on the market and earn money upon download?

The question is clear I assume though couldn't find a relevant answer on web. For example if my app put a link to another app's PlayStore download page and if user download it can I get money on that? ...
mears's user avatar
  • 491
0 votes
2 answers
3k views

Rewarded video No fill from ad server.Failed to load? Android

I am trying to implement admob ads in fragment but its been a month and i am still getting error 3 ( No ads to fill ). i have tried with new Id but still getting same error, test ads are working fine....
Samir Shaikh's user avatar
0 votes
0 answers
2k views

Admob Rewarded Video - Ad Not Loading

I have to implement Admob Reward Video in the android studio in my current project. I have tried everything like.. Youtube tutorial, Admob Official tutorials and scripts but nothing is working for me. ...
Rakib Shoahil 's user avatar
0 votes
1 answer
255 views

Reward Function in MIT Deep Traffic Challenge?

I have been playing around with the MIT DeepTraffic Challenge Also watching the lecture and reading the slides After getting a General understanding of the architecture I was wondering what exactly ...
mrk's user avatar
  • 10.3k
1 vote
1 answer
1k views

Keras Reinforcement Learning: How to pass reward to the model

import numpy as np import gym from gym import wrappers # 追加 from keras.models import Sequential from keras.layers import Dense, Activation, Flatten from keras.optimizers import Adam from rl.agents....
leppy's user avatar
  • 49
1 vote
0 answers
85 views

Canvas problems. Not able to reproduce design

I need to build canvas animation like design requires. I spend almost 3 days but I'm not able to do anything like in design. Here a REQUESTED design!. And here - what I've got for now: current ...
MR.QUESTION's user avatar
3 votes
0 answers
2k views

Reward videos number of plays limit

I am developing one android game. I am planning to integrate a reward videos in my app. I am planing to go for AdMob mediation. Below is my question. User can watch a reward video to save his/her ...
Sam's user avatar
  • 3,044
1 vote
1 answer
689 views

WebView remote site and reward videos

I have a simple game developed in PHP. I have loaded the remote site in Android WebView. I want to find out that if user clicks on a FREE life button which is on my remote PHP site, I want to start a ...
Sam's user avatar
  • 3,044
-4 votes
1 answer
192 views

How to integrate SDK in android studio

I own a reward app and the dev of the app has abandon the project. So im alone in this and i dont know how to integrate 2 advertising networks into my app...They share the SDK for monetization and i ...
Gmstate's user avatar
1 vote
0 answers
68 views

Rewarded Videos - Time Left Counter

I would like to ask a question about rewarded videos in Android. I have set the rewarded videos to show once per hour, which means every user can watch one video per one hour. My question is, what is ...
Filjan Kishija's user avatar
2 votes
2 answers
808 views

Unity RewardAd function call more time not just once

I want a RewardAd in my game. When you watch video you get +10 score to your current score not your high-score. You have a 45 high-score and you are now at 37, so you watch video for +10 score and ...
antal1208's user avatar