Newest 'reward' Questions

0 votes

0 answers

15 views

Cartpole gym spawn point

How can I change the initial spawn point on of the cartpole while resetting the environment? I have to use a custom reward in testing reward is like: def new_reward(state, x0): s = state[0] ...

Giuseppe Denina Rivera

11

asked Nov 29 at 7:16

0 votes

0 answers

82 views

Hide and Seek game with Unity ML agent. Advice on reward system for going towards hider

I am looking for some insights on my rewards for seeker agent. I want him to be able to find hider in maze-like environment. I am using Raycasts (Ray Perception Sensor) as observation of environment. ...

Milos Stanojevic

11

asked Sep 14 at 15:04

0 votes

0 answers

16 views

Reward Function Design for RL Agent Switching Between Algorithms Based on State and Resource Use

I'm developing an application using Reinforcement Learning (RL) where my agent can choose between three different algorithms (actions) to determine its set of motions for achieving a task. These ...

aya_kh

49

asked Jul 2 at 16:00

0 votes

0 answers

15 views

DRL EV charging and discharging

I am using the TD3 algorithm for scheduling the charging and discharging of electric vehicles (EVs), considering time-of-use electricity pricing. I have set up three reward values: r_anx, r_dep, and ...

Wen Smith

1

asked May 22 at 8:03

0 votes

0 answers

28 views

Pine Script V5, how to I set SL/TP pre inputing a range?

I tried to use the word range, but it didn´t work, what I want is to position the SL/TP orders together with the order based on a pre set range of $ or %. Sample: order long price $1000, pre set range ...

Ricardo Marinho

1

asked May 20 at 10:46

0 votes

0 answers

9 views

Multi-information DQN

Hypothetically how would one pass two sets of information to a DQN on the same map. Imagine you want to pass information for DQN that is conducted search and rescue in a space and needs to both know ...

Edwin Meriaux

1

asked May 11 at 13:52

1 vote

0 answers

86 views

What kind of reward should I set in q-learning to get values closer to the result I expect?

I'm working on a Q-learning project using OpenAI Gym and PyBullet drones. My goal is to control the height of the drone so that it stays at a height of 1 and remains stable at that point. I'm using ...

gulb

21

asked May 1 at 7:08

0 votes

0 answers

149 views

How to integrate Rewardful in React JS site and stripe?

I start a new project which require Rewardful Integration. I read Reawarful documentation and create same api but request not process successfully. I read Reawarful documentation and create same api ...

Hafiz Faisal Ali

1

asked Apr 14 at 4:32

0 votes

0 answers

20 views

Charging scheduling of electric vehicle by TD3 algorithm(RL)

I set three reward values, namely r_anx, r_dep and r_price, in which r_dep<=0, and obtained corresponding reward values by setting different weight coefficients. However, two situations would occur....

Wen Smith

1

asked Apr 3 at 8:08

1 vote

1 answer

320 views

How to make sense of the output of the reward model, how do we know what string it is preferring?

In the process of doing RLHF I made a reward model using a dataset of chosen and rejected string pairs. It is very similar to the example that's there in the official TRL library - Reward Modeling I ...

jar

2,888

asked Feb 26 at 13:32

0 votes

0 answers

94 views

How to save this DDPG model after the reward is saturated?

I have built a DDPG model for my research paper. The training is going fine but I want to save the policy at the end so I don't have to train again. I have used [This DDPG link as my reference](https:/...

Sukhamjot Singh

35

asked Dec 21, 2023 at 7:13

2 votes

1 answer

2k views

Why is the mean reward per episode of my PPO and DQN decreasing over time?

I am training an RL agent to optimise dispatching in a job shop manufacturing system. My approach is based on this code: https://github.com/AndreasKuhnle/SimRLFab. It migrates the environment to a ...

WizardOfCLZ

21

asked Mar 11, 2023 at 9:14

0 votes

1 answer

184 views

nan reward after hyperparameters optimization (ray, gym)

I launched a hyperopt algorithm on a custom gym environment. this is my code : config = { "env": "affecta", "sgd_minibatch_size": 1000, ...

Clm28

1

asked Jan 24, 2023 at 17:27

0 votes

1 answer

61 views

How to Record Variables in Pytorch Without Breaking Gradient Computation?

I am trying to implement some policy gradient training, similar to this. However, I would like to manipulate the rewards(like discounted future sum and other differentiable operations) before doing ...

Gabriella Chaos

33

asked Jan 17, 2023 at 14:52

1 vote

1 answer

91 views

After the ethereum merge, how can I know the reward address..?

Before the Ethereum merge. The miner received the fee or reward, and the miner was known by looking at the json rpc function "eth_getBlockByNumber". Now, I know that people who participated ...

Jmob

89

asked Sep 16, 2022 at 7:56

-1 votes

2 answers

320 views

RL reward function with unknown range

For the sake of the argument, let's say that I am trying to minimize a number of mathematical functions using Reinforcement Learning, where the minimum can essentially lie anywhere between -inf and +...

Daniel von Eschwege

521

asked Jul 15, 2022 at 19:47

1 vote

0 answers

754 views

Get callback when ADMOB reward ad is closed without seeing whole ad in ios swift

I am using reward admob ad in my project with latest sdk. How can i get proper callback that the user has closed the ad in between. I know there is a delegate method of fullscreencontentdelegate which ...

nishant narola

13

asked Jul 12, 2022 at 17:04

1 vote

2 answers

425 views

Reinforcement learning does nothing when using test forex data

I am experimenting with RL and I am trying to write an AI so it can learn to trade the Forex market. Here is my code below: from gym import Env from gym.spaces import Discrete, Box import numpy as np ...

Joshua Attridge

25

asked Apr 6, 2022 at 10:35

0 votes

0 answers

155 views

Reward Function for automated parking autonomous Robots

I'm implementing a reinforcement learning task, to solve a parking task for autonomous robots. So basically, the idea of the task is to start at a certain Point in front of the parking spot and drive ...

emilio ribadeneira

1

asked Feb 15, 2022 at 7:13

1 vote

1 answer

354 views

Can contextual bandit rewards be changed over time?

I am working on implementing a contextual bandit with Vowpal Wabbit for dynamic pricing where arms represent price margins. The cost/reward is determined by taking price – expected cost. Cost is not ...

aab

11

asked Dec 29, 2021 at 16:40

0 votes

1 answer

62 views

can we get 'good' values of predefined constants in a cost function using reinforcement learning?

I am new to reinforcement learning and I know the basic theory behind it. However, I could not map the problem to the existing frameworks. The problem is as follows: Given an environment with ...

Samaresh Bera

19

asked Aug 19, 2021 at 12:26

1 vote

2 answers

944 views

How to prevent my reward sum received during evaluation runs repeating in intervals when using RLlib?

I am using Ray 1.3.0 (for RLlib) with a combination of SUMO version 1.9.2 for the simulation of a multi-agent scenario. I have configured RLlib to use a single PPO network that is commonly updated/...

hridayns

697

asked Jun 21, 2021 at 15:08

1 vote

0 answers

299 views

Understanding the reward functionality in Reinforcment learning (atari breakout)

I'm trying to understand the reward functionality in Breakout atari implemented by Deepmind. I'm a little confused about the reward. They represent every state using four frames and depending on that ...

jon

11

asked Mar 4, 2021 at 14:34

0 votes

1 answer

485 views

Reward of Pong game - (OpenAI gym)

I know that the Pong Game initializes to new game when one side scores 20 points. By the way, the reward shows that it goes down below -20. Why is that so? One thing to expect is that after one side ...

ssw101

335

asked Feb 25, 2021 at 6:08

-1 votes

1 answer

85 views

question about reward in reinforcement learning (RL)

I have a question about reward in RL. is this sentence true? and if it is why? thank you in advance "the reward each time (for the same action from the same state) needs not to be the same."

Pouyan

51

asked Feb 22, 2021 at 15:26

1 vote

0 answers

445 views

Is the reward related to previous state or next state?

In the reinforcement learning framework, I am a little bit confused about the reward and how it is related to states. For example, in Q-learning, we have the following formula for updating the Q table:...

MadMage

186

asked Jan 3, 2021 at 16:46

3 votes

1 answer

1k views

Discount reward in REINFORCE deep reinforcement learning algorithm

I'm implementing a REINFORCE with baseline algorithm, but I have a doubt with the discount reward function. I implemented the discount reward function like this: def disc_r(rewards): r = np....

LRD

361

asked Dec 10, 2020 at 11:12

0 votes

2 answers

955 views

Why isn't my code increasing the score on reward video ad watched?

When I click a button to load a reward video ad the score should double. Instead the reward ad plays and when I close after watching it, nothing happens. I am using public static variable points to ...

Azhar Waheed

1

asked Jun 15, 2020 at 7:48

1 vote

0 answers

47 views

What is the best way to deal with imbalanced sample database with rewards

I look for a solution to train a DNNClassifier (4 classes, 20 numeric features) from imbalanced rewarded samples datafile. Each class represents a game action and reward the action score. Features ...

GerardL

83

asked Jan 23, 2020 at 16:34

1 vote

1 answer

137 views

How to train a bad reward with a classifying Neural Net?

I am trying to train a Neural Net on playing Tic Tac Toe via Reinforcement Learning with Keras, Python. Currently the Net gets an Input of the current board: array([0,1,0,-1,0,1,0,0,0]) 1 = X -...

nailuj05

37

asked Jan 4, 2020 at 15:13

1 vote

0 answers

310 views

Why are my rewards converging but still have a lot of variations

I am training a reinforcement learning agent on an episodic task of fixed episode length. I am tracking the training process by plotting the cumulative rewards over an episode. I am using tensorboard ...

chink

1,623

asked Nov 29, 2019 at 10:30

0 votes

1 answer

71 views

Formulation of a reward structure

I am new to reinforcement learning and experimenting with training of RL agents. I have a doubt about reward formulation, from a given state if a agent takes a good action i give a positive reward, ...

chink

1,623

asked Nov 26, 2019 at 10:28

0 votes

1 answer

154 views

Why is my reward function returning None in Python?

OK, so, I'm trying to make a intrinsic-curiosity agent using keras and tensorflow. This agent's reward function is the difference of an autoencoder's loss between the previous and current state, and ...

ZeroMaxinumXZ

387

asked Sep 17, 2019 at 14:04

0 votes

1 answer

146 views

Reward distribution Reinforcement Learning

Problem1: We want to go from s to e. In each cell we can move right R or down D. The environment is fully known. The table has (4*5) 20 cells. The challenge is that we do not know what the reward of ...

Mohammad Abdollahi

1

asked Sep 16, 2019 at 7:03

2 votes

2 answers

2k views

How create multiple reward video's in Unity application?

the last few days I am trying to implement reward video's (admob) in my Unity app. I want to have multiple rewards video's people can watch, with different types of rewards. I feel like I am close (...

DaanNetherlands

43

asked Sep 14, 2019 at 11:11

1 vote

1 answer

2k views

Custom environment Gym for step function processing with DDPG Agent

I'm new to reinforcement learning, and I would like to process audio signal using this technique. I built a basic step function that I wish to flatten to get my hands on Gym OpenAI and reinforcement ...

Post. T.

89

asked Jul 8, 2019 at 8:32

1 vote

1 answer

2k views

Discounted rewards in basic reinforcement learning

I'm wondering how discounting rewards for reinforcement learning actually works. I believe the idea is that rewards later in an episode get weighted heavier than early rewards. That makes perfect ...

Perks

11

asked Apr 21, 2019 at 1:12

0 votes

1 answer

485 views

MDP implementation using python - dimensions

I have problem in implementing mdp (markov decision process) by python. I have these matrices: states: (1 x n) and actions: (1 x m) .Transition matrix is calculated by this code: p = np.zeros((n,n))...

Nasrin

1

asked Dec 31, 2018 at 20:21

2 votes

1 answer

453 views

Convergence of the Q-learning on the inverted pendulum

Hello I'm working on a total control of the cartpole problem (inverted pendulum). My aim is for the system to reach stability meaning all the states(x, xdot,theta and theta) should converge to zero. I ...

Stevy KUIMI

57

asked Nov 5, 2018 at 16:29

0 votes

0 answers

68 views

Admob- is it possible to put a link to another App on the market and earn money upon download?

The question is clear I assume though couldn't find a relevant answer on web. For example if my app put a link to another app's PlayStore download page and if user download it can I get money on that? ...

mears

491

asked Sep 11, 2018 at 17:05

0 votes

2 answers

3k views

Rewarded video No fill from ad server.Failed to load? Android

I am trying to implement admob ads in fragment but its been a month and i am still getting error 3 ( No ads to fill ). i have tried with new Id but still getting same error, test ads are working fine....

Samir Shaikh

56

asked Aug 3, 2018 at 11:41

0 votes

0 answers

2k views

Admob Rewarded Video - Ad Not Loading

I have to implement Admob Reward Video in the android studio in my current project. I have tried everything like.. Youtube tutorial, Admob Official tutorials and scripts but nothing is working for me. ...

Rakib Shoahil

19

asked Aug 1, 2018 at 16:45

0 votes

1 answer

255 views

Reward Function in MIT Deep Traffic Challenge?

I have been playing around with the MIT DeepTraffic Challenge Also watching the lecture and reading the slides After getting a General understanding of the architecture I was wondering what exactly ...

mrk

10.3k

asked Jun 22, 2018 at 13:45

1 vote

1 answer

1k views

Keras Reinforcement Learning: How to pass reward to the model

import numpy as np import gym from gym import wrappers # 追加 from keras.models import Sequential from keras.layers import Dense, Activation, Flatten from keras.optimizers import Adam from rl.agents....

leppy

49

asked Jun 12, 2018 at 5:22

1 vote

0 answers

85 views

Canvas problems. Not able to reproduce design

I need to build canvas animation like design requires. I spend almost 3 days but I'm not able to do anything like in design. Here a REQUESTED design!. And here - what I've got for now: current ...

MR.QUESTION

359

asked Jan 25, 2018 at 18:09

3 votes

0 answers

2k views

Reward videos number of plays limit

I am developing one android game. I am planning to integrate a reward videos in my app. I am planing to go for AdMob mediation. Below is my question. User can watch a reward video to save his/her ...

Sam

3,044

asked Jan 25, 2018 at 15:05

1 vote

1 answer

689 views

WebView remote site and reward videos

I have a simple game developed in PHP. I have loaded the remote site in Android WebView. I want to find out that if user clicks on a FREE life button which is on my remote PHP site, I want to start a ...

Sam

3,044

asked Jan 23, 2018 at 14:17

-4 votes

1 answer

192 views

How to integrate SDK in android studio

I own a reward app and the dev of the app has abandon the project. So im alone in this and i dont know how to integrate 2 advertising networks into my app...They share the SDK for monetization and i ...

Gmstate

1

asked Jan 9, 2018 at 21:52

1 vote

0 answers

68 views

Rewarded Videos - Time Left Counter

I would like to ask a question about rewarded videos in Android. I have set the rewarded videos to show once per hour, which means every user can watch one video per one hour. My question is, what is ...

Filjan Kishija

11

asked Nov 10, 2017 at 18:04

2 votes

2 answers

808 views

Unity RewardAd function call more time not just once

I want a RewardAd in my game. When you watch video you get +10 score to your current score not your high-score. You have a 45 high-score and you are now at 37, so you watch video for +10 score and ...

antal1208

85

asked Oct 25, 2017 at 18:29

Collectives™ on Stack Overflow

Related Tags