72 questions
0
votes
0
answers
15
views
Cartpole gym spawn point
How can I change the initial spawn point on of the cartpole while resetting the environment? I have to use a custom reward in testing reward is like:
def new_reward(state, x0):
s = state[0]
...
0
votes
0
answers
82
views
Hide and Seek game with Unity ML agent. Advice on reward system for going towards hider
I am looking for some insights on my rewards for seeker agent. I want him to be able to find hider in maze-like environment. I am using Raycasts (Ray Perception Sensor) as observation of environment. ...
0
votes
0
answers
16
views
Reward Function Design for RL Agent Switching Between Algorithms Based on State and Resource Use
I'm developing an application using Reinforcement Learning (RL) where my agent can choose between three different algorithms (actions) to determine its set of motions for achieving a task. These ...
0
votes
0
answers
15
views
DRL EV charging and discharging
I am using the TD3 algorithm for scheduling the charging and discharging of electric vehicles (EVs), considering time-of-use electricity pricing. I have set up three reward values: r_anx, r_dep, and ...
0
votes
0
answers
28
views
Pine Script V5, how to I set SL/TP pre inputing a range?
I tried to use the word range, but it didn´t work, what I want is to position the SL/TP orders together with the order based on a pre set range of $ or %.
Sample: order long price $1000, pre set range ...
0
votes
0
answers
9
views
Multi-information DQN
Hypothetically how would one pass two sets of information to a DQN on the same map.
Imagine you want to pass information for DQN that is conducted search and rescue in a space and needs to both know ...
1
vote
0
answers
86
views
What kind of reward should I set in q-learning to get values closer to the result I expect?
I'm working on a Q-learning project using OpenAI Gym and PyBullet drones. My goal is to control the height of the drone so that it stays at a height of 1 and remains stable at that point. I'm using ...
0
votes
0
answers
149
views
How to integrate Rewardful in React JS site and stripe?
I start a new project which require Rewardful Integration. I read Reawarful documentation and create same api but request not process successfully.
I read Reawarful documentation and create same api ...
0
votes
0
answers
20
views
Charging scheduling of electric vehicle by TD3 algorithm(RL)
I set three reward values, namely r_anx, r_dep and r_price, in which r_dep<=0, and obtained corresponding reward values by setting different weight coefficients. However, two situations would occur....
1
vote
1
answer
320
views
How to make sense of the output of the reward model, how do we know what string it is preferring?
In the process of doing RLHF I made a reward model using a dataset of chosen and rejected string pairs. It is very similar to the example that's there in the official TRL library - Reward Modeling
I ...
0
votes
0
answers
94
views
How to save this DDPG model after the reward is saturated?
I have built a DDPG model for my research paper. The training is going fine but I want to save the policy at the end so I don't have to train again.
I have used [This DDPG link as my reference](https:/...
2
votes
1
answer
2k
views
Why is the mean reward per episode of my PPO and DQN decreasing over time?
I am training an RL agent to optimise dispatching in a job shop manufacturing system. My approach is based on this code: https://github.com/AndreasKuhnle/SimRLFab. It migrates the environment to a ...
0
votes
1
answer
184
views
nan reward after hyperparameters optimization (ray, gym)
I launched a hyperopt algorithm on a custom gym environment.
this is my code :
config = {
"env": "affecta",
"sgd_minibatch_size": 1000,
...
0
votes
1
answer
61
views
How to Record Variables in Pytorch Without Breaking Gradient Computation?
I am trying to implement some policy gradient training, similar to this. However, I would like to manipulate the rewards(like discounted future sum and other differentiable operations) before doing ...
1
vote
1
answer
91
views
After the ethereum merge, how can I know the reward address..?
Before the Ethereum merge. The miner received the fee or reward, and the miner was known by looking at the json rpc function "eth_getBlockByNumber".
Now, I know that people who participated ...
-1
votes
2
answers
320
views
RL reward function with unknown range
For the sake of the argument, let's say that I am trying to minimize a number of mathematical functions using Reinforcement Learning, where the minimum can essentially lie anywhere between -inf and +...
1
vote
0
answers
754
views
Get callback when ADMOB reward ad is closed without seeing whole ad in ios swift
I am using reward admob ad in my project with latest sdk. How can i get proper callback that the user has closed the ad in between. I know there is a delegate method of fullscreencontentdelegate which ...
1
vote
2
answers
425
views
Reinforcement learning does nothing when using test forex data
I am experimenting with RL and I am trying to write an AI so it can learn to trade the Forex market. Here is my code below:
from gym import Env
from gym.spaces import Discrete, Box
import numpy as np
...
0
votes
0
answers
155
views
Reward Function for automated parking autonomous Robots
I'm implementing a reinforcement learning task, to solve a parking task for autonomous robots. So basically, the idea of the task is to start at a certain Point in front of the parking spot and drive ...
1
vote
1
answer
354
views
Can contextual bandit rewards be changed over time?
I am working on implementing a contextual bandit with Vowpal Wabbit for dynamic pricing where arms represent price margins. The cost/reward is determined by taking price – expected cost. Cost is not ...
0
votes
1
answer
62
views
can we get 'good' values of predefined constants in a cost function using reinforcement learning?
I am new to reinforcement learning and I know the basic theory behind it. However, I could not map the problem to the existing frameworks. The problem is as follows:
Given an environment with ...
1
vote
2
answers
944
views
How to prevent my reward sum received during evaluation runs repeating in intervals when using RLlib?
I am using Ray 1.3.0 (for RLlib) with a combination of SUMO version 1.9.2 for the simulation of a multi-agent scenario. I have configured RLlib to use a single PPO network that is commonly updated/...
1
vote
0
answers
299
views
Understanding the reward functionality in Reinforcment learning (atari breakout)
I'm trying to understand the reward functionality in Breakout atari implemented by Deepmind. I'm a little confused about the reward. They represent every state using four frames and depending on that ...
0
votes
1
answer
485
views
Reward of Pong game - (OpenAI gym)
I know that the Pong Game initializes to new game when one side scores 20 points.
By the way, the reward shows that it goes down below -20.
Why is that so?
One thing to expect is that after one side ...
-1
votes
1
answer
85
views
question about reward in reinforcement learning (RL)
I have a question about reward in RL.
is this sentence true? and if it is why?
thank you in advance
"the reward each time (for the same action from the same state) needs not to be the same."
1
vote
0
answers
445
views
Is the reward related to previous state or next state?
In the reinforcement learning framework, I am a little bit confused about the reward and how it is related to states. For example, in Q-learning, we have the following formula for updating the Q table:...
3
votes
1
answer
1k
views
Discount reward in REINFORCE deep reinforcement learning algorithm
I'm implementing a REINFORCE with baseline algorithm, but I have a doubt with the discount reward function.
I implemented the discount reward function like this:
def disc_r(rewards):
r = np....
0
votes
2
answers
955
views
Why isn't my code increasing the score on reward video ad watched?
When I click a button to load a reward video ad the score should double. Instead the reward ad plays and when I close after watching it, nothing happens. I am using public static variable points to ...
1
vote
0
answers
47
views
What is the best way to deal with imbalanced sample database with rewards
I look for a solution to train a DNNClassifier (4 classes, 20 numeric features) from imbalanced rewarded samples datafile. Each class represents a game action and reward the action score. Features ...
1
vote
1
answer
137
views
How to train a bad reward with a classifying Neural Net?
I am trying to train a Neural Net on playing Tic Tac Toe via Reinforcement Learning with Keras, Python.
Currently the Net gets an Input of the current board:
array([0,1,0,-1,0,1,0,0,0])
1 = X
-...
1
vote
0
answers
310
views
Why are my rewards converging but still have a lot of variations
I am training a reinforcement learning agent on an episodic task of fixed episode length. I am tracking the training process by plotting the cumulative rewards over an episode. I am using tensorboard ...
0
votes
1
answer
71
views
Formulation of a reward structure
I am new to reinforcement learning and experimenting with training of RL agents.
I have a doubt about reward formulation, from a given state if a agent takes a good action i give a positive reward, ...
0
votes
1
answer
154
views
Why is my reward function returning None in Python?
OK, so, I'm trying to make a intrinsic-curiosity agent using keras and tensorflow. This agent's reward function is the difference of an autoencoder's loss between the previous and current state, and ...
0
votes
1
answer
146
views
Reward distribution Reinforcement Learning
Problem1:
We want to go from s to e. In each cell we can move right R or down D. The environment is fully known. The table has (4*5) 20 cells. The challenge is that we do not know what the reward of ...
2
votes
2
answers
2k
views
How create multiple reward video's in Unity application?
the last few days I am trying to implement reward video's (admob) in my Unity app. I want to have multiple rewards video's people can watch, with different types of rewards. I feel like I am close (...
1
vote
1
answer
2k
views
Custom environment Gym for step function processing with DDPG Agent
I'm new to reinforcement learning, and I would like to process audio signal using this technique. I built a basic step function that I wish to flatten to get my hands on Gym OpenAI and reinforcement ...
1
vote
1
answer
2k
views
Discounted rewards in basic reinforcement learning
I'm wondering how discounting rewards for reinforcement learning actually works. I believe the idea is that rewards later in an episode get weighted heavier than early rewards. That makes perfect ...
0
votes
1
answer
485
views
MDP implementation using python - dimensions
I have problem in implementing mdp (markov decision process) by python.
I have these matrices: states: (1 x n) and actions: (1 x m)
.Transition matrix is calculated by this code:
p = np.zeros((n,n))...
2
votes
1
answer
453
views
Convergence of the Q-learning on the inverted pendulum
Hello I'm working on a total control of the cartpole problem (inverted pendulum). My aim is for the system to reach stability meaning all the states(x, xdot,theta and theta) should converge to zero. I ...
0
votes
0
answers
68
views
Admob- is it possible to put a link to another App on the market and earn money upon download?
The question is clear I assume though couldn't find a relevant answer on web. For example if my app put a link to another app's PlayStore download page and if user download it can I get money on that? ...
0
votes
2
answers
3k
views
Rewarded video No fill from ad server.Failed to load? Android
I am trying to implement admob ads in fragment but its been a month and i am still getting error 3 ( No ads to fill ).
i have tried with new Id but still getting same error, test ads are working fine....
0
votes
0
answers
2k
views
Admob Rewarded Video - Ad Not Loading
I have to implement Admob Reward Video in the android studio in my current project. I have tried everything like.. Youtube tutorial, Admob Official tutorials and scripts but nothing is working for me. ...
0
votes
1
answer
255
views
Reward Function in MIT Deep Traffic Challenge?
I have been playing around with the MIT DeepTraffic Challenge
Also watching the lecture and reading the slides
After getting a General understanding of the architecture I was wondering what exactly ...
1
vote
1
answer
1k
views
Keras Reinforcement Learning: How to pass reward to the model
import numpy as np
import gym
from gym import wrappers # 追加
from keras.models import Sequential
from keras.layers import Dense, Activation, Flatten
from keras.optimizers import Adam
from rl.agents....
1
vote
0
answers
85
views
Canvas problems. Not able to reproduce design
I need to build canvas animation like design requires. I spend almost 3 days but I'm not able to do anything like in design. Here a REQUESTED design!. And here - what I've got for now: current ...
3
votes
0
answers
2k
views
Reward videos number of plays limit
I am developing one android game. I am planning to integrate a reward videos in my app. I am planing to go for AdMob mediation. Below is my question.
User can watch a reward video to save his/her ...
1
vote
1
answer
689
views
WebView remote site and reward videos
I have a simple game developed in PHP. I have loaded the remote site in Android WebView. I want to find out that if user clicks on a FREE life button which is on my remote PHP site, I want to start a ...
-4
votes
1
answer
192
views
How to integrate SDK in android studio
I own a reward app and the dev of the app has abandon the project.
So im alone in this and i dont know how to integrate 2 advertising networks into my app...They share the SDK for monetization and i ...
1
vote
0
answers
68
views
Rewarded Videos - Time Left Counter
I would like to ask a question about rewarded videos in Android. I have set the rewarded videos to show once per hour, which means every user can watch one video per one hour.
My question is, what is ...
2
votes
2
answers
808
views
Unity RewardAd function call more time not just once
I want a RewardAd in my game. When you watch video you get +10 score to your current score not your high-score.
You have a 45 high-score and you are now at 37, so you watch video for +10 score and ...