Indian Institute of Technology, Kanpur
Electrical Engineering
This paper presents a predictive controller for handling plug-and-play (P&P) charging requests of electric vehicles (EVs) in a distribution system. The proposed method uses a two-stage hierarchical control scheme based on a model... more
In recently proposed multiple access techniques such as IDMA and OFDM-IDMA, the user separation is done by user specific interleavers in contrast to conventional CDMA scheme where user separation is assured with user-specific signature... more
We consider the problem where M agents interact with M identical and independent environments with S states and A actions using reinforcement learning for T rounds. The agents share their data with a central server to minimize their... more
Teleoperated surgical robots can provide immediate medical assistance in austere and hostile environments. However, such scenarios are time-sensitive and thus, require highbandwidth and low-latency communication links which might be... more
In the future, deployable, teleoperated surgical robots can save the lives of critically injured patients in battlefield environments. These robotic systems will need to have autonomous capabilities to take over during communication... more
Many engineering problems have multiple objectives, and the overall aim is to optimize a non-linear function of these objectives. In this paper, we formulate the problem of maximizing a non-linear concave function of multiple long-term... more
We consider the problem of constrained Markov Decision Process (CMDP) where an agent interacts with a unichain Markov Decision Process. At every interaction, the agent obtains a reward. Further, there are K cost functions. The agent aims... more
Many real-world problems like Social Influence Maximization face the dilemma of choosing the best K out of N options at a given time instant. This setup can be modeled as a combinatorial bandit which chooses K out of N arms at each time,... more
Quantum key distribution (QKD) allows two distant parties to share encryption keys with security based on laws of quantum mechanics. In order to share the keys, the quantum bits have to be transmitted from the sender to the receiver over... more
Mean field control (MFC) is an effective way to mitigate the curse of dimensionality of cooperative multi-agent reinforcement learning (MARL) problems. This work considers a collection of Npop heterogeneous agents that can be segregated... more
We consider the bandit problem of selecting $K$ out of $N$ arms at each time step. The reward can be a non-linear function of the rewards of the selected individual arms. The direct use of a multi-armed bandit algorithm requires choosing... more
Many real-world problems face the dilemma of choosing best $K$ out of $N$ options at a given time instant. This setup can be modelled as combinatorial bandit which chooses $K$ out of $N$ arms at each time, with an aim to achieve an... more
We consider the problem where N agents collaboratively interact with an instance of a stochastic K arm bandit problem for K ≫ N . The agents aim to simultaneously minimize the cumulative regret over all the agents for a total of T time... more
Gradient descent and its variants are widely used in machine learning. However, oracle access of gradient may not be available in many applications, limiting the direct use of gradient descent. This paper proposes a method of estimating... more
We consider the problem of tabular infinite horizon concave utility reinforcement learning (CURL) with convex constraints. Various learning applications with constraints, such as robotics, do not allow for policies that can violate... more
Reinforcement Learning (RL) is being increasingly applied to optimize complex functions that may have a stochastic component. RL is extended to multi-agent systems to find policies to optimize systems that require agents to coordinate or... more
Reinforcement learning is widely used in applications where one needs to perform sequential decisions while interacting with the environment. The problem becomes more challenging when the decision requirement includes satisfying some... more
Gradient descent and its variants are widely used in machine learning. However, oracle access of gradient may not be available in many applications, limiting the direct use of gradient descent. This paper proposes a method of estimating... more