Papers by Luis Paulo Reis
Journal of Intelligent and Robotic Systems, Jan 8, 2019
Stochastic search and optimization techniques are used in a vast number of areas, ranging from re... more Stochastic search and optimization techniques are used in a vast number of areas, ranging from refining the design of vehicles, determining the effectiveness of new drugs, developing efficient strategies in games, or learning proper behaviors in robotics. However, they specialize for the specific problem they are solving, and if the problem's context slightly changes, they cannot adapt properly. In fact, they require complete re-leaning in order to perform correctly in new unseen scenarios, regardless of how similar they are to previous learned environments. Contextual algorithms have recently emerged as solutions to this problem. They learn the policy for a task that depends on a given context, such that widely different contexts belonging to the same task are learned simultaneously. That being said, the state-of-the-art proposals of this class of algorithms prematurely converge, and simply cannot compete with algorithms that learn a policy for a single context. We describe the Contextual Relative Entropy Policy Search (CREPS) algorithm, which belongs to the before-mentioned class of contextual algorithms. We extend it with a technique that allows the algorithm to severely increase its performance, and we call it Contextual Relative Entropy Policy Search with Covariance Matrix Adaptation (CREPS-CMA). We propose two variants, and demonstrate their behavior in a set of classic contextual optimization problems, and on complex simulator robot tasks.
Group Decision and Negotiation, Jan 13, 2020
In this work, we propose a Dynamic Difficulty Adjustment methodology to achieve automatic video g... more In this work, we propose a Dynamic Difficulty Adjustment methodology to achieve automatic video game balance. The balance task is modeled as a meta game, a game where actions change the rules of another base game. Based on the model of Reinforcement Learning (RL), an agent assumes the role of a game master and learns its optimal policy by playing the meta game. In this new methodology we extend traditional RL by adding the existence of a meta environment whose state transition depends on the evolution of a base environment. In addition, we propose a Multi Agent System training model for the game master agent, where it plays against multiple agent opponents, each with a distinct behavior and proficiency level while playing the base game. Our experiment is conducted on an adaptive grid-world environment in singleplayer and multiplayer scenarios. Our results are expressed in twofold: (i) the resulting decision making by the game master through gameplay, which must comply in accordance to an established balance objective by the game designer; (ii) the initial conception of a framework for automatic game balance, where the balance task design is reduced to the modulation of a reward function (balance reward), an action space (balance strategies) and the definition of a balance space state.
When compared with their single-agent counterpart, multi-agent systems have an additional set of ... more When compared with their single-agent counterpart, multi-agent systems have an additional set of challenges for reinforcement learning algorithms, including increased complexity, non-stationary environments, credit assignment, partial observability, and achieving coordination. Deep reinforcement learning has been shown to achieve successful policies through implicit coordination, but does not handle partial-observability. This paper describes a deep reinforcement learning algorithm, based on multi-agent actor-critic, that simultaneously learns action policies for each agent, and communication protocols that compensate for partial-observability and help enforce coordination. We also research the effects of noisy communication, where messages can be late, lost, noisy, or jumbled, and how that affects the learned policies. We show how agents are able to learn both high-level policies and complex communication protocols for several different partially-observable environments. We also show how our proposal outperforms other state-of-the-art algorithms that don’t take advantage of communication, even with noisy communication channels.
arXiv (Cornell University), Mar 28, 2023
The FC Portugal 3D team is developed upon the structure of our previous Simulation league 2D/3D t... more The FC Portugal 3D team is developed upon the structure of our previous Simulation league 2D/3D teams and our standard platform league team. Our research concerning the robot low-level skills is focused on developing behaviors that may be applied on real robots with minimal adaptation using model-based approaches. Our research on high-level soccer coordination methodologies and team playing is mainly focused on the adaptation of previously developed methodologies from our 2D soccer teams to the 3D humanoid environment and on creating new coordination methodologies based on the previously developed ones. The research-oriented development of our team has been pushing it to be one of the most competitive over the years (
Advances in intelligent systems and computing, Nov 20, 2019
Given the plethora of Reinforcement Learning algorithms available in the literature, it can prove... more Given the plethora of Reinforcement Learning algorithms available in the literature, it can prove challenging to decide on the most appropriate one to use in order to solve a given Reinforcement Learning task. This work presents a benchmark study on the performance of several Reinforcement Learning algorithms for discrete learning environments. The study includes several deep as well as non-deep learning algorithms, with special focus on the Deep Q-Network algorithm and its variants. Neural Fitted Q-Iteration, the predecessor of Deep Q-Network as well as Vanilla Policy Gradient and a planner were also included in this assessment in order to provide a wider range of comparison between different approaches and paradigms. Three learning environments were used in order to carry out the tests, including a 2D maze and two OpenAI Gym environments, namely a custom-built Foraging/Tagging environment and the CartPole environment.
Lecture Notes in Computer Science, 2019
Reinforcement learning techniques bring a new perspective to enduring problems. Developing skills... more Reinforcement learning techniques bring a new perspective to enduring problems. Developing skills from scratch is not only appealing due to the artificial creation of knowledge. It can also replace years of work and refinement in a matter of hours. From all the developed skills in the RoboCup 3D Soccer Simulation League, running is still considerably relevant to determine the winner of any match. However, current approaches do not make full use of the robotic soccer agents’ potential. To narrow this gap, we propose a way of leveraging the Proximal Policy Optimization using the information provided by the simulator for official RoboCup matches. To do this, our algorithm uses a mix of raw, computed and internally generated data. The final result is a sprinting and a stopping behavior that work in tandem to bring the agent from point a to point b in a very short time. The sprinting speed stabilizes at around 2.5 m/s, which is a great improvement over current solutions. Both the sprinting and stopping behaviors are remarkably stable.
IEEE transactions on games, 2023
Frontiers in Robotics and AI, Apr 10, 2023
This work has developed a hybrid framework that combines machine learning and control approaches ... more This work has developed a hybrid framework that combines machine learning and control approaches for legged robots to achieve new capabilities of balancing against external perturbations. The framework embeds a kernel which is a model-based, full parametric closed-loop and analytical controller as the gait pattern generator. On top of that, a neural network with symmetric partial data augmentation learns to automatically adjust the parameters for the gait kernel, and also generate compensatory actions for all joints, thus significantly augmenting the stability under unexpected perturbations. Seven Neural Network policies with different configurations were optimized to validate the effectiveness and the combined use of the modulation of the kernel parameters and the compensation for the arms and legs using residual actions. The results validated that modulating kernel parameters alongside the residual actions have improved the stability significantly. Furthermore, The performance of the proposed framework was evaluated across a set of challenging simulated scenarios, and demonstrated considerable improvements compared to the baseline in recovering from large external forces (up to 118%). Besides, regarding measurement noise and model inaccuracies, the robustness of the proposed framework has been assessed through simulations, which demonstrated the robustness in the presence of these uncertainties. Furthermore, the trained policies were validated across a set of unseen scenarios and showed the generalization to dynamic walking.
Integrated Computer-aided Engineering, Sep 11, 2020
Tackling multi-agent environments where each agent has a local limited observation of the global ... more Tackling multi-agent environments where each agent has a local limited observation of the global state is a non-trivial task that often requires hand-tuned solutions. A team of agents coordinating in such scenarios must handle the complex underlying environment, while each agent only has partial knowledge about the environment. Deep reinforcement learning has been shown to achieve superhuman performance in single-agent environments, and has since been adapted to the multi-agent paradigm. This paper proposes A3C3, a multi-agent deep learning algorithm, where agents are evaluated by a centralized referee during the learning phase, but remain independent from each other in actual execution. This referee's neural network is augmented with a permutation invariance architecture to increase its scalability to large teams. A3C3 also allows agents to learn communication protocols with which agents share relevant information to their team members, allowing them to overcome their limited knowledge, and achieve coordination. A3C3 and its permutation invariant augmentation is evaluated in multiple multi-agent test-beds, which include partially-observable scenarios, swarm environments, and complex 3D soccer simulations.
Advances in intelligent systems and computing, 2019
Deep learning models have as of late risen as popular function approximators for single-agent rei... more Deep learning models have as of late risen as popular function approximators for single-agent reinforcement learning challenges, by accurately estimating the value function of complex environments and being able to generalize to new unseen states. For multi-agent fields, agents must cope with the non-stationarity of the environment, due to the presence of other agents, and can take advantage of information sharing techniques for improved coordination. We propose an neural-based actor-critic algorithm, which learns communication protocols between agents and implicitly shares information during the learning phase. Large numbers of agents communicate with a self-learned protocol during distributed execution, and reliably learn complex strategies and protocols for partially observable multi-agent environments.
Elcctrical Impcdancc Tomography (EIT) is a rcccnt mcdical imaging tcchniquc bascti on multiple im... more Elcctrical Impcdancc Tomography (EIT) is a rcccnt mcdical imaging tcchniquc bascti on multiple impcdancc mcasurcmcnis using.; urt'acc clcctrodcs. Thc clccirical impcdancc of LI tissuc, which characicrizcs iis ability io conduct clcctrical currcnt. tlcpcnds on its structures and physiological sutc. In ccllular mediums, ihc transmission 01'clccirical signals involvcs the ionic conduction in the interstitial spacc and cytoplasm and thc capacitivc propcriics of thc ccll mcmbrancs (1). Thc ac impcdancc ol'a tissuc varics with thc applied signal I' ...
arXiv (Cornell University), Nov 27, 2020
This work aims to combine machine learning and control approaches for legged robots, and develope... more This work aims to combine machine learning and control approaches for legged robots, and developed a hybrid framework to achieve new capabilities of balancing against external perturbations. The framework embeds a kernel which is a fully parametric closed-loop gait generator based on analytical control. On top of that, a neural network with symmetric partial data augmentation learns to automatically adjust the parameters for the gait kernel and to generate compensatory actions for all joints as the residual dynamics, thus significantly augmenting the stability under unexpected perturbations. The performance of the proposed framework was evaluated across a set of challenging simulated scenarios. The results showed considerable improvements compared to the baseline in recovering from large external forces. Moreover, the produced behaviours are more natural, human-like and robust against noisy sensing.
arXiv (Cornell University), Mar 1, 2021
Humanoid robots are made to resemble humans but their locomotion abilities are far from ours in t... more Humanoid robots are made to resemble humans but their locomotion abilities are far from ours in terms of agility and versatility. When humans walk on complex terrains, or face external disturbances, they combine a set of strategies, unconsciously and efficiently, to regain stability. This paper tackles the problem of developing a robust omnidirectional walking framework, which is able to generate versatile and agile locomotion on complex terrains. The Linear Inverted Pendulum Model and Central Pattern Generator concepts are used to develop a closed-loop walk engine, which is then combined with a reinforcement learning module. This module learns to regulate the walk engine parameters adaptively, and generates residuals to adjust the robot's target joint positions (residual physics). Additionally, we propose a proximal symmetry loss function to increase the sample efficiency of the Proximal Policy Optimization algorithm, by leveraging model symmetries and the trust region concept. The effectiveness of the proposed framework was demonstrated and evaluated across a set of challenging simulation scenarios. The robot was able to generalize what it learned in unforeseen circumstances, displaying human-like locomotion skills, even in the presence of noise and external pushes.
arXiv (Cornell University), Nov 27, 2020
This work aims to combine machine learning and control approaches for legged robots, and develope... more This work aims to combine machine learning and control approaches for legged robots, and developed a hybrid framework to achieve new capabilities of balancing against external perturbations. The framework embeds a kernel which is a fully parametric closed-loop gait generator based on analytical control. On top of that, a neural network with symmetric partial data augmentation learns to automatically adjust the parameters for the gait kernel and to generate compensatory actions for all joints as the residual dynamics, thus significantly augmenting the stability under unexpected perturbations. The performance of the proposed framework was evaluated across a set of challenging simulated scenarios. The results showed considerable improvements compared to the baseline in recovering from large external forces. Moreover, the produced behaviours are more natural, human-like and robust against noisy sensing.
Chess was for many years the main domain for Artificial Intelligence (AI) research. However, afte... more Chess was for many years the main domain for Artificial Intelligence (AI) research. However, after the victory of Deep Blue over Gary Kasparov-world champion at the time-it was clear that new challenges and domains had to be considered. The scientific community searching for new and complex problems, found robotic soccer as a new domain for research in Distributed Artificial Intelligence. Filipe Marques vi vii 1.2 Agent and Multi-Agent System Object-oriented applications are becoming less common because the nature of the problems is changing. An open and distributed environment is emerging and centralized approaches are less suitable to solve problems in this kind of environment. Agent-oriented applications are a good paradigm to solve problems in open, heterogeneous and distributed environments. The main difference between agents and objects was introduced by Wooldridge:
Springer eBooks, 2023
FC Portugal, a team from the universities of Porto and Aveiro, won the main competition of the 20... more FC Portugal, a team from the universities of Porto and Aveiro, won the main competition of the 2022 RoboCup 3D Simulation League, with 17 wins, 1 tie and no losses. During the course of the competition, the team scored 84 goals while conceding only 2. FC Portugal also won the 2022 RoboCup 3D Simulation League Technical Challenge, accumulating the maximum amount of points by ending first in its both events: the Free/Scientific Challenge, and the Fat Proxy Challenge. The team presented in this year's competition was rebuilt from the ground up since the last RoboCup. No previous code was used or adapted, with the exception of the 6D pose estimation algorithm, and the get-up behaviors, which were re-optimized. This paper describes the team's new architecture and development approach. Key strategy elements include team coordination, role management, formation, communication, skill management and path planning. New lower-level skills were based on a deterministic analytic model and a shallow neural network that learned residual dynamics through reinforcement learning. This process, together with an overlapped learning approach, improved seamless transitions, learning time, and the behavior in terms of efficiency and stability. In comparison with the previous team, the omnidirectional walk is more stable and went from 0.70 m/s to 0.90 m/s, the long kick from 15 m to 19 m, and the new close-control dribble reaches up to 1.41 m/s.
Journal of Intelligent and Robotic Systems, May 12, 2021
ICERI proceedings, Nov 1, 2017
Advances in intelligent systems and computing, Dec 21, 2017
There are many open issues and challenges in the reinforcement learning field, such as handling h... more There are many open issues and challenges in the reinforcement learning field, such as handling high-dimensional environments. Function approximators, such as deep neural networks, have been successfully used in both single- and multi-agent environments with high dimensional state-spaces. The multi-agent learning paradigm faces even more problems, due to the effect of several agents learning simultaneously in the environment. One of its main concerns is how to learn mixed policies that prevent opponents from exploring them in competitive environments, achieving a Nash equilibrium. We propose an extension of several algorithms able to achieve Nash equilibriums in single-state games to the deep-learning paradigm. We compare their deep-learning and table-based implementations, and demonstrate how WPL is able to achieve an equilibrium strategy in a complex environment, where agents must find each other in an infinite-state game and play a modified version of the Rock Paper Scissors game.
Robotics and Autonomous Systems, Dec 1, 2021
This paper proposes a modular framework to generate robust biped locomotion using a tight couplin... more This paper proposes a modular framework to generate robust biped locomotion using a tight coupling between an analytical walking approach and deep reinforcement learning. This framework is composed of six main modules which are hierarchically connected to reduce the overall complexity and increase its flexibility. The core of this framework is a specific dynamics model which abstracts a humanoid's dynamics model into two masses for modeling upper and lower body. This dynamics model is used to design an adaptive reference trajectories planner and an optimal controller which are fully parametric. Furthermore, a learning framework is developed based on Genetic Algorithm (GA) and Proximal Policy Optimization (PPO) to find the optimum parameters and to learn how to improve the stability of the robot by moving the arms and changing its center of mass (COM) height. A set of simulations are performed to validate the performance of the framework using the official RoboCup 3D League simulation environment. The results validate the performance of the framework, not only in creating a fast and stable gait but also in learning to improve the upper body efficiency.
Uploads
Papers by Luis Paulo Reis