E1 277 January-April 3:1 Reinforcement Learning: Instructor
E1 277 January-April 3:1 Reinforcement Learning: Instructor
E1 277 January-April 3:1 Reinforcement Learning: Instructor
Reinforcement Learning
Instructor
Shalabh Bhatnagar
Email: [email protected]
Teaching Assistant
Sindhu P.R., Raghuram Bharadwaj
Email: [email protected], [email protected]
Announcements
Stochastic dynamic programming is a general framework for modelling such problems. However, one requires
knowledge of transition probabilities (i.e., the system dynamics) as well as the associated cost function. Both
of these quantities are normally not known and one only has access to data that is available from the
experiment. For instance, one may not know the transition probabilities but one may see what the next state is
given the current state and the action or control chosen. The course deals with building first the model based
dynamic programming techniques and subsequently the model free, data driven algorithms, and deals with the
horizon models, the dynamic programming algorithm, infinite horizon discounted cost and average cost
Page 1/2
problems, numerical solution methodologies, full state representations, function approximation techniques,
approximate dynamic programming, partially observable Markov decision processes, Q-learning, temporal
making under uncertainty. They will know the algorithms they can apply when faced with such problems and
the convergence and accuracy guarantees that such algorithms would provide.
Grading policy
Two mid term exams, One course project, and One final exam
Assignments
Resources
Page 2/2