Off-policy reinforcement learning algorithms let us train and evaluate policies using data collected from other policies. This makes them attractive for real-world settings like robotics, where generating real data is expensive and time-consuming. This talk covers two ways we’ve used off-policy algorithms. First, I’ll talk about how we’ve used off-policy learning to solve a challenging vision-based robotic manipulation task on real robots. Second, I’ll discuss our recent work on off-policy evaluation of policies, and reasons off-policy evaluation is important for future research of real-world reinforcement learning problems.
Alex Irpan is a software engineer at Google Brain, where he works on ways to apply deep reinforcement learning to robotics and other real-world problems. His research focuses on ways to leverage real-world data as much as possible for robotic manipulation problems, through techniques like transfer learning and off-policy learning. He received his BA in computer science from UC Berkeley in 2016, where he did undergrad research in the Berkeley AI Research Lab, mentored by Pieter Abbeel and John Schulman.