With the unreasonable effectiveness of data in the deep-learning era, most state-of-the-art computer vision solutions today require large amounts of training data and ever-increasing training resources. Furthermore, amassing large amounts of labeled data for task-specific needs becomes increasingly tedious and expensive. While gathering large amounts of cross-modal robot data poses a challenge in itself, we envision that robots will be able self-supervise themselves in certain tasks by transferring or bootstrapping capabilities with the rich set of cross-modal information that these robots typically collect. In this talk, we show that this bootstrap mechanism can also leverage spatio-temporal constraints that are implicitly maintained in robots via techniques like Simultaneous Localization and Mapping (SLAM).
We envision that self-supervised solutions to task learning will have far-reaching implications especially in the context of life-long learning in autonomous systems, while alleviating the need to procure large amounts of labeled data. To conclude, I will talk about some of the recent machine learning efforts at Toyota Research Institute, and how self-supervision hopes to be at the core of our vision of petabyte-scale learning from robot data.
Sudeep is a Machine Learning Research Scientist at Toyota Research Institute. He recently received his PhD in Computer Science from MIT, where he focused on enabling self-supervised perception and learning in SLAM-aware mobile robots. Prior to MIT, he was a software developer working on real-time computer-vision related technologies. He completed his Bachelors in Mechanical Engineering at the University of Michigan - Ann Arbor. He has also had the opportunity to work as a research intern at exciting companies such as Mitsubishi Electric Research Labs (MERL) and Segway.