Multimodal AI is useful for video content analysis in a variety of problems, e.g., Visual Dialog, Object Detection, Scene Understanding, Content Recommendation, etc. In this talk, we will focus on the problem of multimodal content recommendation, where an AI agent processes data in multiple modalities (e.g., video, images, audio, language) and learns how to recommend new multi-media content to an user. We show how the AI agent can process different characteristics of the current video as well as those in the user history, and infer the relevant matching characteristics based on which the AI agent will be able to recommend new videos of interest to the user. The particular characteristics we focus on are the fine-grained categories of the video. In this context, we have developed some key innovations: (a) a hierarchy of fine-grained categories, customized to the domain under consideration; (b) novel modeling approaches like temporal coherence-based regularization and hierarchical loss function, which improve the accuracy of the deep learning models in getting accurate predictions of fine-grained categories from videos; and (c) a novel multi-modal fusion architecture, which uses approaches like sparse fusion and gated mixture of experts to combine predictions from multiple modalities and get the final category prediction. We will discuss how these innovations come together in our proposed architecture for multimodal content recommendation, which out-performs current state-of-the-art models in performance.
Dr. Shalini Ghosh is the Director of AI Research at the Artificial Intelligence Center of Samsung Research America, where she leads a group works on Situated AI and Multi-modal Learning (i.e., learning from computer vision, language, and speech). She has extensive experience and expertise in Machine Learning (ML), especially Deep Learning, and has worked on applications to multiple domains. Before joining Samsung Research, Dr. Ghosh was a Principal Computer Scientist in the Computer Science Laboratory at SRI International, where she has been the Principal Investigator/Tech Lead of several impactful DARPA and NSF projects. She was a Visiting Scientist at Google Research in 2014-2015, where she worked on applying deep learning (Google Brain) models to dialog systems and natural language applications. Dr. Ghosh has a Ph.D. in Computer Engineering from the University of Texas at Austin. She has won several grants and awards for her research, including a Best Paper award and a Best Student Paper Runner-up award for applications of ML to dependable computing. Dr. Ghosh is also on the program committee of multiple impactful conferences and journals in ML and AI (e.g., NIPS, ICML, KDD, AAAI), has served as invited panelist in multiple panels, and was invited to be a guest lecturer at UC Berkeley.