State-of-the-Art Deep Learning systems at hyper-scale AI companies attack the toughest problems with distributed deep learning. Distributed Deep Learning systems enable both AI researchers and practioners to be more productive and the training of models that would be intractable on a single GPU server. In this talk, we will introduce the latest developments in distributed Deep Learning (synchronous stochastic gradient descent) and how distribution can both massively reduce training time and parallel experimentation, using large-scale hyperparameter optimization. We will introduce different distributed architectures, including the parameter server and Ring-AllReduce models. In particular, we will describe open-source TensorFlow frameworks that leverage Apache Spark to manage distributed training, such as Yahoo’s TensorflowOnSpark, Uber’s Horovod platform, and Hops’ TfSpark. We will introduce the different programming models supported and highlight the importance of cluster support for managing GPUs as a resource. To this end, we will also introduce Hops, an open-source distribution of Hadoop with support for GPUs as a resource, and show how TensorFlow/Spark applications can be easily run from a Jupyter Notebook. We will also show that on-premise distributed Deep Learning is gaining traction, as both enterprise and commodity GPUs can be integrated into a single platform.
Jim Dowling is the CEO of Logical Clocks AB, as well as an Associate Professor at KTH Royal Institute of Technology in Stockholm, and a Senior Researcher at SICS RISE. His research concentrates on building systems support for machine learning at scale. He is the lead architect of Hops Hadoop, the world's most fastest and most scalable Hadoop distribution and only Hadoop platform with support for GPUs as a resource. He is a regular speaker at Big Data and AI industry conferences, and blogs at O'Reilly on AI.