Abstract:
Deep neural networks have seen empirical successes across computer vision tasks, but training them requires tens of thousands to millions of examples, which typically come in the form of an image or images, and human annotated ground
truth. Curating vision datasets, in general, amounts to numerous man-hours; tasks like depth estimation require an even more massive effort. I will introduce an alternative form of supervision that leverages multi-sensor validation as an unsupervised (or self-supervised)
training objective for depth estimation. I will demonstrate how one can leverage synthetic data and the abundance of publicly available pretrained models, which has largely relied on expensive manual labeling, to learn or distill the regularities of our visual
world. In doing so, I show that one can design smaller and faster models that can operate in real-time with state-of-the-art performance. Not only that, these models can be adapted online to novel environments in which they are deployed. Additionally, I will
discuss the current limitations of data augmentation procedures used during unsupervised training, which involves reconstructing the inputs as the supervision signal, and detail a method that allows one to scale up and introduce previously inviable augmentations
to boost performance. Finally, I will show that unsupervised depth training can serve as a feasible form of large-scale pretraining to produce backbones suitable for semantic tasks.
Bio:
Alex Wong is an Assistant Professor in the department of Computer Science and the director of
the
Vision Laboratory at Yale University. He received his Ph.D. in Computer Science from the University of California, Los Angeles (UCLA) in 2019 and was co-advised by Stefano Soatto and Alan Yuille. He was previously a post-doctoral research scholar at UCLA under
the guidance of Stefano Soatto. His research lies in the intersection of machine learning, computer vision, and robotics and largely focuses on multi-sensor fusion for 3D reconstruction, robust vision under adverse conditions, unsupervised learning, and medical
image analysis. His work has received the outstanding student paper award at the Conference on Neural Information Processing Systems (NeurIPS) 2011 and the best paper award in robot vision at the International Conference on Robotics and Automation (ICRA) 2019.
Organizer:
Prof. Lindi Liao, PhD
Department of Information Sciences & Technology
School of Computing | George Mason University
http://mason.gmu.edu/~dliao2/