Deep Learning

Instructor: Bhiksha Raj

Timings:1:30 p.m. -- 2:50 p.m.
Days:Mondays and Wednesdays
Location: GHC 4102

Credits: 12

Instructor: Bhiksha Raj
Contact: email:bhiksha@cs.cmu.edu,  Phone:8-9826,  Office: GHC6705
Office hours: TBD
TAs: Haohan Wang (Office hours: TBD), Haoqi Fan (Office hours: TBD)

Deep learning algorithms attempt to learn multi-level representations of data, embodying a hierarchy of factors that may explain them. Such algorithms have been demonstrated to be effective both at uncovering underlying structure in data, and have been successfully applied to a large variety of problems ranging from image classification, to natural language processing and speech recognition.

In this course students will learn about this resurgent subject. The course presents the subject through a series of seminars and labs, which will explore it from its early beginnings, and work themselves to some of the state of the art. The presentations will cover the basics of deep learning and the underlying theory, as well as the breadth of application areas to which it has been applied, as well as the latest issues on learning from very large amounts of data. Although the concept of deep learning has been applied to a number of different models, we will concentrate largely, although not entirely, on the connectionist architectures that are most commonly associated with it.

The labs will exercise the basics of several aspects of implementation and investigation of these networks.

Students who participate in the course are expected to present at least two papers, in addition to completing all labs. Presentations are expected to be thorough and, where applicable, illustrated through experiments and simulations conducted by the student.

Attendance is mandatory.


Papers and presentations

DateTopic/paperPresenterAdditional Links
31 Aug 2016IntroductionBhiksha Raj
7 Sep 2016Story so far
Theano tutorialHaohan Wang, Haoqi Fan[material]
12 Sep 2016 Training a network through back propagation [slides] Backpropagation through time: what it does and how to do it., Proc. IEEE 1990, P. Werbos
14 Sep 2016 On the problem of local minima in backpropagation, IEEE tran. Pattern Analysis and Machine Intelligence, Vol 14(1), 76-86, 1992. M. Gori and A. Tesi Dan Schwatz
Training a 3-node neural network is NP-complete, Avrim Blum and Ron Rivest, COLT 88
Backpropagation fails where perceptrons succeed, IEEE Trans on circuits and systems. Vol. 36:5, May 1989. M. Brady, R. Raghavan, J. Slawny Ian Quah
19 Sep 2016 Speeding up BP: Rprop, Acceleration, Nestorov's method [slides]
Adaptive Subgradient Methods for Online Learning and Stochastic Optimization. Journal of Machine Learning Research 12 (2011) 2121-2159. J. Duchi, E. Hazan, Y. Singer
ADADELTA: An Adaptive Learning Rate Method. Matthew Zeiler, ArXiv, 2012
Adam: A Method for Stochastic Optimization. D. Kingma, J. Ba. ArXiv 2014