|Timings:||1:30 p.m. -- 2:50 p.m.|
|Days:||Mondays and Wednesdays|
|Instructor: Bhiksha Raj|
|Contact: email:firstname.lastname@example.org, Phone:8-9826, Office: GHC6705|
|Office hours: TBD|
|TAs: Haohan Wang (Office hours: TBD), Haoqi Fan (Office hours: TBD)|
Deep learning algorithms attempt to learn multi-level representations of data, embodying a hierarchy of factors that may explain them. Such algorithms have been demonstrated to be effective both at uncovering underlying structure in data, and have been successfully applied to a large variety of problems ranging from image classification, to natural language processing and speech recognition.
In this course students will learn about this resurgent subject. The course presents the subject through a series of seminars and labs, which will explore it from its early beginnings, and work themselves to some of the state of the art. The presentations will cover the basics of deep learning and the underlying theory, as well as the breadth of application areas to which it has been applied, as well as the latest issues on learning from very large amounts of data. Although the concept of deep learning has been applied to a number of different models, we will concentrate largely, although not entirely, on the connectionist architectures that are most commonly associated with it.
The labs will exercise the basics of several aspects of implementation and investigation of these networks.
Students who participate in the course are expected to present at least two papers, in addition to completing all labs. Presentations are expected to be thorough and, where applicable, illustrated through experiments and simulations conducted by the student.
Attendance is mandatory.
|31 Aug 2016||Introduction||Bhiksha Raj|
|7 Sep 2016||Story so far|
|Theano tutorial||Haohan Wang, Haoqi Fan||[material]|
|12 Sep 2016||Training a network through back propagation||[slides]||Backpropagation through time: what it does and how to do it., Proc. IEEE 1990, P. Werbos|
|14 Sep 2016||On the problem of local minima in backpropagation, IEEE tran. Pattern Analysis and Machine Intelligence, Vol 14(1), 76-86, 1992. M. Gori and A. Tesi||Dan Schwatz
|Training a 3-node neural network is NP-complete, Avrim Blum and Ron Rivest, COLT 88|
|Backpropagation fails where perceptrons succeed, IEEE Trans on circuits and systems. Vol. 36:5, May 1989. M. Brady, R. Raghavan, J. Slawny||Ian Quah
|19 Sep 2016||Speeding up BP: Rprop, Acceleration, Nestorov's method||[slides]|
|Adaptive Subgradient Methods for Online Learning and Stochastic Optimization. Journal of Machine Learning Research 12 (2011) 2121-2159. J. Duchi, E. Hazan, Y. Singer|
|ADADELTA: An Adaptive Learning Rate Method. Matthew Zeiler, ArXiv, 2012|
|Adam: A Method for Stochastic Optimization. D. Kingma, J. Ba. ArXiv 2014|