11-785 DEEP LEARNING

Deep Learning

Instructor: Bhiksha Raj

COURSE NUMBER--11-785
Timings:1:30 p.m. -- 2:50 p.m.
Days:Mondays and Wednesdays
Location: GHC 4211
Website: http://deeplearning.cs.cmu.edu

Credits: 12

Instructor: Bhiksha Raj
Contact: email:bhiksha@cs.cmu.edu,  Phone:8-9826,  Office: GHC6705
Office hours: TBD
TA: Zhenzhong (Danny) Lan (Office hours: Friday, 4:00PM~5:00PM, GHC6225), Volkan Cirik

Deep learning algorithms attempt to learn multi-level representations of data, embodying a hierarchy of factors that may explain them. Such algorithms have been demonstrated to be effective both at uncovering underlying structure in data, and have been successfully applied to a large variety of problems ranging from image classification, to natural language processing and speech recognition.

In this course students will learn about this resurgent subject. The course presents the subject through a series of seminars and labs, which will explore it from its early beginnings, and work themselves to some of the state of the art. The seminars will cover the basics of deep learning and the underlying theory, as well as the breadth of application areas to which it has been applied, as well as the latest issues on learning from very large amounts of data. Although the concept of deep learning has been applied to a number of different models, we will concentrate largely, although not entirely, on the connectionist architectures that are most commonly associated with it.

The labs will exercise the basics of several aspects of implementation and investigation of these networks.

Students who participate in the course are expected to present at least two papers, in addition to completing all labs. at least two papers, in addition to completing all labs. Presentations are expected to be thorough and, where applicable, illustrated through experiments and simulations conducted by the student.

Attendance is mandatory.

Labs

Papers and presentations

DateTopic/paperAuthorPresenterAdditional Links
31 Aug 2015IntroductionBhiksha Raj
[slides]
21 Sep 2015Torch, Theano and AWSDanny Lan and Prasanna Muthukumar[Prasanna's code]
[Danny's Slides]
9 Sep 2015Bain on Neural Networks. Brain and Cognition 33:295-305, 1997Alan L. Wilkes and Nicholas J. WadeStephanie Rosenthal
[slides]
McCulloch, W.S. & Pitts, W.H. (1943). A Logical Calculus of the Ideas Immanent in Nervous Activity, Bulletin of Mathematical Biophysics, 5:115-137, 1943.W.S. McCulloch and W.H. PittsFatima Talib Al-Raisi
[slides]
Michael Marsalli's tutorial on the McCulloch and Pitts Neuron
The First Computational Theory of Mind and Brain: A Close Look at McCulloch and Pitts' "Logical Calculus of Ideas Immanent in Nervous Activity", Gualtiero Piccinini, Synthese 141: 175.215, 2004
14 Sep 2015The Perceptron: A Probalistic Model For Information Storage And Organization In The Brain. Psychological Review 65 (6): 386.408, 1958.F. RosenblattManu
[slides]
More about threshold logic. Proc. Second Annual Symposium on Switching Circuit Theory and Logical Design, 1961. R. O. Winder.
14 Sep 2015The organization of Behavior, 1949.D. O. HebbSrivaths R.
[slides]
16 Sep 2015The Widrow Hoff learning rule (ADALINE and MADALINE).WidrowXuanchong
[slides]
21 Sep 2015 Backpropagation through time: what it does and how to do it., Proc. IEEE 1990 P. Werbos Bernie
[slides]
21 Sep 2015 On the problem of local minima in backpropagation, IEEE tran. Pattern Analysis and Machine Intelligence, Vol 14(1), 76-86, 1992. M. Gori, A. Tesi Sai
[slides]
Training a 3-node neural network is NP-complete, Avrim Blum and Ron Rivest, COLT 88
23 Sep 2015 Backpropagation fails where perceptrons succeed, IEEE Trans on circuits and systems. Vol. 36:5, May 1989 Martin Brady, Raghu Raghavan, Joseph Slawny Suruchi
[slides]
23 September Multilayer feedforward networks are universal approximators, Neural Networks, Vol:2(3), 359-366, 1989 K. Hornik, M. Stinchcombe, H. White Aman Gupta
[slides]

Neural networks with a continuous squashing function in the output are universal approximators, J.L. Castro, C.J. Mantas, J.M. Benitez, Neural Networks, Vol 13, pp. 561-563, 2000
28 September A visual illustration of how neural networks approximate functions Michael Nielsen Nikolas Wolfe
[slides]
28 September A Simplified Neuron Model as a Principal Component Analyzer, J. Math. Biology (1982) 15:267-273 Erkki Oja Amir Zade
[slides]
30 September The self-organizing map. Proc. IEEE, Vol 79, 1464:1480, 1990 Teuvo Kohonen Karishma Agrawal
[slides]
30 September Self-Organizing Maps and Learning Vector Quantization for Feature Sequences Panu Somervuo, Teuvo Kohonen Aditya Sharma
[slides]
5 October Neural networks and physical systems with emergent collective computational abilities, Proc. Natl. Acad. Sciences, Vol 79, 2554-2558, 1982 John Hopfield Hinton
[slides]
5 October A learning algorithm for Boltzmann machines, Cognitive Science, 9, 147-169, 1985 D. Ackley, G. Hinton, T. Sejnowski Shi Zong
[slides]
Learning and Relearning in Boltzmann machines, T. Sejnowski and G. Hinton
Improved simulated annealing, Boltzmann machine, and attributed graph matching, Lei Xu and Erkii Oja, EURASIP Workshop on Neural Networks, vol 412, LNCS, Springer, pp: 151-160, 1990
5 October Phoneme recognition using time-delay neural networks, IEEE trans. Acoustics, Speech Signal Processing, Vol 37(3), March 1989 Waibel, Hanazawa, Hinton, Shikano, Lang Allard Dupuis
[slides]
7 October Neocognitron: A new algorithm for pattern recognition tolerant of deformations and shifts in position, Pattern Recognition Vol. 15(6), pp. 455-469, 1982 Kunihiko Fukushima and Sei Miyake Chenchen Zhu
[slides]
7 October An artificial neural network for spatio-temporal bipolar patterns: application to phoneme classification Toshiteru Homma Serim Park
[slides]
7 October Gradient based learning applied to document recognition, Proceedings of IEEE, Vol 86:11, Nov 1998, pp 2278-2324. Yann Lecun, Leon Boton, Yohsua Bengio, Patrick Haffner Lu Jiang
[slides]
12 Oct Supervised sequence labelling with recurrent neural networks, PhD dissertation, T. U. Munchen, 2008, Chapters 4 and 7 Alex Graves Kazuya
[slides]
12 Oct Bidirectional Recurrent Neural Networks Mike Schuster and Kuldip K. Paliwal Praveen Palanisamy
[slides]
12 Oct Long Short-Term Memory Sepp Hochreiter Jurgen Schmidhuber Zihang Dai
[slides]
14 Oct The Cascade-Correlation Learning Architecture Scott E. Fahlman and Christian Lebiere Scott E. Fahlman
[slides]
14 Oct The Recurrent Cascade-Correlation Architecture Scott E. Fahlman Scott E. Fahlman
19 Oct ImageNet Classification with Deep Convolutional Neural Networks Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton Guillaume Lample
[slides]
19 Oct Very Deep Convolutional Networks for Large-Scale Image Recognition Karen Simonyan, Andrew Zisserman Pradeep
[slides]
21 Oct Visualizing and Understanding Convolutional Networks Matthew D Zeiler, Rob Fergus Wanli
[slides]
21 Oct Dropout: A Simple Way to Prevent Neural Networks from Overfitting Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, Ruslan Salakhutdinov Xuanchong
21 Oct Maxout Network Ian J. Goodfellow, David Warde-Farley, Mehdi Mirza, Aaron Courville, Yoshua Bengio Sandeep