11-785 Introduction to Deep Learning
Spring 2018

“Deep Learning” systems, typified by deep neural networks, are increasingly taking over all AI tasks, ranging from language understanding, and speech and image recognition, to machine translation, planning, and even game playing and autonomous driving. As a result, expertise in deep learning is fast changing from an esoteric desirable to a mandatory prerequisite in many advanced academic settings, and a large advantage in the industrial job market.

In this course we will learn about the basics of deep neural networks, and their applications to various AI tasks. By the end of the course, it is expected that students will have significant familiarity with the subject, and to be able to apply to them to a variety of tasks. They will also be positioned to understand much of the current literature on the topic and extend their knowledge through further study.

Instructor: Bhiksha Raj


Lecture: Monday and Wednesday, 9.00am-10.20am

Recitation: Friday, 9.00am-10.20am, Newell Simon 3002

Office hours:


  1. We will be using one of several toolkits (typically TensorFlow or PyTorch). The toolkits are largely programmed in Python. You will need to be able to program in at least one of these languages. Alternately, you will be responsible for finding and learning a toolkit that requires programming in a language you are comfortable with,
  2. You will need familiarity with basic calculus (differentiation, chain rule), linear algebra and basic probability.


This course is worth 12 units.

Course Work


Grading will be based on weekly quizzes, homework assignments and a final project.

There will be five assignments in all. Note that assignments 4 and 5 are released simultaneously. They will also be due on the same date.

Quizzes 13 quizzes (bottom 3 quiz scores will be dropped), total contribution to grade 25%
Assignments 5 assignments, total contribution to grade 50%
Project 1 project, total contribution to grade 25%


The course will not follow a specific book, but will draw from a number of sources. We list relevant books at the end of this page. We will also put up links to relevant reading material for each class. Students are expected to familiarize themselves with the material before the class. The readings will sometimes be arcane and difficult to understand; if so, do not worry, we will present simpler explanations in class.

Discussion board: Piazza

We will use Piazza for discussions. Here is the link. Please sign up.

Wiki page

We have created an experimental wiki explaining the types of neural networks in use today. Here is the link.

You can also find a nice catalog of models that are current in the literature here. We expect that you will be in a position to interpret, if not fully understand many of the architectures on the wiki and the catalog by the end of the course.


Kaggle is a popular data science platform where visitors compete to produce the best model for learning or analyzing a data set.

For assignments 4 and 5 you will be submitting your evaluation results to a Kaggle leaderboard.

Academic Integrity

You are expected to comply with the University Policy on Academic Integrity and Plagiarism.
  • You are allowed to talk with / work with other students on homework assignments
  • You can share ideas but not code, you should submit your own code
Your course instructor reserves the right to determine an appropriate penalty based on the violation of academic dishonesty that occurs. Violations of the university policy can result in severe penalties including failing this course and possible expulsion from Carnegie Mellon University. If you have any questions about this policy and any work you are doing in the course, please feel free to contact your instructor for help.

Tentative Schedule

Lecture Start date Topics Lecture notes/Slides Additional readings, if any Quizzes/Assignments
1 January 17
  • Introduction to deep learning
  • Course logistics
  • History and cognitive basis of neural computation.
  • The perceptron / multi-layer perceptron
slides Quiz 1 (Due 20th)
2 January 22
  • The neural net as a universal approximator
3 January 24
  • Training a neural network
  • Perceptron learning rule
  • Empirical Risk Minimization
  • Optimization by gradient descent
slides Assignment 1
Quiz 2 (Due 27th)
4 January 29
  • Back propagation
  • Calculus of back propogation
5 January 31
  • Convergence in neural networks
  • Rates of convergence
  • Loss surfaces
  • Learning rates, and optimization methods
  • RMSProp, Adagrad, Momentum
slides Quiz 3 (Due Feb 3rd)
6 February 5
  • Stochastic gradient descent
  • Acceleration
  • Overfitting and regularization
  • Tricks of the trade:
    • Choosing a divergence (loss) function
    • Batch normalization
    • Dropout
7 February 7 Guest Lecture (Scott Fahlman) Quiz 4 (Due 10th)
8 February 12
  • Convolutional Neural Networks (CNNs)
  • Weights as templates
  • Translation invariance
  • Training with shared parameters
  • Arriving at the convlutional model
slides Assignment 2
9 February 14
  • Models of vision
  • Neocognitron
  • Mathematical details of CNNs
  • Alexnet, Inception, VGG
slides Quiz 5 (Due 17th)
10 February 19
  • Recurrent Neural Networks (RNNs)
  • Modeling series
  • Back propogation through time
  • Bidirectional RNNs
11 February 21
  • Stability
  • Exploding/vanishing gradients
  • Long Short-Term Memory Units (LSTMs) and variants
  • Resnets
slides Quiz 6 (Due 24th)
12 February 26
  • Loss functions for recurrent networks
  • Connectionist Temporal Classification (CTC)
  • Sequence prediction
13 February 28
  • What do networks represent
  • Autoencoders and dimensionality reduction
  • Representation learning
slides Assignment 3
Quiz 7 (Due 3rd)
14 March 5
  • Variational Autoencoders (VAEs) Part 1
  • Factor Analysis
  • Expectation Maximization and Variational Inference
15 March 7
  • Variational Autoencoders (VAEs) Part 2
slides Quiz 8 (Due 10th)
16 March 12 Spring break
17 March 14 Spring break
18 March 19 NNets in Speech Recognition, Guest Lecture (Stern)
19 March 21
  • Generative Adversarial Networks (GANs) Part 1
slides Quiz 9 (Due 24th)
Assignments 4 and 5
20 March 26
  • Generative Adversarial Networks (GANs) Part 2
21 March 28
  • Hopfield Networks
  • Energy functions
slides Quiz 10 (Due 31st)
22 April 2
  • Boltzmann Machines
  • Learning in Boltzmann machines
23 April 4
  • Restricted Boltzman Machines
  • Deep Boltzman Machines
slides Quiz 11 (Due 7th)
24 April 9 Guest lecture (Graham Neubig)
25 April 11
  • Reinforcement Learning 1
Quiz 12 (Due 14th)
26 April 16
  • Reinforcement Learning 2
27 April 18
  • Reinforcement Learning 3
Quiz 13 (Due 21st)
28 April 23
  • Q Learning
  • Deep Q Learning
29 April 25 Guest Lecture (TBD)
30 April 30
  • Multi-task and multi-label learning, transfer learning with NNets
31 May 2
  • Newer models and trends
  • Review

Tentative Schedule of Recitations

Recitation Start date Topics
1 January 19 Amazon Web Services (AWS)
2 January 26 Practical Deep Learning in Python
3 February 2 Optimization methods
4 February 9 Tuning methods
5 February 16 TBD
6 February 23 TBD
7 March 2 RNN's and LSTM's
8 March 9 TBD
9 March 16 TBD
10 March 23 Practical implementation of VAEs
11 March 30 Practical implementation of GANs
12 April 6 Practice with BMs and RBMs
13 April 13 TBD
14 April 20 TBD
15 April 27 TBD

Documentation and Tools


Deep Learning
Deep Learning By Ian Goodfellow, Yoshua Bengio, Aaron Courville Online book, 2017
Neural Networks and Deep Learning
Neural Networks and Deep Learning By Michael Nielsen Online book, 2016
Deep Learning with Python
Deep Learning with Python By J. Brownlee
Parallel Distributed Processing
Parallel Distributed Processing By Rumelhart and McClelland Out of print, 1986