11-785 Introduction to Deep Learning
Spring 2018

“Deep Learning” systems, typified by deep neural networks, are increasingly taking over all AI tasks, ranging from language understanding, and speech and image recognition, to machine translation, planning, and even game playing and autonomous driving. As a result, expertise in deep learning is fast changing from an esoteric desirable to a mandatory prerequisite in many advanced academic settings, and a large advantage in the industrial job market.

In this course we will learn about the basics of deep neural networks, and their applications to various AI tasks. By the end of the course, it is expected that students will have significant familiarity with the subject, and to be able to apply to them to a variety of tasks. They will also be positioned to understand much of the current literature on the topic and extend their knowledge through further study.

Instructor: Bhiksha Raj


Lecture: Monday and Wednesday, 9.00am-10.20am

Location: Porter Hall 125C

Recitation: Friday, 9.00am-10.20am, Location: GHC 4307

Office hours:


  1. We will be using one of several toolkits (the primary toolkit for recitations/intruction is PyTorch). The toolkits are largely programmed in Python. You will need to be able to program in at least one of these languages. Alternately, you will be responsible for finding and learning a toolkit that requires programming in a language you are comfortable with,
  2. You will need familiarity with basic calculus (differentiation, chain rule), linear algebra and basic probability.


This course is worth 12 units.

Course Work


Grading will be based on weekly quizzes, homework assignments and a final project.

There will be five assignments in all. Note that assignments 4 and 5 are released simultaneously. They will also be due on the same date.

Quizzes 13 quizzes (bottom 3 quiz scores will be dropped), total contribution to grade 25%
Assignments 5 assignments, total contribution to grade 50%
Project 1 project, total contribution to grade 25%


The course will not follow a specific book, but will draw from a number of sources. We list relevant books at the end of this page. We will also put up links to relevant reading material for each class. Students are expected to familiarize themselves with the material before the class. The readings will sometimes be arcane and difficult to understand; if so, do not worry, we will present simpler explanations in class.

Discussion board: Piazza

We will use Piazza for discussions. Here is the link. Please sign up.

Wiki page

We have created an experimental wiki explaining the types of neural networks in use today. Here is the link.

You can also find a nice catalog of models that are current in the literature here. We expect that you will be in a position to interpret, if not fully understand many of the architectures on the wiki and the catalog by the end of the course.


Kaggle is a popular data science platform where visitors compete to produce the best model for learning or analyzing a data set.

For assignments 4 and 5 you will be submitting your evaluation results to a Kaggle leaderboard.

Academic Integrity

You are expected to comply with the University Policy on Academic Integrity and Plagiarism.
  • You are allowed to talk with / work with other students on homework assignments
  • You can share ideas but not code, you should submit your own code
Your course instructor reserves the right to determine an appropriate penalty based on the violation of academic dishonesty that occurs. Violations of the university policy can result in severe penalties including failing this course and possible expulsion from Carnegie Mellon University. If you have any questions about this policy and any work you are doing in the course, please feel free to contact your instructor for help.

Tentative Schedule

Lecture Start date Topics Lecture notes/Slides Additional readings, if any Quizzes/Assignments
1 January 17
  • Introduction to deep learning
  • Course logistics
  • History and cognitive basis of neural computation.
  • The perceptron / multi-layer perceptron
Quiz 1 (Due 20th)
2 January 22
  • The neural net as a universal approximator
3 January 24
  • Training a neural network
  • Perceptron learning rule
  • Empirical Risk Minimization
  • Optimization by gradient descent
Assignment 1
Quiz 2 (Due 27th)
4 January 29
  • Back propagation
  • Calculus of back propogation
See note about video to the right.
  • Rumelhart, Hinton and Williams, 1986
  • Note: We re-recorded the introduction for the online section after the lecture. To see the slides in order, watch the last few minutes of the video and then watch the video from the beginning. We will edit the video and upload soon.
5 January 31
  • Convergence in neural networks
  • Rates of convergence
  • Loss surfaces
  • Learning rates, and optimization methods
  • RMSProp, Adagrad, Momentum
video (except final minutes)
video (final minutes)
Quiz 3 (Due Feb 3rd)
6 February 5
  • Stochastic gradient descent
  • Acceleration
  • Overfitting and regularization
  • Tricks of the trade:
    • Choosing a divergence (loss) function
    • Batch normalization
    • Dropout
Video (first 5 mins)
Video (rest)
7 February 7 Guest Lecture (Scott Fahlman) slides Quiz 4 (Due 11th)
8 February 12
  • Optimization continued
Assignment 1 due
Assignment 2
9 February 14
  • Convolutional Neural Networks (CNNs)
  • Weights as templates
  • Translation invariance
  • Training with shared parameters
  • Arriving at the convlutional model
Quiz 5 (Due 17th)
10 February 19
  • Models of vision
  • Neocognitron
  • Mathematical details of CNNs
  • Alexnet, Inception, VGG
11 February 21
  • Recurrent Neural Networks (RNNs)
  • Modeling series
  • Back propogation through time
  • Bidirectional RNNs
slides Quiz 6 (Due 24th)
12 February 26
  • Stability
  • Exploding/vanishing gradients
  • Long Short-Term Memory Units (LSTMs) and variants
  • Resnets
13 February 28
  • Loss functions for recurrent networks
  • Connectionist Temporal Classification (CTC)
  • Sequence prediction
slides Assignment 2 due
Assignment 3
Quiz 7 (Due 3rd)
14 March 5
  • What do networks represent
  • Autoencoders and dimensionality reduction
  • Representation learning
15 March 7
  • Variational Autoencoders (VAEs) Part 1
  • Factor Analysis
  • Expectation Maximization and Variational Inference
slides Quiz 8 (Due 10th)
16 March 12 Spring break
17 March 14 Spring break
18 March 19 Variational Autoencoders (VAEs) Part2
NNets in Speech Recognition, Guest Lecture (Stern)
19 March 21
  • Generative Adversarial Networks (GANs) Part 1
slides Assignment 3 due
Quiz 9 (Due 24th)
Assignments 4 and 5
20 March 26
  • Generative Adversarial Networks (GANs) Part 2
21 March 28
  • Hopfield Networks
  • Energy functions
slides Quiz 10 (Due 31st)
22 April 2
  • Boltzmann Machines
  • Learning in Boltzmann machines
23 April 4
  • Restricted Boltzman Machines
  • Deep Boltzman Machines
slides Quiz 11 (Due 7th)
24 April 9
  • Reinforcement Learning 1
25 April 11
  • Reinforcement Learning 2
Quiz 12 (Due 14th)
26 April 16
  • Reinforcement Learning 3
27 April 18
  • Q Learning
  • Deep Q Learning
Quiz 13 (Due 21st)
28 April 23 Guest lecture (Graham Neubig) Assignments 4 and 5 due
29 April 25 Guest Lecture (Byron Yu)
30 April 30
  • Multi-task and multi-label learning, transfer learning with NNets
31 May 2
  • Newer models and trends
  • Review

Tentative Schedule of Recitations (note: dates may shift)

Recitation Start date Topics Lecture notes/Slides
1 January 19 Amazon Web Services (AWS) slides
2 January 26 Practical Deep Learning in Python slides
3 February 2 Optimization methods slides
4 February 9 Convolutional Networks slides
5 February 16 Basics of Recurrent networks slides
6 February 23 Recurrent networks 2: Loss functions, CTC
7 March 2 Visualization: What does the network learn
8 March 9 Sequence-to-sequence models, Attention models, examples from speech and language
Week of 12th: Spring break
9 March 23 Variational autoencoders
10 March 30 Generative Adversarial Networks
11 April 6 Embeddings
12 April 13 Hopfield Nets, Boltzmann machines, RBMs
13 April 20 Reinforcement Learning: Deep Q nets, policy gradient methods
14 April 27 Wrap up

Some ideas for projects

Documentation and Tools


Deep Learning
Deep Learning By Ian Goodfellow, Yoshua Bengio, Aaron Courville Online book, 2017
Neural Networks and Deep Learning
Neural Networks and Deep Learning By Michael Nielsen Online book, 2016
Deep Learning with Python
Deep Learning with Python By J. Brownlee
Parallel Distributed Processing
Parallel Distributed Processing By Rumelhart and McClelland Out of print, 1986