11-785 Introduction to Deep Learning
Spring 2018

If you're looking for the webpage for fall 2018 click here

“Deep Learning” systems, typified by deep neural networks, are increasingly taking over all AI tasks, ranging from language understanding, and speech and image recognition, to machine translation, planning, and even game playing and autonomous driving. As a result, expertise in deep learning is fast changing from an esoteric desirable to a mandatory prerequisite in many advanced academic settings, and a large advantage in the industrial job market.

In this course we will learn about the basics of deep neural networks, and their applications to various AI tasks. By the end of the course, it is expected that students will have significant familiarity with the subject, and to be able to apply to them to a variety of tasks. They will also be positioned to understand much of the current literature on the topic and extend their knowledge through further study.

Instructor: Bhiksha Raj


Lecture: Monday and Wednesday, 9.00am-10.20am

Location: Porter Hall 125C

Recitation: Friday, 9.00am-10.20am, Location: GHC 4307

Office hours:


  1. We will be using one of several toolkits (the primary toolkit for recitations/intruction is PyTorch). The toolkits are largely programmed in Python. You will need to be able to program in at least one of these languages. Alternately, you will be responsible for finding and learning a toolkit that requires programming in a language you are comfortable with,
  2. You will need familiarity with basic calculus (differentiation, chain rule), linear algebra and basic probability.


This course is worth 12 units.

Course Work


Grading will be based on weekly quizzes, homework assignments and a final project.

There will be five assignments in all. Note that assignments 4 and 5 are released simultaneously. They will also be due on the same date.

Quizzes 13 quizzes (bottom 3 quiz scores will be dropped), total contribution to grade 25%
Assignments 5 assignments, total contribution to grade 50%
Project 1 project, total contribution to grade 25%


The course will not follow a specific book, but will draw from a number of sources. We list relevant books at the end of this page. We will also put up links to relevant reading material for each class. Students are expected to familiarize themselves with the material before the class. The readings will sometimes be arcane and difficult to understand; if so, do not worry, we will present simpler explanations in class.

Discussion board: Piazza

We will use Piazza for discussions. Here is the link. Please sign up.

Wiki page

We have created an experimental wiki explaining the types of neural networks in use today. Here is the link.

You can also find a nice catalog of models that are current in the literature here. We expect that you will be in a position to interpret, if not fully understand many of the architectures on the wiki and the catalog by the end of the course.


Kaggle is a popular data science platform where visitors compete to produce the best model for learning or analyzing a data set.

For assignments 4 and 5 you will be submitting your evaluation results to a Kaggle leaderboard.

Academic Integrity

You are expected to comply with the University Policy on Academic Integrity and Plagiarism.
  • You are allowed to talk with / work with other students on homework assignments
  • You can share ideas but not code, you should submit your own code
Your course instructor reserves the right to determine an appropriate penalty based on the violation of academic dishonesty that occurs. Violations of the university policy can result in severe penalties including failing this course and possible expulsion from Carnegie Mellon University. If you have any questions about this policy and any work you are doing in the course, please feel free to contact your instructor for help.

Tentative Schedule

Lecture Start date Topics Lecture notes/Slides Additional readings, if any Quizzes/Assignments
1 January 17
  • Introduction to deep learning
  • Course logistics
  • History and cognitive basis of neural computation.
  • The perceptron / multi-layer perceptron
Quiz 1 (Due 20th)
2 January 22
  • The neural net as a universal approximator
3 January 24
  • Training a neural network
  • Perceptron learning rule
  • Empirical Risk Minimization
  • Optimization by gradient descent
Assignment 1
Quiz 2 (Due 27th)
4 January 29
  • Back propagation
  • Calculus of back propogation
See note about video to the right.
  • Rumelhart, Hinton and Williams, 1986
  • Note: We re-recorded the introduction for the online section after the lecture. To see the slides in order, watch the last few minutes of the video and then watch the video from the beginning. We will edit the video and upload soon.
5 January 31
  • Convergence in neural networks
  • Rates of convergence
  • Loss surfaces
  • Learning rates, and optimization methods
  • RMSProp, Adagrad, Momentum
Quiz 3 (Due Feb 3rd)
6 February 5
  • Stochastic gradient descent
  • Acceleration
  • Overfitting and regularization
  • Tricks of the trade:
    • Choosing a divergence (loss) function
    • Batch normalization
    • Dropout
7 February 7 Guest Lecture (Scott Fahlman) slides Quiz 4 (Due 11th)
8 February 12
  • Optimization continued
Assignment 1 due
Assignment 2
9 February 14
  • Convolutional Neural Networks (CNNs)
  • Weights as templates
  • Translation invariance
  • Training with shared parameters
  • Arriving at the convlutional model
Quiz 5 (Due 17th)
10 February 19
  • Models of vision
  • Neocognitron
  • Mathematical details of CNNs
  • Alexnet, Inception, VGG
11 February 21
  • Recurrent Neural Networks (RNNs)
  • Modeling series
  • Back propogation through time
  • Bidirectional RNNs
Quiz 6 (Due 24th)
12 February 26
  • Stability
  • Exploding/vanishing gradients
  • Long Short-Term Memory Units (LSTMs) and variants
  • Resnets
13 February 28
  • Loss functions for recurrent networks
  • Sequence prediction
Assignment 2 due
Assignment 3
Quiz 7 (Due 3rd)
14 March 5
  • Sequence To Sequence Methods
  • Connectionist Temporal Classification (CTC)
15 March 7
  • What to networks represent
  • Autoencoders and dimensionality reduction
  • Learning representations
Quiz 8 (Due 10th)
-- March 12 Spring break
-- March 14 Spring break
-- Extra Lecture 14 pt2 March 17
(during spring break)
Sequence-to-sequence models, Attention models, examples from speech and language slides
16 March 19 Variational Autoencoders (VAEs) slides
17 March 21
  • Generative Adversarial Networks (GANs) Part 1
Assignment 3 due
Quiz 9 (Due 24th)
Assignments 4 and 5
18 March 26
  • Generative Adversarial Networks (GANs) Part 2
19 March 28
  • Guest Lecture: Gerald Friedland
Quiz 10 (Due 31st)
20 April 2
  • Hopfield Networks
  • Energy functions
21 April 4
  • Training Hopfield Networks
  • Stochastic Hopfield Networks
Quiz 11 (Due 7th)
22 April 9
  • Restricted Boltzman Machines
  • Deep Boltzman Machines
23 April 11
  • Reinforcement Learning 1
video Quiz 12 (Due 14th)
24 April 16
  • Reinforcement Learning 2
25 April 18
  • Reinforcement Learning 3
Quiz 13 (Due 21st)
26 April 23 Guest lecture (Graham Neubig) Assignments 4 and 5 due
27 April 25 Guest Lecture (Byron Yu)
28 April 30
  • Q Learning
  • Deep Q Learning
29 May 2
  • Newer models and trends
  • Review

Tentative Schedule of Recitations (note: dates may shift)

Recitation Start date Topics Lecture notes/Slides
1 January 19 Amazon Web Services (AWS) slides
2 January 26 Practical Deep Learning in Python slides
3 February 2 Optimization methods slides
4 February 9 Convolutional Networks slides
5 February 16 Basics of Recurrent networks slides
6 February 23 Recurrent networks 2: Loss functions, CTC slides
7 March 2 Visualization: What does the network learn slides
--Extra Lecture 14pt2-- March 17
(during spring break)
Sequence-to-sequence models, Attention models, examples from speech and language slides
8 March 23 Attention slides
9 March 30 Variational autoencoders slides
10 April 6 GANs video
11 April 13 Embeddings & HW Baselines video
12 April 20 Hopfield Nets, Boltzmann machines, RBMs
13 April 27 Reinforcement Learning: Deep Q nets, policy gradient methods

Some ideas for projects

Documentation and Tools


Deep Learning
Deep Learning By Ian Goodfellow, Yoshua Bengio, Aaron Courville Online book, 2017
Neural Networks and Deep Learning
Neural Networks and Deep Learning By Michael Nielsen Online book, 2016
Deep Learning with Python
Deep Learning with Python By J. Brownlee
Parallel Distributed Processing
Parallel Distributed Processing By Rumelhart and McClelland Out of print, 1986