11-785 Introduction to Deep Learning
Fall 2017

“Deep Learning” systems, typified by deep neural networks, are increasingly taking over all AI tasks, ranging from language understanding, and speech and image recognition, to machine translation, planning, and even game playing and autonomous driving. As a result, expertise in deep learning is fast changing from an esoteric desirable to a mandatory prerequ isite in many advanced academic settings, and a large advantage in the industrial job market.

In this course we will learn about the basics of deep neural networks, and their applications to various AI tasks. By the end of the course, it is expected that students will have significant familiarity with the subject, and to be able to apply to them to a variety of tasks. They will also be positioned to understand much of the current literature on the topic and extend their knowledge through further study.

Instructor: Bhiksha Raj


Time: Mondays, Thursdays, 9.00am-10.20am

Office hours:


  1. We will be using one of several toolkits. The toolkits are largely programmed in Python or Lua. You will need to be able to program in at least one of these languages. Alternately, you will be responsible for finding and learning a toolkit that requires programming in a language you are comfortable with,
  2. You will need familiarity with basic calculus (differentiation, chain rule), linear algebra and basic probability.


This course is worth 12 units.

Course Work


Grading will be based on weekly quizzes, homework assignments and a final project. There will be six assignments in all.

Quizzes 12 or 13, total contribution to grade 25%
Assignments 6, total contribution to grade 50%
Project Total contribution to grade: 25%


Deep learning is a relatively new, fast developing topic, and there are no standard textbooks on the subject that cover the state-of-art, although there are several excellent tutorial books that one can refer to. The topics in this course are collected from a variety of sources, including recent papers. As a result, we do not specify a single standard textbook. However, we list a number of useful books at the end of this page, which we greatly encourage students to read, as they will provide much of the background for the course. We will also put up links to relevant reading material for each class. Students are expected to familiarize themselves with the material before the class. The readings will sometimes be arcane and difficult to understand; if so, do not worry, we will present simpler explanations in class.

Discussion board: Piazza

We will use Piazza for discussions. Here is the link. Please sign up.

Academic Integrity

You are expected to comply with the University Policy on Academic Integrity and Plagiarism.
  • You are allowed to talk with/work with other students on homework assignments
  • You can share ideas but not code, you should submit your own code
Your course instructor reserves the right to determine an appropriate penalty based on the violation of academic dishonesty that occurs. Violations of the university policy can result in severe penalties including failing this course and possible expulsion from Carnegie Mellon University. If you have any questions about this policy and any work you are doing in the course, please feel free to contact your instructor for help.

Tentative Schedule

Lecture Start date Topics Lecture notes/Slides Additional readings, if any Quizzes/Assignments
1 August 28
  • Introduction to deep learning
  • Course logistics
  • The perceptron/multli-layer perceptron
  • Hebbian learning
2 August 30
  • The neural net as a universal approximator
3 September 6
  • Training a neural network
  • Perceptron learning rule
  • Empirical Risk Minimization
  • Optimization by gradient descent
4 September 11
  • Back propagation
  • Calculus of back propogation
5 September 13
  • Convergence in neural networks
  • Rates of convergence
  • Loss surfaces
  • Learning rate and data normalization
  • RMSProp, Adagrad, Momentum
slides Assignment 1
6 September 18
  • Stochastic gradient descent
  • Acceleration
  • Overfitting and regularization
  • Tricks of the trade:
    • Choosing a divergence (loss) function
    • Batch normalization
    • Dropout
7 September 20
  • Convolutional Neural Networks (CNNs)
  • Weights as templates
  • Translation invariance
  • Training with shared parameters
  • Arriving at the convlutional model
Goodfellow Chapter 9
8 September 25
  • TBD (Mike Tarr)
9 September 27
  • Cascade Correlation (Scott Fahlman)
10 October 2
  • Models of vision
  • Neocognitron
  • Mathematical details of CNNs
  • Alexnet, Inception, VGG
11 October 4
  • Recurrent Neural Networks (RNNs)
  • Modeling series
  • Back propogation through time
  • Bidirectional RNNs
Goodfellow Chapter 10
12 October 9
  • More about recurrence
  • Exploding/vanishing gradients
  • Long Short-Term Memory Units (LSTMs)
  • Examples
13 October 11
  • What do network's represent
  • Autoencoders and dimensionality reduction
  • Representation learning
14 October 16
  • Hopfield Networks
  • Energy functions
  • Boltzman Machines
15 October 18
  • Restricted Boltzman Machines
  • Deep Boltzman Machines
16 October 23
  • Variational Autoencoders (VAEs)
  • Factor Analysis
  • Expectation Maximization and Variational Inference
17 October 25
  • Generative Adversarial Networks (GANs)
18 October 30
  • Sequence-to-sequence modeling
19 November 1
  • TBD (Pulkit Agarwal)
20 November 6
  • Deepnets for Speech recognition (Rich Stern)
21 November 8
  • Transfer learning, multi-task learning
22 November 13
  • TBD (Graham Neubig)
23 November 15
  • Reinforcement Learning (part 1)
24 November 20
  • Reinforcement Learning (part 2)
25 November 27
  • Reinforcement Learning (part 3)
26 November 29
  • Reinforcement Learning (part 4)
27 December 4
  • Newer models and trends. Memory, tape, and Turing machines
28 December 6
  • Review

Documentation and Tools


ebook image
Deep Learning By Ian Goodfellow, Yoshua Bengio, Aaron Courville Online book, 2017
ebook image
Neural Networks and Deep Learning By Michael Nielsen Online book, 2016
Deep Learning with Python
Deep Learning with Python By J. Brownlee
ebook image
Parallel Distributed Processing By Rumelhart and McClelland Out of print, 1986

A nice catalog of the various types of neural network models that are current in the literature can be found here. We expect that you will be in a position to interpret, if not fully understand many of these architectures by the end of the course.