08:00:35	 Anon. Asynchronous Update:	were the slides posted to piazza
08:00:41	 bgaind@andrew.cmu.edu (TA):	Posting them now
08:00:51	 Anon. ICA:	Vanishing and Exploding Gradients
08:00:55	 Anon. Train:	Vanishing gradients
08:00:55	 Anon. Gh0stR1d3r:	Back propagation through time
08:00:56	 Anon. Refractory Period:	exploding grad
08:01:28	 Anon. Refractory Period:	because of its memory
08:02:41	 bgaind@andrew.cmu.edu (TA):	The slides for today and last class are on Piazza
08:02:47	 Anon. Asynchronous Update:	thanks!
08:03:57	 Anon. Neuron:	vanishing gradients
08:05:29	 Anon. is_leaf:	parameters
08:08:29	 Anon. Imagenet:	Use ReLU?
08:08:57	 Anon. CUDAError:	can we use skip connections between layers
08:11:15	 Anon. is_leaf:	so this is equivalent to all weights = 1, but cant activations still shrink values?
08:11:48	 Anon. DistilBERT:	I think we are assuming activation = Identity mat
08:12:25	 Anon. is_leaf:	thanks
08:12:32	 Anon. Imagenet:	Is this supposed to be a motivation for LSTM?
08:13:03	 bgaind@andrew.cmu.edu (TA):	Kind of, yes. LSTMS try to solve the vanishing gradients problem.
08:22:55	 Anon. Refractory Period:	how does the carousel decide what constitute a useful pattern to store?
08:23:09	 Anon. Refractory Period:	in the input
08:26:35	 Anon. Refractory Period:	Nevermind**
08:30:30	 Anon. Derivative:	Is the peephole connection kind  of like the skip layers?
08:31:01	 Anon. Imagenet:	Yes
08:31:55	 Anon. ICA:	What was the intuition for coming up with LSTMs? Like how convolutional networks came from studies on the visual cortex of the brain, how did the first LSTM come up?
08:33:48	 Anon. SVM:	So the difference between  Carousel and “Peephole” is whether using C or not?
08:34:05	 Anon. Derivative:	Thank you
08:34:48	 Anon. Supervised:	so in these diagrams, “number of layers” = 1 right? How does the picture change with more layers?
08:40:18	 Anon. Git:	should be a multi -choice
08:40:25	 Anon. Markov Chain:	couldn't only choose 1:-/
08:40:29	 Anon. Markov Chain:	could*
08:40:32	 Anon. Attractor:	I think so
08:41:05	 Anon. Activation Function:	cannot submit due to zoom issue...
08:41:27	 bgaind@andrew.cmu.edu (TA):	Don't worry. Its fine.
08:45:52	 Anon. Refractory Period:	are we limited to only sigmoid functions for the forget gates or can we use other activation functions?
08:48:22	 Anon. Seq2Seq:	quiz question spotted xD
08:49:04	 Anon. Leakage:	autograd?
08:49:06	 Anon. Git:	autograd
08:49:07	 Anon. Tensor:	autograd
08:49:09	 Anon. Derivative:	autograd
08:49:10	 Anon. C++:	^
08:49:41	 Anon. is_leaf:	we can define forward backward for cell
08:50:13	 Anon. is_leaf:	should be faster than using autograd
09:17:00	 Anon. Imagenet:	Could you explain again what we mean by time synchrony?
09:34:43	 Anon. is_leaf:	why must they be one hots instead of just a number for example?
09:35:53	 Anxiang Zhang (TA):	That would be problem.
09:36:20	 Anxiang Zhang (TA):	if you have {h:1, e:2, l:3}
09:36:34	 Anxiang Zhang (TA):	You are assuming e is more important than h
09:41:27	 Anon. Adam:	for interpretability?
09:55:40	 Anon. Derivative:	niceeee
09:55:54	 Anon. CTC:	Why do we randomly draw words here instead of picking the next word with the highest probability?
09:56:04	 Anon. Mask-RCNN:	thanks!
09:56:22	 Anon. Derivative:	Thank you!
09:56:36	 Anon. Git:	thanks!
10:02:09	 Anon. CasCor:	Thank you Professor!
10:02:21	 Anon. Directed Edge:	Thanks
10:02:33	 Anxiang Zhang (TA):	Thanks for understanding
10:02:43	 Anon. Matrix:	Thanks