00:19:56	Reshmi Ghosh (TA):	Post your brilliant questions here folks!:)
00:22:36	Anon. Retinal Ganglion:	:(
00:23:49	Reshmi Ghosh (TA):	Don’t worry this was just for fun
00:24:13	Reshmi Ghosh (TA):	But I would pay more attention to hidden slides for the upcoming quiz, that’s your hint
00:28:29	Anon. Gh0stR1d3r:	Did we start on slide 18?
00:28:39	Anon. Hessian:	Hi, I got a Failed to submit pool error
00:28:39	Reshmi Ghosh (TA):	There were hidden slides
00:29:35	Reshmi Ghosh (TA):	@pinxu:It might be the case that you answered right when the poll closed
00:29:44	Reshmi Ghosh (TA):	If this repeats let us know
00:30:27	Anon. Gh0stR1d3r:	Hmm hopefully finding the number of slides isn’t too hard; I’ve been there since 8
00:30:46	Anon. Sanger’s Rule:	Sorry I am confused with the last option:Don't we in practice run a batch through the network, and then compute and propagate the loss of that batch?
00:31:41	Anon. Decoder:	How do you make sure that the effect of the adjustment at one training point is local?
00:39:37	Anon. Depolarization:	Is it random without replacement?
00:39:47	Anon. Instance:	can't we divide the data into small groups?
00:40:20	bgaind@andrew.cmu.edu (TA):	@Jon:Yes. In a single epoch, each instance is used only once in SGD.
00:40:29	Anon. Depolarization:	I see that now on the slide, thanks
00:40:35	Reshmi Ghosh (TA):	Yeo without replacement
00:40:45	Anon. Retinal Ganglion:	When you pushing the left part downward, why the right part goes upward? I thought the right part would remain the same
00:41:03	bgaind@andrew.cmu.edu (TA):	@Vidhi:We do that in Batch Gradient descent, which is different from Gradient Descent (all examples) and SGD (one example).
00:42:21	bgaind@andrew.cmu.edu (TA):	@Daniel:I think that is just for illustration and not to taken too literally. The idea is to compare the relative effect on the handkerchief in case of using a single example vs all examples.
00:42:32	Anon. Mask-RCNN:	adv divergence over training set
00:42:41	Anon. Dropout (for NNs):	sum of divergence over the the whole set
00:44:28	Anon. Decoder:	average
00:46:10	Anon. Decoder:	Average of blue arrows
00:46:12	Anon. Dropout (for NNs):	average of all the blue arrows
00:49:01	Anon. Boltzmann:	Doesn’t that also depend on the function?
00:50:36	bgaind@andrew.cmu.edu (TA):	Can you tell what you mean by "that"?
00:51:04	Anon. ResNet18:	add a tolerance?
00:51:20	Anon. Deep Blue:	plateau on error
00:52:03	Reshmi Ghosh (TA):	@Anurag:What were you referring to when you said “that”? Sorry we missed it
00:54:10	Anon. Fast-RCNN:	Will applying Nesterov accelerated gradient make SGD harder to converge?
00:54:40	Anon. YOLO:	C
00:54:47	Anon. Git:	c
00:55:34	Anon. Residual:	eta less than 1?
00:58:57	Anon. Activation Function:	how could loss be infinite?
01:00:58	Anon. Egg Salad:	1/n
01:01:14	Anon. Linear Algebra:	harmonic
01:01:17	Anon. DFS:	1 + 1/2 + 1/3 + … vs 1 + 1/4 + 1/9 + ...
01:05:32	Anon. Decoder:	Why would SGD arrive at poorer minima?
01:06:06	bgaind@andrew.cmu.edu (TA):	Because the ideal (the most accurate) update would have been to use all the examples.
01:06:09	Anon. Instance:	what do you mean by online version?
01:06:48	bgaind@andrew.cmu.edu (TA):	Instead you just chose one out of thousands of training examples, because you were worried about speed. So, the minima might suffer.
01:07:10	Anon. Gh0stR1d3r:	@vidhi I think it means you update as you go?
01:08:23	Anon. Transformer:	Var (xi) /N
01:10:36	Anon. Decoder:	So even with sufficient number of epochs SGD would perform poorer than batch updates?
01:13:21	Anon. Baseline:	no
01:13:30	Anon. Git:	no
01:13:53	Anon. Supervised:	\
01:15:21	Anon. Mask-RCNN:	get more samples
01:17:57	Anon. Retinal Ganglion:	mini batch:)
01:21:07	Anon. Egg Salad:	Factor of b instead of n
01:24:09	Anon. Decoder:	What happens in the long run?
01:24:51	Anon. Decoder:	Would SGD loss gradually drops to the same level as the other two updates?
01:25:11	Anon. Spiking NN:	Is there a way to determine b
01:28:37	Reshmi Ghosh (TA):	Poll folks
01:30:03	Anon. Hessian:	Hi, I met a Failed to submit poll, error 5003
01:30:14	Reshmi Ghosh (TA):	Oh dear lord
01:30:24	Reshmi Ghosh (TA):	I will see what is the issue after the lecture
01:30:32	Reshmi Ghosh (TA):	Thank you for letting us know
01:30:35	Anon. Hessian:	Thank you!
01:30:49	bgaind@andrew.cmu.edu (TA):	Maybe you submitted it right when we closed it.
01:31:03	Reshmi Ghosh (TA):	Ya. I really think that is the issue
01:31:16	Reshmi Ghosh (TA):	We usually keep polls open for ~50-60 seconds
01:31:33	Anon. Hessian:	I submit quit early though
01:31:38	Reshmi Ghosh (TA):	Oh!
01:31:39	Anon. Actor-Critic:	I encountered the same issue.
01:32:15	Anon. Sanger’s Rule:	it might be a random server issue for zoom 5003 error? https://fixingport.com/fix-zoom-error-code-5003
01:32:16	Reshmi Ghosh (TA):	@haoxuanz we will look into it
01:32:25	Anon. Actor-Critic:	Thanks！
01:35:14	Anon. ASGD:	no
01:35:16	Anon. Egg Salad:	no but the second has larger variance
01:35:23	Anon. Dropout (for NNs):	no
01:35:47	Anon. Baseline:	no
01:46:46	Anon. Egg Salad:	Why is adadelta so good
01:47:04	Anon. ResNet18:	^
01:47:51	bgaind@andrew.cmu.edu (TA):	Please ask all remaining questions on Piazza
01:47:53	Reshmi Ghosh (TA):	I am gonna post this question on Piazza, as we have already exceeded 9:20.
01:48:40	Reshmi Ghosh (TA):	Yep go ahead and post, I have started the thread already.