00:13:06 Anon. EC2: yash nishant 00:13:25 Anon. EC2: sorry about that:p 00:15:51 Reshmi Ghosh (TA): Oh no! Caught!! 00:21:46 Anon. Cable Theory: yes 00:21:49 Anon. ReLU: yes 00:22:14 Anon. Layer: y 00:22:16 Anon. ReLU: yi? 00:24:47 Anon. Sparse Coding: No 00:28:35 Reshmi Ghosh (TA): 10 seconds 00:33:21 Anon. Tensor: Why do we need to shift the mean back away from zero? 00:39:10 Reshmi Ghosh (TA): 10 seconds 00:50:51 Anon. Sequence: yes 00:53:28 Anon. Sequence: 3? 00:53:49 Anon. Sequence: direct/through mean/through variance 00:57:09 Anon. PaddedSequence: 1 00:58:04 Anon. TensorFlow: only to eliminate ??. Didn't catch that 00:58:10 Anon. TensorFlow: For epsilon* 00:58:17 Anon. ASGD: underflow 00:58:27 Anon. TensorFlow: Thanks Pranav 00:59:03 Reshmi Ghosh (TA): 10 seconds 01:00:16 Anon. Sequence: 1 01:00:23 Anon. PaddedSequence: 1/b 01:02:59 Anon. PaddedSequence: 2 01:04:34 Anon. Sequence: 2zi * 1/b 01:05:24 Anon. Sequence: B times it 01:05:59 Anon. Args: 0 01:10:05 Anon. Cable Theory: 0 01:10:09 Anon. Kernel: 0 01:12:12 Anon. AlphaGo: So this will cause gradient vanish right? 01:12:58 Reshmi Ghosh (TA): What is “this”? 01:13:15 Anon. AlphaGo: If the variation in the mini batch is very small 01:14:27 Reshmi Ghosh (TA): 10 seconds 01:21:12 Anon. ASGD: Is there a difference in performance if we do batch norm->activation vs activation->batch norm? 01:21:35 Anon. Git: Yes, 01:22:20 Reshmi Ghosh (TA): Batchnorm appears after activation 01:22:26 Anon. Git: Original paper used batch-norm->activation but a later study found the latter results in better performance 01:22:41 Reshmi Ghosh (TA): ^ 01:22:45 Anon. ASGD: Thanks! 01:31:25 Anon. Nonlinear Transform: So deeper models tend to be smoother, and wider models tend to be not as smooth? 01:32:25 Anon. YOLOv5: ^ does it mean deeper models will not easily overfit? But I guess in reality it’s the opposite right? 01:32:40 Anon. Nonlinear Transform: Yea I also want to clarify that ^ 01:33:07 Anon. Git: Remember we are talking about a deep model and a shallow layer with the same number of parameters 01:33:32 Anon. YOLOv5: ! Yeah that make sense 01:33:34 Anon. YOLOv5: Thanks! 01:33:37 Anon. Nonlinear Transform: Cool! 01:33:51 Anon. Nonlinear Transform: Why is that the deeper model tends to be smoother? 01:34:01 Anon. NLLLoss: what are the topics for the extra lecture? 01:34:01 Anon. Nonlinear Transform: Compared to say a wider layer