00:16:23	Reshmi Ghosh (TA):	Side note:Just a reminder if anyone hasn’t done the hw2p2 sample submission on Kaggle yet, please do it soon as the early deadline is tonight 11:59pm
00:22:08	Anon. Softmax:	So the “stride” refers to the step size of shifting the filter/mask?
00:22:21	Reshmi Ghosh (TA):	yes
00:23:00	Anon. Softmax:	And the value of stride directly determines the shape of convolved features?
00:23:02	Reshmi Ghosh (TA):	By the way just a quick terminology it is also called as kernel
00:23:27	Anon. Actor-Critic:	What is kernel?
00:23:38	Reshmi Ghosh (TA):	The filter itself
00:23:38	Anon. MiniMax:	The filter
00:23:44	Anon. Actor-Critic:	Thx!
00:23:51	Reshmi Ghosh (TA):	Yiwei yes, the stride impacts output size
00:24:30	Reshmi Ghosh (TA):	There is an output size formula that should be covered / has been covered. The output size is based on the width/heigh of the image, the padding size, and the stride
00:24:34	Anon. YOLOv2:	seems I have mic issues today. I wanted to find out the advantages of striding at a width of 2+. Besides reducing the number of neurons per layer as we increase in depth, is there any other advantage?
00:24:44	Reshmi Ghosh (TA):	I will ask it for yoy
00:24:48	Reshmi Ghosh (TA):	Don’t worry
00:24:54	Anon. YOLOv2:	Thank you Reshmi
00:25:07	Reshmi Ghosh (TA):	But an high level overview, it depends on the task
00:25:14	Reshmi Ghosh (TA):	The stride is an hyperparameter
00:25:49	Anon. Softmax:	@Reshmi Thank you. What is the padding size?
00:27:16	Anon. YOLOv2:	Got it.
00:27:22	Reshmi Ghosh (TA):	Padding is included when the filter needs to move by a stride > 1 and the input dimensions are limited, because of which the filter will move beyond the input size
00:27:23	Anon. Neurotransmitter:	why the index start from 1 instead of 0?
00:27:51	Anon. Connectionist:	just convention, you can think it as 0
00:28:02	Reshmi Ghosh (TA):	Yes Eason is right
00:28:05	Anon. Neurotransmitter:	thanks
00:28:38	Anon. GRU:	so each layer will only output 1 map?
00:28:56	Reshmi Ghosh (TA):	Depends on channels
00:29:04	Reshmi Ghosh (TA):	Input channels**
00:29:31	Reshmi Ghosh (TA):	And also the kernel channels
00:29:43	Reshmi Ghosh (TA):	Did you understand the concept of channels?
00:30:08	Reshmi Ghosh (TA):	Folks this is the output formula I was talking about
00:32:56	Anon. Connectionist:	what happens if N-M /S is not an integer
00:33:06	Anon. D33p_M1nd:	you force it to be
00:33:24	Anon. D33p_M1nd:	ignore the remainder
00:33:37	Anon. Connectionist:	so 2.5 -> 2, 1.5 -> 1 right?
00:33:42	Anon. D33p_M1nd:	yes
00:33:46	Anon. Connectionist:	cool
00:35:41	Anon. YOLOv5:	Why are we using odd filter size ? I have only seen 3, 5, ...
00:36:47	Reshmi Ghosh (TA):	It can be anything actually
00:37:02	Anon. Softmax:	Would we lose some of the inputs if the stride is > 1 and the filter is forced to stay within edges in some cases?
00:37:21	Anon. Actor-Critic:	Where is padding used again? Here we’re doing pooling means we want to make the size of the matrix smaller right? Why’d we want to pad things?
00:37:35	Reshmi Ghosh (TA):	It depends on the task, if you use a stride greater than 1 for small images, then yes, but for large sized images no
00:37:44	Anon. YOLOv2:	padding could
00:37:48	Reshmi Ghosh (TA):	Think of some pixels in an image which do not have any information
00:37:54	Reshmi Ghosh (TA):	They are just noise
00:37:56	Reshmi Ghosh (TA):	Can be ignored
00:38:15	Reshmi Ghosh (TA):	Pooling is a separate layer
00:38:41	Reshmi Ghosh (TA):	In the convolution layer you do padding @Qiyun
00:39:04	Anon. Actor-Critic:	Icic thx!
00:39:52	Anon. Softmax:	By downsampling do we mean the  # of convolved features < # of inputs?
00:39:54	Anon. VC Dimension:	why is this ceiling?
00:40:02	Anon. Variance:	avg
00:40:05	Anon. GRU:	min?
00:40:08	Anon. Weight Decay:	mean
00:40:37	Reshmi Ghosh (TA):	Number of convolved features (output size) can anyways be smaller than input
00:40:57	Reshmi Ghosh (TA):	Max pooling is further reducing these “parameters”
00:41:03	Anon. Connectionist:	what is the puprpose of doing pooling? Just for saving computation?
00:41:29	Reshmi Ghosh (TA):	Yep reducing complexity
00:41:32	Anon. Softmax:	So we usually use zero padding to make the number of convolved features = number of inputs and then do the pooling to reduce dimension?
00:41:54	Reshmi Ghosh (TA):	Uhh no
00:42:08	Reshmi Ghosh (TA):	Suppose you have a input of size 6X6
00:42:20	Reshmi Ghosh (TA):	And filter size of 2x2
00:42:44	Reshmi Ghosh (TA):	And you are moving the filter with stride = 3
00:42:49	Reshmi Ghosh (TA):	What will happen?
00:42:51	Anon. Pooling:	doesn't pooling lead to loss of information, or there is a hyperparameter that controls how much downsampling the pooling layer does?
00:42:56	Anon. YOLOv2:	nice!
00:43:32	Anon. GRU:	so when we do this we are basically making a filter that does max pooling?
00:43:55	Anon. GRU:	if we don't use explicit down layers
00:44:12	Reshmi Ghosh (TA):	Pooling is a separate layer
00:44:16	Anon. Softmax:	Is pooling layer just an alias for downsampling layer?
00:44:21	Reshmi Ghosh (TA):	That is added after convD layer
00:44:42	Reshmi Ghosh (TA):	Ya pooling is a downsampling technique
00:46:58	Reshmi Ghosh (TA):	Please raise hands
00:50:23	Anon. GRU:	Relu
00:50:23	Anon. YOLOv2:	That's like scaling right?
00:50:25	Anon. Fourier Transform:	full mlp?
00:50:26	Anon. Scheduler:	linear
00:51:08	Anon. Softmax:	Dense linear layer?
00:52:27	Reshmi Ghosh (TA):	10 seconds
00:52:45	Anon. Softmax:	Does the context in HW1P2 count as a kind of distributed scanning?
00:52:56	Reshmi Ghosh (TA):	You mean hw2p2?
00:54:12	Anon. Softmax:	No, I mean the frames you added to both sides for HW1P2
00:54:34	Reshmi Ghosh (TA):	no
00:54:38	Reshmi Ghosh (TA):	Context was different
00:55:03	Reshmi Ghosh (TA):	Distributed scanning related more to how you share parameters
00:55:15	Anon. Connectionist:	is it possible that the depth of filter is not 3?
00:55:28	Anon. Softmax:	I see
00:55:32	Anon. Dendrite:	why do we need K_1 filters?
00:55:46	Reshmi Ghosh (TA):	Number of filters is dependent on task
00:55:54	Reshmi Ghosh (TA):	It can be 1, 2, 4, 5 anything
00:56:56	Anon. Dendrite:	I think in the past we just had a single filter which weights were all 1 that scanned the input rather than K_1 right.
00:57:36	Reshmi Ghosh (TA):	That would be just an example
00:58:53	Anon. Actor-Critic:	One filter filters the whole image right?
00:59:36	Reshmi Ghosh (TA):	One filter “convolves “/ scans around the entire image yes.
00:59:38	Anon. Softmax:	Do we have something equivalent to a pooling layer in distributed scanning MLP?
01:00:32	Reshmi Ghosh (TA):	But inputs also have channels remember (RGB channel = 3 channels?))
01:00:39	Anon. MiniMax:	So is the number of channels the number of outputs of that particular layer?
01:00:41	Anon. Dendrite:	ooh I get it now thanks
01:00:42	Reshmi Ghosh (TA):	It is okay to be confused about all these
01:00:45	Reshmi Ghosh (TA):	Keep asking questions
01:00:57	Reshmi Ghosh (TA):	I will try my best to answer and also poke bhiksha
01:01:23	Reshmi Ghosh (TA):	@vaidehi, number of filter channels == number of output channe;s
01:01:34	Anon. MiniMax:	Cool, thanks!
01:03:20	Anon. Visual Cortex:	Slightly unrelated - are we supposed to put an activation after a convolution?
01:03:38	Anon. Connectionist:	yes
01:03:41	Reshmi Ghosh (TA):	yes
01:03:47	Anon. Visual Cortex:	thanks
01:04:37	Anon. Softmax:	So the model should look like Convolution -> Activation -> Max Pooling?
01:04:57	Reshmi Ghosh (TA):	Pooling is a choice
01:05:04	Reshmi Ghosh (TA):	You may or may not implement
01:05:13	Anon. Softmax:	Thank you
01:05:25	Anon. Visual Cortex:	300
01:05:31	Anon. Visual Cortex:	oops
01:06:19	Anon. Softmax:	3?
01:07:34	Anon. Softmax:	But isn’t it the case that we can also write the filter in this “cubic” shape?
01:08:22	Anon. Connectionist:	cannot understand we need at least 3
01:08:31	Anon. Connectionist:	the filter is a cube right?
01:08:35	Anon. D33p_M1nd:	to keep same number of points
01:09:16	Reshmi Ghosh (TA):	Eason, each filter is 2d because image is 2d
01:09:27	Reshmi Ghosh (TA):	Stacked over each other, it is a cube
01:09:31	Reshmi Ghosh (TA):	Does it make sense?
01:10:40	Anon. Actor-Critic:	I think the confusion originated from the slides where it says filter size LxLx3
01:11:24	Anon. Softmax:	So even though each filter takes input from all of the R,G,B channels it only produces a single output?
01:11:41	Reshmi Ghosh (TA):	noooo
01:12:37	Anon. Epoch:	So there’s one LxL filter for each of RGB channels, so LxLx3?
01:12:55	Reshmi Ghosh (TA):	Just think of in this example
01:13:03	Reshmi Ghosh (TA):	RGB is three channel in input
01:13:14	Reshmi Ghosh (TA):	And each filter 2D is scanning each channel
01:14:02	Reshmi Ghosh (TA):	But number of filter channels can be greater than 3 as well
01:14:05	Reshmi Ghosh (TA):	This is an example
01:15:58	Reshmi Ghosh (TA):	https://towardsdatascience.com/a-beginners-guide-to-convolutional-neural-networks-cnns-14649dbddce8
01:17:07	Anon. Actor-Critic:	Just to clarify, I believe in this example, one filter has size 100 x 100 x 3 (where 3 is the depth), one filter is scanning the whole pictures, and the *output* of one filter is 2d, and we can do downsampling afterwards, etc.
01:17:31	Anon. Dendrite:	i don't see a poll?
01:17:39	Reshmi Ghosh (TA):	10 seconds
01:17:52	Anon. Dendrite:	hmmm
01:18:00	Reshmi Ghosh (TA):	sorry
01:18:08	Reshmi Ghosh (TA):	Bhiksha has questions on one note
01:18:11	Anon. Dendrite:	I think the poll didn't appear for me
01:18:22	Reshmi Ghosh (TA):	But rn If I tell him to pull out the questions
01:18:35	Reshmi Ghosh (TA):	He might get more frustrated
01:18:42	Anon. GRU:	th poll also did not appear for me
01:18:47	Reshmi Ghosh (TA):	We will share the poll questions later on
01:18:56	Anon. Dendrite:	Like the zoom poll not the questions
01:19:04	Anon. YOLOv2:	thanks Reshmi.
01:19:49	Reshmi Ghosh (TA):	I know
01:19:54	Reshmi Ghosh (TA):	Zoom is weird
01:20:03	Reshmi Ghosh (TA):	But we also show questions in the background
01:20:10	Anon. Dendrite:	what should I do for attendance
01:20:14	Reshmi Ghosh (TA):	Which is One note:P
01:20:21	Anon. Dropout:	just to be clear, does pooling cause a loss of information?
01:20:21	Reshmi Ghosh (TA):	You have answered one of the polls right?
01:20:32	Anon. Dendrite:	right, the first one
01:20:34	Anon. Leakage:	pad
01:20:35	Anon. Pooling:	pad
01:21:27	Anon. Leakage:	oh ok
01:21:30	Anon. GRU:	can we use the same filter multiple times to downscale later? or would there be no point
01:21:48	Anon. Pooling:	16
01:38:34	Reshmi Ghosh (TA):	10 seconds
01:39:29	Anon. Actor-Critic:	Thank you!