00:10:23	Anon. Spiking NN:	😂
00:10:46	Reshmi Ghosh (TA):	We need more than 1:20 to do justice to the class:P
00:12:49	Reshmi Ghosh (TA):	Please lower your hands after you raise them
00:12:54	Reshmi Ghosh (TA):	It shall pain:P
00:14:40	Anon. Spiking NN:	reminds me of this:https://en.wikipedia.org/wiki/La_plume_de_ma_tante_(phrase)
00:15:04	Reshmi Ghosh (TA):	haha
00:15:52	Anon. Spiking NN:	bruh if u didn’t know when the bhudda was born this looks completely legit
00:16:09	Anon. Indifferentiable:	^
00:16:21	Reshmi Ghosh (TA):	Bruh, you gotta not believe anything AI generates:P
00:21:20	Reshmi Ghosh (TA):	Post questions and respond on chat
00:21:41	Anon. Directed Edge:	wait was the left figure fully connected  or no
00:21:48	Anon. Directed Edge:	Ok it’s fully connected
00:21:54	Reshmi Ghosh (TA):	Yes they are:))
00:22:06	Anon. Spiking NN:	8am the morning after a homework was due is my guess as to attendance 🤔
00:23:17	Anon. Eta:	recurrent
00:23:22	Anon. Indifferentiable:	RNN
00:23:29	Anon. Axon:	convolution
00:23:37	Anon. YOLOv5:	Could a 1-d TDNN work?
00:23:39	Anon. Spiking NN:	memory unit
00:23:40	Anon. Directed Edge:	cnn
00:23:40	Anon. VC Dimension:	CNN
00:26:37	Anon. Indifferentiable:	lots of memory
00:30:29	Reshmi Ghosh (TA):	10 seconds
00:32:31	Anon. YOLOv5:	is this similar to transitioning between markov states?
00:33:39	Anti (TA):	In markov states, an input does not affect the output for the rest of time
00:35:56	Reshmi Ghosh (TA):	@nikhil what was your question?
00:36:02	Reshmi Ghosh (TA):	I saw you had raised your hand
00:36:50	Anon. YOLOv5:	Is there ever a point where we want outputs too far from current tilmestep to stop influencing our current output? To refer back to the example about stock prices, wouldn’t we not want thanksgiving to influence our prediction long after november?
00:37:51	Akshat Gupta (TA):	Ideally we want the network to learn how far to refer to
00:38:15	Reshmi Ghosh (TA):	Yep ^^
00:38:38	Anti (TA):	@nikhil you may actually be right for for the first example, since we only look at the last output, an didn't yet have memory
00:39:09	Reshmi Ghosh (TA):	Imagine a sentence, how do you want to train your model, so that it can predict the next word that comes after a part of the sentence
00:42:59	Anon. Regularization:	Are these blocks identical to eachother?
00:43:48	Reshmi Ghosh (TA):	10 seconds
00:45:08	Anon. Center Loss:	What is the pros of the introduction of memory units?
00:45:19	Anon. hello_world.py:	Hi, I failed to submit the poll due to system error?
00:45:53	Reshmi Ghosh (TA):	Did you not see the poll?
00:45:59	Reshmi Ghosh (TA):	It could be a zoom issue
00:46:22	Anon. Loss Surface:	No I did not see poll as well
00:46:32	Anon. hello_world.py:	I press the submit button, but it did not submit. And when times up there is a fail report
00:46:35	Anti (TA):	@Feng-Guang Su this slide is about that question
00:47:03	Reshmi Ghosh (TA):	Can you leave and join @junwei? If not it is okay. I am noting names, no worries there
00:47:36	Reshmi Ghosh (TA):	@pinxu that is weird
00:49:00	Anon. RCNN:	What is the motivation behind creating a separate hidden state of the network instead of directly recursing on the output?
00:51:00	Anon. RCNN:	Thank you
00:51:20	Anon. Boolean:	Is there a connection between RNNs and HMMs?
00:51:54	Anon. Regularization:	the block diagrams definitely looks similar
00:52:03	Anon. Regularization:	but the blocks here are NNs
00:52:14	Anon. YOLOv5:	I think the difference is that the Markov property isn’t guaranteed
00:52:41	Anon. Spiking NN:	there’s no markov property at all imo
00:53:47	Anon. Cable Theory:	is the first hidden state learned or fixed before training?
00:54:07	Anon. Regularization:	I think he said fixed
00:54:18	Akshat Gupta (TA):	It is learnt
00:54:27	Anon. Regularization:	we know the initial state, the weights of influence can be learned
00:55:42	Anon. Gabor transforms:	what would h-2 be for the timestep 1 in that prior example?
00:56:30	Akshat Gupta (TA):	We have to start with an initialized hidden state t = -2
00:57:04	Akshat Gupta (TA):	Along with a hidden state at t = -1
00:58:02	Anon. RCNN:	So essentially we have two recurrences:one along the time axis and one independent of time?
00:59:25	Reshmi Ghosh (TA):	When we talk about recurrence we are talking about time. The layers of the network still exist.
01:01:45	Anon. Matrix:	what does the subscript i represent?
01:02:29	Anon. Encoder:	+1
01:02:35	Anon. RCNN:	Which term represents the self-loop/recurrent weights?
01:03:07	Akshat Gupta (TA):	Reusing h(t-1) for h(t) represents recurrence
01:03:22	Reshmi Ghosh (TA):	h_i (t-1) is being used to feed into the next step
01:03:56	Anon. RCNN:	What does t represent?
01:03:57	Anon. Encoder:	it does
01:03:59	Anon. Matrix:	yes , makes sense
01:04:00	Anon. Encoder:	thank you
01:04:19	Reshmi Ghosh (TA):	T is the time
01:04:28	Reshmi Ghosh (TA):	Time step**
01:04:46	Akshat Gupta (TA):	or t is index to sequence for an input sequence of length T
01:04:58	Reshmi Ghosh (TA):	^^
01:05:03	Anon. RCNN:	But why would there be a change in time step if you are moving across different layers?
01:05:27	Reshmi Ghosh (TA):	You have layers and you have sequences,
01:06:10	Reshmi Ghosh (TA):	Like I said earlier when we talk about recurrence, we are referring to the sequence or time dimension
01:07:23	Reshmi Ghosh (TA):	@yiwei the blocks you see of different colors? That is the layer, but columns of those blocks are through each time
01:07:28	Reshmi Ghosh (TA):	Does that make sense?
01:08:14	Anon. Indifferentiable:	Are state-space models updating with backprop all the way back? Or stopping at the previous state unit? Is that what is meant by true recurrence?
01:08:37	Anon. RCNN:	Yes but when we talk about self-loops (output of one block being fed into the same block) there shouldn’t be a change in time step right?
01:09:47	Akshat Gupta (TA):	@Kinori, All the way back
01:10:33	Anon. Indifferentiable:	Thanks
01:10:48	Anon. YOLOv5:	is the hidden unit not changing with training? im confused cause he said shared parameters
01:12:14	Anti (TA):	It changes. But each green block has the same parameters (weights)
01:13:10	Reshmi Ghosh (TA):	@yiwei
01:13:16	Reshmi Ghosh (TA):	Aah I got where you are confusing the concepts. So when Bhiksha showed a single column with loop, it was just representative. But indeed that figure should be unrolled wet time
01:13:27	Reshmi Ghosh (TA):	wrt**
01:14:18	Anon. RCNN:	@Reshmi I see. Thank you
01:19:58	Anon. YOLOv5:	Is there a limit to how many time steps can be allocated when implementing this? Because otherwise it seems like you could recurse almost infinitely when computing backprop
01:21:34	Akshat Gupta (TA):	There is no limit..which is a problem with RNN’s. Good spot
01:24:23	Anon. YOLOv5:	So once the derivatives wrt. Div are computed in time step T, they dont need to be computed agian for the previous time steps?
01:24:30	Reshmi Ghosh (TA):	I know a lot of people had apprehensions about raising hands, should I ask Bhiksha to repeat some concept?
01:24:43	Anon. Indifferentiable:	Is the divergence only found at the end of the generated sequence? Or is it done at every output?
01:27:51	Anon. Gabor transforms:	I can see it
01:28:05	Anon. Center Loss:	Why don’t we directly propagate Z instead of h? Any motivation?
01:30:11	Anon. Center Loss:	Oh true
01:30:27	Anon. Center Loss:	Yes