08:19:14	 Anon. Ivy:	Hi
08:19:18	 Anon. Star-Lord:	hello professor:D
08:19:24	 Anon. IronMan:	Hello!
08:19:29	 Anon. Jarvis:	hi professor
08:19:33	 Anon. S. Aiken:	Hi!
08:19:37	 Anon. Centre:	hi
08:19:37	 Anon. P.J. McArdle:	Hi professor!
08:21:29	 Anon. Centre:	Professor, I think there are some background noises
08:23:04	 Anon. BlackWidow:	Sorry, those are ambient noises that I don’t think the prof. can help
08:23:19	 Anon. BlackWidow:	Will let him know for the next lecture
08:23:50	 Anon. P.J. McArdle:	Someone’s mic is on
08:25:24	 Anon. Grandview:	it might be a local minima
08:25:26	 Anon. Thor:	No, because it could be a local minimum and not a global
08:25:28	 Anon. Fifth:	No, could be a local minima
08:25:41	 Anon. Baum:	no
08:25:46	 Anon. SilverSurfer:	We minimize relative to a loss which isn’t necessarily classification error
08:25:46	 Anon. P.J. McArdle:	No, empirical risk should be equal to true risk
08:25:54	 Anon. P.J. McArdle:	for generalization
08:26:55	 Anon. Forbes:	Doesn't that also depend on the complexity of the model?
08:27:27	 Anon. Morewood:	divergence
08:28:53	 Anon. SpyKid2:	left
08:29:18	 Anon. Morewood:	yes
08:29:22	 Anon. Baum:	yes
08:32:05	 Anon. Star-Lord:	yes
08:33:25	 Anon. Jarvis:	yes
08:33:40	 Anon. SpyKid2:	no
08:33:41	 Anon. Murdoch:	no
08:33:47	 Anon. Bigelow:	no
08:33:49	 Anon. SilverSurfer:	Not really
08:34:07	 Anon. Strange:	Outliers are not accounted for in backprop.
08:34:19	 Anon. Morewood:	But isn’t that an outlier?
08:35:28	 Anon. Morewood:	ok
08:36:46	 Anon. Bigelow:	a feature
08:36:47	 Anon. Strange:	Depending on the task
08:36:47	 Anon. SilverSurfer:	Theres positives
08:36:49	 Anon. Friendship:	we don't want to account for noise right
08:45:45	 Anon. Mantis:	How do we decide which direction to move in if we detect we have ended up in a saddle?
08:48:41	 Anon. Rocket:	You could add noise to escape from saddle
08:48:49	 Anon. Star-Lord:	^^ The professor mentioned that the Hessian might be useful when we are in a saddle point
08:49:15	 Anon. P.J. McArdle:	do we add a regularizer to escape the saddle points?
08:49:57	 Anon. SilverSurfer:	If a point is a saddle point there should be at least one direction (an eigenvector of the Hessian) in which the function is decreasing
08:50:31	 Anon. P.J. McArdle:	In ridge regression, we add regularizer to make it a convex function. I am not sure if that holds here
08:50:35	 Anon. Wilkins:	you can adjust the step size
08:57:01	 Anon. Star-Lord:	takes too long to reach min
08:58:06	 Anon. Strange:	Many steps
09:00:03	 Anon. Star-Lord:	yes
09:00:23	 Anon. SilverSurfer:	three
09:00:23	 Anon. Morewood:	3
09:00:24	 Anon. Hobart:	3
09:00:58	 Anon. SilverSurfer:	exact
09:00:58	 Anon. Penn:	Exact
09:01:00	 Anon. Baum:	exact
09:01:00	 Anon. Fifth:	exact
09:01:05	 Anon. Fury:	exact
09:01:09	 Anon. SilverSurfer:	yes
09:03:21	 Anon. Atom:	Second derivative
09:03:38	 Anon. Jarvis:	second derivate
09:03:43	 Anon. Morewood:	Inverse of second derivative
09:05:36	 Anon. Morewood:	Takes too long
09:05:37	 Anon. SilverSurfer:	Itt takes a few steps
09:05:53	 Anon. SilverSurfer:	yes
09:06:07	 Anon. Wilkins:	it will oscillate
09:06:31	 Anon. Myrtle:	yes
09:06:34	 Anon. Star-Lord:	stay in the same height
09:06:37	 Anon. Morewood:	It won’t converge?
09:06:37	 Anon. SilverSurfer:	It’ll jump back and forth at the same height
09:08:27	 Anon. Morewood:	yes
09:08:29	 Anon. Star-Lord:	yes
09:13:24	 Anon. Star-Lord:	yes
09:16:41	 Anon. Ivy:	so we can compute each minimum separately and combine them together?
09:17:01	 Anon. S. Highland:	Same doubt
09:18:40	 Anon. Star-Lord:	overshoot in one direction
09:18:52	 Anon. Atom:	Yes
09:18:52	 Anon. Butler:	Yes in vertical
09:18:53	 Anon. Wasp:	yes
09:18:55	 Anon. Darlington:	yes
09:18:56	 Anon. Wilkins:	yes
09:18:56	 Anon. Nebula:	yes
09:18:58	 Anon. Bigelow:	yes
09:19:03	 Anon. Forward:	yes
09:19:54	 Anon. Butler:	0.7
09:19:56	 Anon. Morewood:	0.7
09:19:57	 Anon. Wilkins:	0.7
09:22:58	 Anon. Morewood:	no
09:23:01	 Anon. S. Highland:	no
09:23:03	 Anon. Bigelow:	no
09:23:03	 Anon. Wilkins:	no
09:23:28	 Anon. GreenArrow:	no
09:23:29	 Anon. Fury:	no
09:23:32	 Anon. Wilkins:	no
09:26:01	 Anon. Ivy:	something like Simulated annealing?
09:28:15	 Anon. Morewood:	Doesn’t learning rate depends on the point at which we are starting to descent? If it starts near to global minimum and we start with large learning rate, it might shoot and diverge
09:33:05	 Anon. Star-Lord:	what happens if we overshoot and reach another bowl?
09:40:01	 Anon. Ivy:	what if we start from a local minima and we want to get out of it?
09:40:25	 Anon. Ivy:	start inside local minima
09:54:20	 Anon. Ivy:	what if we start from inside of local minima and we want to get out of it using momentum because momentum computes average of previous steps
10:07:39	 Anon. Groot:	Thank you!
10:07:48	 Anon. Star-Lord:	It was great professor:D