08:25:07 Anon. Synapse: So in the case of PCA the dimension of z should be strictly less than the dimension of x? 08:26:44 Anon. Train: Why should the data be centered in this case ? 08:28:33 ksaharan@andrew.cmu.edu (TA): What case are you referring to? 08:28:43 Anon. Train: PCA 08:29:56 ksaharan@andrew.cmu.edu (TA): So that we only capture the variance and are not affected by the mean 08:30:28 Anon. Train: Thank you 08:52:15 Anon. Synapse: Usually how do we choose the non-linear activation so that the model has enough expressive power? 08:52:31 Anon. Oja’s Rule: ^^ 08:52:47 ksaharan@andrew.cmu.edu (TA): We’ll see that soon in the context of DL 08:53:43 ksaharan@andrew.cmu.edu (TA): That’s what VAE is all about 09:00:59 Anon. Synapse: What is the encoder component in this case of NLGM? 09:02:56 ksaharan@andrew.cmu.edu (TA): There is no explicit encoder as you do in AE or VAE 09:06:11 ksaharan@andrew.cmu.edu (TA): But if your non-linear activation is nice enough to work with, you can call p(z|x) as the encoder (if you can construct that in a closed form or otherwise) 09:07:43 Anon. Train: i think p(z|x) can be a neural network. It takes x as input and gives you z. So encoder is just a NN model 09:08:37 ksaharan@andrew.cmu.edu (TA): Thats a good guess. It takes you in the realms of VAE, away from conventional NLGM 09:14:28 Anon. Synapse: Why we can make the assumption that p(z|x) approximately follows a Gaussian distribution? 09:15:03 Anon. Train: because world is a boring place 09:15:16 Anon. Train: I might be wrong though lol 09:15:28 ksaharan@andrew.cmu.edu (TA): Well you can assume whatever you want to be honest but you need to be cognizant of the fact that it should be something that looks similar to the real p(z|x) 09:15:57 ksaharan@andrew.cmu.edu (TA): If it doesn’t, then you are doing something random. Your encoder has no relationship with your decoder 09:16:16 ksaharan@andrew.cmu.edu (TA): Not sure how useful that is going to be fo anything at all 09:18:04 Anon. MiniMax: standard normalization 09:20:47 Anon. Kernel: so that was assuming our covariance is a diagonal matrix and the covariance terms are 0? 09:26:22 ksaharan@andrew.cmu.edu (TA): Yeah that’s the standard assumption 09:30:44 ksaharan@andrew.cmu.edu (TA): Most of the math that you’ll see and implementations that you’ll find work with this standard assumption 09:45:52 ksaharan@andrew.cmu.edu (TA): For people who find this interesting may want to check out something called Variational Inference