19:58:54 Reshmi Ghosh (TA): Mute when you join, please!:D 20:01:19 Anon. Reinforcement: Any chance that we can have video disabled by default? Just to protect people’s privacy 😆 20:01:43 Jacob Lee (TA): https://docs.google.com/forms/d/e/1FAIpQLSfwUVnapDbqRmff_dcyxi8SyYeSYtJU6gRtsbMFzWWNycs0qQ/viewform 20:01:59 Anon. Photoreceptor: will this be recorded? 20:02:01 Anon. Psilocybin: I guess u can choose to close video before entering the room 20:02:28 Reshmi Ghosh (TA): Yes it willl be recorded 20:02:39 Anon. Reinforcement: yeah I found that, thanks! 20:03:29 Reshmi Ghosh (TA): Please fill in the form, it is anonymous, we just want to get a sense of how you are doing in hw1 20:07:52 Anon. Markov Chain: Will these bootcamps be a regular thing? 20:07:59 Reshmi Ghosh (TA): Nope! 20:08:23 Reshmi Ghosh (TA): this homework onwards you should get a hang of p1 20:08:49 Reshmi Ghosh (TA): Well hopefully:P 20:09:36 Anon. Hessian: Hello - Will this bootcamp be recorded? 20:09:44 jinhyun1@andrew.cmu.edu (TA): yup 20:09:49 Anon. Hessian: thx 20:09:49 Anon. Shufflenet: Yea I think so 20:10:22 Reshmi Ghosh (TA): Yes it will be! 20:12:54 Jacob Lee (TA): https://www.cs.cmu.edu/~112/notes/notes-oop-part1.html 20:13:36 Reshmi Ghosh (TA): We will collect all links and post on Piazza for future reference 20:14:02 Anon. Fast-RCNN: Will the recording be available ? 20:14:11 Anon. Gentoo: yes 20:15:40 Anon. Gentoo: Are the tensor and node the same thing? 20:17:12 Anon. Oja’s Rule: what does "store the node on the output tensor" mean? Tensor is just like nd array correct? how do you store an object to tensor? 20:17:24 Anon. Gentoo: Ty Jacob, but i’m still curious about what the node object refers to? 20:17:24 Anon. PyTorch: So in Function.apply() —> backward_function = BackwardFunction(cls) is basically the node object created? 20:17:34 Anon. PyTorch: And cls is the operation? 20:18:31 Tony Qin (TA): Zexi, a Tensor is not like nd array. It holds more information than that. Check out its definition in tensor.py 20:19:17 Anon. All-or-nothing: How to define the constant grad_fn? 20:19:28 jinhyun1@andrew.cmu.edu (TA): What do you mean by constant grad_fn? 20:19:29 Tony Qin (TA): Yueqing, you could think of node as BackwardFunction or AccumulateGrad… Not exactly true but could be helpful 20:20:10 Anon. Gentoo: Thank you! 20:20:27 Anon. All-or-nothing: Constant node, sorry 20:21:27 Jinhyung David Park (TA): @Di it would be None 20:21:29 Jinhyung David Park (TA): as default 20:21:58 Anon. Linear Algebra: Yeah, I would like to know the order 20:22:03 Anon. All-or-nothing: Thanks! 20:24:13 Anti (TA): Can you review what apply will return? 20:24:47 Anon. Dropout: Can you explain .apply in detail? 20:24:49 Anon. Gaussian: the *args in Function.apply will be Tensor objects, right? 20:25:17 Anon. Linear Algebra: No, reshape() will pass a tuple in *args 20:25:17 Anxiang Zhang (TA): not really, sometimes there are other arguments 20:26:07 Anon. Thalamus: In function .apply() do we have to create an object for accumulate grad type of node as well? 20:26:35 Anon. Saltatory: In autograd_engine.backward(), will grad_fn always be one of a BackwardFunction object or AccumlateGrad object? So is every element in grad_fn.next_functions one of those objects? 20:26:45 Anon. pdb.set_trace(): can u plz explain contextManager as well? 20:26:59 Tony Qin (TA): Jeff, it could be any of the 3 types of nodes 20:27:30 Anon. Saltatory: @Tony right so one of those 2 objects + None? 20:27:40 Jinhyung David Park (TA): Yup 20:28:16 Jinhyung David Park (TA): @Debdas ContextManager you can basically think of it as storage. When you do forward of a function, you will need to save some stuff for to compute the backward 20:28:38 Tony Qin (TA): Baishali, you will most likely run into the case where creating an AccumulateGrad node would be appropriate 20:28:39 Jinhyung David Park (TA): @Debdas basically, it’s a usb stick we give you for hte forward so you can save stuff in it, and we give you the same usb stick when you do backward 20:29:17 Anon. Thalamus: We are storing valid parent nodes in next_functions. However context manager stores all the *arg in the forward function of each operation. how do we prevent context manager from storing all parent nodes? 20:30:15 Jinhyung David Park (TA): Context Manager stores whatever you want to store in it - not necessarily all the *arg. 20:31:28 Anon. Operationtimedout: W/r/t Context Manager, why/when do we need to take the first element of a “list” (e.g., in Log) when pulling it in backward()? Why is a list, not the tensor, recovered? 20:31:35 Anon. Thalamus: We are finding the valid parents and storing then in next_function after the forward call to operation class. 20:31:51 Anon. Saltatory: So in the write up if the node is a BackwardFunction object, it says pass gradients only if requires_grad==True, but how do you access .requires_grad? I guess my question is how do you access a tensor in BackwardFunction obj? 20:33:21 Anon. Saltatory: Thank you! 20:34:11 Anon. Perceptron: Is it possible to draw a diagram and illustrate which functions/classes go where? I'm having a hard time understanding it at a high level what goes where 20:34:46 Anon. Activation: +1 20:35:02 Anon. Shufflenet: +1 20:35:14 Anon. Dropout: +1 20:35:16 Anon. Giant Squid Neuron: +1 20:35:22 Anon. Kaggle: +1 20:35:28 Anon. Linear Algebra: +1 20:35:58 Anon. Operationtimedout: I think the file structure does a good job of this. I know it took me awhile to get it though 20:36:50 Anon. pdb.set_trace(): yes, that video is grt 20:37:04 Anon. Array: I had a question if we have some way to test our functions by being given a tree already and the expected output to debug, would this be the sandbox.py? 20:37:24 Anon. Gentoo: Is this the link? 20:37:25 Anon. Linear Algebra: Why Autograd Step 1 says that "without storing" "It then passes (without storing) its gradient to the graph traversal method" 20:37:25 Anon. Gentoo: https://www.youtube.com/watch?v=MswxJw-8PvE 20:38:10 Jinhyung David Park (TA): @matias sandbox gives you a layout for doing so 20:38:19 Anon. Array: thank you 20:38:55 Anon. Linear Algebra: If I store the gradient, would that be a problem? 20:39:23 Anon. Linear Algebra: Got it, thank you 20:40:46 Anon. Giant Squid Neuron: so shapes must be equivalent? 20:40:59 Anon. Operationtimedout: With respect to the SGD step, we do “NOT” add this to the comp graph. However, we’ve previously overloaded some required operators with the assumption that these operations would be added. Would you recommend numpy in this case? It’s currently imported. OR do these directions mean that the SGD step is not added, but the internal operations are in the graph? 20:42:19 Anon. Operationtimedout: thank you 20:44:20 Anon. Gentoo: Could you please make the font larger? I can barely read it…… 20:44:57 Anon. ReduceLROnPlateau: we only need to add broadcasting in add func, for linear autograd. Right? 20:45:05 Anon. Gentoo: Thank you so much 20:45:45 Jacob Lee (TA): for the linear question yeah, pretty much only need broadcasting for add 20:45:52 Anon. Perceptron: what's ctx? 20:45:58 Jacob Lee (TA): ContextManager object 20:45:59 Anon. Gentoo: context 20:46:02 Anon. GRU: context 20:46:08 Anon. Perceptron: thanks all 20:46:24 Anon. Recall Capacity: Will we talk about BatchNorm in this bootcamp? 20:46:46 Jacob Lee (TA): ^ Yeah we'll try to 20:46:54 Anon. Seq2Seq: Can you please talk sth about derivative rules of element-wise matrix multiplication/division? 20:47:10 Jacob Lee (TA): ^ Recitation 2 has some discussion of that 20:47:18 Jacob Lee (TA): The hints that are very big imo 20:47:22 Jacob Lee (TA): the hints there* 20:49:18 Anon. Recall Capacity: The "is_parameter" attribute of Tensor is actually important for deciding "require_grad" and "is_leaf", do we need to consider it when comstrcting the computing graph? 20:50:07 Jacob Lee (TA): ^ You need to set it during forward, but you probably won't need to check it yourself 20:50:12 Jacob Lee (TA): like check it to do anything 20:50:21 Anon. Loss Function: There is a situation when using auto_grad.apply() in backward function, it outputs a list of tensors. How to deal with this situation? 20:50:46 Anon. Retinal Ganglion: Isnt this too idealistic? what if our network wants to add more than 2 terms? or is this a toy example? 20:51:09 Anon. Giant Squid Neuron: will the auto grader be able to tell us if our operations are correct before we implement the rest 20:52:31 Jinhyung David Park (TA): @xinyue every function that we’ll use will only output a single result, so no. I believe any multi-output function can be deconstructe dinto multiple single-output functions 20:52:38 Jinhyung David Park (TA): we did not implement this because it adds another layer of complexity 20:52:55 Jinhyung David Park (TA): @Rohan If you want to do a + b + c, this is just (a+b) + c, a sequence of double additions 20:53:04 Jinhyung David Park (TA): So you don’t need to implement multiple additions explicitly. 20:53:16 Jinhyung David Park (TA): Later functions may take in more than 2 terms and some inputs may not even be tensors 20:53:36 Anon. Reinforcement: for linear layer we do need matmul right? Somebody might ask this before but just wanna confirm it quickly 20:53:48 Jinhyung David Park (TA): @nicky I believe so 20:53:58 Jinhyung David Park (TA): @daniel yes 20:54:10 Anon. Reinforcement: Thanks! 20:55:15 Anon. YOLOv2: Do we only need unbroadcasting for addition, or also other matrix operations? 20:55:42 Anon. YOLOv2: Thanks! 20:56:11 Anon. Operationtimedout: could you explain the structure of the param object? 20:56:30 Anon. Operationtimedout: what does self.params hold? 20:56:40 Anon. Operationtimedout: SGD 20:56:47 Anon. Operationtimedout: and optimizer 20:56:52 Anon. Thalamus: In backward function of operation classes, we are finding the gradient of both inputs a and b. If any one of then has require_grad=False, we will not be computing its gradient. How do we handle that in the operation backward function? 20:57:10 Anon. pdb.set_trace(): how to test question 2 & 3? sandbox does not test this.. 20:57:56 Jinhyung David Park (TA): @baishali you can just pretend that they have it True and deal with it in Function.apply 20:58:02 Jinhyung David Park (TA): *kind of 20:58:11 Jinhyung David Park (TA): as in you can deal with it later during the backward pass 20:58:24 Anon. Recall Capacity: A quick conceputual question:in the backward, the gradient of each sample in the same batch is actually accumulated for SGD, right? 20:58:27 Anon. Shufflenet: Will we also have the chatbox contents when the recorded video gets uploaded for us? Thx! 20:58:35 Tony Qin (TA): Baishali, if requires_grad=False then it will be a constant (None). Deal with it in backward somehow 20:58:35 Jinhyung David Park (TA): @qiyun sure 20:58:36 Anon. GRU: yes we will share the chat 20:58:45 Jinhyung David Park (TA): @zhihao yes 20:58:49 Jinhyung David Park (TA): @average over batch 20:59:46 Anon. Reinforcement: Do we need backprop for batchnorm and activations 21:00:10 Anon. Reinforcement: gotcha. 21:00:14 Anon. Recall Capacity: @Jinhyung, Sorry, where have we done the average part? Like for the bias, I think when I passed the test case, I did not do the average part. 21:00:51 Anon. Gentoo: Could you plz explain what the gamma and beta mean? 21:01:26 Jinhyung David Park (TA): @zhihao Actually… I don thtin kyou need to explicitly worry about that for this homework 21:01:28 Jinhyung David Park (TA): maybe only for batchnorm 21:01:46 Anon. Gentoo: Okay , thank you. 21:02:55 Tony Qin (TA): Gamma and beta are just learnable parameters. It will be updated during backprop 21:03:29 Anon. Saltatory: for p2 on the leaderboard, is “TA Submission” the only TA submission? im just curious and also want to know where I am with the progress of this assignment. 21:03:31 Anon. Recall Capacity: @Jinhyung, ok. Just very curious where was the average part executed. 21:04:22 Tony Qin (TA): Jeff, some other TAs have submitted as well… Tentative cutoffs will be announced on Wednesday. Shoot for at least 70 21:05:14 Anon. Saltatory: @tony thank you! Also I think the writeup isn’t updated on the website or autolab. I’m looking at it right now and it’s different from jacob’s 21:05:40 Anon. CNN: What is the ideal practice on choosing batch_size? 21:05:43 Anon. Shufflenet: Did the code part get updated at any chance? Do we need to redownload that part as well? 21:06:06 Jinhyung David Park (TA): @shriti you can test a bunch of batchsizes 21:06:14 Jinhyung David Park (TA): start at a number, multiply in exponents of 2 21:06:22 Anon. Shufflenet: gotcha! 21:06:29 Jinhyung David Park (TA): run one or two epochs and see which has highest result and go with it 21:06:31 Tony Qin (TA): Shriti, increase batch size as long as training time per epoch decreases. Take advantage of the many cores on a GPU 21:06:57 Anon. CNN: Ok t 21:07:01 Tony Qin (TA): Nvm 21:07:03 Anon. CNN: Thanks! 21:08:41 Anon. Oja’s Rule: I'm a little lost in terms of what to submit for the 9/16 deadline. We're testing MNIST but on Kaggle we are asked to do speech classification... What are we submitting for the 9/16 deadline? 21:09:16 Jinhyung David Park (TA): 9/16 deadline is for hw1p2 21:09:23 Reshmi Ghosh (TA): Early deadline! 21:09:34 Jinhyung David Park (TA): you can double check the writeup for it 21:09:47 Anon. ReLU: What can we use for p2? Only the ones that we implement in p1? 21:09:55 Anon. Shufflenet: How specific is gpu gonna be used for hw1p2 if we’re suggested to run locally first? 21:09:55 Reshmi Ghosh (TA): anything 21:09:59 Reshmi Ghosh (TA): You have to experiment 21:10:00 Anon. Shufflenet: *amazon aws 21:10:01 Jinhyung David Park (TA): you should use the actual pytorch 21:10:05 Reshmi Ghosh (TA): With models 21:10:17 Jinhyung David Park (TA): actually you can totally do p2 with p1 21:10:22 Jinhyung David Park (TA): but it’ll be 100x slower 21:10:27 Reshmi Ghosh (TA): XD 21:10:48 Anon. Oja’s Rule: https://piazza.com/class/k9lk3ucwopisb?cid=200 The piazza post says to complete problem 1 and submit 21:11:07 Jinhyung David Park (TA): oh 21:11:22 Jinhyung David Park (TA): uhh thats like a recommendation for how much of hw1p1 you should get done 21:11:25 Anon. Gentoo: Can we just submit something like “print(“Hello World”)” before the early ddl? I may not have enough time to finish the base line work……:p 21:11:26 Jinhyung David Park (TA): and that is 9/15 21:11:34 Reshmi Ghosh (TA): That is recommended schedule btw 21:11:49 Tony Qin (TA): Yueqing, the hw1p2 early submission must be in the correct format at least. 21:11:49 Reshmi Ghosh (TA): Which we highly recommend:P 21:11:50 Jinhyung David Park (TA): @yueqing you have to submit an actual prediction file 21:12:00 Anon. Shufflenet: Is there a submission limit for the Kaggle competition in Total..? 21:12:05 Jinhyung David Park (TA): nope only daily 21:12:08 Anon. Gentoo: Okay gotcha 21:12:25 Anon. Oja’s Rule: thx 21:12:59 Reshmi Ghosh (TA): Submit the submission.csv file if needed by 9/16 21:13:00 Anon. Shufflenet: Just following up with previous qn as well, what is the difference from using amazon aws on hw1p2 v.s. running locally since TA suggested doing locally first? 21:13:07 Reshmi Ghosh (TA): But you need your names up on kaggle 21:13:21 Tony Qin (TA): Running locally will be magnitudes slower if you don’t a GPU with cuda 21:13:24 Jinhyung David Park (TA): locally just to see if your code has any bugs 21:13:28 Jinhyung David Park (TA): ^what tony said 21:14:00 Anon. Shufflenet: Icic, but running locally will take forever ish…? So if training looks alright and running, then we can transfer to running on was? 21:14:04 Anon. Shufflenet: *on aws 21:14:12 Jinhyung David Park (TA): yup 21:14:19 Anon. Shufflenet: Thanks! That’s super clear then 21:14:26 Reshmi Ghosh (TA): I will make a reminder post about hw1p2 early deadline 21:14:29 Anon. ReduceLROnPlateau: we have to submit on kaggle or autolab? 21:14:33 Tony Qin (TA): You may run into additional bugs on AWS, but most of the debugging should be done if it’s working locally 21:14:38 Anon. ReduceLROnPlateau: for hw1p2 21:14:40 Tony Qin (TA): Hw1p2 is on kaggle 21:14:41 Reshmi Ghosh (TA): Hw1p2 submission file should be on kaggle 21:14:43 Jinhyung David Park (TA): if you are ok with git, I think a good pipeline is to develop locally on a repository and directly git clone to colab and run it 21:14:53 Anon. CNN: Anybody has an experience using Google Cloud Platform? 21:15:10 Jinhyung David Park (TA): TA Jiachen does 21:15:15 Anon. Recall Capacity: What's HW1 Bonus? 21:15:16 Anon. Shufflenet: Oh yea is there a limit on the times you can submit on kaggle competition? Either for early ddl or the final deadline? 21:15:17 Jinhyung David Park (TA): maybe you can go to his OH 21:15:25 Jinhyung David Park (TA): @zhihao to be released after hw1p1 is done 21:15:32 Jinhyung David Park (TA): @qiyun only a daily limit of 10 21:15:34 Jinhyung David Park (TA): no overall limit 21:15:34 Reshmi Ghosh (TA): Yep per day - 10 submissions 21:15:43 Anon. Array: have the coupons already been shared? 21:15:49 Jinhyung David Park (TA): will be rewleased soon 21:15:50 Reshmi Ghosh (TA): Will be shared soon 21:15:55 Reshmi Ghosh (TA): We are working on it:) 21:16:07 Reshmi Ghosh (TA): Expect it in 1-2 days 21:16:08 Anon. Array: thank you 21:16:13 Anon. Shufflenet: Thank you @Jinhyung! 21:16:14 Anon. Weight Decay: How long did p1 and p2 take for you guys (TAs) 21:16:18 Anon. Operationtimedout: Request:please upload to youtube the lectures on the same day 21:16:30 Anon. Giant Squid Neuron: this was very helpful thank you! 21:17:14 Tony Qin (TA): P1 - different homework from last semester 21:17:19 Reshmi Ghosh (TA): P1 is completely new. 21:17:25 Tony Qin (TA): P2 - long time 21:17:27 Jinhyung David Park (TA): p1 depends on your background 21:17:31 Reshmi Ghosh (TA): Can’t benchmark our time 21:17:34 Reshmi Ghosh (TA): As David said 21:17:37 Jinhyung David Park (TA): i think the ballpark we’re saying is 8 - 20? 21:17:43 Anon. Gaussian: would it be possible to share some augmentations to improve accuracy for p2? I have only done augmentations with image data 21:17:44 Reshmi Ghosh (TA): Also on the YouTube request. That’s actually on Bhiksha 21:17:48 Anon. Shufflenet: How long are the training expected to run for hw1p2? 21:18:00 Jinhyung David Park (TA): @dhruv sure we can post a link 21:18:01 Reshmi Ghosh (TA): Depends on your model and what you experiment 21:18:03 Jinhyung David Park (TA): you can do fine without aug though 21:18:07 Anon. CUDAError: Same question 21:18:12 Anon. comicstrip: where is the sample submission for hw1p2? 21:18:14 Tony Qin (TA): @Qiyun, if you’re running g4dn.xlarge maybe give yourself 5 hours 21:18:15 Anon. CUDAError: How long does it take using the suggested model 21:18:23 Anon. CUDAError: 1024 - BatchNorm(20) - Linear 21:18:24 Reshmi Ghosh (TA): In the kaggle download 21:18:27 Reshmi Ghosh (TA): Please download 21:18:28 Anon. Shufflenet: @Tony thank u! 21:18:36 Jinhyung David Park (TA): @hfei not very long 21:18:43 Tony Qin (TA): @hfei that is not the suggested model. Just starter. You will not do well with that model 21:18:46 Anon. GRU: @Alvin, sample submission will be on the kaggle site 21:18:48 Jinhyung David Park (TA): probably like.. 1 min for epoch? 21:20:16 Anon. Fast RCNN: What's a ballpark figure for expected accuracy for p2? 21:20:39 Tony Qin (TA): ~70 +- 5 21:20:51 Anon. Fast RCNN: Thanks! 21:22:25 Anon. ReduceLROnPlateau: thanks guys!! 21:22:31 Anon. Gaussian: thanks guys! 21:22:35 Anon. MiniMax: Thank you! 21:22:35 Anon. YOLOv2: Thanks! 21:22:37 Anon. Recall Capacity: 3q 21:22:41 Anon. Array: thank you 21:22:49 Anon. Kaggle: Thanks! 21:23:01 Anon. ReLU: thanks