In this assignment you will be introduced to basic numpy functionality, vectorization, and slicing/indexing. The goals of the assignment are as follows:
You will be given a set of problems in this homework to test your basics. You must finish all of them to get full points.
Although, this homework is worth only 1% of your final grade, it is essential that you do it fully! This homework acts as an introduction to python, if you can't solve this homework then you will be struggling with the coming assignments. Make sure you understand the concepts introduced here and in recitation 0 to determine your initial level in the course.
On time submission deadline: | 20 Jan 2019, 11:59:59 EDT. |
---|---|
Late submission deadline: | 31 Jan 2019, 11:59:59 EDT. |
Expected time required for this homework: | Six hours. If you're quick, you can do it in under an hour. |
The late submission deadline is intended for late registrants to the course.
Your solutions will be autograded by Autolab. For this reason, it is important that you do not change the signature of any of the functions contained in the template.
In order to submit your solution, create a tar file containing your code. The root of the tar file should have a directory named hw0 containing your module code.
Creating the tar could be done through using the tar command in the command line. You can use this command to create the file,
tar --create --file=handin.tar files_to_include ...You can also untar the handout using the following command in the command line.
tar --extract handout.tar
# --extract: extract files from an archive.
For more information on using tar please refer to this website, https://www.computerhope.com/unix/utar.htm .
In this problem you will be given snippets of code. The snippets will be functions that you will be introduced to through out the course and famous functions you might use in basic machine learning algorithms. These functions will not be vectorized.
Your task is to vectorize the functions. That is, you have to replace the loop with numpy functions while maintaining its functionality.
In this problem you will be given a variable length synthetic dataset. You will be given two different types of data, uni-variate time-series data and multivariate time-series data.
Univariate time-series data will look something like this $(N, -)$ where $N$ is the number of instances and $-$ is the variable depending on the length of each instance.
Multivariate time-series data will look something like $(N, -, F)$ where $N$ is the number of instances, - is the variable depending on the length of each instance and $F$ is the dimension of the features of an instance.
Your task will revolve around processing the data so that time-series arrays have the same length. You can use loops in this part.
In this part of the problem you are required to slice the data to smaller lengths. That is, you will be chopping part of an instance to make all the instances in the dataset of the same length. To do that you have multiple options as to how to chop the dataset:
Note that no matter what method you use you need to make sure that the length you choose to reduce the sizes to is larger than or equal to the size of any instance in the dataset. In this problem we give you the correct length that is possible to achieve for all the utterances in the dataset.
Here are some examples of how the functions should behave like . The examples are 2-dimensional arrays. You should implement methods for 3-dimensional
arrays.
# For any (uXlY) in data, X stands for the index of the utterance and Y stands for the index of the feature in the feature vector of the utterance X.
Note that we cannot give you an example for random point because for each utterance there will be a different starting position of each utterance.
In this part of the problem you are required to pad the data to a larger/same lengths. That is you will be adding values to an instance to make all the instances in the dataset of the same length. To do that you have multiple options:
Here are some examples of how the functions should behave like
# For any (uXlY) in data, X stands for the index of the utterance and Y stands for the index of the feature in the feature vector of the utterance X.
PyTorch is an open-source deep learning library for python, and will be the primary framework throughout the course. You can install PyTorch referring to https://PyTorch.org/get-started/locally/.
One of the fundamental concepts in PyTorch is the Tensor, a multi-dimensional matrix containing elements of a single type. Tensors are similar to numpy nd-arrays and tensors support most of the functionality that numpy matrices do.
In following exercises, you will familiarize yourself with tensors and more importantly, the PyTorch documentation. It is important to note that for this section we are simply using PyTorch's tensors as a matrix library, just like numpy. So please do not use functions in torch.nn, like torch.nn.ReLU.
In PyTorch, it is very simple to convert between numpy arrays and tensors. PyTorch's tensor library provides functions to perform the conversion in either direction. In this task, you are to write 2 functions:
In this task, you are to implement the function tensor sumproducts that takes two tensors as input, and returns the sum of the element-wise products of the two tensors.
In this task, you are to implement the ReLU and ReLU Prime function for PyTorch Tensors.