## TensorBoard for Visualization

TensorBoard is a neural network visualization library developed by Google as part of Tensorflow. In the past, people can use TensorBoard in PyTorch via third-party adaptors like tensorboardX. Starting from 1.2.0 (the latest version), **PyTorch officially supports TensorBoard**. We recommend you to use the latest version of PyTorch and use its built-in support of TensorBoard for visualization.

This tutorial covers how to use PyTorch's official support of TensorBoard. You can also refer to the [official documentation](https://pytorch.org/docs/stable/tensorboard.html). If you insist on using an older version of PyTorch, try [tensorboardX](https://github.com/lanpa/tensorboardX).

Let's take up the same task as defined in Recitation 2. We'll be training a Neural Network to classify if a set of points $(x_1, x_2)$ lie inside a circle of radius $1$ or not. For more details on what the task is, please re-visit Recitation 2.


In [1]:
# Install required libraries
!pip install torch>=1.2.0 tensorboard future tqdm

In [8]:
import torch
import torch.nn as nn

Similar to Recitation 2, we first sample some polar co-ordinates that are randomly distributed within a circle of radius 2 and centered at origin, ie. $(0,0)$.

In [9]:
import math

def sample_points(n):
    """
    :param n: Total number of data-points
    :return: A tuple (X,y) where X is a float tensor with shape (n,2)
               and y is an interger tensor with shape(n,)
    """    
    radius = torch.rand(n) * 2
    angle = torch.rand(n) * 2 * math.pi
    x1 = radius * angle.cos()
    x2 = radius * angle.sin()
    y = radius < 1
    x = torch.stack([x1, x2], dim=1)
    return x, y

In [10]:
# Generating the data

X_train, y_train = sample_points(1000)
X_val,y_val = sample_points(500)

print(X_train.size(), y_train.size())

torch.Size([1000, 2]) torch.Size([1000])


In [11]:
# Build a simple MLP
def build_model(dims, activation):
    layers = []
    for i in range(len(dims) - 1):
        layers.append(nn.Linear(dims[i], dims[i + 1]))
        if i < len(dims) - 2:
            layers.append(activation())
    return nn.Sequential(*layers)

# Test the function
print(build_model([2, 12, 1], nn.Sigmoid))

Sequential(
  (0): Linear(in_features=2, out_features=12, bias=True)
  (1): Sigmoid()
  (2): Linear(in_features=12, out_features=1, bias=True)
)


A SummaryWriter writes all values we want to visualize to a given directory. This line creates a SummaryWriter that creates write event files and saves in the `./runs/example` directory.
```python
from torch.utils.tensorboard import SummaryWriter
writer = SummaryWriter("./runs/example")
```
You should use different run directories in a common root directory for different runs of your model. TensorBoard looks for runs in the root directory. So for this example, we start the TensorBoard with:

```sh
tensorboard --logdir=./runs
```
Then, visit `localhost:6006` with your browser to see the TensorBoard.

Each time we add a value, we specify a **tag** and a **step**. Each tag is a string and corresponds to a plot on TensorBoard. The step is an integer (`epoch` in this example) that serves as the X axis on the plot.

To plot a single scalar, use [*SummaryWriter.add_scalar()*](https://pytorch.org/docs/stable/tensorboard.html#torch.utils.tensorboard.writer.SummaryWriter.add_scalar). To plot multiple scalars on a plot, use [*SummaryWriter.add_scalars()*](https://pytorch.org/docs/stable/tensorboard.html#torch.utils.tensorboard.writer.SummaryWriter.add_scalars) and pass in a dict of scalars.

Using [*SummaryWriter.add_histogram()*](https://pytorch.org/docs/stable/tensorboard.html#torch.utils.tensorboard.writer.SummaryWriter.add_histogram) to plot a histogram of values in a tensor is also useful for understanding the dynamics of the network.

In [12]:
from tqdm import tqdm

def train(model, writer, epochs=1000):
    criterion = nn.BCEWithLogitsLoss()
    optimizer = torch.optim.Adam(model.parameters())
    for epoch in tqdm(range(epochs)):
        model.zero_grad()
        out = model(X_train).flatten()
        loss = criterion(out, y_train.float())
        train_loss = loss.item()
        train_acc = ((out > 0) == y_train).float().mean().item()

        loss.backward()
        # Plot histogram of gradient of all parameters
        for name, param in model.named_parameters():
            writer.add_histogram('grad_' + name, param.grad.data, epoch)
        optimizer.step()
    
        with torch.no_grad():
            out = model(X_val).flatten()
            val_loss = criterion(out, y_val.float()).item()
            val_acc = ((out > 0) == y_val).float().mean().item()
        # Plot loss and accuracy on train and val
        writer.add_scalars('loss', {'train': train_loss, 'val': val_loss}, epoch)
        writer.add_scalars('acc', {'train': train_acc, 'val': val_acc}, epoch)

## Using Different Activation functions

Let's see and understand how each of these activation functions perform.

- Sigmoid
    * Get values between 0 and 1.
    * A Sigmoid layer easily dies or saturates. A value too small kills the gradient flow whereas a value too big saturates the neurons, effectively passing no information through it.
    
- Tanh
    * Outputs values between -1 and 1. Also zero centered and so does not have the problem of all positive/negative gradients.
    * Better than Sigmoid but problem of saturation persists.

- ReLU
    * Converges quickly as is a threshold based activation and does not saturate.
    * Neurons die off. Large weight update could set the weights in such a way (they become negative) during backpropagation that they never fire for any data point. Important to set lower learning rates for ReLU.
    
    
**TRY IT OUT**

Use all the 3 activation functions. See which performs better and try to find out why. 

In [13]:
from torch.utils.tensorboard import SummaryWriter
writer = SummaryWriter("./runs/sigmoid")
model = build_model([2, 12, 1], nn.Sigmoid)
train(model, writer)

writer = SummaryWriter("./runs/tanh")
model = build_model([2, 12, 1], nn.Tanh)
train(model, writer)

writer = SummaryWriter("./runs/relu")
model = build_model([2, 12, 1], nn.ReLU)
train(model, writer)

100%|██████████| 1000/1000 [00:09<00:00, 107.35it/s]
100%|██████████| 1000/1000 [00:11<00:00, 85.39it/s]
100%|██████████| 1000/1000 [00:11<00:00, 86.21it/s]


Open TensorBoard and see the result!

![TensorBoard: accuracy](tensorboard_acc.png)

![TensorBoard: gradient distribution](tensorboard_grad.png)