In Lab2, you will learn deep learning basics and use them to build an NN4Sys application, Learned Index. For the basics, we heavily borrow from a textbook Dive into Deep Learning (d2l), in particular S3.2. Object-Oriented Design for Implementation. It provides a nice "object-oriented" interface for us to organize deep learning code, with some useful utility functions. On the second part, you will train a learned index of your own.
$ conda create -n cs7670 python=3
: create a conda environment called "cs7670"$ conda activate cs7670
: activate this environment$ pip install d2l==1.0.0a1
: install necesssary packages from d2l$ pip install torch torchvision termcolor
: install PyTorch and a color printing package$ conda install ipykernel
: install IPython kernel $ cd ~
$ git clone git@github.com:NEU-CS7670-labs/lab2-<Your-GitHub-Username>.git lab2
Note that the repo address git@github.com:...
can be obtained by going to
GitHub repo page (your cloned lab2), clicking the green "Code" button, then choose "SSH".
$ cd ~/lab2; ls
// you should see:
FighterJet.mp4 Lab2.ipynb data utils.ipynb
$ cd ~/lab2
$ conda activate cs7670 # if you haven't
$ jupyter-notebook
This should open your default browser. Click the file named Lab2.ipynb
.
A note about Jupyter notebook. If you're not familiar with Jupyter notebook, here is a quick tutorial. We will only use the basics, and you don't have to be an expert of this tool.
This section is a revised version borrowed from Object-Orented Design for Implementation from d2l.ai.
below you should start to run the code snippets (click "Run" button on the top toolbar, or press "Ctrl + Enter" by default)
from termcolor import colored
def info(msg):
assert isinstance(msg, str)
print(colored(msg, "magenta", attrs=['bold']))
info("Active environment should be cs7670:")
! conda info | grep 'active env'
import time
import random
import numpy as np
import torch
from torch import nn
from d2l import torch as d2l
We need a few utilities to simplify object-oriented programming in Jupyter notebooks. One of the challenges is that class definitions tend to be fairly long blocks of code. Notebook readability demands short code fragments, interspersed with explanations, a requirement incompatible with the style of programming common for Python libraries. The first utility function allows us to register functions as methods in a class after the class has been created. In fact, we can do so even after we’ve created instances of the class! It allows us to split the implementation of a class into multiple code blocks.
def add_to_class(Class):
def wrapper(obj):
setattr(Class, obj.__name__, obj)
return wrapper
class HelloWorld:
def __init__(self):
super().__init__()
self.msg = "nothing"
# create an instance of HelloWorld
hello = HelloWorld()
info(f"an instance of class HelloWorld has a message hello.msg=``{hello.msg}''")
Next let's add one function to the above class HelloWorld
.
# add one function to class
@add_to_class(HelloWorld)
def update_msg(self, msg):
self.msg = msg
# update the msg to "hello world"
hello.update_msg("hello world")
info(f"the same instance now has a message hello.msg=``{hello.msg}''")
# Exercise: use "@add_to_class" helper to implement a function "print_msg"
# that prints "self.msg" stored in the HelloWorld instance.
# TODO: your code here
# print message
info('Expected to see "hello world"')
hello.print_msg()
The second one is a utility class that saves all arguments in a class’s
__init__
method as class attributes. This allows us to extend constructor call signatures implicitly without additional code.
# ClassHyperParameters saves all arguments in a class's
# `__init__` method as class attributes.
class HyperParameters:
def save_hyperparameters(self, ignore=[]):
# saves all arguments in a class's `__init__` method as class attributes.
pass
To use it, we define our class that inherits from HyperParameters and calls save_hyperparameters in the
__init__
method.
# Call the fully implemented HyperParameters class saved in d2l
class A(d2l.HyperParameters):
def __init__(self, a, b):
print('self.a =', self.a)
info("you should see an AttributeError.\n")
tmp = A(a=1, b=2)
@add_to_class(A)
def __init__(self, a, b):
self.save_hyperparameters(ignore=['b'])
print('self.a =', self.a)
print('There is no self.b =', not hasattr(self, 'b'))
info("you should see no errors now.")
tmp = A(a=1, b=2)
The last utility allows us to plot experiment progress interactively while it is going on. In deference to the much more powerful (and complex) TensorBoard we name it ProgressBoard.
The draw function plots a point (x, y) in the figure, with label specified in the legend. The optional every_n smooths the line by only showing points in the figure. Their values are averaged from the neighbor points in the original figure.
board = d2l.ProgressBoard('this is name')
for x in np.arange(0, 10, 0.1):
board.draw(x, np.sin(x), 'sin', every_n=2)
board.draw(x, np.cos(x), 'cos', every_n=10)
The Module class (Lab2Module
below) is the base class of all models we will implement. At a minimum we need to define three methods:
__init__
method stores the learnable parameters,training_step
method accepts a data batch to return the loss value,configure_optimizers
method returns the optimization method, or a list of them, that is used to update the learnable parameters.class Lab2Module(nn.Module, d2l.HyperParameters):
def __init__(self, plot_train_per_epoch=2, plot_valid_per_epoch=1):
super().__init__()
self.save_hyperparameters()
self.board = d2l.ProgressBoard()
def loss(self, y_hat, y):
raise NotImplementedError
def forward(self, X):
assert hasattr(self, 'net'), 'Neural network is defined'
return self.net(X)
def plot(self, key, value, train):
"""Plot a point in animation."""
assert hasattr(self, 'trainer'), 'Trainer is not inited'
self.board.xlabel = 'epoch'
if train:
x = self.trainer.train_batch_idx / \
self.trainer.num_train_batches
n = self.trainer.num_train_batches / \
self.plot_train_per_epoch
else:
x = self.trainer.epoch + 1
n = self.trainer.num_val_batches / \
self.plot_valid_per_epoch
self.board.draw(x, value.to(d2l.cpu()).detach().numpy(),
('train_' if train else 'val_') + key,
every_n=int(n))
def training_step(self, batch):
l = self.loss(self(*batch[:-1]), batch[-1])
self.plot('loss', l, train=True)
return l
def configure_optimizers(self):
raise NotImplementedError
You may notice that Module is a subclass of
nn.Module
, the base class of neural networks in PyTorch. It provides convenient features to handle neural networks. For example, if we define aforward
method, such asforward(self, X)
, then for an instancea
we can invoke this function bya(X)
. This works since it calls the forward method in the built-in__call__
method.
The DataModule class (
Lab2Data
below) is the base class for data. Quite frequently the__init__
method is used to prepare the data. This includes downloading and preprocessing if needed. Thetrain_dataloader
returns the data loader for the training dataset. A data loader is a (Python) generator that yields a data batch each time it is used. This batch is then fed into thetraining_step
method of Module to compute the loss.
class Lab2Data(d2l.HyperParameters):
def __init__(self, root='./data', num_workers=4):
self.save_hyperparameters()
def get_dataloader(self, train):
raise NotImplementedError
def train_dataloader(self):
return self.get_dataloader(train=True)
The
Trainer
class (Lab2Trainer
below) trains the learnable parameters in theModule
class with data specified inDataModule
. The key method isfit
, which accepts two arguments:model
, an instance of Module, anddata
, an instance of DataModule. It then iterates over the entire datasetmax_epochs
times to train the model.
class Lab2Trainer(d2l.HyperParameters):
def __init__(self, max_epochs, num_gpus=0, gradient_clip_val=0):
self.save_hyperparameters()
assert num_gpus == 0, 'No GPU support yet'
def prepare_data(self, data):
self.train_dataloader = data.train_dataloader()
self.num_train_batches = len(self.train_dataloader)
def prepare_model(self, model):
model.trainer = self
model.board.xlim = [0, self.max_epochs]
self.model = model
def fit(self, model, data):
self.prepare_data(data)
self.prepare_model(model)
self.optim = model.configure_optimizers()
self.epoch = 0
self.train_batch_idx = 0
self.val_batch_idx = 0
for self.epoch in range(self.max_epochs):
self.fit_epoch()
def fit_epoch(self):
raise NotImplementedError
Get yourself reasonably comfortable with how Module
, Data
, and Trainer
interact with each other because you will need to fill in the raise NotImplementedError
soon.
Watch the "AI fighter jet" problem in the vedio: either see it here or play ~/lab2/FighterJet.mp4
.
To summarize, we want to train a figther jet NN that follows a safety rule:
the fighter jet fires iff the number of missiles on-the-fly is greater than zero.
According to the safety rule, the jet will not fire no matter how many other (enemy) jets exist.
FighterJetData
¶You will implement a dataset (fill in __init__
) for the fighter jet, and use it for training.
self.X
will store the NN inputs (see below). It will be a tensor of "[#jets, #missiles]"self.Y
will store the outputs (a tensor of size one). It will contain a firing score: 0
means not-fire, 1
means fire.self.Y
in __init__
according to our safety rule.class FighterJetData(Lab2Data):
def __init__(self, num_train=1000, batch_size=32):
super().__init__()
self.save_hyperparameters()
# prepare training inputs
n = num_train # total number of istances
jets = torch.randint(0, 20, (n,)).float() # get a random #jets from [0,20)
missiles = torch.randint(0, 3, (n,)).float() # get a random #missiles from [0,3)
self.X = torch.stack((jets, missiles), -1) # stack tensors to [[#jets, #missiles], ...]
# TODO: your code here
self.Y = None
def get_dataloader(self, train):
assert train, "We only use this dataset for training."
dataset = torch.utils.data.TensorDataset(self.X, self.Y)
return torch.utils.data.DataLoader(dataset, self.batch_size, shuffle=train)
info("""
you should see something like:
x= tensor([[13., 1.]]) y= tensor([[1.]])
x= tensor([[14., 0.]]) y= tensor([[0.]])
x= tensor([[1., 2.]]) y= tensor([[1.]])
x= tensor([[6., 0.]]) y= tensor([[0.]])
x= tensor([[5., 2.]]) y= tensor([[1.]])
x= tensor([[4., 1.]]) y= tensor([[1.]])
x= tensor([[15., 0.]]) y= tensor([[0.]])
x= tensor([[12., 0.]]) y= tensor([[0.]])
x= tensor([[8., 2.]]) y= tensor([[1.]])
x= tensor([[18., 1.]]) y= tensor([[1.]])
Check if the output value follows our safety rule:
x[1]>0 => y=1 and x[1]=0 => y=0
If not, you need to fix it.
""")
a = FighterJetData(10,1)
for x,y in a.train_dataloader():
print("x=",x, "y=",y)
FighterJetModule
¶Next, you will implement a NN to learn from the training data.
class FighterJetModule(Lab2Module):
def __init__(self, plot_train_per_epoch=2, plot_valid_per_epoch=1):
super().__init__()
self.save_hyperparameters()
self.board = d2l.ProgressBoard()
# TODO: your code here
self.net = None
info("""
you should see something like:
Sequential(
(0): Linear(in_features=2, out_features=16, bias=True)
(1): ReLU()
(2): Linear(in_features=16, out_features=16, bias=True)
(3): ReLU()
(4): Linear(in_features=16, out_features=1, bias=True)
)
""")
m = FighterJetModule()
print(m.net)
loss
and configure_optimizers
of FighterJetModule
¶Next we need to implement the loss function and add a optimizer to the module.
troch.optim.Adam
or troch.optim.SGD
.@add_to_class(FighterJetModule)
def loss(self, y_hat, y):
# `y` is the true label
# `y_hat` is the predicted label from the current NN
# TODO: your code here
return None
@add_to_class(FighterJetModule)
def configure_optimizers(self):
# you should return an opertimizer from `torch.optim`.
# TODO: your code here
return None
Untile now, you've implemented the NN architecture (exercise 1), training dataset (exercise 2), loss function, and an optimizer (exercise 3). Next is an implementation of one round of training. Read line by line to make sure you understand them.
Here are some pointers:
@add_to_class(Lab2Trainer)
def fit_epoch(self):
self.model.train() # Line A
for batch in self.train_dataloader:
loss = self.model.training_step(batch)
self.optim.zero_grad() # Line B
with torch.no_grad(): # Line C
loss.backward()
self.optim.step() # line D
self.train_batch_idx += 1
# Training
model = FighterJetModule() # create a model
data = FighterJetData() # create dataset
trainer = Lab2Trainer(max_epochs=20) # create trainer, train 20 epochs
# Train!
# you will see the loss changes while training (lower loss is better)
trainer.fit(model, data)
# check if the NN learned the safety rule
info("given an input Tensor(10000,0) [#jets, #missiles], should we fire?\n \
(by safety rule in the video, no), but...")
with torch.no_grad():
ret = model.forward(torch.Tensor([1345,0]))
print("fire?", ret > 0.5)
%run utils.ipynb
p_points = 0
n_points = 0
with torch.no_grad():
for p,n in zip(get_positive_tests(), get_negative_tests()):
if model.forward(p).item() >= 0.5:
p_points += 1
if model.forward(n).item() < 0.5:
n_points += 1
info(f"=== points ===\n"
f" positive: [{p_points}/{get_num_positive_cases()}]\n"
f" negative: [{n_points}/{get_num_negative_cases()}]\n"
f" total: [{p_points+n_points}/{get_num_positive_cases() + get_num_negative_cases()}]")
Try to train a NN that produces
=== points ===
positive: [500/500]
negative: [500/500]
total: [1000/1000]
Hint: this is supposed to be a non-trivial job (but sometimes people get lucky). If you're struggling, you might want to re-implement the NN (what NNs have larger learning capacity?), and also modify the training dataset (what data will let your NN learn the safety rule?).
In this section, we're trying to replicate "S2.3 A First, Naive Learned Index" in the learned index paper, where we use one neural network to learn a sorted dataset.
# below are some global variables (hypterparameters)
# they are here for an easier hyperparameter tuning
# (you will need to come back and change them)
m_learning_rate = 0.01
m_batch_size = 128
m_max_epochs = 40
m_normalize = True
%run utils.ipynb
import matplotlib.pyplot as plt
# (1) study datasets
datasets = {
"easy" : get_linear_dataset(batch_size=m_batch_size, normalize=m_normalize),
"medium" : get_lognormal_dataset(batch_size=m_batch_size, normalize=m_normalize),
"hard" : get_wiki_dataset(batch_size=m_batch_size, normalize=m_normalize)
}
# visualize the distribution of three cases
def plot_distribution(name):
xs = []
ys = []
for x,y in datasets[name].dataset:
xs.append(x.item())
ys.append(y.item())
plt.plot(xs, ys)
plt.xlabel("database key")
plt.ylabel("data position")
plt.title(f"dataset [{name}]")
plt.show()
for name in datasets:
plot_distribution(name)
Of course, a simple starting point will be an MLP. You can implement whatever NNs you want and compare their performance.
# (2) define your model (NN)
class LearnedIndex(d2l.Module):
def __init__(self):
super().__init__()
self.save_hyperparameters()
# TODO: your code here
self.net = None
def loss(self, y_hat, y):
# TODO: your code here
return None
def configure_optimizers(self):
# TODO: your code here; remeber to use global var, `learning_rate`
# (for simpler parameter tuning)
return None
# TODO: choose the dataset to learn
my_dataset = datasets["easy"]
# prepare training
model = LearnedIndex() # create a model
data = my_dataset # create dataset
trainer = d2l.Trainer(max_epochs=m_max_epochs) # create trainer
# Train!
trainer.fit(model, data)
Below is a test of how well your learned index perform. The higher the "index points", the better.
# see how well our learned index is
%run utils.ipynb
ind_points = 0
with torch.no_grad():
# assert "index_err_bound" in globals(), "run %run utils.ipynb"
for x,y in my_dataset.dataset:
if abs(model.forward(x).item() - y.item() ) <= my_dataset.get_err_bound():
ind_points += 1
info(f"=== index points ===\n"
f" [{ind_points}/{len(my_dataset.dataset)}]\n")
Can you achieve the following learned index performance?
# for easy dataset
=== index points ===
[9000/10000]
# for medium dataset
=== index points ===
[8000/10000]
# for hard dataset
=== index points ===
[7000/10000]
Hints: try to tune parameters and hyperparameters (go back to the code block with global parameters), including:
Implement RMI in the learned index paper, and train your RMIs to achieve
# for easy dataset
=== index points ===
[10000/10000]
# for medium dataset
=== index points ===
[10000/10000]
# for hard dataset
=== index points ===
[10000/10000]
# write whatever code you need to build RMI here