由没有可以tran的optimization课,下学期可能要去上一门time series,这里正好就看到了,也许是天意呢。

通过继承torch.utils.data.Dataset 实现自定义时间序列数据集

1
torch.utils.data.Dataset
  • 这是一个抽象类,我们只需继承这个类,并且复写其中两个方法即可
    • __len__: 实现len(dataset)返回整个数据集的大小
    • __getitem__: 用来获取一些索引的数据,使dataset[i] 返回数据集中第 i 个样本
    • 注意:如果不复写的话会直接返回错误
      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      WINDOW_SIZE = 8
      class Covid19Dataset(Dataset):
      def __len__(self):
      return len(dfdiff) - WINDOW
      def __getitem___(self,i)
      x = dfdiff.loc[i:i+WINDOW_SIZE-1,;]
      feature = torch.tensor(x.values)
      y = dfdiff.loc[i+WINDOW_SIZE,:]
      label = torch.tensor(y.values)
      return (feature,label)

Define the model

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
import torch
from torch import nn
import importlib
import torchkeras

torch.random.seed(42)

class Block(nn.Module):
def __init__(self):
super(Block,self).__init__()
def forward(self,x,x_input):
x_out = torch.max((1+x)*x_input[:,-1,:],torch.tensor(0.0))
class Net(nn.Module):
def __init__(self):
super(Net,self).__init__()
self.lstm = nn.LSTM(input_size=3,hidden_size=3,num_layers=5,batch_first = True)
self.linear = nn.Linear(3,3)
self.block = BLock()

def forward()
x = self.lstm(x_input)[0][:,-1,:]
x = self.linear(x)
y = self.block(x,x_input)
return y
net = Net()
print(net)

Train the Model

  • 这里介绍一种不需要自定义循环的方式,但是我估计以后也不怎么用的到
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    from torchmetrics.regression import MeanAbsolutePercentageError

    def mspe(y_pred,y_true):
    err_percent = (y_true - y_pred)**2/(torch.max(y_true**2,torch.tensor(1e-7)))
    return torch.mean(err_percent)

    net = Net()
    loss_fn = mspe
    metric_dict = {"mape":MeanAbsolutePercentageError()}

    optimizer = torch.optim.Adam(net.parameters(), lr=0.01)
    lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=10, gamma=0.0001)
    1
    2
    3
    4
    5
    6
    from torchkeras import KerasModel 
    keras_model = KerasModel(net,
    loss_fn = loss_fn,
    metrics_dict= metric_dict,
    optimizer = optimizer,
    lr_scheduler = lr_scheduler)
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    dfhistory = keras_model.fit(train_data=dl_train,
    val_data=dl_val,
    epochs=100,
    ckpt_path='checkpoint',
    patience=10,
    monitor='val_loss',
    mode='min',
    callbacks=None,
    plot=True,
    cpu=True
    )
  • 我认为就一个38sample的数据集上面做训练和eval是有一点荒谬的。验证集和训练集用同一个就更荒谬了
  • 我感觉我也学了好多次lstm && RNN 了,现在怎么还是蒙蒙的呢,这是怎么回事呢。