I am training a neural network on video frames (converted to greyscale) to output a tensor with two values. The first iteration always evaluates an acceptable loss (mean squared error generally between 15-40), followed by an exponential rise in the second pass, and then infinite.
The net is quite vanilla:
class NeuralNetwork(nn.Module):
def __init__(self):
super(NeuralNetwork, self).__init__()
self.flatten = nn.Flatten()
self.linear_relu_stack = nn.Sequential(
nn.Linear(100 * 291, 29100),
nn.ReLU(),
nn.Linear(29100, 29100),
nn.ReLU(),
nn.Linear(29100, 2),
)
def forward(self, x):
x = self.flatten(x)
logits = self.linear_relu_stack(x)
return logits
As is the training loop:
def train(dataloader, model, loss_fn, optimizer):
size = len(dataloader.dataset)
model.train()
for batch, (X, y) in enumerate(dataloader):
X, y = X.to("cpu"), y.to("cpu")
# Compute prediction error
pred = model(X)
loss = loss_fn(pred, y)
# Backpropogation
optimizer.zero_grad()
loss.backward()
optimizer.step()
loss_fn = nn.MSELoss()
optimizer = torch.optim.SGD(model.parameters(), lr=1e-3)
Example of loss function growth:
ITERATION 1
prediction: tensor([[-1.2239, -8.2337]], grad_fn=<AddmmBackward>)
actual: tensor([[0.0321, 0.0325]])
loss: tensor(34.9545, grad_fn=<MseLossBackward>)
ITERATION 2
prediction: tensor([[ 314636.5625, 2063098.2500]], grad_fn=<AddmmBackward>)
actual: tensor([[0.0330, 0.0323]])
loss: tensor(2.1777e 12, grad_fn=<MseLossBackward>)
ITERATION 3
prediction: tensor([[-8.0924e 22, -5.3062e 23]], grad_fn=<AddmmBackward>)
actual: tensor([[0.0334, 0.0317]])
loss: tensor(inf, grad_fn=<MseLossBackward>)
Here is an example of the video data: it's a 291x100 greyscale image and there are 1100 of them in the training dataset:
dataset.video_frames.size()
> torch.Size([1100, 100, 291])
dataset.video_frames[0]
> tensor([[21., 29., 28., ..., 33., 27., 26.],
[22., 27., 25., ..., 25., 25., 30.],
[23., 26., 26., ..., 24., 24., 28.],
...,
[24., 33., 31., ..., 41., 40., 42.],
[26., 34., 31., ..., 26., 20., 22.],
[25., 32., 32., ..., 21., 20., 18.]])
And the labeled training data:
dataset.y.size()
> torch.Size([1100, 2])
dataset.y[0]
> tensor([0.0335, 0.0315], dtype=torch.float)
I've fiddled the learning rate, number of hidden layers, and nothing seems to keep the loss from going to infinite.
CodePudding user response:
Properly scaling the inputs is crucial for proper training. Weights are initialized based on some assumptions on the way inputs are scaled. See this part of a lecture on weight initialization and see how critical it is for proper convergence.
More details on the mathematical analysis of the influence of weight initialization can be found in Sec. 2 of this paper:
Kaiming He, Xiangyu Zhang, Shaoqing Ren and Jian Sun Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification (ICCV 2015).
