ValueError: Can not squeeze dim[1], expected a dimension of 1 for '{{node binary

I'm trying to fit a LSTM-model to my data with a Masking Layer in front and I get this error:

ValueError: Can not squeeze dim[1], expected a dimension of 1, got 4 for '{{node binary_crossentropy/weighted_loss/Squeeze}} = Squeeze[T=DT_FLOAT, squeeze_dims=[-1]](Cast)' with input shapes: [128,4].

This is my code:

from tensorflow.keras.layers import LSTM, Dense, BatchNormalization, Masking
from tensorflow.keras.losses import BinaryCrossentropy
from tensorflow.keras.models import Sequential
from tensorflow.keras.optimizers import Nadam
import numpy as np

if __name__ == '__main__':
    
    # define stub data
    samples, timesteps, features = 128, 4, 99
    X = np.random.rand(samples, timesteps, features)
    Y = np.random.randint(0, 2, size=(samples))
    
    # create model
    model = Sequential()
    model.add(Masking(mask_value=0., input_shape=(None, 99)))
    model.add(LSTM(100, return_sequences=True))
    model.add(BatchNormalization())
    model.add(Dense(1, activation='sigmoid'))
    optimizer = Nadam(learning_rate=0.0001)
    loss = BinaryCrossentropy(from_logits=False)
    model.compile(loss=loss, optimizer=optimizer)

    # train model
    model.fit(
        X,
        Y,
        batch_size=128)

I see from this related post, that I can't use one-hot encoded labels, but my labels are not one-hot encoded. Also, when I remove the masking layer, training works.

From my understanding one sample consists of 4 timesteps with 99 features here. The shape of X is therefore (128,4,99) Therefore, I only have to provide one label for each sample, the shape of Y therefore being (128,)

But it seems like the dimensions of X and or Y are not correct, as tensorflow wants to change its dimensions? I have tried providing a label per timestep of each sample (Y = np.random.randint(0, 2, size=(samples, timesteps)), with the same result.

Why does adding the masking layer introduce this error? And how can I keep the masking layer without getting the error?

System Information:

Python version: 3.9.5
Tensorflow version: 2.5.0
OS: Windows

CodePudding user response：

I don't think the problem is the Masking layer. Since you set the parameter return_sequences to True in the LSTM layer, you are getting a sequence with the same number of time steps as your input and an output space of 100 for each timestep, hence the shape (128, 4, 100), where 128 is the batch size. Afterwards, you apply a BatchNormalization layer and finally a Dense layer resulting in the shape (128, 4, 1). The problem is your labels have a 2D shape (128, 1) and your model has a 3D output due to the return_sequences parameter. So, simply setting this parameter to False should solve your problem. See also this post.

Here is a working example:

from tensorflow.keras.layers import LSTM, Dense, BatchNormalization, Masking
from tensorflow.keras.losses import BinaryCrossentropy
from tensorflow.keras.models import Sequential
from tensorflow.keras.optimizers import Nadam
import numpy as np

if __name__ == '__main__':
    
    # define stub data
    samples, timesteps, features = 128, 4, 99
    X = np.random.rand(samples, timesteps, features)
    Y = np.random.randint(0, 2, size=(samples))
    
    # create model
    model = Sequential()
    model.add(Masking(mask_value=0., input_shape=(None, 99)))
    model.add(LSTM(100, return_sequences=False))
    model.add(BatchNormalization())
    model.add(Dense(1, activation='sigmoid'))
    optimizer = Nadam(learning_rate=0.0001)
    loss = BinaryCrossentropy(from_logits=False)
    model.compile(loss=loss, optimizer=optimizer)

    # train model
    model.fit(
        X,
        Y,
        batch_size=128)