Why is the tensorflow maxout not calculating the gradient respectively where is the mistake?-CodePudding

I am trying to use the tensorflow maxout implementation (https://www.tensorflow.org/addons/api_docs/python/tfa/layers/Maxout) but struggle with it;

I try to illustrate my problem: If I have the following

d=3


x_in=Input(shape=d)

x_out=Dense(d, activation='relu')(x_in)
model = Model(inputs=x_in, outputs=x_out)


model.compile(optimizer='adam', loss='MeanAbsoluteError')

X=tf.random.normal([200,3])
Y=tf.random.normal([200,3])

model.fit(X, Y, epochs=5, batch_size=32)

Then it is working normally, i.e. the loss is continuously getting smaller and I can get the estimated weights:

model.layers[1].get_weights()
Out[141]: 
[array([[-0.15133516, -0.14892222, -0.64674205],
        [ 0.34437487,  0.7822309 , -0.08931279],
        [-0.8330534 , -0.13827904, -0.23096593]], dtype=float32),
 array([-0.03069788, -0.03311999, -0.02603031], dtype=float32)]

However, when I want to use a maxout activation instead, things do not work out

d=3


x_in=Input(shape=d)

x_out = tfa.layers.Maxout(3)(x_in)
model = Model(inputs=x_in, outputs=x_out)


model.compile(optimizer='adam', loss='MeanAbsoluteError')

X=tf.random.normal([200,3])
Y=tf.random.normal([200,3])

model.fit(X, Y, epochs=5, batch_size=32)

The loss stays constant for all Epochs and

model.layers[1].get_weights()
Out[141]: []

Where is my mistake?

CodePudding user response：

It will only work in combination with another layer, for example a Dense layer. Also, the Maxout layer itself does not have any trainable weights as you can see in the model summary but it does have a hyperparameter num_units:

import tensorflow as tf
import tensorflow_addons as tfa

d=3
x_in=tf.keras.layers.Input(shape=d)
x = tf.keras.layers.Dense(3)(x_in)
x_out = tfa.layers.Maxout(3)(x)
model = tf.keras.Model(inputs=x_in, outputs=x_out)

model.compile(optimizer='adam', loss='MeanAbsoluteError')

X=tf.random.normal([200,3])
Y=tf.random.normal([200,3])

model.fit(X, Y, epochs=5, batch_size=32)
print(model.summary())

Epoch 1/5
7/7 [==============================] - 0s 2ms/step - loss: 1.0404
Epoch 2/5
7/7 [==============================] - 0s 3ms/step - loss: 1.0361
Epoch 3/5
7/7 [==============================] - 0s 2ms/step - loss: 1.0322
Epoch 4/5
7/7 [==============================] - 0s 2ms/step - loss: 1.0283
Epoch 5/5
7/7 [==============================] - 0s 3ms/step - loss: 1.0244
Model: "model_5"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 input_6 (InputLayer)        [(None, 3)]               0         
                                                                 
 dense_5 (Dense)             (None, 3)                 12        
                                                                 
 maxout_4 (Maxout)           (None, 3)                 0         
                                                                 
=================================================================
Total params: 12
Trainable params: 12
Non-trainable params: 0
_________________________________________________________________
None

Maybe also take a look at the paper regarding Maxout:

The maxout model is simply a feed-forward achitecture, such as a multilayer perceptron or deep convolutional neural network, that uses a new type of activation function: the maxout unit.