When use custom layer in Tensorflow 2.0, the gradient returns None-CodePudding

I designed a custom layer for use like this:

class SquaredWeightedLayer(tf.keras.layers.Layer):
    def __init__(self, units=1):
        super(SquaredWeightedLayer, self).__init__()
        self.units = units

    def build(self, input_shape):
        self.w = self.add_weight(shape=(input_shape[-1], self.units), initializer="random_normal", trainable=True)
        self.square_w = self.w ** 2
        self.b = self.add_weight(shape=(self.units,), initializer="zeros", trainable=True)
        super(SquaredWeightedLayer, self).build(input_shape)

    def call(self, inputs):
        return tf.sigmoid(tf.add(tf.matmul(inputs, self.square_w), self.b))

However, the tape.gradient(loss, self.w) returns None and tape.gradient(loss, self.square_w) returns normal value. loss is binary_crossentropy.

I will very appreciate any suggestion for fixing this. Thanks!

CodePudding user response：

The problem is that the computation of self.w ** 2 is already executed in the build function, outside any tape context, so Tensorflow cannot trace that square_w actually came from w, leading to a gradient of None. You can fix it by simply moving the square operation into call:

def call(self, inputs):
    return tf.sigmoid(tf.add(tf.matmul(inputs, self.w**2), self.b))