I was trying to understand how K.layers.Dropout might be implemented, since in the literature it's always referred as a random independent sampling of 0/1 masks for each element.
Given that the literature it's pretty clear to me, I switched to coding it, and I stumbled upon an issue: since TF uses Graphs, we don't know the sample size, in particular:
def class CustomLayer(K.keras.Layer)
def call(inputs):
tf.print(inputs.shape)
will indeed print (supposing the eager evaluation is turned off) None as first dimension
Having said that, how is TF able to sample an independent mask for each sample in each minibatch?
At the moment my best guess is that they are using something like tf.vectorized_map to get the performance they are getting with a random mask for each element in the minibatch
CodePudding user response:
I traced the code for tf.keras.layers.Dropout.call in an effort to answer the following question (tensorflow 2.9):
how is TF able to sample an independent mask for each sample in each minibatch?
In summary, a random uniform distribution is sampled from [0, 1) with the same shape as the input (including batch dimension). This allows the method to use an independent mask for each sample. The noise array is then made into a boolean mask based on the dropout rate. This is all assuming that one keeps noise_shape=None when instantiating the Dropout layer.
I have copied the relevant lines below.
noise_shape = _get_noise_shape(x, noise_shape)
# Sample a uniform distribution on [0.0, 1.0) and select values larger
# than or equal to `rate`.
random_tensor = uniform_sampler(shape=noise_shape, dtype=x_dtype)
keep_mask = random_tensor >= rate
ret = gen_math_ops.mul(ret, gen_math_ops.cast(keep_mask, x_dtype))
In the case that noise_shape=None in the Dropout layer, _get_noise_shape will return the shape of the input x. This is done with the graph-compatible method tf.shape, which evaluates the shape of the tensor at runtime.
Here is an overview of the process for the TensorFlow / Keras v2 API.
- Instantiate
tf.keras.layers.Dropoutlayer (withnoise_shape=None). - Call the
Dropout.callinstance method on an inputx. - Call
self._random_generator.dropout, which callsBaseRandomLayer._random_generator.dropout, which callstf.nn.experimental.stateless_dropout- There is conditional logic in
BaseRandomLayer._random_generator.dropout: v2 api will usestateless_dropoutand v1 api will usetf.nn.dropout.
- There is conditional logic in
- Call private method
_dropout, which then constructs the noise array to be the same shape as the input tensorx. - Apply the noise array to the input, and return the result.
