Weighted Average: Custom layer weights don't change in TensorFlow 2.2.0

ghz 1years ago ⋅ 2030 views

Question

I am trying to implement a weighted average between two tensors in TensorFlow, where the weight can be learned automatically. Following the advice on how to design a custom layer for a keras model here, my attempt is the following:

class WeightedAverage(tf.keras.layers.Layer):
    def __init__(self):
        super(WeightedAverage, self).__init__()

        init_value = tf.keras.initializers.Constant(value=0.5)

        self.w = self.add_weight(name="weight",
                                 initializer=init_value,
                                 trainable=True)

    def call(self, inputs):
        return tf.keras.layers.average([inputs[0] * self.w,
                                        inputs[1] * (1 - self.w)])

Now the problem is that after training the model, saving, and loading it again, the value for w remains 0.5. Is it possible that the parameter does not receive any gradient updates? When printing the trainable variables of my model, the parameter is listed and should therefore be included when calling model.fit.


Answer

Here is a possibility to implement a weighted average between two tensors, where the weight can be learned automatically. I also introduce the constrain that the weights must sum up to 1. To grant this we have to simply apply a softmax on our weights. In the dummy example below I combine with this method the output of two fully-connected branches but you can manage it in every other scenario

here the custom layer:

class WeightedAverage(Layer):
    
    def __init__(self):
        super(WeightedAverage, self).__init__()
        
    def build(self, input_shape):
        
        self.W = self.add_weight(
                    shape=(1,1,len(input_shape)),
                    initializer='uniform',
                    dtype=tf.float32,
                    trainable=True)
        
    def call(self, inputs):

        # inputs is a list of tensor of shape [(n_batch, n_feat), ..., (n_batch, n_feat)]
        # expand last dim of each input passed [(n_batch, n_feat, 1), ..., (n_batch, n_feat, 1)]
        inputs = [tf.expand_dims(i, -1) for i in inputs]
        inputs = Concatenate(axis=-1)(inputs) # (n_batch, n_feat, n_inputs)
        weights = tf.nn.softmax(self.W, axis=-1) # (1,1,n_inputs)
        # weights sum up to one on last dim

        return tf.reduce_sum(weights*inputs, axis=-1) # (n_batch, n_feat) 

here the full example in a regression problem:

inp1 = Input((100,))
inp2 = Input((100,))
x1 = Dense(32, activation='relu')(inp1)
x2 = Dense(32, activation='relu')(inp2)
W_Avg = WeightedAverage()([x1,x2])
out = Dense(1)(W_Avg)

m = Model([inp1,inp2], out)
m.compile('adam','mse')

n_sample = 1000
X1 = np.random.uniform(0,1, (n_sample,100))
X2 = np.random.uniform(0,1, (n_sample,100))
y = np.random.uniform(0,1, (n_sample,1))

m.fit([X1,X2], y, epochs=10)

in the end, you can also visualize the value of the weights in this way:

tf.nn.softmax(m.get_weights()[-3]).numpy()