Question
I am trying to implement a weighted average between two tensors in TensorFlow, where the weight can be learned automatically. Following the advice on how to design a custom layer for a keras model here, my attempt is the following:
class WeightedAverage(tf.keras.layers.Layer):
def __init__(self):
super(WeightedAverage, self).__init__()
init_value = tf.keras.initializers.Constant(value=0.5)
self.w = self.add_weight(name="weight",
initializer=init_value,
trainable=True)
def call(self, inputs):
return tf.keras.layers.average([inputs[0] * self.w,
inputs[1] * (1 - self.w)])
Now the problem is that after training the model, saving, and loading it
again, the value for w
remains 0.5. Is it possible that the parameter does
not receive any gradient updates? When printing the trainable variables of my
model, the parameter is listed and should therefore be included when calling
model.fit
.
Answer
Here is a possibility to implement a weighted average between two tensors, where the weight can be learned automatically. I also introduce the constrain that the weights must sum up to 1. To grant this we have to simply apply a softmax on our weights. In the dummy example below I combine with this method the output of two fully-connected branches but you can manage it in every other scenario
here the custom layer:
class WeightedAverage(Layer):
def __init__(self):
super(WeightedAverage, self).__init__()
def build(self, input_shape):
self.W = self.add_weight(
shape=(1,1,len(input_shape)),
initializer='uniform',
dtype=tf.float32,
trainable=True)
def call(self, inputs):
# inputs is a list of tensor of shape [(n_batch, n_feat), ..., (n_batch, n_feat)]
# expand last dim of each input passed [(n_batch, n_feat, 1), ..., (n_batch, n_feat, 1)]
inputs = [tf.expand_dims(i, -1) for i in inputs]
inputs = Concatenate(axis=-1)(inputs) # (n_batch, n_feat, n_inputs)
weights = tf.nn.softmax(self.W, axis=-1) # (1,1,n_inputs)
# weights sum up to one on last dim
return tf.reduce_sum(weights*inputs, axis=-1) # (n_batch, n_feat)
here the full example in a regression problem:
inp1 = Input((100,))
inp2 = Input((100,))
x1 = Dense(32, activation='relu')(inp1)
x2 = Dense(32, activation='relu')(inp2)
W_Avg = WeightedAverage()([x1,x2])
out = Dense(1)(W_Avg)
m = Model([inp1,inp2], out)
m.compile('adam','mse')
n_sample = 1000
X1 = np.random.uniform(0,1, (n_sample,100))
X2 = np.random.uniform(0,1, (n_sample,100))
y = np.random.uniform(0,1, (n_sample,1))
m.fit([X1,X2], y, epochs=10)
in the end, you can also visualize the value of the weights in this way:
tf.nn.softmax(m.get_weights()[-3]).numpy()