boolean_mask or sparse dot product in tensorflow

tl;dr what is the most efficient way to dynamically choose some entries of a tensor.

I am trying to implement syntactic GCN in Tensorflow. Basically, I need to have a different weight matrix for every label (lets ignore biases for this question) and choose at each run the relevant entries to use, those would be chosen by a sparse matrix (for each entry there is at most one label in one direction and mostly no edge so not even that).

More concretely, when I have a sparse matrix of labeled edges (zero-one), is it better to use it in a mask, a sparse-dense tensor multiplication or maybe just use normal multiplication (I guess not the latter, but for simplicty use it in the example)

example:

    units = 6 # output size 
    x = ops.convert_to_tensor(inputs[0], dtype=self.dtype)
    labeled_edges = ops.convert_to_tensor(inputs[1], dtype=self.dtype)
    edges_shape = labeled_edges.get_shape().as_list()
    labeled_edges = expand_dims(labeled_edges, -2)
    labeled_edges = tile(
        labeled_edges, [1] * (len(edges_shape) - 1) + [units, 1])
    graph_kernel = math_ops.multiply(self.kernel, labeled_edges) # here is the question basically
    outputs = standard_ops.tensordot(x, graph_kernel, [[1], [0]])
    outputs = math_ops.reduce_sum(outputs, [-1])

Answer

To efficiently choose some entries of a tensor based on a sparse matrix, you should focus on using masking or sparse-dense matrix multiplication. Here's a breakdown of the different approaches and their pros and cons:

1. Using a Mask

This is a common and efficient approach when working with sparse data. You can multiply the kernel by a mask (zero-one tensor) that only keeps the relevant entries. The mask can be generated using your sparse adjacency matrix (which is typically represented as a one-hot or binary matrix in your case).

Here’s how you can implement it:

# Assuming the label_edges tensor is a sparse matrix of 0s and 1s (shape: [batch_size, n_edges])
labeled_edges = ops.convert_to_tensor(inputs[1], dtype=self.dtype)

# Create a mask based on labeled_edges, e.g., where labeled_edges is 1, we keep the entry
mask = labeled_edges  # if the labeled_edges matrix is sparse, this should have 1s where relevant, else 0s

# Apply the mask to the kernel (element-wise multiplication)
graph_kernel = math_ops.multiply(self.kernel, mask)

# Now do the normal matrix multiplication (with the filtered kernel)
outputs = standard_ops.tensordot(x, graph_kernel, [[1], [0]])
outputs = math_ops.reduce_sum(outputs, [-1])

Pros:

Simplicity: You just multiply your kernel with a binary mask (the labeled edges) and perform the regular operations.
Efficient: TensorFlow is optimized to handle masked operations efficiently, especially when the sparsity is high.

Cons:

If your sparse matrix has a very low density, then the mask matrix could still result in a lot of zero multiplications, which might be computationally wasteful. In such cases, you can consider other options like sparse tensor operations.

2. Sparse-Dense Matrix Multiplication

If your edge matrix (labeled_edges) is sparse, it may be more efficient to use sparse tensor multiplication directly. TensorFlow has built-in support for sparse tensors (tf.sparse), and sparse operations are more efficient for matrices with a large number of zero entries.

Here’s how to implement sparse-dense matrix multiplication:

# Convert labeled_edges to a sparse tensor
sparse_labeled_edges = tf.sparse.from_dense(labeled_edges)

# Perform sparse-dense multiplication (this is efficient for large sparse matrices)
graph_kernel = tf.sparse.sparse_dense_matmul(sparse_labeled_edges, self.kernel)

# Now, do the tensordot with the resulting graph_kernel
outputs = standard_ops.tensordot(x, graph_kernel, [[1], [0]])
outputs = math_ops.reduce_sum(outputs, [-1])

Pros:

Memory Efficiency: Sparse tensors only store non-zero entries and their indices, which can save memory for large sparse matrices.
Compute Efficiency: TensorFlow can optimize sparse matrix operations, resulting in more efficient computation, especially when the sparse matrix has low density (most of the entries are zero).

Cons:

Complexity: Sparse operations are less straightforward and might require additional transformations (like from_dense), which could add some overhead.
Edge cases: You might need to handle cases where the sparse tensor is too dense (e.g., you might end up in a dense matrix multiplication scenario, which is less efficient).

3. Normal Multiplication (Not Recommended)

Using normal dense matrix multiplication (math_ops.multiply(self.kernel, labeled_edges)) will involve calculating the multiplication for every entry, including the zero entries in the sparse matrix. This is inefficient because you will be doing unnecessary computations for the zero entries, which defeats the purpose of having a sparse matrix.

You should avoid using normal multiplication unless the matrix is dense (i.e., there are few zeros).

Conclusion:

If you are working with a sparse adjacency matrix (labeled_edges), and you want to avoid unnecessary computations, sparse-dense matrix multiplication (tf.sparse.sparse_dense_matmul) is typically the most efficient approach.
If the matrix is not very sparse, masking with math_ops.multiply will work fine and is simple to implement.