Efficiently multiply a dense matrix by a sparse vector

I am looking for an efficient way to multiply a dense matrix by a sparse vector, Av, where A is of size (M x N) and v is (N x 1). The vector v is a scipy.sparse.csc_matrix.

I have two methods I use at the moment:

In method 1, I pick off the non-zero values in v, say vi, and element-wise multiply vi with the corresponding column of A, then sum up these columns. So if y = Av, then y = A[:, 0]*v0 + ... + A[:, N]*vN, only for the non-zero i.

def dense_dot_sparse(dense_matrix, sparse_column):
    prod = np.zeros((dense_matrix.shape[0]))
    r, c = sparse_column.nonzero()
    indices = zip(r, c)
    for ind in indices:
        prod = prod + dense_matrix[:, ind[1]] * sparse_column[ind]
    return prod

In method 2, I perform the multiplication by simply making the sparse vector .todense() and use np.dot().

def dense_dot_sparse2(dense_matrix, sparse_column):
    return np.dot(dense_matrix, sparse_column.todense())

The typical size of A is (512 x 2048) and the sparsity of v varies between 1 to 200 non-zero entries. I choose which method to employ based on the sparsity of v. If the sparsity of v is ~ 200 non-zeros, method 1 takes ~45ms and method 2 takes ~5ms. But when v is very sparse, ~1 non-zero, then method 1 takes ~1ms whereas method 2 still takes 5ms. Checking the sparsity of v (.nnz) adds nearly another 0.2ms.

I have to perform about 1500 of these multiplications (after splitting up my data and multiprocessing), so the time adds up.

[EDIT: Adding a simple representative example

rows = 512
cols = 2048
sparsity = 0.001  # very sparse: 0.001 for ~ 1 non-zero, moderately sparse: 0.1 for ~ 200 non-zero
big_matrix = np.random.rand(rows, cols)  # use as dense matrix
col = np.random.rand(cols, 1)
col = np.array([i[0] if i < sparsity else 0.0 for i in col])
sparse_col = csc_matrix(col)  # use as sparse vector
print sparse_col.nnz

END EDIT]

I am looking for a single implementation that is fast for both very sparse and moderately sparse v.

Answer

You're facing a typical issue of balancing performance when working with sparse vectors and dense matrices. The difference in performance between your two methods arises because method 1 iterates over the non-zero elements of the sparse vector (which is efficient for sparsely populated vectors), while method 2 forces you to convert the sparse vector to a dense format (which incurs overhead, especially for very sparse vectors).

For a single implementation that works efficiently across both very sparse and moderately sparse vectors, we need to minimize the overhead of the dense conversion and avoid looping over all elements when the vector is sparse.

Strategy for Optimization

You can combine the best parts of both approaches:

For very sparse vectors, the first method (looping over non-zeros) should be used, as it avoids unnecessary operations for the zero elements.
For moderately sparse vectors, converting the sparse vector to dense and using np.dot() might be faster because the number of non-zero elements is large enough that the overhead of checking each non-zero element in a loop becomes inefficient.

Optimized Implementation

A possible solution is to check the number of non-zero elements (nnz) in the sparse vector and decide which approach to use based on that. To avoid the overhead of checking nnz each time, you can directly estimate when the sparse vector is sufficiently sparse for the todense() approach to be faster.

import numpy as np
from scipy.sparse import csc_matrix

def optimized_dense_dot_sparse(dense_matrix, sparse_column):
    # Check if the sparse vector has sufficiently many non-zero entries to make conversion faster
    nnz = sparse_column.nnz
    
    # If it's sparse enough, do the optimized loop-based method
    if nnz <= 10:  # You can adjust this threshold based on profiling results
        prod = np.zeros((dense_matrix.shape[0]))
        r, c = sparse_column.nonzero()
        for i in range(nnz):
            prod += dense_matrix[:, c[i]] * sparse_column.data[i]
        return prod
    else:
        # If it's moderately sparse, use the dense multiplication approach
        return np.dot(dense_matrix, sparse_column.toarray().flatten())

# Example usage:
rows = 512
cols = 2048
sparsity = 0.001  # Very sparse: ~1 non-zero
big_matrix = np.random.rand(rows, cols)  # Use as dense matrix
col = np.random.rand(cols, 1)
col = np.array([i[0] if np.random.rand() < sparsity else 0.0 for i in col])
sparse_col = csc_matrix(col)  # Use as sparse vector

result = optimized_dense_dot_sparse(big_matrix, sparse_col)
print(result)

Explanation

For very sparse vectors (with a low nnz), the method loops over the non-zero entries (method 1), multiplying the corresponding columns of the dense matrix by the values in the sparse vector. This is efficient because we avoid multiplying by zero entries.
For moderately sparse vectors (with a higher nnz), we use np.dot() after converting the sparse vector to a dense one. This is faster for moderately sparse vectors because the overhead of conversion becomes minimal when the number of non-zero elements is large enough to warrant the dense computation.
The threshold (nnz <= 10) can be adjusted based on profiling. This threshold depends on the relative costs of looping over sparse entries versus converting to dense. You can experiment with this threshold for your specific use case and matrix sizes.

Performance Considerations

Small nnz (few non-zero elements): This method avoids the cost of converting the sparse vector to dense and efficiently computes the result by looping over non-zero entries.
Large nnz (many non-zero elements): The method avoids the overhead of looping and directly computes the result using np.dot() with the dense vector.

Conclusion

This hybrid approach allows you to dynamically choose the most efficient method based on the sparsity of the vector, thus ensuring better performance across a wide range of sparsities in the vector. You can tune the threshold based on the performance characteristics of your specific hardware and use case.