I have a nD array, say of dimensions: (144, 522720) and I need to compute its FFT.
PyFFTW
seems slower than numpy
and scipy
, that it is NOT expected.
Am I doing something obviously wrong?
Below is my code
import numpy
import scipy
import pyfftw
import time
n1 = 144
n2 = 522720
loops = 2
pyfftw.config.NUM_THREADS = 4
pyfftw.config.PLANNER_EFFORT = 'FFTW_ESTIMATE'
# pyfftw.config.PLANNER_EFFORT = 'FFTW_MEASURE'
Q_1 = pyfftw.empty_aligned([n1, n2], dtype='float64')
Q_2 = pyfftw.empty_aligned([n1, n2], dtype='complex_')
Q_ref = pyfftw.empty_aligned([n1, n2], dtype='complex_')
# repeat a few times to see if pyfft planner helps
for i in range(0,loops):
Q_1 = numpy.random.rand(n1,n2)
s1 = time.time()
Q_ref = numpy.fft.fft(Q_1, axis=0)
print('NUMPY - elapsed time: ', time.time() - s1, 's.')
s1 = time.time()
Q_2 = scipy.fft.fft(Q_1, axis=0)
print('SCIPY - elapsed time: ', time.time() - s1, 's.')
print('Equal = ', numpy.allclose(Q_2, Q_ref))
s1 = time.time()
Q_2 = pyfftw.interfaces.numpy_fft.fft(Q_1, axis=0)
print('PYFFTW NUMPY - elapsed time = ', time.time() - s1, 's.')
print('Equal = ', numpy.allclose(Q_2, Q_ref))
s1 = time.time()
Q_2 = pyfftw.interfaces.scipy_fftpack.fft(Q_1, axis=0)
print('PYFFTW SCIPY - elapsed time = ', time.time() - s1, 's.')
print('Equal = ', numpy.allclose(Q_2, Q_ref))
s1 = time.time()
fft_object = pyfftw.builders.fft(Q_1, axis=0)
Q_2 = fft_object()
print('FFTW PURE Elapsed time = ', time.time() - s1, 's')
print('Equal = ', numpy.allclose(Q_2, Q_ref))
Answer
The fact that PyFFTW
is slower than numpy
or scipy
in your code is unusual, as PyFFTW
typically offers better performance for larger arrays due to its use of FFTW (Fastest Fourier Transform in the West) which is highly optimized for various hardware architectures. However, there are several possible factors that could lead to suboptimal performance. Let's go through your code and examine potential issues and optimizations.
1. PyFFTW Configuration:
-
You have set
pyfftw.config.NUM_THREADS = 4
, but it's worth noting that the performance benefit of multithreading inPyFFTW
depends on the hardware (e.g., number of available CPU cores) and the size of the data. If the data is relatively small or your CPU has fewer than 4 cores, multithreading may not result in a performance boost and could even cause overhead. -
Try running the benchmark with
NUM_THREADS = 1
to see if threading is the issue. FFT is often more efficient in single-threaded mode for certain array sizes.
pyfftw.config.NUM_THREADS = 1
2. FFTW Planner Effort:
- You are using the default
FFTW_ESTIMATE
planner effort. While this is faster in terms of initial planning, it might not result in the best performance, especially for larger datasets. For larger arrays, usingFFTW_MEASURE
could yield better performance as it will measure the best strategy for the FFT computation.
pyfftw.config.PLANNER_EFFORT = 'FFTW_MEASURE'
- You may also experiment with
FFTW_PATIENT
, which can sometimes give better results thanFFTW_MEASURE
for specific cases, but it will be slower during the planning stage.
3. Array Alignment:
-
PyFFTW
benefits from aligned memory for performance. You are usingpyfftw.empty_aligned()
to allocate your arrays, which is good. However, ensure that the arrays are actually aligned to the memory boundaries expected by FFTW. Sometimes, just callingpyfftw.empty_aligned()
without setting the alignment strategy may not provide the expected performance. -
If performance is still not satisfactory, try using
pyfftw.interfaces.cache.enable()
and check if this improves performance by caching FFTW plans.
4. Comparing FFT Implementations:
-
You are comparing
PyFFTW
againstnumpy.fft
andscipy.fft
in your code. Note thatscipy.fft
internally usesnumpy.fft
, so the performance difference you're seeing betweenscipy
andnumpy
might just be due to the overhead of callingscipy
's wrapper. -
It would be good to test
numpy.fft
andscipy.fft
separately, ensuring they are using the same backend (i.e., they both useFFTW
in the case ofscipy.fft
), and that they aren't doing anything extra that might introduce overhead.
5. Use pyfftw.builders
:
-
When you use
pyfftw.interfaces.numpy_fft.fft
, it works similarly tonumpy.fft.fft
, but it doesn't take full advantage ofpyfftw.builders
, which is optimized for repeated FFT calls. -
Using
pyfftw.builders.fft
is the recommended approach for better performance in repetitive FFT computations. Thepyfftw.builders
interface allows for the pre-planning of FFT computations, which is more efficient when you're running multiple FFTs on the same size data. -
Correct Use of
pyfftw.builders
:fft_object = pyfftw.builders.fft(Q_1, axis=0) Q_2 = fft_object()
This approach should yield the best performance, especially for repeated FFT calls.
6. Benchmarking Across Different Libraries:
-
It's also important to ensure that your benchmark is consistent. The following code structure can help you isolate the results more effectively by ensuring the data is the same across all libraries:
import numpy import scipy import pyfftw import time n1 = 144 n2 = 522720 loops = 2 pyfftw.config.NUM_THREADS = 1 pyfftw.config.PLANNER_EFFORT = 'FFTW_MEASURE' # Initialize the data Q_1 = numpy.random.rand(n1, n2) # Create FFT objects once for PyFFTW fft_numpy = pyfftw.interfaces.numpy_fft.fft fft_scipy = pyfftw.interfaces.scipy_fftpack.fft fft_fftw = pyfftw.builders.fft(Q_1, axis=0) # Benchmark for i in range(loops): # Numpy FFT start_time = time.time() Q_ref = numpy.fft.fft(Q_1, axis=0) print(f'NUMPY - elapsed time: {time.time() - start_time:.6f}s.') # Scipy FFT start_time = time.time() Q_2 = scipy.fft.fft(Q_1, axis=0) print(f'SCIPY - elapsed time: {time.time() - start_time:.6f}s.') print(f'Equal = {numpy.allclose(Q_2, Q_ref)}') # PyFFTW with numpy interface start_time = time.time() Q_2 = fft_numpy(Q_1, axis=0) print(f'PYFFTW NUMPY - elapsed time: {time.time() - start_time:.6f}s.') print(f'Equal = {numpy.allclose(Q_2, Q_ref)}') # PyFFTW with scipy interface start_time = time.time() Q_2 = fft_scipy(Q_1, axis=0) print(f'PYFFTW SCIPY - elapsed time: {time.time() - start_time:.6f}s.') print(f'Equal = {numpy.allclose(Q_2, Q_ref)}') # Pure PyFFTW FFTW builder start_time = time.time() Q_2 = fft_fftw() print(f'FFTW PURE - elapsed time: {time.time() - start_time:.6f}s.') print(f'Equal = {numpy.allclose(Q_2, Q_ref)}')
-
Key Adjustments:
- The
pyfftw.builders.fft
object is created before the loop to avoid recreating it on each iteration. - The results are consistently compared using
numpy.allclose
to ensure the results match across all libraries. - The
NUM_THREADS = 1
andFFTW_MEASURE
are used to ensure optimal single-threaded performance with better planning for large arrays.
- The
7. Check Hardware and Environment:
- Make sure that your hardware has enough resources (CPU cores, memory, etc.) to benefit from the optimizations offered by
PyFFTW
. - If you're running on a multi-core machine, consider checking CPU utilization to verify if
PyFFTW
is utilizing multiple cores. For smaller arrays, threading may not provide a significant benefit.
Summary of Suggestions:
- Use
pyfftw.config.NUM_THREADS = 1
and check performance again. - Set
pyfftw.config.PLANNER_EFFORT = 'FFTW_MEASURE'
. - Use
pyfftw.builders.fft
for repeated FFTs. - Ensure that you are comparing FFTs across different libraries correctly and consistently.
- Check hardware and multithreading settings.
By adjusting these configurations, you should see an improvement in the performance of PyFFTW
relative to numpy
and scipy
.