TL;DR Question
Regarding numpy arrays that are neighter C or F contiguous (array's c_contiguous and f_contiguous flags are False):
- Can an array really be neither C or F contiguous? Or falsy flags just mean numpy can't figure out the correct contigious type?
- What are the performance implications on such arrays? Are there any optimizations we miss when staying in this state?
An array for example:
import numpy as np
arr = np.random.randint(0, 255, (1000, 1000, 3), dtype='uint8')
arr = arr[:, :, ::-1]
assert arr.flags.c_contiguous is False
assert arr.flags.f_contiguous is False
Background
I am trying to optimize a simple code block which is called many times during a program. This code block is responsible to load PIL Image, convert it to a numpy array, inverse it's channels and return it. Something like this:
import numpy as np
from PIL import Image
def load_image(path):
arr = np.asarray(Image.open(path).convert('RGB'))
return arr[:, :, ::-1].copy()
The original copy()
call was there in order to enforce the return value to be C-order array, however, I was wondering if there's a way to achieve the same effect without copying the array every time, as it sounds very expensive.
I tried replacing the copy()
call with np.ascontiguousarray()
, but benchmarks showed they're the same, thus I guess it also performs copying behind the scenes.
I decided to apply this solution in the end:
import numpy as np
from PIL import Image
def load_image(path):
arr = np.ascontiguousarray(Image.open(path).convert('RGB'))
return arr[:, :, ::-1]
Here, I convert the image to a C-ordered array, which could incur a copy behind the scenes, but in fact it doesn't, because in the benchmarks this function is X3-X4 faster than the previous one. However I want to be sure I do not cancel any future optimizations by returning an array which is neither C or F ordered.
Answer
TL;DR Answer:
-
Can an array be neither C nor F contiguous?
Yes, an array can be neither C-contiguous nor F-contiguous. This happens when the strides or memory layout of the array don't match the patterns expected for C or Fortran order. In the case of slicing (e.g.,arr[:, :, ::-1]
), it can lead to an array where neither contiguous flag is set toTrue
. -
What are the performance implications?
Arrays that are neither C nor F contiguous can still be accessed, but they will likely be slower for certain operations that assume a contiguous memory layout. NumPy and many other libraries (such asPIL
orTensorFlow
) expect C-contiguous arrays for efficient memory access. Being non-contiguous can impact the speed of array operations (like slicing or reshaping) and possibly prevent certain optimizations (e.g., SIMD instructions or cache-friendly access patterns).
Explanation of the Behavior:
When you perform operations like arr[:, :, ::-1]
, you're reversing the order of the channels (typically the last axis of the array). This operation can lead to a non-contiguous array, meaning that it doesn't follow the memory layout expected by NumPy's C-contiguous or Fortran-contiguous flags. This can happen because slicing or other operations might not result in a memory layout that is contiguous in either direction.
How this can happen:
- C-contiguous arrays are stored in memory row-wise (i.e., the last axis is contiguous in memory).
- F-contiguous arrays are stored column-wise (i.e., the first axis is contiguous in memory).
- When you reverse an axis, like
arr[:, :, ::-1]
, it might create a view on the data that doesn't maintain either the C or F-contiguous property.
This is because the strides (which tell how to navigate through memory) could be modified in such a way that neither the C nor F memory access patterns are respected. In this case, NumPy can't determine a "simple" contiguous layout and sets both c_contiguous
and f_contiguous
to False
.
Performance Implications:
-
Access Speed:
NumPy can access contiguous memory faster due to cache optimizations and better memory locality. When the array is neither C nor F contiguous, the data might be scattered in memory, leading to slower memory access (cache misses) during array operations. -
NumPy Optimizations:
Some operations (e.g., matrix multiplications, transpositions, or element-wise operations) are optimized for contiguous memory. Non-contiguous arrays might miss out on these optimizations, as NumPy would need to do additional work to traverse the memory, which can be slower. -
Memory Overhead:
Operations that create non-contiguous arrays may also lead to higher memory overhead, as NumPy cannot assume the data is laid out in an optimal manner. This could also affect performance when performing large, repetitive operations on the array.
Why np.ascontiguousarray
Works:
When you use np.ascontiguousarray
, you're ensuring that the array is laid out in memory as a C-contiguous array, regardless of whether the original data was contiguous or not. This is generally very efficient, but if the array is already in a contiguous layout, it will not incur a copy.
In your case, replacing copy()
with np.ascontiguousarray()
is likely to not incur a copy if the array was already contiguous. However, for the operation arr[:, :, ::-1]
, if the resulting array is non-contiguous, np.ascontiguousarray()
might still perform a copy to ensure the result is C-contiguous.
Recommendations:
-
Keep using
np.ascontiguousarray
:
It is a good approach to ensure your arrays are C-contiguous before returning them, especially since it is performing better in benchmarks. In the case where the image is already C-contiguous,np.ascontiguousarray
won't incur a copy, but if it is non-contiguous, it will copy it. -
Avoid unnecessary copying:
If you want to avoid copying, you should avoid slicing or operations that make the array non-contiguous (likearr[:, :, ::-1]
). If you can adjust the operation to maintain a C-contiguous memory layout, that would avoid the need for an extra copy. -
Profile the code:
It is a good idea to profile the code to see if the non-contiguous arrays impact performance in other parts of the program. Sometimes, the difference in performance might not be significant for your specific use case, especially if the array operations aren't heavily performance-critical.
Conclusion:
While non-contiguous arrays can result in performance penalties, using np.ascontiguousarray
is an effective solution to ensure that the array layout is optimal. Your approach of converting the array to a contiguous format before returning it is valid and has been shown to provide a performance boost in your benchmarks. Just be mindful of the extra copies involved when creating non-contiguous arrays and avoid unnecessary slicing if you want to minimize copying.