Does ComputeBandStats take nodata into account?

ghz 8hours ago ⋅ 1 views

I am trying to compute the stats for an image which is only partly covered by data. I would like to know if ComputeBandStats ignores the pixels with the same value as the files nodata.

Here is my code:

inIMG = gdal.Open(infile)

# getting stats for the first 3 bands
# Using ComputeBandStats insted of stats array has min, max, mean and sd values
print "Computing band statistics"
bandas = [inIMG.GetRasterBand(b+1) for b in range(3)]
minMax = [b.ComputeRasterMinMax() for b in bandas]
meanSD = [b.ComputeBandStats(1) for b in bandas]
print minMax
print meanSD

For the image without the nodata attribute the output is:

Computing band statistics
[(0.0, 26046.0), (0.0, 24439.0), (0.0, 22856.0)]
[(762.9534697777777, 647.9056493556284), (767.642869, 516.0531530834181), (818.0449643333334, 511.5360132592902)]

For the image with nodata = 0 the output is:

Computing band statistics
[(121.0, 26046.0), (202.0, 24439.0), (79.0, 22856.0)]
[(762.9534697777777, 647.9056493556284), (767.642869, 516.0531530834181), (818.0449643333334, 511.5360132592902)]

The min and max values have changed such that 0 is no longer min, which makes sense, because in the second version it is nodata and therefore not regarded by ComputeRasterMinMax(). However, the mean and standard deviation has not changed.

Does this mean that ComputeBandStats doesn't disregard the nodata values? Is there any way to force ComputeBandStats to disregard the nodata values?

Answer

Yes, you're correct that the ComputeBandStats function doesn't automatically disregard NoData values unless explicitly told to do so. By default, ComputeBandStats will compute statistics over all pixels, including those with the NoData value, which can lead to skewed statistics if there are a lot of NoData values present.

In your case, you can explicitly tell ComputeBandStats to ignore NoData values by passing the appropriate mask when calling ComputeBandStats. The mask argument is a bitmask that indicates which pixels should be included in the statistics computation.

Here’s how you can modify your code to ignore NoData values when calculating statistics:

  1. Set the NoData value explicitly using the GetNoDataValue() method.
  2. Use a mask to ignore the pixels with NoData when computing the statistics.

Here’s an updated version of your code that uses the NoData value mask:

inIMG = gdal.Open(infile)

# Getting stats for the first 3 bands
# Using ComputeBandStats instead of stats array has min, max, mean and sd values
print("Computing band statistics")
bandas = [inIMG.GetRasterBand(b+1) for b in range(3)]

# Get NoData value for each band
no_data_values = [b.GetNoDataValue() for b in bandas]

# Computing min/max values (this does not need NoData handling, as ComputeRasterMinMax ignores NoData automatically)
minMax = [b.ComputeRasterMinMax() for b in bandas]
print(minMax)

# ComputeBandStats can ignore NoData values if you pass the appropriate mask
meanSD = []
for i, band in enumerate(bandas):
    # Create a mask where NoData values are ignored (True where data is valid)
    if no_data_values[i] is not None:
        mask = band.GetMaskBand()  # This is a mask band, where valid data is non-zero and NoData pixels are 0
    else:
        mask = None  # No mask needed if NoData is not set

    # Compute band statistics with mask if NoData exists
    stats = band.ComputeBandStats(mask, 1)  # The '1' means to compute mean, stddev, etc.
    meanSD.append(stats)

print(meanSD)

Explanation:

  • GetNoDataValue(): This method retrieves the NoData value for each band. If the band has a NoData value set, you will use this value to create a mask that ignores these pixels.
  • Masking: GetMaskBand() retrieves the mask band (if available), where NoData values are set to 0, and valid data pixels are set to 255. If the band doesn’t have a NoData value, None is used for the mask.
  • ComputeBandStats(mask, 1): This ensures that the statistics are computed while ignoring the NoData pixels.

Result:

With this modification, the mean and standard deviation values should change, as the NoData pixels are now excluded from the calculation, leading to more accurate statistics for images where data is partially missing.

Let me know if you need any further clarification or adjustments!