validation accuracy is 0 with Keras fit_generator

ghz 2days ago ⋅ 5 views

I recently upgraded to Keras 2.2.2. The validation accuracy is zero for all training epochs. My training data has data samples in 2 categories i.e., train_data_dir has 2 subfolders. My validation data (i.e., val_data_dir) has data samples in 1 category only but it contains 2 subfolders (one for each category) with no data samples in one of the subfolders. I used to get a meaningful non-zero value for the validation accuracy with an earlier version of Keras. Please help me spot the issue my code (below) and Keras 2.2.2

Below is part of my code:

train_generator = train_datagen.flow_from_directory(
train_data_dir,
target_size = (img_height, img_width),
batch_size = batch_size, 
class_mode = "categorical")    

validation_generator = val_datagen.flow_from_directory(
val_data_dir,
target_size = (img_height, img_width),
batch_size = batch_size, 
class_mode = "categorical",
save_to_dir = 'home/dir')

nb_train_samples = train_generator.n
nb_validation_samples = validation_generator.n

sample_steps = int(nb_train_samples/batch_size)
validation_steps = int(nb_validation_samples/batch_size)


parallel_model.fit_generator(
train_generator,
steps_per_epoch = sample_steps,
epochs = 1,
validation_data = validation_generator,    
validation_steps = validation_steps,
callbacks=[early])

Answer

The issue you are encountering where the validation accuracy is zero for all epochs is likely due to the following potential problems related to your validation data setup and how Keras is handling it:

1. Validation Data Contains Only One Category in Practice:

Even though your val_data_dir contains two subfolders, one for each category, it seems like one of those subfolders is empty, meaning there is no actual data sample in that category. Keras will still attempt to load data from both subfolders, but if one subfolder has no data, it will not be able to compute any validation accuracy for that category, leading to zero accuracy.

Solution:

You can fix this issue by ensuring that both categories have samples in the validation directory. If you have only one category of validation data, you can still set up the directory structure correctly, but you should ideally ensure that there is data in both subfolders (or at least a minimal amount of validation data for each category).

If you don't have any validation samples for one category, you can also consider either:

  • Using a custom validation generator where you manage the data flow yourself (e.g., using ImageDataGenerator directly with data you manually load).
  • Splitting your validation data from the available training data to ensure both categories are represented.

2. Class Imbalance in Validation Data:

If you have one category with data and the other without, Keras will still calculate the validation accuracy, but because it’s only seeing one class during validation, it won’t be able to compute meaningful accuracy (it may always return zero or 100%, depending on the class label mapping).

Solution:

Ensure that both categories are present in the validation data, or if this isn't possible, handle it explicitly in your model and validation process by ensuring balanced categories for accurate validation computation.

3. Incorrect Directory Structure (Validation Data):

Keras expects the directory structure of the validation data to be organized such that each category has its own subfolder (with at least one sample). The structure should look like this:

val_data_dir/
    category_1/
        image_1.jpg
        image_2.jpg
        ...
    category_2/
        image_1.jpg
        image_2.jpg
        ...

If one of the categories is completely empty, Keras may fail to properly handle the validation accuracy calculation.

Solution:

Ensure that both categories in val_data_dir have at least one image, or consider splitting your training data into a validation set instead of having an empty category in val_data_dir.

4. Potential Issue with steps_per_epoch and validation_steps:

When using fit_generator, Keras expects steps_per_epoch and validation_steps to be integers that correspond to the number of batches you want to run for each epoch. If the number of validation samples (nb_validation_samples) is less than the batch size, validation_steps could be calculated as zero or result in an incorrect number of steps.

Solution:

Double-check the values of nb_train_samples, nb_validation_samples, steps_per_epoch, and validation_steps. If the validation samples are fewer than the batch size, you may want to adjust validation_steps.

For instance:

validation_steps = max(1, int(nb_validation_samples / batch_size))  # Ensure at least 1 step

5. Debugging the Training Process:

To debug and see what's happening during the training, you can print the outputs of your train_generator and validation_generator to confirm that Keras is reading the data correctly:

print(f'Train Generator Class Labels: {train_generator.class_indices}')
print(f'Validation Generator Class Labels: {validation_generator.class_indices}')

This will ensure that Keras is mapping the class labels correctly. If your validation data is set up correctly, it should print out class labels for both categories.

Final Corrected Code Example:

Assuming you've ensured both validation categories have data, here’s the updated version of your code:

train_generator = train_datagen.flow_from_directory(
    train_data_dir,
    target_size=(img_height, img_width),
    batch_size=batch_size,
    class_mode="categorical"
)

validation_generator = val_datagen.flow_from_directory(
    val_data_dir,
    target_size=(img_height, img_width),
    batch_size=batch_size,
    class_mode="categorical"
)

nb_train_samples = train_generator.n
nb_validation_samples = validation_generator.n

sample_steps = int(nb_train_samples / batch_size)
validation_steps = max(1, int(nb_validation_samples / batch_size))  # Ensure validation_steps is at least 1

parallel_model.fit_generator(
    train_generator,
    steps_per_epoch=sample_steps,
    epochs=1,
    validation_data=validation_generator,
    validation_steps=validation_steps,
    callbacks=[early]
)

Summary of Suggestions:

  • Ensure that both categories are present in the validation directory.
  • Check your validation data directory structure.
  • Verify the values of steps_per_epoch and validation_steps and adjust if necessary.
  • If you can, balance the validation categories or use a custom validation generator.