Use CrossEntropyLoss with LogSoftmax

ghz 19hours ago ⋅ 2 views

From the Pytorch documentation, CrossEntropyLoss combines LogSoftMax and NLLLoss together in one single class

But I am curious; what happens if we use both CrossEntropyLoss for criterion and LogSoftMax in my classifier:

model_x.fc = nn.Sequential (nn.Linear(num_ftrs, 2048, bias=True), nn.ReLU(), 
                               nn.Linear(2048, 1024 ), nn.ReLU(),
                               nn.Linear(1024 ,256), nn.ReLU(),
                                nn.Linear(256 ,128), nn.ReLU(),
                               nn.Linear(128, num_labels),nn.LogSoftmax(dim = 1))

criterion = nn.CrossEntropyLoss()

Then if i have saved a trained model using the code above, how can I check the criterion used by the saved model?

Answer

If you use both CrossEntropyLoss and LogSoftmax together, you essentially apply the softmax transformation and the negative log likelihood loss twice, which is incorrect. The reason is that CrossEntropyLoss already combines both the LogSoftmax and NLLLoss (Negative Log Likelihood Loss) into a single operation, so applying LogSoftmax separately on the output before passing it to CrossEntropyLoss will result in unnecessary and incorrect double computation.

Why Using Both is Problematic:

  1. CrossEntropyLoss already includes:

    • LogSoftmax: It first applies the log of the softmax function to the model's output.
    • NLLLoss: It then computes the negative log-likelihood loss based on the log-softmaxed probabilities.
  2. LogSoftmax on its own:

    • If you add LogSoftmax explicitly to the model's output, you are manually computing the logarithm of the probabilities.

Thus, if you use both, you are essentially calculating the log of the softmaxed probabilities twice. This can lead to incorrect loss values during training because the model's logits are transformed twice.

Correct Usage:

Instead of using both LogSoftmax in the model and CrossEntropyLoss as the criterion, you should remove the LogSoftmax from the model and just use CrossEntropyLoss. Here’s how you should define your model:

model_x.fc = nn.Sequential(
    nn.Linear(num_ftrs, 2048, bias=True),
    nn.ReLU(),
    nn.Linear(2048, 1024),
    nn.ReLU(),
    nn.Linear(1024, 256),
    nn.ReLU(),
    nn.Linear(256, 128),
    nn.ReLU(),
    nn.Linear(128, num_labels)  # no LogSoftmax here
)

# Criterion (CrossEntropyLoss automatically applies LogSoftmax internally)
criterion = nn.CrossEntropyLoss()

How to Check the Criterion Used by the Saved Model:

When you save a model in PyTorch using torch.save(model.state_dict()), the saved model only contains the parameters (weights and biases), and does not store the criterion or the loss function used during training. This is because loss functions (like CrossEntropyLoss) are part of the training process and are not part of the model's architecture, which is saved via the state_dict.

However, you can infer the criterion used based on the model architecture and how the loss was computed during training. If you save your training settings (for example, in a config file or documentation), you can refer to that information to determine which criterion was used.

How to Check the Model Architecture and Loss:

  1. Model Architecture:

    • You can inspect the model's architecture (i.e., the layers in model_x.fc) to see if a LogSoftmax layer is included. If it's there, it’s a clue that the criterion used should likely not be CrossEntropyLoss, but something like NLLLoss.
  2. Training Script or Configuration:

    • If you have saved your training settings (in a configuration file, training log, or script), you can refer to it to check which criterion was used.
  3. Manual Tracking:

    • If you're saving and loading models, it's good practice to also save the hyperparameters and training settings in a separate file to keep track of what was used (for example, using a JSON or YAML file for configuration).

Example for Saving Model and Training Settings:

# Save the model's state dict and other training settings
torch.save({
    'model_state_dict': model.state_dict(),
    'optimizer_state_dict': optimizer.state_dict(),
    'epoch': epoch,
    'loss': loss,
    'config': {
        'criterion': 'CrossEntropyLoss',
        'learning_rate': learning_rate,
        'batch_size': batch_size,
        # ... other settings
    }
}, 'model_checkpoint.pth')

To check the criterion later, you can load the saved settings:

checkpoint = torch.load('model_checkpoint.pth')
model.load_state_dict(checkpoint['model_state_dict'])
config = checkpoint['config']

print(config['criterion'])  # Check the criterion used

This way, you explicitly track the training setup along with the model's weights.

Conclusion:

  • Don't use both LogSoftmax and CrossEntropyLoss together. CrossEntropyLoss already includes LogSoftmax internally.
  • To check the criterion used by a saved model, you can store and load the training configuration separately, as PyTorch does not store the criterion in the model checkpoint.