From the Pytorch documentation, CrossEntropyLoss combines LogSoftMax and NLLLoss together in one single class
But I am curious; what happens if we use both CrossEntropyLoss for criterion and LogSoftMax in my classifier:
model_x.fc = nn.Sequential (nn.Linear(num_ftrs, 2048, bias=True), nn.ReLU(),
nn.Linear(2048, 1024 ), nn.ReLU(),
nn.Linear(1024 ,256), nn.ReLU(),
nn.Linear(256 ,128), nn.ReLU(),
nn.Linear(128, num_labels),nn.LogSoftmax(dim = 1))
criterion = nn.CrossEntropyLoss()
Then if i have saved a trained model using the code above, how can I check the criterion used by the saved model?
Answer
If you use both CrossEntropyLoss
and LogSoftmax
together, you essentially apply the softmax transformation and the negative log likelihood loss twice, which is incorrect. The reason is that CrossEntropyLoss
already combines both the LogSoftmax
and NLLLoss
(Negative Log Likelihood Loss) into a single operation, so applying LogSoftmax
separately on the output before passing it to CrossEntropyLoss
will result in unnecessary and incorrect double computation.
Why Using Both is Problematic:
-
CrossEntropyLoss already includes:
- LogSoftmax: It first applies the log of the softmax function to the model's output.
- NLLLoss: It then computes the negative log-likelihood loss based on the log-softmaxed probabilities.
-
LogSoftmax on its own:
- If you add
LogSoftmax
explicitly to the model's output, you are manually computing the logarithm of the probabilities.
- If you add
Thus, if you use both, you are essentially calculating the log of the softmaxed probabilities twice. This can lead to incorrect loss values during training because the model's logits are transformed twice.
Correct Usage:
Instead of using both LogSoftmax
in the model and CrossEntropyLoss
as the criterion, you should remove the LogSoftmax
from the model and just use CrossEntropyLoss
. Here’s how you should define your model:
model_x.fc = nn.Sequential(
nn.Linear(num_ftrs, 2048, bias=True),
nn.ReLU(),
nn.Linear(2048, 1024),
nn.ReLU(),
nn.Linear(1024, 256),
nn.ReLU(),
nn.Linear(256, 128),
nn.ReLU(),
nn.Linear(128, num_labels) # no LogSoftmax here
)
# Criterion (CrossEntropyLoss automatically applies LogSoftmax internally)
criterion = nn.CrossEntropyLoss()
How to Check the Criterion Used by the Saved Model:
When you save a model in PyTorch using torch.save(model.state_dict())
, the saved model only contains the parameters (weights and biases), and does not store the criterion or the loss function used during training. This is because loss functions (like CrossEntropyLoss
) are part of the training process and are not part of the model's architecture, which is saved via the state_dict
.
However, you can infer the criterion used based on the model architecture and how the loss was computed during training. If you save your training settings (for example, in a config file or documentation), you can refer to that information to determine which criterion was used.
How to Check the Model Architecture and Loss:
-
Model Architecture:
- You can inspect the model's architecture (i.e., the layers in
model_x.fc
) to see if aLogSoftmax
layer is included. If it's there, it’s a clue that the criterion used should likely not beCrossEntropyLoss
, but something likeNLLLoss
.
- You can inspect the model's architecture (i.e., the layers in
-
Training Script or Configuration:
- If you have saved your training settings (in a configuration file, training log, or script), you can refer to it to check which criterion was used.
-
Manual Tracking:
- If you're saving and loading models, it's good practice to also save the hyperparameters and training settings in a separate file to keep track of what was used (for example, using a JSON or YAML file for configuration).
Example for Saving Model and Training Settings:
# Save the model's state dict and other training settings
torch.save({
'model_state_dict': model.state_dict(),
'optimizer_state_dict': optimizer.state_dict(),
'epoch': epoch,
'loss': loss,
'config': {
'criterion': 'CrossEntropyLoss',
'learning_rate': learning_rate,
'batch_size': batch_size,
# ... other settings
}
}, 'model_checkpoint.pth')
To check the criterion later, you can load the saved settings:
checkpoint = torch.load('model_checkpoint.pth')
model.load_state_dict(checkpoint['model_state_dict'])
config = checkpoint['config']
print(config['criterion']) # Check the criterion used
This way, you explicitly track the training setup along with the model's weights.
Conclusion:
- Don't use both
LogSoftmax
andCrossEntropyLoss
together.CrossEntropyLoss
already includesLogSoftmax
internally. - To check the criterion used by a saved model, you can store and load the training configuration separately, as PyTorch does not store the criterion in the model checkpoint.