lmp.util.metric
#
Evaluation metrics.
- lmp.util.metric.nll(batch_tkids: Tensor, batch_tkids_pd: Tensor, use_log2: bool = True) Tensor [source]#
Calculate negative log-likelihood \(-\log p\) on batch token ids.
Let \(x = \pa{x^1, \dots, x^B}\) be a batch of token sequences. Suppose that each token sequence has length \(S+1\) and each token is defined in a vocabulary with size \(V\). Let \(x^b = \pa{x_1^b, \dots, x_{S+1}^b}\) be the \(b\)-th token sequence in the batch \(x\). Suppose that the probability \(\Pr\pa{x_{t+1}^b \vert x_1^b, \dots, x_t^b}\) of the next token \(x_{t+1}^b\) when seeing previous tokens \(x_1^b, \dots, x_t^b\) is given. Let \(R\) be the returned tensor with shape \((B, S)\). Then the \(b\)-th row, \(t\)-th column of the returned tensor \(R\) is defined as follow:
\[R_{b,t} = -\log_2 \Pr\pa{x_{t+1} \vert x_1^b, \dots, x_t^b}.\]If \(x_{t+1}^b\) is padding token, then we assign \(R_{b,t}\) to zero.
- Parameters
batch_tkids (torch.Tensor) – Batch of token ids which represent prediction targets.
batch_tkids
has shape \((B, S)\) anddtype == torch.long
.batch_tkids_pd (torch.Tensor) – Batch of token id prediction probability distributions.
batch_tkids_pd
has shape \((B, S, V)\) anddtype == torch.float
.use_log2 (bool, default: True) – Set to
True
to use \(\log_2\). Set toFalse
to use \(\ln\).
- Returns
\(-\log p\) tensor. Returned tensor has shape \((B, S)\) and
dtype == torch.float
.- Return type