lmp.util.metric#

Evaluation metrics.

lmp.util.metric.nll(batch_tkids: Tensor, batch_tkids_pd: Tensor, use_log2: bool = True) Tensor[source]#

Calculate negative log-likelihood \(-\log p\) on batch token ids.

Let \(x = \pa{x^1, \dots, x^B}\) be a batch of token sequences. Suppose that each token sequence has length \(S+1\) and each token is defined in a vocabulary with size \(V\). Let \(x^b = \pa{x_1^b, \dots, x_{S+1}^b}\) be the \(b\)-th token sequence in the batch \(x\). Suppose that the probability \(\Pr\pa{x_{t+1}^b \vert x_1^b, \dots, x_t^b}\) of the next token \(x_{t+1}^b\) when seeing previous tokens \(x_1^b, \dots, x_t^b\) is given. Let \(R\) be the returned tensor with shape \((B, S)\). Then the \(b\)-th row, \(t\)-th column of the returned tensor \(R\) is defined as follow:

\[R_{b,t} = -\log_2 \Pr\pa{x_{t+1} \vert x_1^b, \dots, x_t^b}.\]

If \(x_{t+1}^b\) is padding token, then we assign \(R_{b,t}\) to zero.

Parameters
  • batch_tkids (torch.Tensor) – Batch of token ids which represent prediction targets. batch_tkids has shape \((B, S)\) and dtype == torch.long.

  • batch_tkids_pd (torch.Tensor) – Batch of token id prediction probability distributions. batch_tkids_pd has shape \((B, S, V)\) and dtype == torch.float.

  • use_log2 (bool, default: True) – Set to True to use \(\log_2\). Set to False to use \(\ln\).

Returns

\(-\log p\) tensor. Returned tensor has shape \((B, S)\) and dtype == torch.float.

Return type

torch.Tensor