lmp.infer._top_1#

Top-1 inference method.

class lmp.infer._top_1.Top1Infer(*, max_seq_len: int = 32, **kwargs: Any)[source]#

Bases: BaseInfer

Top-1 inference method.

For each inference step, this method pick the token id with maximum (top-1) probability from next token id probability distribution over tokenizer’s vocabulary as the next token id prediction. If there are multiple token ids having the same maximum probability, then this method pick the smallest token id. It is a greedy algorithm, simple but lack of diversity.

infer_name#

CLI name of top-1 inference method is top-1.

Type

ClassVar[str]

See also

lmp.infer

All available inference methods.

lmp.script.gen_txt

Use pre-trained language model checkpoint to generate continual text of given text segment.

classmethod add_CLI_args(parser: ArgumentParser) None#

Add inference method hyperparameters to CLI argument parser.

Parameters

parser (argparse.ArgumentParser) – CLI argument parser.

Return type

None

See also

lmp.script.gen_txt

Use pre-trained language model checkpoint to generate continual text of given text segment.

gen(model: BaseModel, tknzr: BaseTknzr, txt: str) str[source]#

Generate continual text conditioned on given text segment.

Top-1 inference algorithm is structured as follow:

  1. Encode input text as 1 sequence batch.

  2. Remove token ids after <eos> since model is not trained to predict tokens after seeing <eos>.

  3. Loop over conditional token ids to generate conditional hidden states.

  4. Loop to generate token ids. In each iteration, generated token id was choosed so that it has maximum probability from next token id probability distribution. Generation loop stops when <eos> is generated or maximum length constraint is violated.

  5. Decode generated token ids into text and return.

Parameters
  • model (BaseModel) – Pre-trained language model which will be used to generate text.

  • tknzr (BaseTknzr) – Pre-trained tokenizer which performs text encoding and decoding.

  • txt (str) – Text segment which the generation process is conditioned on.

Returns

Generated text.

Return type

str

See also

lmp.script.gen_txt

Use pre-trained language model checkpoint to generate continual text of given text segment.