Contents Menu Expand Light mode Dark mode Auto light/dark mode
Language Model Playground 1.0.0 documentation
Language Model Playground 1.0.0 documentation

Table of Contents:

  • Quick Start
  • Scripts
    • Sample dataset
    • Tokenizer training script
    • Tokenize given text
    • Language model training script
    • Evaluate language model perplexity on a dataset
    • Evaluate language model perplexity on given text
    • Generate continual text with language model conditioned on given text
  • Datasets
    • Dataset base class
    • Chinese Poem Dataset
    • Demo Dataset
    • Winograd NLI dataset
    • Wiki-Text-2 Dataset
  • Tokenizers
    • Byte-pair encoding Tokenizer
    • Tokenizer base class
    • Character Tokenizer
    • Whitespace Tokenizer
  • Language Models
    • Language model base class
    • Elman-Net
    • LSTM (1997 version)
    • LSTM (2000 version)
    • LSTM (2002 version)
    • Transformer encoder
  • Inference methods
    • Inference method base class
    • Top-1 inference method
    • Top-K inference method
    • Top-P inference method
  • Experiment Results
    • Demo Dataset
      • Elman Net: structure-related hyperparameters baseline
      • Elman Net: structure-related hyperparameters best possible settings
    • WikiText-2 Dataset
    • WNLI Dataset
      • Experiment 1: Models Performance Baseline
  • Developer Guilds
    • Contributing to Language Model Playground
    • How To Document Guide
    • How To Test Guide
    • lmp
      • lmp.dset
        • lmp.dset._base
        • lmp.dset._ch_poem
        • lmp.dset._demo
        • lmp.dset._wiki_text_2
        • lmp.dset._wnli
      • lmp.infer
        • lmp.infer._base
        • lmp.infer._top_1
        • lmp.infer._top_k
        • lmp.infer._top_p
      • lmp.model
        • lmp.model._base
        • lmp.model._elman_net
        • lmp.model._lstm_1997
        • lmp.model._lstm_2000
        • lmp.model._lstm_2002
        • lmp.model._trans_enc
      • lmp.script
        • lmp.script.eval_dset_ppl
        • lmp.script.eval_txt_ppl
        • lmp.script.gen_txt
        • lmp.script.sample_dset
        • lmp.script.tknz_txt
        • lmp.script.train_model
        • lmp.script.train_tknzr
      • lmp.tknzr
        • lmp.tknzr._base
        • lmp.tknzr._bpe
        • lmp.tknzr._char
        • lmp.tknzr._ws
      • lmp.util
        • lmp.util.cfg
        • lmp.util.dset
        • lmp.util.infer
        • lmp.util.log
        • lmp.util.metric
        • lmp.util.model
        • lmp.util.optim
        • lmp.util.rand
        • lmp.util.tknzr
        • lmp.util.validate
      • lmp.vars
  • Glossary
Back to top
Edit this page

lmp.tknzr#

Tokenizer module.

lmp.tknzr.ALL_TKNZRS#

All available tokenizers.

Type

list[lmp.tknzr.BaseTknzr]

lmp.tknzr.TKNZR_OPTS#

Mapping tokenizer name tknzr_name to tokenizer class.

Type

Final[dict[str, lmp.tknzr.BaseTknzr]]

Examples

Get lmp.tknzr.CharTknzr by its name.

>>> from lmp.tknzr import CharTknzr, TKNZR_OPTS
>>> CharTknzr.tknzr_name in TKNZR_OPTS
True
>>> TKNZR_OPTS[CharTknzr.tknzr_name] == CharTknzr
True

Submodules:

  • lmp.tknzr._base
  • lmp.tknzr._bpe
  • lmp.tknzr._char
  • lmp.tknzr._ws
Next
lmp.tknzr._base
Previous
lmp.script.train_tknzr
Copyright © 2022, ProFatXuanAll
Made with Sphinx and @pradyunsg's Furo