lmp.script.tknz_txt#

Use pre-trained tokenizer to tokenize text.

One must first run the script lmp.script.train_tknzr before running this script.

The following example used pre-trained tokenizer under experiment my_tknzr_exp to tokenize text 'Hello World'.

python -m lmp.script.tknz_txt --exp_name my_tknzr_exp --txt "Hello World"

You can use -h or --help options to get a list of supported CLI arguments.

python -m lmp.script.tknz_txt -h

See also

lmp.script.train_tknzr

Train tokenizer.

lmp.tknzr

All available tokenizers.

lmp.script.tknz_txt.main(argv: List[str]) List[str][source]

Script entry point.

Parameters

argv (list[str]) – List of CLI arguments.

Return type

None

lmp.script.tknz_txt.parse_args(argv: List[str]) Namespace[source]

Parse CLI arguments.

Parameters

argv (list[str]) – List of CLI arguments.

See also

sys.argv

Python CLI arguments interface.

Returns

Parsed CLI arguments.

Return type

argparse.Namespace