Tokenize given text#

Use pre-trained tokenizer to tokenize text.

One must first run the script lmp.script.train_tknzr before running this script.

The following example used pre-trained tokenizer under experiment my_tknzr_exp to tokenize text 'Hello World'.

python -m lmp.script.tknz_txt --exp_name my_tknzr_exp --txt "Hello World"

You can use -h or --help options to get a list of supported CLI arguments.

python -m lmp.script.tknz_txt -h

See also

lmp.script.train_tknzr

Train tokenizer.

lmp.tknzr

All available tokenizers.

lmp.script.tknz_txt.main(argv: List[str]) List[str][source]#

Script entry point.

Parameters

argv (list[str]) – List of CLI arguments.

Return type

None

lmp.script.tknz_txt.parse_args(argv: List[str]) Namespace[source]#

Parse CLI arguments.

Parameters

argv (list[str]) – List of CLI arguments.

See also

sys.argv

Python CLI arguments interface.

Returns

Parsed CLI arguments.

Return type

argparse.Namespace