Sample dataset#
Use this script to sample data points of a dataset.
We can use the following script to sample text from WikiText2Dset.
python -m lmp.script.sample_dset wiki-text-2
The default sampling index is 0 and the default version of WikiText2Dset is train.
Thus the following script has the same sampling result as above.
python -m lmp.script.sample_dset wiki-text-2 --idx 0 --ver train
The following script sample text from WikiText2Dset with index set to 1 and version set to
test.
python -m lmp.script.sample_dset wiki-text-2 --idx 1 --ver test
You can use -h or --help options to get a list of available datasets.
python -m lmp.script.sample_dset -h
You can use -h or --help options on a specific dataset to get a list of supported CLI arguments, including all
available versions of a dataset.
python -m lmp.script.sample_dset wiki-text-2 -h
See also
- lmp.dset
All available datasets.