Dataset base class#
- class lmp.dset.BaseDset(*, ver: Optional[str] = None)[source]#
Bases:
Dataset
Dataset base class.
Most datasets need to be downloaded from the web. Only some of them can be generated locally. Datasets are downloaded / generated automatically if they are not on your local machine. No downloading or generation are executed if dataset files already exist on your local machine.
- Parameters
ver (Optional[str], default: None) – Version of the dataset. Set to
None
to use the default versionself.__class__.df_ver
.
See also
- lmp.dset
All available datasets.
- __iter__() Iterator[str] [source]#
Iterate through each sample in the dataset.
- Yields
str – One sample in
self.spls
, ordered by sample indices.
- static download_file(mode: str, download_path: str, url: str) None [source]#
Download file from
url
.
- static norm(txt: str) str [source]#
Text normalization.
Text will be NFKC normalized. Whitespaces are collapsed and strip from both ends.
See also
unicodedata.normalize
Python built-in unicode normalization.
Examples
>>> from lmp.dset import BaseDset >>> BaseDset.norm('123456789') '123456789'