Demo Dataset#

class lmp.dset.DemoDset(*, ver: Optional[str] = None)[source]#

Bases: BaseDset

Demo dataset.

This dataset is consist of 2-digits addition literatures. All literatures have the following format:

If you add \(a\) to \(b\) you get \(a + b\) .

where \(a, b\) are integers within \(0\) to \(99\) (inclusive).

Here we describe the dataset in detail. Let \(N = \set{0, 1, \dots, 99}\) be the set of non-negative integers which are less than \(100\). Let \(a, b \in N\).

Version	Design Philosophy	Constraint
`train`	Training set.	\(a < b\)
`valid`	Check whether model learn commutative law on 2-digits integer addition.	\(a > b\)
`test`	Check whether model learn to generalize 2-digits addition.	\(a = b\)

Parameters: ver (Optional[str], default: None) – Version of the dataset. Set to None to use the default version self.__class__.df_ver.

df_ver#

Default version is 'train'.

Type: ClassVar[str]

dset_name#

CLI name of demo dataset is demo.

Type: ClassVar[str]

spls#

All samples in the dataset.

Type: list[str]

ver#

Version of the dataset.

Type: str

vers#

Supported versions including 'train', 'test' and 'valid'.

Type: ClassVar[list[str]]

See also

lmp.dset: All available datasets.

Examples

>>> from lmp.dset import DemoDset
>>> dset = DemoDset(ver='train')
>>> dset[0]
'If you add 0 to 1 you get 1 .'

__getitem__(idx: int) → str#

Sample text using index.

Parameters: idx (int) – Sample index.
Returns: The sample whose index equals to idx.
Return type: str

__iter__() → Iterator[str]#

Iterate through each sample in the dataset.

Yields: str – One sample in self.spls, ordered by sample indices.

__len__() → int#

Get dataset size.

Returns: Number of samples in the dataset.
Return type: int

static download_file(mode: str, download_path: str, url: str) → None#

Download file from url.

Parameters

mode (str) – Can only be 'binary' or 'text'.
download_path (str) – File path of the downloaded file.
url (str) – URL of the file to be downloaded.

Return type

None

static norm(txt: str) → str#

Text normalization.

Text will be NFKC normalized. Whitespaces are collapsed and strip from both ends.

Parameters: txt (str) – Text to be normalized.
Returns: Normalized text.
Return type: str