lmp.dset._demo
#
Demo dataset.
- class lmp.dset._demo.DemoDset(*, ver: Optional[str] = None)[source]#
Bases:
BaseDset
Demo dataset.
This dataset is consist of 2-digits addition literatures. All literatures have the following format:
If you add \(a\) to \(b\) you get \(a + b\) .
where \(a, b\) are integers within \(0\) to \(99\) (inclusive).
Here we describe the dataset in detail. Let \(N = \set{0, 1, \dots, 99}\) be the set of non-negative integers which are less than \(100\). Let \(a, b \in N\).
Version
Design Philosophy
Constraint
train
Training set.
\(a < b\)
valid
Check whether model learn commutative law on 2-digits integer addition.
\(a > b\)
test
Check whether model learn to generalize 2-digits addition.
\(a = b\)
- Parameters
ver (Optional[str], default: None) – Version of the dataset. Set to
None
to use the default versionself.__class__.df_ver
.
See also
- lmp.dset
All available datasets.
Examples
>>> from lmp.dset import DemoDset >>> dset = DemoDset(ver='train') >>> dset[0] 'If you add 0 to 1 you get 1 .'
- static norm(txt: str) str #
Text normalization.
Text will be NFKC normalized. Whitespaces are collapsed and strip from both ends.
See also
unicodedata.normalize
Python built-in unicode normalization.
Examples
>>> from lmp.dset import BaseDset >>> BaseDset.norm('123456789') '123456789'