lmp.dset._wnli
#
WNLI dataset.
- class lmp.dset._wnli.WNLIDset(*, ver: Optional[str] = None)[source]#
Bases:
BaseDset
Winograd NLI dataset.
Winograd NLI is a relaxation of the Winograd Schema Challenge 1 proposed as part of the GLUE 2 benchmark. This dataset only extract sentences from WNLI and no NLI labels were used.
Here are the statistics of each supported version. Tokens are separated by whitespaces.
Version
Number of samples
Maximum number of tokens
Minimum number of tokens
dev
142
63
4
test
292
60
4
train
1270
63
3
- Parameters
ver (Optional[str], default: None) – Version of the dataset. Set to
None
to use the default versionself.__class__.df_ver
.
Examples
>>> from lmp.dset import WNLIDset >>> dset = WNLIDset(ver='test') >>> dset[0] Mark was timid .
- classmethod download_dataset() None [source]#
Download WNLI dataset.
Download zip file from https://dl.fbaipublicfiles.com/glue/data/WNLI.zip and extract raw files from zip file. Raw files are named as
wnli.ver.tsv
, wherever
is the version of the dataset. After extracting raw files the downloaded zip file will be deleted.- Return type
None
- static norm(txt: str) str #
Text normalization.
Text will be NFKC normalized. Whitespaces are collapsed and strip from both ends.
See also
unicodedata.normalize
Python built-in unicode normalization.
Examples
>>> from lmp.dset import BaseDset >>> BaseDset.norm('123456789') '123456789'
- 1
Hector Levesque, Ernest Davis, and Leora Morgenstern. The winograd schema challenge. In Thirteenth international conference on the principles of knowledge representation and reasoning. 2012. URL: https://cs.nyu.edu/~davise/papers/WinogradSchemas/WS.html.
- 2
Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel Bowman. GLUE: a multi-task benchmark and analysis platform for natural language understanding. In Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, 353–355. Brussels, Belgium, November 2018. Association for Computational Linguistics. URL: https://aclanthology.org/W18-5446, doi:10.18653/v1/W18-5446.