Winograd NLI dataset#
- class lmp.dset.WNLIDset(*, ver: Optional[str] = None)[source]#
Bases:
BaseDsetWinograd NLI dataset.
Winograd NLI is a relaxation of the Winograd Schema Challenge 1 proposed as part of the GLUE 2 benchmark. This dataset only extract sentences from WNLI and no NLI labels were used.
Here are the statistics of each supported version. Tokens are separated by whitespaces.
Version
Number of samples
Maximum number of tokens
Minimum number of tokens
dev142
63
4
test292
60
4
train1270
63
3
- Parameters
ver (Optional[str], default: None) – Version of the dataset. Set to
Noneto use the default versionself.__class__.df_ver.
Examples
>>> from lmp.dset import WNLIDset >>> dset = WNLIDset(ver='test') >>> dset[0] Mark was timid .
- __iter__() Iterator[str]#
Iterate through each sample in the dataset.
- Yields
str – One sample in
self.spls, ordered by sample indices.
- classmethod download_dataset() None[source]#
Download WNLI dataset.
Download zip file from https://dl.fbaipublicfiles.com/glue/data/WNLI.zip and extract raw files from zip file. Raw files are named as
wnli.ver.tsv, whereveris the version of the dataset. After extracting raw files the downloaded zip file will be deleted.- Return type
None
- static norm(txt: str) str#
Text normalization.
Text will be NFKC normalized. Whitespaces are collapsed and strip from both ends.
See also
unicodedata.normalizePython built-in unicode normalization.
Examples
>>> from lmp.dset import BaseDset >>> BaseDset.norm('123456789') '123456789'
- 1
Hector Levesque, Ernest Davis, and Leora Morgenstern. The winograd schema challenge. In Thirteenth international conference on the principles of knowledge representation and reasoning. 2012. URL: https://cs.nyu.edu/~davise/papers/WinogradSchemas/WS.html.
- 2
Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel Bowman. GLUE: a multi-task benchmark and analysis platform for natural language understanding. In Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, 353–355. Brussels, Belgium, November 2018. Association for Computational Linguistics. URL: https://aclanthology.org/W18-5446, doi:10.18653/v1/W18-5446.