This paper addresses the problem of validating human judgments on verb semantic selection acquired through manual clustering of concordances from a corpus. In addition to the well-know method based on inter-annotator agreement, we propose a methodology in which the judgements are compared with automatically obtained clusters of word embeddings of argument fillers extracted from corpora. Our working assumption is that judgments and clusters overlap semantically, and we want to verify this hypothesis empirically. We extract the human judgments from the T-PAS resource (Jezek et al., 2014), which contains semantic preferences for subject, object, and prepositional complements for about 1200 Italian verbs, and the argument fillers from the ItWaC corpus (Baroni et al., 2009). We provide a proof of concept that the methodology based on automatically obtained clusters of word embeddings of argument fillers is effective in validating the judgments, with two caveats.
Validating Human Judgements on Verb Semantic Selection
Jezek, E.
2019-01-01
Abstract
This paper addresses the problem of validating human judgments on verb semantic selection acquired through manual clustering of concordances from a corpus. In addition to the well-know method based on inter-annotator agreement, we propose a methodology in which the judgements are compared with automatically obtained clusters of word embeddings of argument fillers extracted from corpora. Our working assumption is that judgments and clusters overlap semantically, and we want to verify this hypothesis empirically. We extract the human judgments from the T-PAS resource (Jezek et al., 2014), which contains semantic preferences for subject, object, and prepositional complements for about 1200 Italian verbs, and the argument fillers from the ItWaC corpus (Baroni et al., 2009). We provide a proof of concept that the methodology based on automatically obtained clusters of word embeddings of argument fillers is effective in validating the judgments, with two caveats.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.