The purpose of this paper is to evaluate whether distributional techniques applied to lexical sets, i.e. the set of fillers of verb argument slots, constitute a useful heuristic to model verb semantic selection. To achieve this purpose, we extract the word vectors corresponding to our lexical set vocabulary from the word2vec distributional semantic model, and then perform k-means clustering on these. We focus on verbs undergoing the causative/inchoative alternation as a case study, as they offer an interesting challenge due to the theoretical assumption that the lexical sets of the transitive Object (O) and the intransitive Subject (S) overlap. We analyze the obtained clusters from a qualitative point of view, calculate the prototype vector based on the cluster centroid, and evaluate them against the human judgments on verb semantic selection acquired from a lexical resource. We present an in-depth linguistic analysis of the Italian verb suonare ’to ring, to play’. The analysis demonstrates that automatically obtained clusters and human judgments based on manual clustering match closely, although the centroids appear not to be systematically the best indicators of the cluster semantics, and metonymic uses leads to incorrect automatic analysis.
Evaluating Distributional Representations of Verb Semantic Selection
Jezek E.;
2019-01-01
Abstract
The purpose of this paper is to evaluate whether distributional techniques applied to lexical sets, i.e. the set of fillers of verb argument slots, constitute a useful heuristic to model verb semantic selection. To achieve this purpose, we extract the word vectors corresponding to our lexical set vocabulary from the word2vec distributional semantic model, and then perform k-means clustering on these. We focus on verbs undergoing the causative/inchoative alternation as a case study, as they offer an interesting challenge due to the theoretical assumption that the lexical sets of the transitive Object (O) and the intransitive Subject (S) overlap. We analyze the obtained clusters from a qualitative point of view, calculate the prototype vector based on the cluster centroid, and evaluate them against the human judgments on verb semantic selection acquired from a lexical resource. We present an in-depth linguistic analysis of the Italian verb suonare ’to ring, to play’. The analysis demonstrates that automatically obtained clusters and human judgments based on manual clustering match closely, although the centroids appear not to be systematically the best indicators of the cluster semantics, and metonymic uses leads to incorrect automatic analysis.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.