For natural language processing and other applications, it has long seemed desirable to group words together according to their essential semantic type-[[Human]], [[Animate]], [[Artefact]], [[Physical Object]], [[Event]], etc.-and to arrange them into a hierarchy. Vast lexical and conceptual ontologies such as WordNet and BSO have been built on this foundation. Examples such as fire a [[Human]] (=dismiss from employment vs. fire a [[Weapon]] (=cause to discharge a projectile) have led to the expectation that semantic types such as [[Weapon]] and [[Human]] can be used systematically for word sense disambiguation. Unfortunately, this expectation is often unwarranted. For example, one attends an [[Event]]-a meeting, a lecture, a funeral, a coronation, etc., but there are many events-e.g. a thunderstorm, a suicide-that people do not attend, while some of the things that people do attend e.g. a school, a church, a clinic-are not [[Event]]s, but rather [[Location]]s where specific events take place. The CPA (Corpus Pattern Analysis) project at Masaryk University, Brno, provides two steps for dealing with this kind of inconvenient linguistic phenomenon: 1) Non-canonical lexical items are coerced into "honorary" membership of a lexical set in particular contexts, e.g. school, church, clinic are coerced into membership of the [[Event]] set in the context of attend, but not, for example, in the context of arrange; 2) The ontology is not a rigid yes/no structure, but a statistically based structure of shimmering lexical sets, like this: [[Event]]: ... meeting <attend __ 663/5355>, where 633 is the total number of occurrences of meeting with attend, 5355 the total number of occurrences of attend in our reference corpus (British National Corpus). Thus, each canonical member of a lexical set is recorded with statistical contextual information. Thus, the semantic ontology is a shimmering hierarchy populated with words which come in and drop out according to context, and whose relative frequency in those contexts is measured. A shimmering ontology of this kind preserves, albeit in a weakened form, the predictive benefits of hierarchical conceptual organization, while maintaining the empirical validity of natural-language description.

Shimmering lexical sets

JEZEK, ELISABETTA
2008-01-01

Abstract

For natural language processing and other applications, it has long seemed desirable to group words together according to their essential semantic type-[[Human]], [[Animate]], [[Artefact]], [[Physical Object]], [[Event]], etc.-and to arrange them into a hierarchy. Vast lexical and conceptual ontologies such as WordNet and BSO have been built on this foundation. Examples such as fire a [[Human]] (=dismiss from employment vs. fire a [[Weapon]] (=cause to discharge a projectile) have led to the expectation that semantic types such as [[Weapon]] and [[Human]] can be used systematically for word sense disambiguation. Unfortunately, this expectation is often unwarranted. For example, one attends an [[Event]]-a meeting, a lecture, a funeral, a coronation, etc., but there are many events-e.g. a thunderstorm, a suicide-that people do not attend, while some of the things that people do attend e.g. a school, a church, a clinic-are not [[Event]]s, but rather [[Location]]s where specific events take place. The CPA (Corpus Pattern Analysis) project at Masaryk University, Brno, provides two steps for dealing with this kind of inconvenient linguistic phenomenon: 1) Non-canonical lexical items are coerced into "honorary" membership of a lexical set in particular contexts, e.g. school, church, clinic are coerced into membership of the [[Event]] set in the context of attend, but not, for example, in the context of arrange; 2) The ontology is not a rigid yes/no structure, but a statistically based structure of shimmering lexical sets, like this: [[Event]]: ... meeting , where 633 is the total number of occurrences of meeting with attend, 5355 the total number of occurrences of attend in our reference corpus (British National Corpus). Thus, each canonical member of a lexical set is recorded with statistical contextual information. Thus, the semantic ontology is a shimmering hierarchy populated with words which come in and drop out according to context, and whose relative frequency in those contexts is measured. A shimmering ontology of this kind preserves, albeit in a weakened form, the predictive benefits of hierarchical conceptual organization, while maintaining the empirical validity of natural-language description.
2008
8496742679
9788496742673
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11571/139680
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact