Research >> Basic Research

Basic research at ILIT falls largely in the realm of knowledge-based natural language processing. We continue to develop the theory and applications of ontological semantics. Within this general framework, we work on a variety of microtheories, including basic semantic dependency extraction, methods for ambiguity resolution (knowledge-based and corpus-based, syntactic and semantic, static or dynamic), non-literal language, recovery from unexpected input, discourse relations, reference resolution, aspect, modality, semantics of modifiers, lexical and compositional semantics of closed-class lexical items, and many others.

- Ontological Semantics

Ontological semantics studies the processes of extracting, representing and manipulating meaning in natural language texts.
Computational processing in ontological semantics relies on a language-independent ontology, an ontology-related lexicon (and onomasticon, or lexicon of proper names) for each language involved and a fact repository consisting of instances of ontological concepts as well as remembered text meaning representations.

The major R&D projects associated with this approach included KBMT-89, Pangloss, Mikrokosmos, CREST and MOQA.

A recent slide presentation can provide a medium-depth introduction to ontological semantics.

A book-length introduction to ontological semantics, Ontological Semantics, by Sergei Nirenburg and Victor Raskin, will be published in September 2004 by MIT Press.

- Treatment of Reference

It is said that during Napoleon’s march from Elba to Paris at the beginning of his “One Hundred Days” in 1815, the successive headlines in Paris newspapers ran something like:

The Corsican Monster Lands at Toulon
The Usurper Marches North
Bonaparte Reaches Lyons
Ex-Emperor in Fontainebleu
Paris Welcomes His Imperial Majesty

It might be difficult to attempt to build a computer system that would emulate the nimbleness, inventiveness and political adroitness of the Parisian editors. However, it is within our reach to build a system that would determine that the text elements in boldface all refer to the same person, and that this person is Emperor Napoleon I Bonaparte.

We understand processing reference in NLP as finding all referring expressions in a text or a corpus and associating them with the representation of real-world entities or events. This definition implies that coreference—that is, the search for surface antecedents—is only a means to a more fundamental end. We propose a novel approach to the treatment of reference in NLP that extends the current state of the art in both breadth and depth. It covers a broader array of phenomena and uses deeper – but available or attainable – sources of knowledge to power its heuristic algorithms than any extant approach. The proposed algorithms for treatment of reference will be incorporated into an existing natural language analysis system (Mahesh et al. 1997, Beale et al. 2002) that includes semantic analysis and produces meaning representations of input texts with the help of a formal ontology.

Our procedure for reference treatment addresses all the types of referring expressions and consists of two components, detection and resolution. Detection consists of the following three tasks: a) determining which objects and events have referential function (not all do, as in My son is a doctor); b) categorizing the referential ones, of which there are many subclasses (as shown in Figure 1); and c) detecting elliptical references. Resolution then finds conceptual references for the expressions, possibly using textual antecedents as clues.

Figure 1. Types of expressions.

We will illustrate the types of referring expressions with examples drawn from the following text, taken at random from the CNN website (it is a typical text and it demonstrates how important it is to be able to treat reference adequately). In the text itself, for purposes of illustration, we marked just two of the many coreference chains in the text (the one referring to the Afghan foreign minister, in bold; the one referring to the Afghan people, in italics).

WASHINGTON (CNN) -- Afghanistan's interim foreign minister expressed optimism Saturday that his nation can rebuild after more than two decades of conflict, provided that the international community remains committed to supplying support. “What we need is continued engagement from the United States, first of all, in the war against terror, which will help stability in Afghanistan and the whole region ... and also in the reconstruction efforts of our people,” Abdullah Abdullah told CNN. “It is a major challenge. We are aware of it.” “What is going on in the political process is a transition from war to peace. After 22 years of war, we have won the war, virtually, and we have to win the peace,” Abdullah said. “It is rebuilding the state from scratch in all aspects of it – political, economical, from the infrastructure point of view, cultural, social. It is an enormous task. But I'm sure the Afghans will do it with the support of the international community,” he said. Abdullah is in Washington to prepare for a visit by interim Afghanistan chairman Hamid Karzai, who is scheduled to meet with President Bush Monday, his first official meeting with Bush since assuming control after the fall of the Taliban regime. On Friday, Abdullah met with Secretary of State Colin Powell and National Security Advisor Condoleezza Rice. Powell, who visited Kabul, the Afghan capital, this month, vowed that the United States would stand by the Afghan people. Abdullah also gave the Council on Foreign Relations an outline of Afghanistan's reconstruction plan to rebuild the devastated country. He told the group that the interim administration is developing a constitution for Afghanistan and will make substantial efforts to include women and the nation's various ethnic groups in the government. Members of the commission that will organize the tribal council or Loya Jirga, whose task is to choose a transitional government at mid-year, were announced Friday. Women are included among the commission's members. “The opportunity is there,” Abdullah said Saturday. “We were optimistic even before September 11 when there were no opportunities and we were trying hard, struggling hard, to create that opportunity,” he said. “We, as Afghans, have to seize it, and have to seize it quickly, and our friends should support us. Together we can make it.”

Direct reference is referring to an object or an event by its basic name. For people, this will typically be their full name (Abdullah Abdullah, the Afghans), their full name expanded by a description (interim Afghanistan chairman Hamid Karzai, Secretary of State Colin Powell) or a canonical abbreviation (President Bush, Bush, Powell, Abdullah). For organizations and places, this will typically be their full name (the United States, the Council on Foreign Relations, Loya Jirga) or a known acronym (CNN, Washington [for Washington, DC]). For events, this will typically be their full name (the war against terror) or a known abbreviation (September 11). One can view these expressions as keys for the database records for their referents. These referring expressions can in some cases be ambiguous (e.g., if the database contains more than one Abdullah Abdullah).

All other referring expressions are indirect. They subdivide into descriptions and pointers. Descriptions denote their referents by mentioning some of their non-key properties. They can be definite (e.g., Afghanistan's interim foreign minister, the international community, the Afghan capital) or indefinite (a transition from war to peace, continued engagement from the United States). Unlike descriptions, pointers just contain enough information to allow hearers to reconstruct to which referring expressions they point. Pointers can be further subdivided into textual pointers (those that typically point to coreferents in the text itself, like he) and deictic pointers (pointing to objects in the “universe of discourse”, that is, to some expected properties of facts – e.g., time, like at mid-year; space, like here; identity of the speaker and hearer, like we; etc.). People are adept at resolving references in well-constructed texts. Out task is to build a computer program that emulates that capability.

Detecting ellipsis involves locating syntactic gaps as well as semantically incomplete structures. In English, syntactic gaps include such things as elided verbs in gapping structures (Mary likes politics but Bill Ø only sports), elided VPs (Mary wants to watch CNN but Bill doesn’t Ø ), and elided head nouns after modifiers (Mary watched CNN for 40 minutes and Bill for only five Ø). Semantically incomplete structures are found in phrases like continued engagement from the United States, where the full interpretation of the term engagement requires the addition of modifiers like military and peace-keeping.

The output of the detection step in reference treatment is, then, a list of all referring expressions, marked by their type, that were either overtly present in the text or were introduced in it through the detection of ellipsis.

Once all referring expressions are detected and classified, they must be resolved. We use the term ‘resolve’ in a broader sense than is typical in the literature: for us, resolution means that all referring expressions must ultimately be associated with representations of objects or events, not only put in a coreference relation with another text element. In our approach, the representations are stored in a fact database and the ontology (see below). Direct referring expressions are resolved through a direct link to the relevant database entry (or, if there is none such yet, to a newly created one), whereas indirect referring expressions require specialized processing by type. Definite and indefinite descriptions, e.g., Afghanistan's interim foreign minister, must be linked to the expression corresponding to a database key, e.g., Abdullah Abdullah. All pointers and ellipses must be expanded into their full referential form based upon the establishment of a coreference chain within the text (he -> Abdullah Abdullah) or extra-textual information (mid-year -> the middle [perhaps May, June, July, August] of the year 2002). Once expanded they, too, must be either linked to a database entry or initiate a new one.

ILIT University of Maryland Baltimore County ECS 202 1000 Hilltop Circle Baltimore, MD 21250
Phone: 410-455-8480 Fax: 410-455-8488 E-mail: ILIT@UMBC.EDU