Projects >> MOQA

1. Team
2. Project Tasks
3. Deliverables

MOQA (Meaning-Oriented Question Answering) is a project sponsored by ARDA under its AQUAINT program.

1. Team

This project is carried out by a team consisting of New Mexico State University’s Computing Research Laboratory (CRL) of Las Cruces, NM, the Institute for Language and Information Technologies, University of Maryland Baltimore County (ILIT) and CoGenTex, Inc. of Ithaca, NY. It is a comprehensive system-level effort covering all three main technical areas of the ARDA AQUAINT program: question understanding and interpretation, determining the answer and presenting the answer. The first complete version of the system concentrates on answering questions about travel and meetings and uses two kinds of data sources—open text (in English, Arabic and Persian) and a structured database, called a fact database (Fact DB) whose entries are instances of concepts in the system’s ontology, or world model. The Fact DB, the ontology and the lexicons for the three languages, together with the working memory that includes the intermediate results of the system operation, are the major static knowledge sources in the system under development. The top-level architecture of the system is illustrated in Figure 1:


Figure 1. Architecture of the System

The general strategy for the development of this system is rapid prototyping, that is, developing a working system in a short period of time, in order to be able to test and evaluate it as a whole. We do not believe that evaluating component routines and algorithms of a comprehensive application such as QA is a cost-effective pursuit. Therefore, once the basic system is assembled, we will work on adding new domains, new knowledge and new types of processing to it: our development approach facilitates rapid portability to other domains. We will be evaluating the performance of the complete system continuously by comparing the time and effort it takes an analyst to perform a task with and without the system. The development of an unconstrained system of this kind will take much more time and resources that can be made available through the AQUAINT program, certainly, in the initial two years.

The CRL/ILIT/CoGenTex team has had a running start on this project, as collectively we have in the past developed a significant percentage of the resources, processors, formalisms and control architectures required for the MOQA system. This state of affairs is summarized in Figure 2:


Figure 2. Development status of the components of the proposed system.

2. Project Tasks

The work on the project involves the following tasks.

Task 1: Design and Implementation of System Architecture.
This task involves integrating all the required system components available to our team from the previous projects, developing a testing and debugging environment and continuous integration and testing of new and expanded system modules.

Task 2: Knowledge Acquisition. This task involves acquiring the goal component of the ontology (size estimate: 25 concepts; extending its plan/script (“complex event”) component to include both domain scripts and workflow scripts whose instantiations in the Fact DB and the extended test meaning representation (TMR) will, inter alia, encode dialog history and context, user profiles, the status of goal attainment. etc. (size estimate: 1,000 concepts); acquiring semantic lexicons for Arabic and the third language (Persian, Russian or Spanish—in the case of Spanish, CRL already has a semantic lexicon); expanding the semantic lexicon for English (the target size of each lexicon in Phase I is set at 20,000 lexical units); adapting and further developing a module (first developed in the TIDES CREST effort) for ontology-based automatic acquisition of Fact DB elements; and populating the Fact DB (size estimate, for the travel and meetings domain in Phase I: 100,000 facts; size estimate, for the workflow, user profile, user intention and QA context-related fact in Phase I: 1,000 facts).

Task 3: Question Understanding. This task includes improving the coverage and quality of the preprocessing modules, especially, the tokenizers and the syntactic analyzers for each language involved—a usable version of each of the preprocessing modules exists at CRL for each of the languages mentioned in the proposal (and for many others!); coverage and quality adjustments and enhancements to the Mikrokosmos semantic analyzer, with special attention paid to co-reference and treatment of unattested lexical items; testing and evaluating semantic analysis throughput for texts in all three languages (a reminder: while the analyst/system dialog will be conducted in English, open text IE will be carried out in each of the system’s languages, which then necessitates translation of results to the TMR form).

Task 4: Question Interpretation. This task uses knowledge (stored in the Fact DB and/or in the extended TMR) about dialog context (current and past), about the user, about the user intentions (goals) and the status of the tasks to present a complete view of the state of affairs in the process of task completion and dialog communication; the decision about what action(s) the system must take at this juncture in the dialog and task completion is also made at this stage.

Task 5: Answer Determination. The decisions made at the previous step will be carried out during answer determination. The actions may involve looking for information in any of two kinds of sources—open text (the TREC TDT corpus will be used for this) and a structured Fact DB. They will also, centrally, include the maintenance of relevant dialog with the user: the system will carry out a running commentary on its own actions; it will also ask clarification questions, make judgments about task priorities and order in which they are attempted, etc. Work on open text answer generation involves the task of generating queries in any of the three languages off of the extended TMR obtained through question understanding and interpretation; this task also involves testing the available IR and IE systems, integrated in Task 1; and the translation of the results of IE (template slot fillers) into the language of TMR.

Task 6: Answer Formulation. This task is the main text generation task in the system. It involves generating a hypertext response to the user query as well as generating the running commentary of the system’s operations, decisions, and inferences. The output language will be English.

Task 7: Documentation; User, Tester and Evaluator Training; Testing; and System Evaluation. This set of tasks will be ongoing over the entire duration of the project. Evaluations of a complete system will be prepared and run at the end of months eight and sixteen and end of the project. The complexity of the system and the limited amount of resources that can be made available makes the formal evaluation of individual components of the system cost-inefficient.

3. Deliverables

At the end of the project, the CRL/ILIT/CoGentex team plans to deliver the following components:

• a comprehensive, self-aware, goal-and-plan-based, context-sensitive, ontological-semantic QA system in the domain of travel and meetings, with a capability to search for information in open texts in three languages and in a structured, ontology-based Fact DB;
• an enhanced text analysis system for each of the languages;
• a question interpretation module that takes into account user goals and the context of the dialog, as well as the awareness of the quality intermediate and final results and rate of progress toward a goal;
• an integrated IR/IE module working on open text in three languages, on the basis of ontologically defined extraction templates;
• a decision-making module that determines the answer(s) and action(s) that the system must produce at each step of the dialog/task processing;
• an ontological-semantic text generation module;
• an enhanced ontology of about 6,500 concepts;
• an enhanced Fact DB of about 100,000 facts;
• a system for automating the acquisition of the Fact DB;
• a semantic lexicon for each of the languages in the system, at about 20,000 entries
• a set of system evaluation results;
• a final technical report describing the system;
• a user manual for the system.

The CRL/ILIT/CoGentex team expects that at least a subset of the above deliverables will be included in the integrated testbed demonstrations and evaluations.


ILIT University of Maryland Baltimore County ECS 202 1000 Hilltop Circle Baltimore, MD 21250
Phone: 410-455-8480 Fax: 410-455-8488 E-mail: ILIT@UMBC.EDU