OLD INEX 2008
QA@INEX Guidelines for Topic Development

Introduction

The aim of the QA@INEX task is to evaluate how current XML retrieval technology can handle focused information needs formulated as natural language questions. The QA task is defined as XML element retrieval, so any ad-hoc XML retrieval system can participate in the task.

Specifically, the QA@INEX task is formulated as follows:

Given a large encyclopedic XML document collection (English Wikipedia) extended with multiple layers of linguistic structures annotated in XML, as a response to a natural language question, an automatic system returns a ranked list of XML elements (sections, paragraphs, sentences, clauses, phrases, names, etc.) that provide best answers to the questions, according to the system.

Topics

A test topic for QA@INEX consists simply of one English natural language question such that an answer can be found in the INEX collection (English Wikipedia), and moreover, a proper response to the question can be described in terms of its semantic type (e.g., "the answer should contain a list of instructions" or "the answer should be a person name"). In other words, looking at a question, a knowledgeable human should be able to describe what a perfect answer would look like, even without knowing the answer.

Some examples of questions suitable for the QA@INEX task:

Example of questions not suitable for the QA@INEX task: During the assessment phase, assessors will be asked to mark (highlight) the correct answer(s) in the pooled responses of the participating systems. As a rough guideline, an exact answer to a QA@INEX question should not be longer than a paragraph.

Topic classification

To be able to focus on different types of questions during the evaluation, we ask topic creators to use question types from a pre-defined list of types. We take the following types for QA@INEX 2008:

Procedure for topic development

Questions for QA@INEX 2008 should be taken from or inspired by queries Wikipedia users actually submitted to Wikipedia's own search engine. We provide a simple search interface to access about 13,000,000 queries submitted to Wikipedia in 2004 (see section "Tools" below). The test questions should be grammatically correct.

Example of the topic development pipeline:

  1. I choose an arbitrary topic: "dogs"
  2. I use the query log search engine to find queries containing "dogs"
  3. I browse the found queries and come across "what hot dogs are made of" which looks interesting
  4. I use TopX search engine to find possible answers to the question in the official INEX wikipedia collection, using query "hot dog"
  5. The first result of TopX links to the article "Hot dog", where, in section "General description" I find a text snippet that provides an answer:
    • There is no fixed specification for hot dog meat , with pork and beef being the most popular. Hot dogs are generally regarded as unhealthy insofar as most have high sodium and fat content. Contents can also be questionable, with cheaper types of hot dogs having been known to contain snouts, ears and organ meat blended. In recent years, manufacturers have turned to turkey and even vegetarian ingredients as well as lowering the salt content.
  6. I submit the question "What are hot dogs made of?" as a topic, together with the answer and the title of the article.
Note that answers to the created questions do not have to correspond to any XML elements. Topic creators should simply provide text snippets that answer the question. Although participating systems will return XML elements as responses, they will not be required to provide exact answers (see section on evaluation in the
QA@INEX task description).

We ask every participating team to create (at least) 15 test topics, i.e., questions with answers. Topic creation should take no more than 2 hours. Later, topic creators will be asked to assess the responses of all submitted runs for their topic. The assessment will be done by simply highlighting correct answers in the pooled articles, using a web-based interface. The assessment will take at most 2 working days per participant.

Tools available for topic development

Questions and comments

Please send your questions and comments to
Valentin Jijkoun.