|
|
OLD INEX 2008 QA@INEX Guidelines for Topic Development
|
|
|
|
Introduction
The aim of the QA@INEX task is to evaluate how current XML retrieval technology
can handle focused information needs formulated as natural language questions.
The QA task is defined as XML element retrieval, so any ad-hoc XML retrieval
system can participate in the task.
Specifically, the QA@INEX task is formulated as follows:
Given a large encyclopedic XML document collection (English Wikipedia) extended
with multiple layers of linguistic structures annotated in XML, as a response
to a natural language question, an automatic system returns a ranked list of
XML elements (sections, paragraphs, sentences, clauses, phrases, names, etc.)
that provide best answers to the questions, according to the system.
Topics
A test topic for QA@INEX consists simply of one English natural language question such
that an answer can be found in the INEX collection (English Wikipedia), and moreover,
a proper response to the question can be described in terms of its semantic
type (e.g., "the answer should contain a list of instructions" or "the answer
should be a person name").
In other words, looking at a question, a knowledgeable human should be able to describe
what a perfect answer would look like, even without knowing the answer.
Some examples of questions suitable for the QA@INEX task:
- How do researchers define a separate species?
- An appropriate response to this question can be a list of criteria that
are used to define a group of organisms as a separate species.
- What was the weather like during the 1941 German invasion in Russia?
- A response would provide a description of the weather, e.g.,
temperature or atmospheric conditions.
- What is the probability density function of a uniform distribution?
- A response should contain a formula or a verbal description of the function.
- Where was a temporary capital of the United States?
- A response should contain a city/town name.
- When was Texas Instruments founded?
- A date is expected as a response.
- Why is there no future tense in the Finnish language?
- Explanations are expected as a response.
- What did Svante Arrhenius discover?
- Something "discoverable" is expected as a response: a phenomenon, a theory, a location etc.
(In fact, a correct response can be "greenhouse effect".)
Example of questions not suitable for the QA@INEX task:
- What was Freud's theory?
- A response would be a general description of a theory, and it's difficult to describe it
more specifically. This topic would be more suitable for the Ad-Hoc task.
- What is the history of the toothbrush?
- A very general, "adhoc-ish" question, for which it's difficult to define a specific response type.
- What do Esperanto and English have in common?
- Again, it's hard to describe the type of the response.
During the
assessment phase, assessors will be asked to mark (highlight) the correct answer(s) in the pooled responses
of the participating systems. As a rough guideline, an exact answer to a QA@INEX question should not
be longer than a paragraph.
Topic classification
To be able to focus on different types of questions during the evaluation, we ask topic creators
to use question types from a pre-defined list of types. We take the following types for QA@INEX 2008:
- numerical factoid: a question asking for a number or measurement (including dates). Examples:
- How many zeros are in the billion?
- What is the elevation of new york city?
- When was Bill Cosby born?
- entity factoid: a question asking for a name of an entity. Examples:
- what is the highest mountain in ireland?
- where is montserrat located?
- who was joseph smith's first wife?
- general factoid: a question asking for a simple "fact", excluding measurements or entity names. Examples:
- what is 3.141592653 called?
- what are the perks after presidency in the US?
- what is ceramic used for?
- entity list: a question asking for a list of entity names (i.e., names that are normally capitalized in English). Examples:
- what are the races in star wars?
- Who are the members of the band Broken Social Scene?
- Which countries did Hitler invade?
- Note that for a list question, one response of a system should provide a complete answer, so the topic creators should only include
list questions for which a complete answer is provided in some Wikipedia article (e.g., there may be an article
expliticly listing the countries invaded by Hitler).
- general list: a question asking for a list of simple "facts", excluding lists of entity names. Examples:
- what are the tenses of the English language?
- what stages does a tadpole undergo when changing into a frog?
- Find names of traditional dances in New Zealand.
- (Note that, grammatically, the latter is not even a question, though
the information need can also be formulated as a question).
- definition: a question asking for a short dictionary-style definition or description. Examples:
- what is absolutism?
- what is political asylum
- who is marlene dietrich
- Topic creators should avoid definition questions that can be answered by the first paragraph of the Wikipedia
article devoted to the focus of the question
- why question: a question asking for an explanation . Examples:
- Why was Dresden bombed?
- What causes excessive flatulence?
- Why is Washington DC not a state?
Procedure for topic development
Questions for QA@INEX 2008 should be taken from or inspired by queries Wikipedia users actually
submitted to Wikipedia's own search engine. We provide a simple search interface to access about 13,000,000
queries submitted to Wikipedia in 2004 (see section "Tools" below). The test questions should be grammatically correct.
Example of the topic development pipeline:
- I choose an arbitrary topic: "dogs"
- I use the query log search engine to find queries containing "dogs"
- I browse the found queries and come across "what hot dogs are made of" which looks interesting
- I use TopX search engine to find possible answers to the question in the official INEX wikipedia collection, using query "hot dog"
- The first result of TopX links to the article "Hot dog", where, in section "General description" I find a text snippet that provides an answer:
-
There is no fixed specification for hot dog meat , with pork and beef
being the most popular. Hot dogs are generally regarded as unhealthy
insofar as most have high sodium and fat content. Contents can also be
questionable, with cheaper types of hot dogs having been known to contain
snouts, ears and organ meat blended. In recent years, manufacturers have
turned to turkey and even vegetarian ingredients as well as lowering
the salt content.
- I submit the question "What are hot dogs made of?" as a topic, together with the answer and the title of the article.
Note that answers to the created questions do not have to correspond to any XML elements. Topic creators should simply provide
text snippets that answer the question. Although participating systems will return XML elements as responses, they will not
be required to provide exact answers (see section on evaluation in the QA@INEX
task description).
We ask every participating team to create (at least) 15 test topics, i.e., questions with answers. Topic creation should
take no more than 2 hours. Later, topic creators will be asked to assess the responses of all submitted runs for their topic.
The assessment will be done by simply highlighting correct answers in the pooled articles, using a web-based interface.
The assessment will take at most 2 working days per participant.
Tools available for topic development
Questions and comments
Please send your questions and comments to Valentin Jijkoun.