INEX 2010 Book Track


The goal of the Book Track is to promote inter-disciplinary research investigating techniques for supporting users in reading, searching, and navigating the full texts of digitized books and to provide a forum for the exchange of research ideas and contributions. Focusing on topics of interest in the fields of information retrieval (IR), human computer interaction (HCI), digital libraries (DL), and eBooks, the track in 2010 will explore the following four tasks:
  1. Prove It task: Find evidence in books to confirm or refute a factual statement. Systems need to return page XML elements from books that contain evidence regarding the statement. This task tests the application of focused retrieval approaches to a collection of over 50,000 digitized books to return users relevant book pages.
  2. Best Books for Reference task: Find the best, most relevant books on the subject of a given factual statement. This task tests domain-specific full-text search methods on a collection of over 50,000 digitized books.
  3. Active Reading task: Conducting user studies into active reading, i.e., exploring how and why readers use eBooks in specific scenarios with a focus on eBook usability.
  4. Structure Extraction task: Building navigation tools for digitized books by constructing hyperlinked table of contents from OCR text and layout information for a sample of 1,000 books.

Book corpus

The track builds on a collection of over 50,000 digitized, out-of-copyright books, provided by Microsoft Live Book Search and the Internet Archive (for non-commercial purposes only). The OCR content of the books is stored in an XML format, referred to as BookML. Most books also have an associated metadata file (*.mrc), which contains publication (author, title, etc.) and classification information in MAchine-Readable Cataloging (MARC) record format.

To access the corpus, you will first need to register to participate in the INEX 2010 Book Track. Once you registered, email Gabriella Kazai at The corpus can then be either downloaded from or you can request it on a USB 2.0 HDD (at a cost of about 70 GBP).


Participants have access to:


Prove It and Best Books tasks:

JuneBook corpus ready and available for download
July 19Topic creation guidelines distributed
July 31Topic submission deadline
Aug 30Topics and Task descriptions distributed
Sep 30Run submissions deadline
Oct 20Relevance assessments deadline
Oct 30Release of assessments and results
Nov 22Papers due for the INEX 2010 proceedings
Dec 13-15INEX Workshop in Amsterdam

Active Reading Task:

Sep 30Submission deadline for user study results
Oct 20Distribution of collected data
Nov 22Papers due for the INEX 2009 workshop
Dec 13-15INEX Workshop in Amsterdam

Structure Extraction Task:

July 31Data is available for download
Sep 30Submissions due
Oct 20Groundtruth annotation due
Oct 30Result announcement and distribution of ground-truth
Nov 22Papers due for the INEX 2009 workshop
Dec 13-15INEX Workshop in Amsterdam


Best Books and Prove It search tasks

Gabriella Kazai
Microsoft Research Cambridge

Marijn Koolen
University of Amsterdam

Structure Extraction task

Antoine Doucet
University of Caen

Active Reading task

Monica Landoni
University of Lugano