OLD INEX 2008 Book Track
Task and Submission Guidelines
Gabriella Kazai, Monica Landoni, Antoine Doucet

Introduction

The goal of the Book track is to evaluate techniques for providing a range of services on collections of digitized books, e.g., browsing, searching, reading, and annotating collections of digitized books, and study user interface issues and the behaviour of the users of such services.

To work towards the track investigates the following tasks:

Participants may take part in any or all the tasks. The minimum participation requirement is to contribute results or study participants to at least one of the tasks and to provide relevance judgements for at least one search topic.

Book Retrieval task

Goals

The goal of this task is to compare book-specific IR techniques with standard IR methods for the retrieval of books, where (whole) books are retrieved. The objective is to measure the cost (e.g., increased processing time, storage) and the benefit (e.g., improved retrieval performance) of applying domain-specific indexing and retrieval approaches over standard IR methods.

User scenario

The scenario underlying this task is that of a user searching for books on a given topic with the intent to build a reading or reference list. The list may be for entertainment, for research purposes, or in preparation for lecture materials, etc. Books on a reading list may be purchased or borrowed from libraries.

Task description

The task is to return a ranked list of books estimated relevant to the user's information need. Both standard IR or structured document retrieval (SDR) approaches may be used that are not specific to the book domain, as well as book-specific IR/SDR approaches that make use of domain-specific information or technologies (e.g., library catalogue information, back of book indexes, etc., or specialised ranking strategies or tuning methods).

The task builds on the corpus of over 50,000 digitized books. The test topics for this task will be created (and later on judged) by the participants. Relevance judgements will be collected at the book and page levels, where the latter will be used for the evaluation of the Page in Context task. Book-level precision/recall and NDCG metrics will be used to evaluate retrieval effectiveness. Information on aspects of efficiency will be collected from participants through an online questionnaire.

Participants of this task are invited to submit either single runs or pairs of runs. Participants may submit up to 10 runs in total. This may be 10 single runs, 5 pairs of runs, or any other combination of individual or paired runs. A single run may either be the result of generic (non-specific) or book-specific IR methods. When pairs of runs are submitted, one run should be the result of applying non-specific IR techniques; and the other run should be generated using the same techniques (where possible), but with the use of additional book-specific features (e.g., back-of-book index, citation statistics, book reviews, etc.) or specifically tuned methods.

A minimum of one automatic run (i.e., using only the topic title part of a test topic for searching and without any human intervention) is compulsory.

Each run should contain, for each test topic, the top 1000 books estimated relevant to the given topic, ranked in order of estimated relevance.

Assumptions

Retrieval results are whole books, presented in a ranked list to the user. Users view the ranked list moving from the top of the list down, examining each rank. Relevant books are those that the user would add to their reading list for the given topic.

Research questions

Effectiveness. Given that a book represents the desired unit of retrieval, should book search be treated simply as an application area for traditional document retrieval methods? Is there any benefit in developing and applying book-specific algorithms and using book-specific features? What properties associated with books can lead to improved retrieval effectiveness? How does the use of structural information affect retrieval performance of books? For example, does information extracted from the table of contents, back-of-book index, or associated MARC records improve the quality of the ranking? Is there a need for specialized length normalization techniques in book search?

Efficiency. How should indexing and retrieval methods be optimized for large collections of books, each hundreds of pages long? How is efficiency affected by incorporating book-specific features or ranking strategies?

Submission format

The DTD describing the submission format for the Book Retrieval task is as follows:

<!ELEMENT bs-submission (topic-fields, description, topic+)>
<!ATTLIST bs-submission
participant-id 	CDATA 	#REQUIRED
run-id 	CDATA 	#REQUIRED
paired-run-id	CDATA	#REQUIRED
task 	(book-retrieval) #REQUIRED
query 	(automatic | manual) #REQUIRED
result-type	(book)	 #REQUIRED
retrieval-type	(non-specific | book-specific) #REQUIRED
>
<!ELEMENT topic-fields EMPTY>
<!ATTLIST topic-fields
title 	(yes|no) #REQUIRED
description 	(yes|no) #REQUIRED
narrative 	(yes|no) #REQUIRED
>
<!ELEMENT description (#PCDATA)>
<!ELEMENT topic (book+)>
<!ATTLIST topic topic-id CDATA #REQUIRED >
<!ELEMENT book (bookid, rank?, rsv?)>
<!ELEMENT bookid 	(#PCDATA)>
<!ELEMENT rank	(#PCDATA)>
<!ELEMENT rsv 	(#PCDATA)>
Each submission run must contain the following:

•@participant-id:The Participant ID number of the submitting institute.
•@run-id: A run ID (which must be unique across all submissions sent from one organization - also please use meaningful, but short names if possible).
•@paired-run-id: The run-id identifying the run that the current submission is paired with (i.e., if the current run is the book-specific ranking then the paired run-id should be the id of the generic ranking - these two runs can then be compared to each other). If a single run is submitted, please use "NA".
•@task: Identification of the task, which should just be "book-retrieval".
•@query: Specification whether the search query was constructed automatically ("automatic") or manually ("manual") from the topic.
•@result-type: Specification of the result-type, which should just be set to "book".
•@retrieval-type: Specifies whether the run is a result of generic, "non-specific" IR methods, or "book-specific" IR techniques that make use of book-specific features or algorithms.
•topic-fields: Specification of which topic fields were used for constructing the search query (i.e., title and/or description and/or narrative).
•description: A description of the retrieval approach applied to generate the run. Please add as much detail as you can, as this would help with the comparison and analysis of the results later on.

Furthermore, a run should contain the search results for each topic, confirming to the following criteria:

•topic: Contains the ranked list of books estimated relevant to the given topic, ordered by decreasing value of relevance. Only a maximum of 1000 books should be returned for each topic.
•@topic topic-id: Identifies the topic.
•book: Contains information for each book result in the ranking.
•bookid: Each book should be identified using its bookID, which is the name of the directory that contains the XML source of the book (along with the MARC metadata file).
•rank/rsv: The rank position and RSV value can be recorded for each book in the ranking. Please note, however, that the evaluation will likely rely on the actual ordering of results alone (values of the rank and rsv fields may thus be ignored).

An example submission is:
<bs-submission participant-id="25" run-id="BM25F-With-ToC-BackOfBookIndex-Streams" 
paired-run-id="BM25" task="book-retrieval" query="automatic" 
result-type="book" retrieval-type="book-specific">
 <topic-fields title="yes" description="no" narrative="no"/>
 <description>
  BM25F using 2 streams extracted from the table of contents and the 
  back-of-book index sections of books. The rest of the book content is 
  ignored. Parameters of BM25F were trained using RankNet.
 </description>
<topic topic-id="01">
  <book>
    <bookid>300A5334B2869F47</bookid>
    <rank>1</rank>
  </book>
  <book>
    <bookid>BAD598FB0A7D02E2</bookid>
    <rank>2</rank>
  </book>
  <book>...</book>
...
</topic>
<topic> ... </topic>
</bs-submission>

Page in Context task

Goals

The goal of this task is to investigate the application of focused retrieval approaches to a collection of digitized books. The task is similar to the INEX ad hoc track's Relevant in Context task, but using a significantly different collection and allowing for the ranking of book parts within a book.

User scenario

The scenario underlying this task is that of a user searching for information in a library of books on a given subject. The information sought may be 'hidden' in some books (i.e., it forms only a minor theme of the book) or it may be the main focus of some other books. The user expects to be pointed directly to the relevant book parts.

Task description

Following the focused retrieval paradigm, the task is to identify and rank (non-overlapping) book parts that contain relevant information and return these to the user, grouped by books. Books should be ranked by decreasing order of relevance, which may be based on best or average passage/element score or some other document score. The book parts within a book should be ranked in decreasing order of relevance. Both passage and element retrieval approaches may be used.

The track builds the same corpus, test topics and relevance judgements (to be collected at the book and page levels) as the Book Retrieval task. Book and element/passage-level precision/recall and NDCG metrics will be used to evaluate retrieval effectiveness (subject to change). Information on aspects of efficiency will be collected from participants through an online questionnaire.

Participants may submit up to 10 runs. One automatic run (using only the topic title to search, without any human intervention) and one manual run (allowing human intervention) are compulsory. Additional manual runs, in addition to the 10 run limit, are allowed and even encouraged in order to help the construction of a reliable test collection.

Each run can contain, for each topic, a maximum of 1000 books estimated relevant to the given topic, ordered by decreasing value of relevance. For each book, a ranked list of non-overlapping XML elements, passages or book page results estimated relevant should be listed in decreasing order of relevance. A minimum of one book part must be returned for each book in the ranking. A submission can only contain one type of results, i.e., only XML elements or only passages; result types cannot be mixed.

Assumptions

Users expect to be returned a ranked list of books and for each book a ranking of (non-overlapping) relevant book parts. Users are assumed to view the ranked list of books, moving from the top of the list down, examining each rank. Inside a book, users follow the ranking of book parts and examine each rank. No browsing is considered (only the ranked book parts are visited by users).

The smallest unit of retrieval is a paragraph (which is equivalent to a section element in the OCRML files).

Relevant book parts are those that are about the user's topic of request.

Research questions

Effectiveness. Can structured document retrieval methods be successfully applied to books? Do they scale? Does structure help to improve effectiveness? Does element or passage retrieval work best for books? Should books be ranked by best passage or average score? Should systems rank book parts from the whole collection first and then group these by books, or should book parts from the same book be compared only locally and ranked only relative to each other?

Efficiency. How should indexing and retrieval methods be optimized for large collections of books, each consisting of thousands of XML elements or arbitrary passages?

Submission format

Submissions for the Page in Context task should conform to the following DTD:

<!ELEMENT bs-submission (topic-fields, description, topic+)>
<!ATTLIST bs-submission
participant-id 	CDATA 	#REQUIRED
run-id 	CDATA 	#REQUIRED
task 	(book-ad-hoc) #REQUIRED
query 	(automatic | manual) #REQUIRED
result-type	(element | passage | page) 	#REQUIRED
>
<!ELEMENT topic-fields EMPTY>
<!ATTLIST topic-fields
title 	(yes|no) #REQUIRED
description 	(yes|no) #REQUIRED
narrative 	(yes|no) #REQUIRED
>
<!ELEMENT description (#PCDATA)>
<!ELEMENT topic (book+)>
<!ATTLIST topic topic-id CDATA #REQUIRED >
<!ELEMENT book (bookid, rank?, rsv?, result+)>
<!ELEMENT result ((path|passage), rank?, rsv?)>
<!ELEMENT bookid	(#PCDATA)>
<!ELEMENT path 	(#PCDATA)>
<!ELEMENT passage EMPTY>
<!ATTLIST passage
           start 	(#PCDATA) #REQUIRED
           end 	(#PCDATA) #REQUIRED
>
<!ELEMENT rank	(#PCDATA)>
<!ELEMENT rsv 	(#PCDATA)>
Each submission must contain the following:
•@participant-id:The Participant ID number of the submitting institute.
•@run-id: A run ID (which must be unique across all submissions sent from one organization - also please use meaningful, but short names if possible).
•@task: Identification of the task, which should just be "book-ad-hoc".
•@query: Specification whether the search query was constructed automatically ("automatic") or manually ("manual") from the topic.
•@result-type: Specification of the result-type, which can be either "element", "passage" or "page". An element is an XML element of arbitrary granularity, given by its XPath (see Appendix A). A passage is an arbitrary sized passage, given by its start and end offsets. A page is an XML element of given granularity, given by its XPath. Result elements/passages must not overlap with any other retrieved element/passage.
•topic-fields: Specification of which topic fields were used for constructing the search query (i.e., title and/or description and/or narrative).
•description: A description of the retrieval approach applied to generate the run. Please add as much detail as you can, as this would help with the comparison and analysis of the results later on.

Furthermore, a run should contain the search results for each topic confirming to the following criteria:

•topic: Contains the ranked list of books estimated relevant to the given topic, ordered by decreasing value of relevance. Only a maximum of 1000 books should be returned for each topic.
•@topic topic-id: Identifies the topic.
•book: Contains information for each book result in the ranking.
•bookid: Each book should be identified using its bookID, which is the name of the directory that contains the XML source of the book (along with the MARC metadata file).
•rank/rsv: The rank position and RSV value can be recorded for each book in the ranking. Please note, however, that the evaluation will likely rely on the actual ordering of results alone (values of the rank and rsv fields may thus be ignored).
•result:For each book, a ranked list of book part results estimated relevant to the topic should be returned.
•path/passage:Book part results may be non-overlapping XML elements, passages or book pages. XML elements and pages are identified by their XPaths, while passages are given by their start and end offsets. For information on XPath, please see Appendix A.
•@start/@end:Defines the character offset of the start and end of a passage within a given book, using XPath, please see Appendix A.
•rank/rsv: For each result inside a book, its rank and/or RSV score can be recorded. Please note that the evaluation may rely on the rank order of the books and of the results inside books alone (values of the rank and rsv fields may be ignored).

An example submission may be as follows:
<bs-submission participant-id="25" 
 run-id="BM25F-Focused-PageLevelRetrieval-With-ToC-BackOfBookIndex-Streams"
 task="book-ad-hoc" query="automatic" 
 result-type="page">
 <topic-fields title="yes" description="no" narrative="no"/>
 <description>
  BM25F using 2 streams extracted from the table of contents and the 
  back-of-book index sections, indexing and retrieval only at page level, 
  no relevance propagation
 </description>
<topic topic-id="01">
  <book>
    <bookid>384D10DAEA4E34A8</bookid><rank>1</rank>
    <result><path>/document[1]/page[27]</path><rank>1</rank></result>
    <result><path>/ document[1]/page [122]</path><rank>2</rank></result>
    <result><path>/ document[1]/page [5]</path><rank>3</rank></result>
    ...
  </book>
  <book>
    <bookid>5AFEE130174076E3</bookid><rank>2</rank>
    <result><path>/ document[1]/page [531]</path><rank>1</rank></result>
    <result><path>/ document[1]/page [14]</path><rank>2</rank></result>
    ...
  </book>
  <book>...</book>
...
</topic>
<topic> ... </topic>
</bs-submission>

Structure Extraction task

Goals

The goal of this task is to test and compare automatic techniques for deriving structural information from digitized books in order to build a hyperlinked table of contents.

Motivation

Current digitization and OCR technologies produce the full text of digitized books with only minimal structure information. Pages and paragraphs are usually identified and marked up in the OCR, but more sophisticated structures, such as chapters, sections, etc., are currently not recognised.

Task description

The task is to build the table of contents for digitized books, using information from the OCR (in DjVu XML format), PDF or JPEG image files.

The track uses a sample collection of 100 digitized books of different genre and style. For each book, the OCR output (DjVu file), PDF file, and the JPEG images of each scanned page are made available.

The table of contents created by participants will be compared to a manually built ground truth (from the PDF of a book), and will be evaluated using recall/precision like measures at different structural levels (i.e., different depths in the table of contents). In addition, because the ground truth may not necessarily be optimal, the quality of the created tables of contents will also be evaluated independently: Participants will be asked to grade them on a multi-level quality scale.

Participants may submit up to 10 runs, each containing the table of contents for all 100 books in the test set.

Assumptions

The generated table of contents may be used by an e-book reader system and presented to users as a hyperlinked hierarchy. Users expect to see the section titles as entries and should be able to click on an entry and jump to the start of the selected section in the book.

Application in other tasks

Participants are welcome to apply their table of contents extraction techniques to the main corpus of 50,000 books and submit runs to the Book Retrieval task that exploit this additional information. Note, however, that in the case of the main corpus only the OCR text, in OCRML format, will be provided as input (no PDF or JPEG is available). Also note that the OCRML markup is rather different from the DjVu format: the basic structural elements are named differently and additional structure is also marked up in OCRML.

Participants may also enhance runs they plan to submit to the Page in Context task, as long as the resulting XPaths conform to the structure of the OCRML files. The generated table of contents may also be used and evaluated through user studies in the Active Reading task.

Research questions

Can table of contents be extracted from the table of contents pages of the book (where available) or should they be generated more reliably from the full content of the book? Can table of contents be extracted only from textual information or is page layout information necessary? What techniques provide reliable page number recognition and extraction?

Submission format

Submissions for the Structure Extraction task should conform to the following DTD:

<!ELEMENT bs-submission (source-files, description, book+)>
<!ATTLIST bs-submission
participant-id 	CDATA 	#REQUIRED
run-id 	CDATA 	#REQUIRED
task 	(book-toc) #REQUIRED
toc-creation 	(automatic | semi-automatic) #REQUIRED
toc-source	(book-toc | no-book-toc | full-content | other) #REQUIRED
>
<!ELEMENT source-files EMPTY>
<!ATTLIST source-files
xml 	(yes|no) #REQUIRED
pdf 	(yes|no) #REQUIRED
jpg 	(yes|no) #REQUIRED
>
<!ELEMENT description (#PCDATA)>
<!ELEMENT book (bookid, toc-entry+)>
<!ELEMENT bookid	(#PCDATA)>
<!ELEMENT toc-entry(toc-entry*)>
<!ATTLIST toc-entry
           title 	(#PCDATA) #REQUIRED
           page 	(#PCDATA) #REQUIRED
>
Each submission must contain the following:
•@participant-id:The Participant ID number of the submitting institute.
•@run-id: A run ID (which must be unique across all submissions sent from one organization - also please use meaningful, but short names if possible).
•@task: Identification of the task, which should just be "book-toc".
•@toc-creation: Specification whether the ToC was constructed fully automatically ("automatic") or with some manual aid ("semi-automatic").
•@toc-source:Specification of whether the ToC was built based only on the table of contents part of the book ("book-toc"), any other part of the book excluding the ToC pages ("no-book-toc"), or based on the full content of the book ("full-content"). If neither of these applies, please specify or simply use "other".
•source-files:Specification of the source files used as input, i.e., the XML file (@xml="yes"), the PDF file (@pdf="yes"), and/or the JPEG files (@jpg="yes").
•description: A description of the approach used to generate the ToC. Please add as much detail as you can, as this would help with the comparison and analysis of the results later on.

Furthermore, a run should contain the search results for each topic confirming to the following criteria:

•book: Contains the ToC information for each book.
•bookid: Each book should be identified using its bookID, which is the name of the directory that contains the XML source of the book (along with the MARC metadata file).
•toc-entry:Contains details of each entry of the table of contents for a given book. Entries may be nested, e.g., sections in a chapter should be nested within the ToC entry of the chapter.
•@title: The title of the ToC entry (e.g., chapter title).
•@pageThe page counter that corresponds to the start of the section represented by the ToC entry. The page counter starts with 1 on the first page of the book (i.e., cover page). Note that this is different from the page number that may be printed in the book itself (which may only start on the first content page and may include different formats, e.g., v, xii, 2-18, etc.).

An example submission may be as follows:
<bs-submission participant-id="25"
 run-id="ToCExtractedDirectlyFromBookToC" 
 task="book-toc" 
 toc-creation="automatic" 
 toc-source="full">
<source-files xml="yes" pdf="no" jpg="no"/>
<description>
 Extraction applied directly to recognised ToC pages of the book. 
 The page numbers are then converted to page counters using a pre-built 
 page lookup table. The ToC levels are estimated based on the layout 
 indentation of a ToC entry.
</description>
<book>
  <bookid>384D10DAEA4E34A8</bookid>
  <toc-entry title="Introduction" page="7">
    <toc-entry title="What is covered?" page="8"></toc-entry>
    <toc-entry title="Recommended reading order" page="11"></toc-entry>
  </toc-entry>
  ...
</book>
<book>
  <toc-entry title="Preface" page="6"></toc-entry>
    ...
</book>
...
</bs-submission>

Active reading task

Goals

The main aim of the active reading task (ART) is to explore how hardware or software tools for reading e-books can provide support to users engaged with a variety of reading related activities, such as fact finding, memory tasks or learning. The goal of the investigation is to derive user requirements and consequently design recommendations for more usable tools to support active reading practices for e-books.

Motivation

Software and hardware e-readers have moved on quite quickly with new models recently coming on the market and getting a lot of attention (e.g., Amazon's Kindle and iRex's Ilaid Reader). Researchers, from a number of related communities, are actively involved in the study and design of e-reader tools. Progress in this area, however, suffers by the lack of common practices when it comes to conducting usability studies. Current user studies focus on specific content and user groups and follow a variety of different procedures that make comparison, reflection and better understanding of related problems difficult. ART offers an ideal arena for researchers involved in such efforts with the crucial opportunity to access a large selection of titles, representing different genres and appealing to a variety of potential users, as well as benefiting from established methodology and guidelines for organising effective evaluation experiments.

Task description

ART is based on the large evaluation experience of EBONI, and adopts its evaluation framework with the aim to guide participants in organising and running user studies whose results could then be compared.

The task is to run one or more user studies in order to test the usability of novel e-readers by following the provided EBONI based procedure and focusing on INEX content. Participants should then gather and analyse results according to the EBONI approach and submit these for overall comparison and evaluation.

The evaluation will be task-oriented. Participants will be able to tailor their own evaluation experiments, inside the EBONI framework, according to resources available to them. In order to gather user feedback, participants may choose from a variety of methods, from low-effort online questionnaires to more time consuming one to one interviews, and think aloud sessions.

Requirements for participation

Participants should have access to one or more software/hardware e-readers (already on the market or in prototype version) and be able to feed these with a subset of the book corpus. Titles will be provided according to participants' needs and objectives. For instance, a research group interested in validating a novel interface for e-books in education, involving history students, will be provided with content in this subject and asked to provide representative tasks. The main requirement is that each research group should have access to a representative sample of users pertinent to their research aims and objectives and would be willing to create and submit related tasks. Tasks may be of varying cognitive levels, from fact finding or memory tasks to learning or understanding tasks.

Participants will be asked to involve a minimum sample of 15/20 users who will be asked to complete three growing complexity tasks and fill in a customised version of the EBONI subjective questionnaire, usually taking no longer than half an hour in total, allowing to gather meaningful and comparable evidence. Additional user tasks and different methods for gathering feedback (e.g., video capture) may be added optionally.

Assumptions

Usability is an essential component of the e-reading experience. As there is not such a thing as a universal e-reader, equally usable by any user interested in any content, each user-content pair should be supported by a specifically designed e-reader.

Research questions

How does genre affect usage requirements of e-readers? What kinds of user interface features are best suited for a given user task and genre of books? How to best present an overview of a whole book to users? How can users be supported in navigating the contents of a book effectively. What techniques provide better support for allowing users to quickly assess the relevance or usefulness of a book? What role does annotation play in understanding and learning, and which forms of annotation are suitable for different tasks? How can annotation be shared in order to help learning? Does annotation increase recall in memory tasks?

Setup phase

Participation in ART includes a setup phase, during which participants are expected to:
In order to help us support participants in designing, planning and running their user studies, participants should submit their bookshelf request and their proposed tasks by October 12, 2008. Earlier submissions are encouraged if feedback or collaboration is needed.

User study phase

During the study phase, participants are expected to: Participants are encouraged to integrate questionnaires with interviews and think aloud sessions when possible, and adapt questionnaires to fit into their own research objectives whilst keeping in the remit of the active reading task. We also welcome any direct collaboration with participants to help shape the tasks according to real/existing research needs. Our aim is to run a comparable but individualized set of studies, all contributing to elicit user and usability issues related to e-books and e-reading.

ART crib sheet

A task crib sheet is a rich description of a user task that forms the basis of a given user study based on a particular scenario in a given context. Thus, it should provide a detailed explanation of the context and motivation of the task, and all details that form the scenario of use. It should include a detailed explanation of how the task should be successfully performed, possible paths to solutions and expected outcomes. Information recorded in the task crib sheet must be clear and precise in order to unambiguously determine whether or not users have completed a task and expected results have been achieved. Precise recording of the task is also important for scientific repeatability. The task crib sheet, hence, has the following parts: A couple of example tasks are given below.

Example 1:
ObjectivesLearn about at least two different classic approaches to children's education.
TaskYou are looking for material on children's education and pedagogy with references to different approaches such as Pestalozzi, Steiner or Montessori. You need material describing their approaches in order to find out commonalities and differences.
MotivationYou want to get a better idea of the options available as you are preparing a project proposal dealing with children and e-learning.
ContextThe information you will be collecting and comparing has to be expressed in a scientific sort of way and allow for comparison. You are looking for books that either present and/or review different approaches or monographs on specific pedagogues' schools. You should concentrate on the use of table of contents as you need books focusing on the topic and providing in depth discussion of it.
BackgroundYou have a good background in e-learning for adults and need to explore the children’s environment.
CompletionThe task will be considered completed if at least two different approaches to children education have been found.
SuccessThe task can be considered successful if the material retrieved enables the discussion of different approaches with references to e-learning as part of the project proposal.
Example 2:
ObjectivesExplore the concepts of emotion and affection. You want to get an understanding of the basic concepts looking at classic literature on the subject.
TaskYou are looking for material describing theory about affection and emotions in general.
Motivation
ContextThe information you will be collecting has to be expressed in a scientific sort of way and be used to write your introduction section. You are looking for books that define or review these concepts with scientific rigor. Poems or fiction will not be considered. You should use, when available, the back of the book index to get directly to basic definitions and build a relevant terminology.
BackgroundYou have a good background on HCI, but want to explore affective computing.
CompletionThe task will be considered completed if you provide a basic terminology relevant to the topic.
SuccessThe task can be considered successful if the material retrieved enables the composition of a complete terminology relevant to affective computing.

Submission procedure

An online submission tool for runs of the Book Retrieval, Page in Context and Structure Extraction tasks, and an upload area for the Active Reading task will be provided closer to the respective submission deadlines. Please note that currently there are no plans to provide online validation of submission runs, so please make sure that your runs conform to the appropriate DTD and that all XPaths (see Appendix A) are valid.

Acknowledgments

Many thanks to Marijn Koolen for helpful comments on aspects of this guide document.

Appendix A: XPaths and Passages

XPath

XML element and book page paths should be given in XPath syntax . To be more precise, only fully specified paths are allowed, as described by the following grammar:
Path ::= '/' ElementNode Path | '/' ElementNode | '/' AttributeNode
ElementNode ::= ElementName Index
AttributeNode ::= '@' AttributeName
Index ::= '[' integer ']'
Example:
<path>/document[1]/page[4]/section[2]</path>
This path identifies the XML element, which can be found if we start at the document root, select the first "document" element, then within that, select the fourth "page" element, within which we select the second "section" element.

Please note that XPath counts element nodes starting with 1 and takes into account the element type. For example, if a "page" element has a title and two sections then both the title and the first section elements would be indexed with 1 (since they are different element types). Their XPaths would be given as:
/document[1]/page[1]/title[1],
/document[1]/page[1]/section[1], and
//document[1]/page[1]/section[2].

Passage

Passage paths are given in the same XPath syntax, but allow for an optional character-offset.
PassagePath ::= Path | Path '/text()' Index .' Offset
Offset ::= integer
Example:
<passage start="/document[1]/page[1]/section[1]/text()[1].0" end="/document[1]/page[2]/section[3]/text()[1].876"/>
The offset can be placed anywhere in the text node starting from 0 (first character) to the node-length (last character).

When a passage starts and ends on an element boundary it can be written without the optional character offset. In this case, it is assumed that the passage starts on the first character (i.e., character 0) of the XML element given in the start attribute and ends on the last character of the XML element given in the end attribute:
<passage start="/document[1]/page[1]/section[1]" end="/document[1]/page[1]/section[1]"/>

XML and whitespace

XML is very flexible in its handling of whitespace, i.e., the following two documents are usually regarded as identical.
<a>
    <b/>
</a>


<a><b/></a>

However, strictly speaking the document on the left contains whitespace content (newlines, tabs, spaces) which is not present in the document on the right. That is, the element <a> in the document on the left contains first a newline and some spaces, then an empty <b> element, and then again a newline.

When constructing passages, any whitespace that represents the only textual content of a text node should be ignored.