The course is composed of three parts:
This lecture will focus on the following topics:
- Evaluating quality of search results (test collections, side-by-side evaluation, C-TEST framework, tuning)
- The Web from an enterprise perspective (i.e. visibility of organisational websites in global search engines)
- Enterprise-scale webs
- Topic-specific search portals and information quality
- Retrieval based on external descriptions of documents
- Distributed search and personal metasearch
- Efficient generation of document summaries
- Link graphs, web communities, spam rejection
- Web-scale crawling
- Indexing and retrieval, near-duplicate detection, ad generation
- Algorithmic advertising
2. Accessing XML content: An information retrieval perspective (M. Lalmas) — [slides]
With XML as the evolving standard for structured documents, there is an increasing demand for appropriate XML access methods.
The development of approaches to access XML content has generated a wealth of issues that are being addressed by the database (DB)
and information retrieval (IR) communities. The DB community has traditionally focused on developing query languages and efficient
evaluation algorithms for highly structured content. In contrast, the IR community has traditionally focused on searching unstructured
content, and has developed various techniques for ranking query results and evaluating their effectiveness.
This lecture will concentrate on the work pursued by the IR community, where the main purpose is to provide content-oriented
access to XML documents to support more precise access to XML documents by retrieving XML document components (the so-called XML
elements) instead of whole documents in response to users' queries. The lecture will introduce the major XML-related standards
and their role in information retrieval. It will cover structured text models, already investigated before XML, and their
relation to XML, as well as indexing and searching algorithms for XML. Current XML retrieval approaches covering both extensions
of older methods toward XML, as well as new models and methods developed specifically for content-oriented XML retrieval will
be discussed. The lecture will finish with the issue related to the evaluation of content-oriented XML retrieval, carried out as part of INEX.
This lecture is based on tutorials given at the ACM SIGIR conferences in Seattle, 2006 (together with Ricardo Baeza-Yates)
and Amsterdam, 2007 (together with Sihem Amer-Yahia, Ricardo Baeza-Yates, and Mariano Concens).
3. Service Oriented Architectures (M. Little) — [slides]
Conceptually, a distributed application consists of several distinct fragments split between the original calling process
(client) and a remote (server) process responsible for executing the requested operations locally. Both the client and server
are typically designed and implemented as if the application was to execute in a traditional centralised environment.
Unfortunately this encourages an architecture where you should tie data and its processing together, leading to tightly
coupled applications. Such applications can be brittle when failures occur or new services/objects need to be swapped
in to replace old services/objects.
SOA is an architectural style to achieve loose coupling among interacting software agents. A service is a unit of work
done by a service provider to achieve desired end results for a consumer. Both provider and consumer are roles played by
software agents on behalf of their owners. SOA is deliberately unprescriptive about what happens behind service endpoints:
we are ultimately only concerned with the transfer of structured data between parties, plus any meta-level information to
safeguard such transfers (e.g., by encrypting or digitally signing messages).
SOA breaks the three-tier approach by inserting a new interface layer to de-couple the core business logic and database
(back-end implementation choices) from the presentation layer and other applications. SOA turns business functions into
services that can be reused and accessed through standard interfaces. This presentation will give an
overview of information processing in SOA.
Schedule
Monday |
From 15h00 |
Registration |
|
16h30-18h45 |
Web search, part I (D. Hawking) |
|
Tuesday |
08h30-11h30 |
Web search, part I (D. Hawking) |
|
16h30-18h45 |
Web search, part II (A. Broder) |
|
Wednesday |
08h30-11h30 |
Web search, part II (A. Broder) |
|
16h30-18h45 |
Accessing XML content (M. Lalmas) |
|
Thursday |
08h30-11h30 |
Accessing XML content (M. Lalmas) |
|
16h30-18h45 |
Service Oriented Architectures (M. Little) |
|
Friday |
08h30-11h30 |
Service Oriented Architectures (M. Little) |
|
11h30-12h00 |
Wrap-up |
|