Hierarchies are becoming ever more popular for the organization of documents, particularly on the Web (Web directories are an example of such hierarchies). Along with their widespread use comes the need for automated classification of new documents to the categories in the hierarchy. As the size of the hierarchy grows and the number of documents to be classified increases, a number of interesting problems arise. In particular it is one of the rare situations where data sparsity remains an issue despite the vastness of available data. The reasons for this are the simultaneous increase in the number of classes and their hierarchical organization. The latter leads to a very high imbalance between the classes at different levels of the hierarchy. Additionally, the statistical dependence of the classes poses challenges and opportunities for the learning methods

Research on large-scale classification so far has focused on situations involving a large number of documents and/or a large numbers of features, with a limited number of categories. However, this is not the case in hierarchical category systems, such as DMOZ, the International Patent Classification or Wikipedia, where, in addition to the large number of documents and features, a large number of categories exist, in the order of tens or hundreds of thousands. Approaching this problem, either existing large-scale classifiers can be extended, or new methods need to be developed. The goal of this workshop, which follows the first edition held in conjunction with the European Conference on Inforamtion Retrieval (ECIR) in 2010, is to discuss and assess some of these strategies, covering all or part of the issues mentioned above.

Workshop Format

The workshop is intended for one day. All participants will be asked to prepare papers, which will be presented either as oral presentations or posters. Submissions must be written in English, following the LNCS guidelines and must not exceed 12 pages including references and figures. Additionally, the program will include one invited talk and a round-table discussion.

The submissions to the workshop are elicited through an open call for papers and will undergo peer review by the programme committee. We encourage submissions on all aspects of large-scale categorization, from purely theoretical work to practical developments of large-scale categorizers.

Organisers

  • George Paliouras, NCSR "Demokritos", Athens, Greece
  • Eric Gaussier, LIG, Grenoble, France
  • Aris Kosmopoulos, NCSR "Demokritos" & AUEB, Athens, Greece
  • Ion Androutsopoulos, AUEB, Athens, Greece
  • Thierry Artières, LIP6, Paris, France
  • Patrick Gallinari, LIP6, Paris, France