The aim of the workshop is to present and discuss recent advances in machine learning approaches to text and natural language processing that capitalize on rich prior knowledge models in these domains.


The workshop aims at presenting a diversity of viewpoints on prior knowledge for language and text processing:

  • Prior knowledge for language modeling and parsing
  • Topic modeling for document analysis and retrieval
  • Parametric and non-parametric Bayesian models in NLP
  • Graphical models embodying structural knowledge of texts
  • Complex features/kernels that incorporate linguistic knowledge; kernels built from generative models
  • Limitations of purely data-driven learning techniques for text and language applications; performance gains due to incorporation of prior knowledge
  • Typology of different forms of prior knowledge for NLP (knowledge embodied in generative Bayesian models, in MDL models, in ILP/logical models, in linguistic features, in representational frameworks, in grammatical rules…)
  • Formal principles for combining rule-based and data-based approaches to NLP


Program committee

  • Guillaume Bouchard, Xerox Research Center Europe
  • Nicola Cancedda, Xerox Research Center Europe
  • Hal Daumé III, University of Utah
  • Marc Dymetman, Xerox Research Center Europe
  • Tom Griffiths, Stanford University
  • Peter Grünwald, Centrum voor Wiskunde en Informatica
  • Kevin Knight, University of Southern California
  • Mark Johnson, Brown University
  • Yee Whye Teh, University College London