News Archives

CLAGI workshop: Call for Papers

EACL 2009 workshop on Computational Linguistic Aspects of Grammatical Inference

Call for Papers

30 or 31 March 2009
Co-located with The 12th Conference of the European Chapter of the Association for Computational Linguistics, Athens, Greece
Submission deadline: 19 December 2008


There has been growing interest over the last few years in learning grammars from natural language text (and structured or semi-structured text). The family of techniques enabling such learning is usually called “grammatical inference” or “grammar induction”.

The field of grammatical inference is often subdivided into formal grammatical inference, where researchers aim to proof efficient learnability of classes of grammars, and empirical grammatical inference, where the aim is to learn structure from data. In this case the existence of an underlying grammar is just regarded as a hypothesis and what is sought is to better describe the language through some automatically learned rules.

Both formal and empirical grammatical inference have been linked with (computational) linguistics. Formal learnability of grammars has been used in discussions on how people learn language. Some people mention proofs of (non-)learnability of certain classes of grammars as arguments in the empiricist/nativist discussion. On the more practical side, empirical systems that learn grammars have been applied to natural language. Instead of proving whether classes of grammars can be learnt, the aim here is to provide practical learning systems that automatically introduce structure in language. Example fields where initial research has been done are syntactic parsing, morphological analysis of words, and bilingual modeling (or machine translation).

This workshop at EACL 2009 aims to explore the state-of-the-art in these topics. In particular, we aim at bringing formal and empirical grammatical inference researchers closer together with researchers in the field of computational linguistics.


We invite the submission of papers on original and unpublished research on all aspects of grammatical inference in relation to natural language (such as, syntax, semantics, morphology, phonology, phonetics), including, but not limited to

* Automatic grammar engineering, including, for example,
o parser construction,
o parameter estimation,
o smoothing, …
* Unsupervised parsing
* Language modelling
* Transducers, for instance, for
o morphology,
o text to speech,
o automatic translation,
o transliteration,
o spelling correction, …
* Learning syntax with semantics
* Unsupervised or semi-supervised learning of linguistic knowledge
* Learning (classes of) grammars (e.g. subclasses of the Chomsky Hierarchy) from linguistic inputs
* Comparing learning results in different frameworks (e.g. membership vs. correction queries)
* Learning linguistic structures (e.g. phonological features, lexicon) from the acoustic signal
* Grammars and finite state machines in machine translation
* Learning setting of Chomskyan parameters
* Cognitive aspects of grammar acquisition, covering, among others,
o developmental trajectories as studied by psycholinguists working with children,
o characteristics of child-directed speech as they are manifested in corpora such as CHILDES, …
* (Unsupervised) Computational language acquisition (experimental or observational)


Papers should present original, completed and unpublished research, not exceeding 8 pages. All submissions are to be formatted using the EACL 2009 style files (

Papers should be submitted electronically, no later than Friday 19 December, 2008. The only accepted format for submitted papers is PDF.

The reviewing process will be blind; thus papers should not include the authors’ names and affiliations or any references to web sites, project names etc. revealing the authors’ identity. Each submission will be reviewed by at least two members of the program committee. Accepted papers will be published in the workshop proceedings.

Important dates

19 December, 2008 – Deadline for paper submission
30 January, 2009 – Notification of acceptance
12 February, 2009 – Camera-ready copies due
30 or 31 March, 2009 – Computational Linguistic Aspects of Grammatical
Inference workshop held at EACL-09
(exact date to be announced)

Programme Committee

Srinivas Bangalore, AT&T Labs-Research, USA Leonor Becerra-Bonache, Yale University, USA
Rens Bod, University of Amsterdam, The Netherlands
Antal van den Bosch, Tilburg University, The Netherlands
Alexander Clark, Royal Holloway, University of London, UK
Walter Daelemans, University of Antwerp, Belgium
Shimon Edelman, Cornell University, USA
Jeroen Geertzen, University of Cambridge, UK
Jeffrey Heinz, University of Delaware, USA
Alfons Juan, Universidad Politecnica de Valencia, Spain

Frantisek Mraz, Charles University, Czech Republic
Khalil Sima’an, University of Amsterdam, The Netherlands
Richard Sproat, University of Illinois at Urbana-Champaign, USA
Willem Zuidema, University of Amsterdam, The Netherlands

Others to be confirmed

Organizing Committee

Menno van Zaanen, Tilburg University, The Netherlands
Colin de la Higuera, Université de Saint-Etienne, France


Menno van Zaanen
Department of Communication and Information Sciences Tilburg University
The Netherlands
mvzaanen (at)

Workshop website

The Great Cosmic Challenge

Today cosmologists are challenging the world to solve a compelling statistical problem, to bring us closer to understanding the nature of dark matter and energy which makes up 95 per cent of the ‘missing’ universe. The GRavitational lEnsing Accuracy Testing 2008 (GREAT08) PASCAL Challenge is being set by 38 scientists across 19 international institutions, with the aim of enticing other researchers to crack it by 30 April 2009.

“The GREAT08 PASCAL Challenge will help us answer the biggest question in cosmology today: what is the dark energy that seems to make up most of the universe? We realised that solving our image processing problem doesn’t require knowledge of astronomy, so we’re reaching out to attract novel approaches from other disciplines,” says Dr Sarah Bridle, UCL Physics and Astronomy, who is leading the challenge alongside Professor John Shawe-Taylor, Director of the UCL Centre for Computational Statistics and Machine Learning.

Twenty per cent of our universe seems to be made of dark matter, an unknown substance that is fundamentally different to the material making up our known world. Seventy-five per cent of the universe appears to be made of a completely mysterious substance dubbed dark energy. One possible explanation for these surprising observations is that Einstein’s law of gravity is wrong.

The method with the greatest potential to discover the nature of dark energy is gravitational lensing, in which the shapes of distant galaxies are distorted by the gravity of the intervening dark matter. “Streetlamps appear distorted by the glass in your bathroom window and you could use the distortions to learn about the varying thickness of the glass. In the same way, we can learn about the distribution of the dark matter by looking at the shapes of distant galaxies,” says Dr. Sarah Bridle. The observed galaxy images appear distorted and their shapes must be precisely disentangled from observational effects of sampling, convolution and noise. The problem being set, to measure these image distortions, involves image analysis and is ideally matched to experts in statistical inference, inverse problems and computational learning, amongst other scientific fields.

Cosmologists are gearing up for an exciting few years interpreting the results of new experiments designed to uncover the nature of dark energy, including the ground-based Dark Energy Survey (DES) in Chile and Pan-STARRS in Hawaii, and space missions by the European Space Agency (Euclid) and by NASA and the US Department of Energy (JDEM). Methods developed to solve the GREAT08 Challenge will help the analysis of this new data.

The GREAT08 Challenge contains 200 GB of simulated images, containing 30 million galaxy images. For the main competition, participants are asked to extract 5400 numbers from 170 GB of data. The competition can be accessed via the website

The GREAT08 Challenge Handbook will shortly be published in the journal Annals of Applied Statistics (AOAS).

Further Information available at