ICGI 2016

Borja Balle

Borja Balle is currently a Lecturer in Data Science at Lancaster University. He received his PhD from Universitat Politècnica de Catalunya in 2013 and then spent two years as a postdoctoral fellow at McGill University.

His research focuses on the design and analysis of machine learning algorithms for structured data like sequences, trees, and graphs. On the theoretical side, his goal is to advance our understanding of the foundations of data science by identifying essential trade-offs between statistical and computational efficiency.

Besides machine learning, Borja also conducts research in related areas like automata theory, streaming algorithms, and data privacy. Practical applications of his research include efficient algorithms for solving large-scale problems in natural language processing and reinforcement learning.

The problem of learning automata or weighted automata dates back to the origins of computer science. This talk presents a series of recent fundamental data-dependent learning guarantees for this problem based on the notion of Rademacher complexity. These learning theory results can guide the design of new algorithms benefitting from a strong justification.

Hendrik Blockeel

Hendrik Blockeel is a professor at the Katholieke Universiteit Leuven (Belgium), and part-time associate professor at the University of Leiden (The Netherlands). His research interests include theory and algorithms for machine learning and data mining in general, with a particular focus on relational learning, graph mining, probabilistic logics, inductive knowledge bases, and applications of these techniques in the broader field of computer science, bio-informatics, and medical informatics.

Prof. Blockeel's main research results include an efficient and versatile relational decision tree learning tool, that has been used in many relational learning applications, a framework for symbolic machine learning that generalized decision tree and rule learning, and experiment databases for machine learning.

Language learning has been studied for decades. For a long time, the focus was on learning the grammatical structure of a language from sentences, or learning the semantics of sentences from examples of sentence/meaning pairs. More recently, there has been increasing interest in grounded language learning, where the language is learned by observing sentences used in a particular context, and trying to link elements of these sentences to elements of the context.

This talk is about an approach called relational grounded language learning. In this approach, the semantics of a sentence is a relational structure, and this structure is learned from sentence/context pairs in which the context is represented in a relational format. Once a model of the link between sentences and semantic structures is in place, it can be used for a variety of purposes: generating sentences describing a given scene, identifying the elements in a scene that a sentence refers to, translating a sentence from one language to another through its semantic representation, and more. The potential of this approach for all these uses has been demonstrated on some simple problems. Although the approach is clearly still in its infancy, we believe it has much potential in terms of helping us understand how humans learn their first language, as well as improving natural language processing technology.

Valentin Spitkovsky

Valentin Spitkovsky completed a doctoral dissertation in Computational Linguistics at Stanford's Artificial Intelligence Laboratory in 2014. His focus has been on unsupervised parsing and grammar induction.

Since then, he has been doing research at Google on Natural Language Processing, Data Mining and Modeling, Machine Intelligence, and Information Retrieval.

Unsupervised learning of hierarchical syntactic structure from free-form natural language text is an important and difficult problem, with implications for scientific goals, such as understanding human language acquisition, or engineering applications, including question answering, machine translation and speech recognition. As is the case with many unsupervised settings in machine learning, grammar induction usually reduces to a non-convex optimization problem. In the first part of the talk, I will review a collection of search heuristics to make expectation-maximization algorithms less sensitive to local optima. However, a deeper challenge of unsupervised learning is that even the locations of global optima of the non-convex objectives, the intrinsic metrics being optimized, are often at best loosely correlated with extrinsic metrics, such as accuracies with respect to reference parse trees. The second part of the talk will cover a suite of constraints on possible valid parses, which can be derived from unparsed surface text forms, to address this concern by helping guide language learners towards linguistically more plausible syntactic constructions.

These results will be used, in the final parts of the talk, to define a family of dependency-and-boundary parsing models and a curriculum learning strategy, co-designed to effectively induce dependency grammars in an unsupervised fashion. The models are parameterized to exploit, as much as possible, observable state, such as words at sentence boundaries, which limits the proliferation of local maxima that is ordinarily caused by presence of latent variables. The training strategy is then to split incoming data into simpler text fragments, in a way that is consistent with the parsing constraints, thus facilitating bootstrapping by increasing numbers of visible edges. Using fragment length as a proxy, the optimization strategy gradually exposes learners to more complex data, starting from just single-word fragments, whose unique parses correspond to a globally optimal initial solution. This grammar induction pipeline attains state-of-the-art accuracies against standard evaluation sets that span a total of nineteen natural languages from several disparate families.

Experimental results presented in this talk strongly suggest that complex learning tasks like grammar induction can be coaxed to more reliably escape local optima and to discover substantially more correct substructures by pursuing learning strategies that begin with simple data and basic models and carefully progress to more complex data instances and expressive model parameterizations.

	Author(s)	Title	Type
08:30	Coffee and registration
09:00	Denis Arrivault, Dominique Benielli, François Denis and Rémi Eyraud	Sp2Learn: A Toolbox for the spectral learning of weighted automata	Regular
09:30	Kristina Strother-Garcia, Jeffrey Heinz and Hyun Jin Hwangbo	Using model theory for grammatical inference: a case study from phonology	Regular
10:00	Olgierd Unold, Łukasz Culer and Agnieszka Kaczmarek	Visualizing context-free grammar induction with grammar-based classifier system	Work-in-progress
10:15	Amos Yeo, John Howroyd, Mark Bishop	Using grammar inference for classification	Work-in-progress
10:30	Coffee
11:00	Hendrik Blockeel	Relational grounded language learning	Keynote
12:30	Lunch
14:30	Roland Groz and Catherine Oriat	Inferring Non-resettable Mealy machines with n States	Regular
15:00	Adrien Boiret, Aurélien Lemay and Joachim Niehren	Learning Top-Down Tree Transducers with Regular Domain Inspection	Regular
15:30	Witold Dyrka, Francois Coste, Olgierd Unold, Lukasz Culer, Agnieszka Kaczmarek	How to Measure the Topological Quality of Protein Grammars?	Work-in-progress
15:45	Tobias Endres	Determining Syntactical Invariants Within the DOM-Structure of Ajax Applications through XML Grammar Inference	Work-in-progress
16:00	Coffee
16:30	Adrian-Horia Dediu, Joana M. Matos and Claudio Moraga	Query Learning Automata with Helpful Labels	Regular
17:00	Gaetano Pellegrino, Christian Albert Hammerschmidt and Sicco Verwer	Learning Deterministic Finite Automata from Infinite Alphabets	Regular
17:30	Michael Siebers	Inferring Languages of Multi-Dimensional Observations	Work-in-progress
17:45	End of day 1

	Author(s)	Title	Type
08:30	Coffee
09:00	Annie Foret and Denis Bechet	Simple K-star Categorial Dependency Grammars and their Inference	Regular
09:30	Shouhei Fukunaga, Yoshimasa Takabatake, Tomohiro I and Hiroshi Sakamoto	Online Grammar Compression for Frequent Pattern Discovery	Regular
10:00	Qin Lin and Sicco Verwer	Probabilistic Model Learning from Noisy Data	Work-in-progress
10:15	Chihiro Shibata, Ryo Yoshinaka	Towards Learning Generalized Residual Finite State Automata	Work-in-progress
10:30	Coffee
11:00	Borja Balle	Theoretical Guarantees for Learning Weighted Automata	Keynote
12:30	Lunch
14:30	Social event and conference dinner

	Author(s)	Title	Type
08:30	Coffee
09:00	Payam Siyari and Matthias Gallé	The Generalized Smallest Grammar Problem	Regular
09:30	Alexander Clark	Testing Distributional Properties of Context-Free Grammars	Regular
10:00	Francois Coste, Mikail Demirdelen	A Refined Parsing Graph Approach to Learn Smaller Contextually Substitutable Grammars With Less Data	Work-in-progress
10:15	Tomoko Ochi, Ryo Yoshinaka and Akihiro Yamamoto	Polynomial time inference of generalization of non-cross pattern languages to term tree languages	Work-in-progress
10:30	Coffee
11:00	Valentin Spitkovsky	Grammar Induction and Parsing with Dependency-and-Boundary Models	Keynote
12:30	Lunch
14:30	SPiCe workshop
16:00	Coffee
16:30	SPiCe workshop
18:00	Closing statement

The 13th International Conference on Grammatical Inference

Scope and objective

Program chairs

Program committee

Call for Papers

Important Dates

Call for Work in Progress

Important Dates

Accepted papers

Keynote speakers

Borja Balle

Balle's keynote: Theoretical Guarantees for Learning Weighted Automata

Hendrik Blockeel

Blockeel's keynote: Relational grounded language learning

Valentin Spitkovsky

Spitkovsky's keynote: Grammar Induction and Parsing with Dependency-and-Boundary Models

Travel, Venue and Hotel Information

Conference Venue

Hotels

Visum

Register

Program

Wednesday, October 5 2016

Thursday, October 6 2016

Friday, October 7 2016