News (2007-08-11): The workshop book Large Scale Kernel Machines, edited by Léon Bottou, Olivier Chapelle, Dennis DeCoste, and Jason Weston, is now available from MIT Press.
Datasets with millions of
observations can be gathered by crawling the web, mining business
databases, or connecting a cheap video tuner to a laptop. Vastly more
ambitious learning systems are theoretically possible. The literature
shows no shortage of ideas for sophisticated statistical models. The
computational cost of learning algorithms is now the bottleneck.
During the last decade, dataset size has outgrown processor
speed. Meanwhile, machine learning algorithms became more principled,
and
also more computationally expensive.
The workshop investigates
computationally efficient ways to exploit such large datasets using
kernel machines. It will show how adequately designed kernel
machines can efficiently process millions of examples. It will also
debate whether kernel machines are the best way to achieve such
objectives.
With
this workshop, we hope to raise the awareness of the community
about the opportunities and challenges offered by large scale datasets.
The target audience includes people who are willing to take advantage
of the applicative opportunities offered by large datasets, as well as
people who are simply curious of the latest advances in this area.
Topics to be discussed:
- Fast
implementation of
ordinary Support Vector Machines. How to improve the optimization
algorithms and to distribute them on several computers?
- Kernel
algorithms
specifically designed for large scale datasets. For instance, online
kernel algorithms are less hungry for memory. Does this improvement
comes for free or does it increases the error rates?
- Methods for containing
the growth of the number of support vectors. Does the number of Support
Vectors always grow linearly with the number of examples, as in
ordinary Support Vector Machines?
- Comparing the relative strengths of kernel and non kernel methods on large scale datasets. Are kernel methods the best tools for such datasets?