Massive Data Analysis

On the campus of

ESIEE Paris, Ecole des Ponts ParisTech

and Université Paris-Est Marne-la-Vallée.

Cité Descartes, June 12th 2014

At ESIEE in Amphi Marcel Dassault

9h15

Arrival of attendees

9h30-9h40

Organisers

Labex Bézout

Foreword

10h30-11h00

coffee break

11h00-11h35

Criteo

Inside Criteo's Engine: Learning and predicting at scale

Criteo's goal is to display ads to the right users at the right price. To achieve that goal, we bid more than 15 billion times per day with peaks to 500,000 times per second, each bid taking less than 5ms. Once a bid is won, we must recommend the appropriate products in less than 50ms. Choosing the bid an the appropriate products is based on the predicted probability that the user will click on the ad. This talk will present the challenges of building a large-scale, extremely fast prediction system with both offline and online constraints.

11h35-12h10

Università di Genova / MIT

Early stopping regularization

Dealing with big data has emphasized the importance for machine learning algorithms to be predictive but at the same time efficient and scalable. Investigating the interplay between statistics and computations has indeed emerged as a new exciting venue for both theoretical and practical studies. A key challenge in this context is the problem of performing provably efficient and adaptive model selection (determining the best model complexity for the data at hand). Early stopping is one of the most appealing heuristics for this problem, since the computational resources required for learning are directly linked to the desired prediction properties. Regularization is achieved by iteratively passing over the data multiple times and choosing the number of passes (epochs) providing the best prediction performance. Despite being widely used in practice, the theoretical properties of early stopping have long been poorly understood. In this talk we will discuss the connection between learning, stability and ill-posedeness to illustrate recent results that provide the theoretical foundation of early stopping for several iterative learning schemes, including batch and online (incremental) approaches.

12h10-12h45

Université de Technologie de Compiègne

Learning Structure-Preserving Vector Representations of Objects from Open-Domain Relational Data

Knowledge bases such as FreeBase or DBPedia encode knowledge or facts about entities in the form of labeled relations between them, such as (Paris, Capital of, France). These are examples of open-domain relational data of unprecedented size, with tens of millions of entities, tens of thousands of possible relationships and billions of facts, which are constantly growing along all their dimensions. Such ressources offer new opportunities for long-term AI goals such as open-domain information extraction from text or question answering. In this talk, I will present algorithms that extract low-dimensional vector representations of entities from the multi-graph structure, in such a way that links between entities can be recovered from these vectors. These representations allow to perform efficient prediction based on the data, without considering its initial discrete structure. I will describe applications of these methods to link prediction and open-domain information extraction, and discuss the perspectives of this work and some open issues.

12h45-14h10

déjeuner

At ESIEE in Amphi Marcel Dassault

14h10-14h45

LIGM, Université Paris Est Marne-la-Vallée

Proximal methods: tools
for solving inverse problems on a large scale

In a wide range of application fields (inverse problems, machine learning, computer vision, data analysis, networking, etc), very large scale optimization problems need to be solved. The goal of this talk is to provide a comprehensive view of a class of optimization algorithms which have been very successful in this areas over the last years: proximal splitting algorithms. These tools allow us to provide a unifying answer to a number of classical dilemma arising when solving optimization problems: constrained vs unconstrained approaches, smooth vs nonsmooth formulations, parallel vs sequential programming,... The presentation will be mainly focused on new developments in this area aiming at increasing the convergence speed of these algorithms and making them easily implementable on multicore or distributed computing architectures. We will see that the Majorization-Minimization principle is at the core of many recent advances. Illustrations of these methods on various reconstruction and restoration examples where massive datasets have to be processed will be provided.

14h45-15h20

INRIA / ENS

Beyond stochastic gradient descent for large-scale machine learning

Many machine learning and signal processing problems are traditionally cast as convex optimization problems. A common difficulty in solving these problems is the size of the data, where there are many observations ("large n") and each of these is large ("large p"). In this setting, online algorithms such as stochastic gradient descent which pass over the data only once, are usually preferred over batch algorithms, which require multiple passes over the data. Given n observations/iterations, the optimal convergence rates of these algorithms are O(1/sqrt(n)) for general convex functions and reaches O(1/n) for strongly-convex functions. In this talk, I will show how the smoothness of loss functions may be used to design novel algorithms with improved behavior, both in theory and practice: in the ideal infinite-data setting, an efficient novel Newton-based stochastic approximation algorithm leads to a convergence rate of O(1/n) without strong convexity assumptions, while in the practical finite-data setting, an appropriate combination of batch and online algorithms leads to unexpected behaviors, such as a linear convergence rate for strongly convex problems, with an iteration cost similar to stochastic gradient descent. (joint work with Nicolas Le Roux, Eric Moulines and Mark Schmidt)

15h20-15h50

coffee break

15h50-16h25

Amazon

How Amazon leverages massive data sets in retail, web services and devices

16h25-17h00

Télécom ParisTech

Distributed optimization algorithms for multi-agent systems

We consider a network of N agents (computing units) having private objective functions and seeking to find a minimum of the aggregate objective. Instances of this problem arise e.g. in machine learning applications where massive data sets are distributed over a network and processed by distinct machines. The aim is to design iterative algorithms where, at a each iteration, an agent updates a local estimate of the minimizer based on the sole knowledge of its private function and the information received from its neighbors. After giving an overview of standard distributed optimization methods, we introduce a primal-dual algorithm derived from recent works by Vu and Condat. The algorithm leads itself to a distributed implementation. Next, we provide an asynchronous version of the algorithm relying on the idea of stochastic coordinate descent.

17h00-18h30

cocktail

Laurent Najman, A3SI team, LIGM, ESIEE Paris.

Guillaume Obozinski, Imagine group, A3SI team, LIGM, Ecole des Ponts - ParisTech.

The **laboratoire d'excellence Bézout**
focusses on the extremely active research areas at the interface between mathematics and computer science.
It includes all the members of three laboratories of the PRES UPE:

- in computer science,
- the Laboratoire d'Informatique Gaspard-Monge (LIGM, unité mixte CNRS, École des Ponts ParisTech, ESIEE Paris, UPEMLV)

- in mathematics,
- the Centre d'Enseignement et de Recherche en Mathématiques et Calcul Scientifique (CERMICS, École des Ponts ParisTech)
- the Laboratoire d'Analyse et de Mathématiques Appliquées (LAMA, unité mixte CNRS, UPEC, UPEMLV)

These laboratories are located in Marne-La-Vallée and Créteil.