Note: Due to a conflict of interest with some nominated papers, the Award Chairs were not involved in the selection of the award-winning research track papers.
Best (Student) Machine Learning Paper Award
Sponsored by the Springer Machine Learning journal
Robust Domain Adaptation: Representations, Weights and Inductive Bias
Authors: Victor Bouvier; Philippe Very; Clément Chastagnol; Myriam Tami; Céline Hudelot
Abstract: Unsupervised Domain Adaptation (UDA) has attracted a lot of attention the past ten years. The emergence of Domain Invariant Representations (IR) has improved drastically the transferability of representations from a labelled source domain to a new and unlabelled target domain. However, a potential pitfall of this approach, e.g. the presence of label shift, has been brought to light. Some works address this issue with a relaxed version of domain invariance obtained by weighting samples, a strategy often referred to as Importance Sampling. From our point of view, the theoretical aspects of how Importance Sampling and Invariant Representations interact in UDA have not been studied in depth. In the present work, we present a bound of the target risk which incorporates both weights and invariant representations. Our theoretical analysis highlights the role of inductive bias in aligning distributions across domains. We illustrate it on standard benchmarks by proposing a new learning procedure for UDA. We observed empirically that weak inductive bias makes adaptation more robust. The elaboration of stronger inductive bias is a promising direction for new UDA algorithms.
Best Student Data Mining Paper Award
Sponsored by the Springer Data Mining and Knowledge Discovery journal
SpecGreedy: Unified Dense Subgraph Detection
Authors: Wenjie Feng, Shenghua Liu, Danai Koutra, Huawei Shen, Xueqi Cheng
Abstract: How can we effectively detect fake reviews or fraudulent connections in a website? How can we spot communities that suddenly appear or coordinated group behaviors based on users’ interaction? And how can we efficiently find the minimum cut for a large graph? All these problems are related to finding dense subgraph patterns, which is an important primitive problem in graph data analysis and has extensive applications across various domains. We focus on formulating the problem of detecting the densest subgraph in real-world large graphs, and we theoretically compare and contrast several closely related problems. Moreover, we propose a unified framework for the densest subgraph detection problem (GENDS), and devise a simple and computationally efficient algorithm, SPECGREEDY, to solve it by leveraging graph spectral properties in a greedy approach. We conduct thorough experiments on 40 real-world net-works with up to 1.47 billion edges from various domains, and demonstrate that our algorithm yields up to58.6×speedup, and achieves better or equal-quality solutions for the densest subgraph problem than the baseline. Moreover, SPEC-GREEDYscales linearly with the graph size, and proves effective in application such as finding sudden new collaborations in a large, time-evolving co-authorship network.
Best Data Mining Paper Award
Revisiting Wedge Sampling for Budgeted Maximum Inner Product Search
Authors: Stephan Sloth Lorenzen; Ninh Pham
Abstract: Top-k maximum inner product search (MIPS) is a central taskin many machine learning applications. This work extends top-k MIPSwith a budgeted setting, that asks for the best approximate top-k MIPSgiven a limited budget of computational operations. We investigate recent advanced sampling algorithms, including wedge and diamond sampling,to solve budgeted top-k MIPS. Our contribution is twofold. First, we show that diamond sampling is essentially a combination of wedge sampling and basic sampling for top-k MIPS. Our theoretical analysis and empirical evaluation show that wedge is competitive (often superior) to diamond on approximating top-k MIPS regarding both efficiency and accuracy. Second,we propose dWedge, a very simple deterministic variant of wedge sampling for budgeted top-k MIPS. Empirically, dWedge provides significantly higher accuracy than other competitive budgeted top-k MIPS solverswhile maintaining a similar speedup. Especially, dWedge returns the top-10 MIPS with at least 90% accuracy with the speedup between 20xand 180x compared to the brute-force search on our large-scale data sets.
Best Applied Data Science Paper Award
Learning to Simulate on Sparse Trajectory Data
Authors: Hua Wei; Chacha Chen; Chang Liu; Guanjie Zheng; Zhenhui Li
Abstract: Simulation of the real-world traffic can be used to help validate the transportation policies. A good simulator means the simulated traffic is similar to real-world traffic, which often requires dense traffic trajectories (i.e., with high sampling rate) to cover dynamic situations in the real world.However, in most cases, the real-world trajectories are sparse, which makes simulation challenging. In this paper, we present a novel framework, ImIn-GAIL, to address the problem of learning to simulate the driving behavior from sparse real-world data. The proposed architecture incorporates data interpolation with the behavior learning process of imitation learning. To the best of our knowledge, we are the first to tackle the data sparsity issue for behavior learning problems. We investigate our framework on both synthetic and real-world trajectory datasets of driving vehicles, showing that our method outperforms various baselines and state-of-the-art methods.
Best Student Machine Learning Paper Award
A Principle of Least Action for the Training of Neural Networks
Authors: Skander Karkar, Ibrahim Ayed, Emmanuel de Bézenac, Patrick Gallinari
Abstract: Neural networks have been achieving high generalization performance on many tasks despite being highly over-parameterized. Since classical statistical learning theory struggles to explain this behaviour, much effort has recently been focused on uncovering the mechanisms behind it, in the hope of developing a more adequate theoretical framework and having a better control over the trained models. In this work, we adopt an alternative perspective, viewing the neural network as a dynamical system displacing input particles over time. We conduct a series of experiments and, by analyzing the network’s behaviour through its displacements, we show the presence of a low kinetic energy bias in the transport map of the network, and link this bias with generalization performance. From this observation, we reformulate the learning problem as follows: find neural networks that solve the task while transporting the data as efficiently as possible. This offers a novel formulation of the learning problem which allows us to provide regularity results for the solution network, based on Optimal Transport theory. From a practical viewpoint, this allows us to propose a new learning algorithm, which automatically adapts to the complexity of the task, and leads to networks with a high generalization ability even in low data regimes.
Best Applied Data Science Paper Runner-Up
Learning a Contextual and Topological Representation of Areas-of-Interest for On-Demand Delivery Application
Authors: Mingxuan Yue, Tianshu Sun, Fan Wu, Lixia Wu, Yinghui Xu, Cyrus Shahabi
Abstract: A good representation of urban areas is of great importance in on-demand delivery services such as for ETA prediction. However, the existing representations learn either from sparse check-in histories or topological geometries, thus are either lacking coverage and violating the geographical law or ignoring contextual information from data. In this paper, we propose a novel representation learning framework for getting a unified representation of Area of Interest from both contextual data (trajectories) and topological data (graphs). The framework first encodes trajectories and graphs into homogeneous views, and then train a multi-view autoencoder to learn the representation of areas using a dynamic weighting strategy. Experiments with real-world package delivery data on ETA prediction confirm the effectiveness of the model.
Test of Time Award
Three naive Bayes approaches for discrimination-free classification
Part of ECML-PKDD 2010 and published at https://link.springer.com/article/10.1007/s10618-010-0190-x
Authors: Toon Calders and Sicco Verwer
Abstract: In this paper, we investigate how to modify the naive Bayes classifier in order to perform classification that is restricted to be independent with respect to a given sensitive attribute. Such independency restrictions occur naturally when the decision process leading to the labels in the data-set was biased; e.g., due to gender or racial discrimination. This setting is motivated by many cases in which there exist laws that disallow a decision that is partly based on discrimination. Naive application of machine learning techniques would result in huge fines for companies. We present three approaches for making the naive Bayes classifier discrimination-free: (i) modifying the probability of the decision being positive, (ii) training one model for every sensitive attribute value and balancing them, and (iii) adding a latent variable to the Bayesian model that represents the unbiased label and optimizing the model parameters for likelihood using expectation maximization. We present experiments for the three approaches on both artificial and real-life data.
Praise from the Awards Chairs: A highly-influential, visionary paper on discrimination-free or fair classification, ahead of the current trend on fairness, accountability and transparency in AI that is transforming the algorithmic decision-making process in numerous areas, including law, education, banking, health, and more.
Journal Track Reviewer Awards
- Esther Galbrun
- Georgiana Ifrim
- Claudio Lucchese
- Matteo Riondato
- Nikolaj Tatti
- Matthijs van Leeuwen
- Celine Vens
- Albrecht Zimmermann
Conference Engagement Awards
Engagement is the main challenge of a virtual conference. We’re all surrounded by every-day distractions, and it is difficult to make time for the conference. As a result, fewer attendees tend to actively participate, which makes it less rewarding to be engaged, etc. We want to break this vicious circle by incentivizing engagement!
To do this, each day, as well as at the end for the full week, 4 Engagement Awards will be announced (i.e. 24 Awards in total!):
- The Community Engagement Award: The top of the whova leaderboard around 8:30 CEST.
- The Organizers’ Engagement Award: all organizers can nominate attendees for engagement.
- The Chairs’ and Volunteers’ Engagement Award: all session chairs can nominate attendees who were particularly engaged during their session (could be speakers in the session or other attendees).
- The Nominator Award: Among all chairs and volunteers who nominated an attendee for the Chairs’ and Volunteers’ Engagement Award, one person will be randomly drawn to receive a prize. Of course, self-nominations are not allowed.
What the prizes are will be announced later, they will give you a taste of Ghent…
(Note that in case of abuse or misuse, the above may be amended or discontinued at the organizers’ sole discretion. Only one prize will be awarded per attendee throughout the week.)