Live Session
Session 12: Evaluation
Reproducibility
Everyone's a Winner! On Hyperparameter Tuning of Recommendation Models
Faisal Shehzad (University of Klagenfurt) and Dietmar Jannach (University of Klagenfurt)
Abstract
The performance of a recommender system algorithm in terms of common offline accuracy measures often strongly depends on the chosen hyperparameters. Therefore, when comparing algorithms in offline experiments, we can obtain reliable insights regarding the effectiveness of a newly proposed algorithm only if we compare it to a number of state-of-the-art baselines that are carefully tuned for each of the considered datasets. While this fundamental principle of any area of applied machine learning is undisputed, we find that the tuning process for the baselines in the current literature is barely documented in much of today’s published research. Ultimately, in case the baselines are actually not carefully tuned, progress may remain unclear. In this paper, we showcase how every method in such an unsound comparison can be reported to be outperforming the state-of-the-art. Finally, we iterate appropriate research practices to avoid unreliable algorithm comparisons in the future.