Live Session
Hall 406 D
Paper
21 Sep
 
16:05
SGT
Session 12: Evaluation
Add Session to Calendar 2023-09-21 04:05 pm 2023-09-21 05:25 pm Asia/Singapore Session 12: Evaluation Session 12: Evaluation is taking place on the RecSys Hub. Https://recsyshub.org
Research

What We Evaluate When We Evaluate Recommender Systems: Understanding Recommender Systems’ Performance using Item Response Theory

View on ACM Digital Library

Yang Liu (University of Helsinki), Alan Medlar (University of Helsinki) and Dorota Glowacka (University of Helsinki)

View Paper PDFView Poster
Abstract

Current practices in offline evaluation use rank-based metrics to measure the quality of recommendation lists. This approach has practical benefits as it centers assessment on the output of the recommender system and, therefore, measures performance from the perspective of end-users. However, this methodology neglects how recommender systems more broadly model user preferences, which is not captured by only considering the top-n recommendations. In this article, we use item response theory (IRT), a family of latent variable models used in psychometric assessment, to gain a comprehensive understanding of offline evaluation. We used IRT to jointly estimate the latent abilities of 51 recommendation algorithms and the characteristics of 3 commonly used benchmark data sets. For all data sets, the latent abilities estimated by IRT suggest that higher scores from traditional rank-based metrics do not reflect improvements in modeling user preferences. Furthermore, we show the top-n recommendations with the most discriminatory power are biased towards lower difficulty items, leaving much room for improvement. Lastly, we highlight the role of popularity in evaluation by investigating how user engagement and item popularity influence recommendation difficulty.

Join the Conversation

Head to Slido and select the paper's assigned session to join the live discussion.

Conference Agenda

View Full Agenda →
No items found.