Live Session
Thursday Posters
Main Track
Scalable Deep Q-Learning for Session-Based Slate Recommendation
Aayush Singha Roy (Insight Centre for Data Analytics, University College Dublin), Edoardo D’Amico (Insight Centre for Data Analytics, University College Dublin), Elias Tragos (Insight Centre for Data Analytics, University College Dublin), Aonghus Lawlor (Insight Centre for Data Analytics, University College Dublin) and Neil Hurley (Insight Centre for Data Analytics, University College Dublin).
Abstract
Reinforcement learning (RL) has demonstrated great potential to improve slate-based recommender systems by optimizing recommendations for long-term user engagement. To handle the combinatorial action space in slate recommendation, recent works decompose the Q-value of a slate into item-wise Q-values, using an item-wise value-based policy. However, the common case where the value function is a parameterized function taking state and action as input results in a linearly increasing number of evaluations required to select an action, proportional to the number of candidate items. While slow training may be acceptable, this becomes intractable when considering the costly evaluation of the parameterized function, such as with deep neural networks, during model serving time. To address this issue, we propose an actor-based policy that reduces the evaluation of the Q-function to a subset of items, significantly reducing inference time and enabling practical deployment in real-world industrial settings. In our empirical evaluation, we demonstrate that our proposed approach achieves equivalent user session engagement to a value-based policy, while significantly reducing the slate serving time by at least 4 times.