Live Session
Friday Posters
Main Track
Can ChatGPT Make Fair Recommendation? A Fairness Evaluation Benchmark for Recommendation with Large Language Model
Jizhi Zhang (University of Science and Technology of China), Keqin Bao (University of Science and Technology of China), Yang Zhang (University of Science and Technology of China), Wenjie Wang (National University of Singapore), Fuli Feng (University of Science and Technology of China) and Xiangnan He (University of Science and Technology of China).
Abstract
The resounding triumph of the Large Language Models (LLMs) has ushered in a novel LLM for recommendation (LLM4rec) paradigm. Notwithstanding, the capacity of LLM4rec to provide equitable recommendations remains uncharted due to the potential presence of societal prejudices in LLMs. In order to avert the plausible hazard of employing LLM4rec, we scrutinize the fairness of LLM4rec with respect to the users’ sensitive attributes. Owing to the disparity between LLM4rec and the conventional recommendation paradigm, there are challenges in utilizing the conventional recommendation fairness benchmark directly. To explore the fairness of recommendations under the LLM4rec, we propose a new benchmark Fairness in Large language models for Recommendation (FairLR), which consists of carefully designed metrics and a dataset that considers eight sensitive attributes in two recommendation scenarios: music and movie. We utilize our FairLR benchmark to examine ChatGPT and expose that it still demonstrates bias towards certain sensitive attributes while making recommendations. Our code and dataset can be found at https://anonymous.4open.science/r/FairLR-751D/.