Benchmark

The primary scope of this dataset is to support the development of movie recommender systems, and this is the first large-scale dataset in the recommender systems community that provides all types of precomputed content-based descriptors in conjunction with metadata in numerical feature format. Part of these data is used in the MediaEval 2018 (task name: “Recommending Movies Using Content: Which content is key?”)

Through our experiments we aim to provide some baseline results to further help researchers use this dataset and compare their results with other papers and experiments. We performed extensive experiments to identify the best performing descriptor in unimodal and multimodal settings where for the latter we used the late fusion scheme based on Borda count using a proposed linear weighing scheme which showed to significantly improve the performance of the hybrid approach.

The competing descriptors are: BLF and i-vector features for audio, AVF and AlexNet Deep features for visual and genre label together with user-generated tags for metadata (baseline). All the experiments were carried out on a selection of ML-20m rating dataset by random selection of 3000 users each having minimum of 50 ratings associated to the consumption profile. The results are reported based on the average performance in a 5-fold cross validation setup.

Table: The best performing descriptor or combination of descriptors with respect to mean reciprocal rank (MRR), mean average precision (MAP), and recall (R) at two cutoff values (@4 and @10)
MRR@4 MAP@4 R@4 MRR@10 MAP@10 R@10
Best unimodal value 0.0233 0.0060 0.0052 0.0311 0.0042 0.0120
Best unimodal feature i-vec i-vec i-vec i-vec i-vec i-vec
Best multimodal value 0.0266 0.0072 0.0059 0.0359 0.0049 0.0139
Best multimodal feature i-vec+tag i-vec+tag i-vec+tag i-vec+tag i-vec+tag i-vec+tag