The MVCD-7K dataset provides a stable and extensive source for devising and evaluating movie recommender systems. MVCD-7K contains audio and visual descriptors in addition to ratings and metadata for 6877 movie clips. clips corresponding to 796 unique movies. Hence, each movie is associated with 8.63 clips on average. All 796 movies are linked to the ML-20M dataset from which it is possible to obtain users’ individual ratings to movies

The dataset therefore facilitates research on content-based recommender systems, where content refers not only to metadata, but specifically to visual and auditory characteristics of movies. The data comes also with several baselines benchmarking results for uni-modal and multi-modal recommendation systems. The dataset therefore facilitates research on movie recommendation. In addition, the rich data supports the exploration of other multimedia tasks such as popularity prediction, genre classification, or auto-tagging (aka tag prediction).

The MVCD-7K dataset has been created as a joint research work by Yashar Deldjoo (Politecnico di Bari, Italy), Mihai Gabriel Constantin (University Politehnica of Bucharest, Romania), Bogdan Ionescu (University Politehnica of Bucharest, Romania), Markus Schedl (Johannes Kepler University Linz, Austria), and Paolo Cremonesi (Politecnico di Milano, Italy).

Download the dataset

This a temporary download link to the folder. This website is currently under design and will be finalized on Sep 10th, 2019.

We would like to acknowledge MovieLens here for providing a stable benchmark dataset of movies containing individual user ratings and metadata which is an enabler for doing research on movie recommendation. Please consider the MovieLens-20M web page for more details on the ratings and tags datasets.

For acknowledgments please cite one or both of the following works:

For further inquiries, feel free to contact Yashar Deldjoo through his email: .