1 |
Art to Genre through Deep Learning: A Comparative Analysis of ResNet and EfficientNet for Album Cover Image-Based Music ClassificationBernsdorff Wallstedt, Simon January 2024 (has links)
Musical genres enable listeners to differentiate between diverse styles and forms of music, serving as a practical tool to organize and categorize artists, albums, and songs. Album covers, featuring graphic depictions that reflect the vibe and tone of the music, serve as a visual intermediary between the artist and the audience. While numerous machine learning techniques leverage textual, visual, and audio information in a multi-modal approach to categorize music, the sole focus on visual aspects, specifically album cover images, and their correlation with musical genres has been less explored. The question guides this research: How do EfficientNet and ResNet compare in their ability to accurately classify album cover images into specific genres based solely on visual features? Two state-of-the-art convolutional neural networks, ResNet and EfficientNet, are employed to classify a newly created dataset (the EquiGen dataset) of 60,000 album cover images into 15 distinct genres. The dataset was divided into 70% for training, 15% for validation, and 15% for testing.The findings reveal that both ResNet and EfficientNet achieve better-than-random classification accuracy, indicating that visual features alone can be informative for genre classification. Some genres performed much better than others, namely Metal, New Age and Rap. EfficientNet demonstrated slightly superior performance compared to ResNet, with higher accuracy, precision, recall, and F1 scores. However, both models exhibited challenges in generalizing well-to-unseen data and showed signs of overfitting.This study contributes to the interdisciplinary research on Music Genre Categorization (MGC), machine learning, and music.
|
Page generated in 0.1316 seconds