With the volume of digital media available today, automatic music recommendation services have proven a useful tool for consumers, allowing them to better discover new and enjoyable music. Typically, this technology is based on collaborative filtering techniques, employing human-generated metadata to base recommendations. Recently, work in content-based recommendation systems have emerged in which the audio signal itself is analyzed for relevant musical information from which models are built that attempt to mimic human similarity judgments.
The current state-of-the-art for content-based music recommendation uses a timbre model based on MFCCs calculated on short segments of tracks. These feature vectors are then modeled using GMMs (Gaussian mixture models). GMM modeling of frame-based MFCCs has been shown to perform fairly well on timbre similarity tasks. However, a common problem is that of hubs , in which a relative small number of songs falsely appear similar to many other songs, significantly decreasing the accuracy of similarity recommendations.
In this thesis, we explore the origins of hubs in timbre-based modeling and propose several remedies. Specifically, we find that a process of model homogenization, in which certain components of a mixture model are systematically removed, improves performance as measured against several ground-truth similarity metrics. Extending the work of Aucouturier, we introduce several new methods of homogenization.
On a subset of the uspop data set, model homogenization improves artist R-precision by a maximum of 3.5% and agreement to user collection co-occurrence data by 7.4%. We also find differences in the effectiveness of the various homogenization methods for hub reduction, with the proposed methods providing the best results.
Further, we extend the modeling of frame-based MFCC features by using a kernel density estimation approach to non-parametric modeling. We find that such an approach significantly reduces the number of hubs (by 2.6% of the dataset) while improving agreement to ground-truth by 5% and slightly improving artist R-precision as compared with the standard parametric model.
Finally, to test whether these principles hold for all musical data, we introduce an entirely new data set consisting of Indian classical music. We find that our results generalize here as well, suggesting that hubness is a general feature of timbre-based similarity music modeling and that the techniques presented to improve this modeling are effective for diverse types of music.
Identifer | oai:union.ndltd.org:GATECH/oai:smartech.gatech.edu:1853/22572 |
Date | 01 April 2008 |
Creators | Godfrey, Mark Thomas |
Publisher | Georgia Institute of Technology |
Source Sets | Georgia Tech Electronic Thesis and Dissertation Archive |
Detected Language | English |
Type | Thesis |
Page generated in 0.0022 seconds