Return to search

Predicting Music Genre Preferences Based on Online Comments

Communication Accommodation Theory (CAT) states that individuals adapt to each other’s communicative behaviors. This adaptation is called “convergence.” In this work we explore the convergence of writing styles of users of the online music distribution plat- form SoundCloud.com. In order to evaluate our system we created a corpus of over 38,000 comments retrieved from SoundCloud in April 2014. The corpus represents comments from 8 distinct musical genres: Classical, Electronic, Hip Hop, Jazz, Country, Metal, Folk, and World. Our corpus contains: short comments, frequent misspellings, little sentence struc- ture, hashtags, emoticons, and URLs. We adapt techniques used by researchers analyzing other short web-text corpora in order to deal with these problems. We use a supervised machine learning approach to classify the genre of comments in our corpus. We examine the effects of different feature sets and supervised machine learning algorithms on classification accuracy. In total we ran 180 experiments in which we varied: number of genres, feature set composition, and machine learning algorithm. In experiments with all 8 genres we achieve up to 40% accuracy using either a Naive Bayes classifier or C4.5 based classifier with a feature set consisting of 1262 token unigrams and bigrams. This represents a 3 time improvement over chance levels.

Identiferoai:union.ndltd.org:CALPOLY/oai:digitalcommons.calpoly.edu:theses-2333
Date01 June 2014
CreatorsSinclair, Andrew J
PublisherDigitalCommons@CalPoly
Source SetsCalifornia Polytechnic State University
Detected LanguageEnglish
Typetext
Formatapplication/pdf
SourceMaster's Theses

Page generated in 0.0022 seconds