The popularity of AI audio applications is growing, it is used in chatbots, automated voice translation, virtual assistants, and text-to-speech translation. Audio classification is crucial in today’s world with a growing need to sort and classify millions of existing audio data with increasing amounts of new data uploaded over time. In the area of classification lies the difficult and lucrative problem of music recommendation. Research in music recommendation has trended over time towards collaborative-based approaches utilizing large amounts of user data. These approaches tend to deal with the cold-start problem of insufficient data and are costly to train. We look to recent advances in music generation to develop a content-based method utilizing a joint embedding space to link text with music audio. This approach has not been previously applied to music recommendation. In this thesis, we will examine the joint embedding methods used by recent AI music generation models and introduce a music recommendation system using joint embeddings. This music recommendation system can avoid cold-start, reduce training costs for music recommendation, and serve as the foundation for a cost-efficient content-based multimedia recommendation system. The current model trained on MusicCaps recommends the correct song per tag input within the top 50%-80% of all songs about 65%-70% of the time and we expect better results after further training.
Identifer | oai:union.ndltd.org:ucf.edu/oai:stars.library.ucf.edu:hut2024-1005 |
Date | 01 January 2024 |
Creators | Tran, Tina |
Publisher | STARS |
Source Sets | University of Central Florida |
Language | English |
Detected Language | English |
Type | text |
Format | application/pdf |
Source | Honors Undergraduate Theses |
Page generated in 0.0019 seconds