Spelling suggestions: "subject:"contentbased analysis"" "subject:"content.based analysis""
1 |
Auditory-based processing of communication soundsWalters, Thomas C. January 2011 (has links)
This thesis examines the possible benefits of adapting a biologically-inspired model of human auditory processing as part of a machine-hearing system. Features were generated by an auditory model, and used as input to machine learning systems to determine the content of the sound. Features were generated using the auditory image model (AIM) and were used for speech recognition and audio search. AIM comprises processing to simulate the human cochlea, and a 'strobed temporal integration' process which generates a stabilised auditory image (SAI) from the input sound. The communication sounds which are produced by humans, other animals, and many musical instruments take the form of a pulse-resonance signal: pulses excite resonances in the body, and the resonance following each pulse contains information both about the type of object producing the sound and its size. In the case of humans, vocal tract length (VTL) determines the size properties of the resonance. In the speech recognition experiments, an auditory filterbank was combined with a Gaussian fitting procedure to produce features which are invariant to changes in speaker VTL. These features were compared against standard mel-frequency cepstral coefficients (MFCCs) in a size-invariant syllable recognition task. The VTL-invariant representation was found to produce better results than MFCCs when the system was trained on syllables from simulated talkers of one range of VTLs and tested on those from simulated talkers with a different range of VTLs. The image stabilisation process of strobed temporal integration was analysed. Based on the properties of the auditory filterbank being used, theoretical constraints were placed on the properties of the dynamic thresholding function used to perform strobe detection. These constraints were used to specify a simple, yet robust, strobe detection algorithm. The syllable recognition system described above was then extended to produce features from profiles of the SAI and tested with the same syllable database as before. For clean speech, performance of the features was comparable to that of those generated from the filterbank output. However when pink noise was added to the stimuli, performance dropped more slowly as a function of signal-to-noise ratio when using the SAI-based AIM features, than when using either the filterbank-based features or the MFCCs, demonstrating the noise-robustness properties of the SAI representation. The properties of the auditory filterbank in AIM were also analysed. Three models of the cochlea were considered: the static gammatone filterbank, dynamic compressive gammachirp (dcGC) and the pole-zero filter cascade (PZFC). The dcGC and gammatone are standard filterbank models, whereas the PZFC is a filter cascade, which more accurately models signal propagation in the cochlea. However, while the architecture of the filterbanks is different, they have all been successfully fitted to psychophysical masking data from humans. The abilities of the filterbanks to measure pitch strength were assessed, using stimuli which evoke a weak pitch percept in humans, in order to ascertain whether there is any benefit in the use of the more computationally efficient PZFC.Finally, a complete sound effects search system using auditory features was constructed in collaboration with Google research. Features were computed from the SAI by sampling the SAI space with boxes of different scales. Vector quantization (VQ) was used to convert this multi-scale representation to a sparse code. The 'passive-aggressive model for image retrieval' (PAMIR) was used to learn the relationships between dictionary words and these auditory codewords. These auditory sparse codes were compared against sparse codes generated from MFCCs, and the best performance was found when using the auditory features.
|
2 |
Music recommendation and discovery in the long tailCelma Herrada, Òscar 16 February 2009 (has links)
Avui en dia, la música està esbiaixada cap al consum d'alguns artistes molt populars. Per exemple, el 2007 només l'1% de totes les cançons en format digital va representar el 80% de les vendes. De la mateixa manera, només 1.000 àlbums varen representar el 50% de totes les vendes, i el 80% de tots els àlbums venuts es varen comprar menys de 100 vegades. Es clar que hi ha una necessitat per tal d'ajudar a les persones a filtrar, descobrir, personalitzar i recomanar música, a partir de l'enorme quantitat de contingut musical disponible. Els algorismes de recomanació de música actuals intenten predir amb precisió el que els usuaris demanen escoltar. Tanmateix, molt sovint aquests algoritmes tendeixen a recomanar artistes famosos, o coneguts d'avantmà per l'usuari. Això fa que disminueixi l'eficàcia i utilitat de les recomanacions, ja que aquests algorismes es centren bàsicament en millorar la precisió de les recomanacions. És a dir, tracten de fer prediccions exactes sobre el que un usuari pugui escoltar o comprar, independentment de quant útils siguin les recomanacions generades. En aquesta tesi destaquem la importància que l'usuari valori les recomanacions rebudes. Per aquesta raó modelem la corba de popularitat dels artistes, per tal de poder recomanar música interessant i desconeguda per l'usuari. Les principals contribucions d'aquesta tesi són: (i) un nou enfocament basat en l'anàlisi de xarxes complexes i la popularitat dels productes, aplicada als sistemes de recomanació, (ii) una avaluació centrada en l'usuari, que mesura la importància i la desconeixença de les recomanacions, i (iii) dos prototips que implementen la idees derivades de la tasca teòrica. Els resultats obtinguts tenen una clara implicació per aquells sistemes de recomanació que ajuden a l'usuari a explorar i descobrir continguts que els pugui agradar. / Actualmente, el consumo de música está sesgada hacia algunos artistas muy populares. Por ejemplo, en el año 2007 sólo el 1% de todas las canciones en formato digital representaron el 80% de las ventas. De igual modo, únicamente 1.000 álbumes representaron el 50% de todas las ventas, y el 80% de todos los álbumes vendidos se compraron menos de 100 veces. Existe, pues, una necesidad de ayudar a los usuarios a filtrar, descubrir, personalizar y recomendar música a partir de la enorme cantidad de contenido musical existente. Los algoritmos de recomendación musical existentes intentan predecir con precisión lo que la gente quiere escuchar. Sin embargo, muy a menudo estos algoritmos tienden a recomendar o bien artistas famosos, o bien artistas ya conocidos de antemano por el usuario.Esto disminuye la eficacia y la utilidad de las recomendaciones, ya que estos algoritmos se centran en mejorar la precisión de las recomendaciones. Con lo cuál, tratan de predecir lo que un usuario pudiera escuchar o comprar, independientemente de lo útiles que sean las recomendaciones generadas. En este sentido, la tesis destaca la importancia de que el usuario valore las recomendaciones propuestas. Para ello, modelamos la curva de popularidad de los artistas con el fin de recomendar música interesante y, a la vez, desconocida para el usuario.Las principales contribuciones de esta tesis son: (i) un nuevo enfoque basado en el análisis de redes complejas y la popularidad de los productos, aplicada a los sistemas de recomendación,(ii) una evaluación centrada en el usuario que mide la calidad y la novedad de las recomendaciones, y (iii) dos prototipos que implementan las ideas derivadas de la labor teórica. Los resultados obtenidos tienen importantes implicaciones para los sistemas de recomendación que ayudan al usuario a explorar y descubrir contenidos que le puedan gustar. / Music consumption is biased towards a few popular artists. For instance, in 2007 only 1% of all digital tracks accounted for 80% of all sales. Similarly, 1,000 albums accounted for 50% of all album sales, and 80% of all albums sold were purchased less than 100 times. There is a need to assist people to filter, discover, personalise and recommend from the huge amount of music content available along the Long Tail.Current music recommendation algorithms try to accurately predict what people demand to listen to. However, quite often these algorithms tend to recommend popular -or well-known to the user- music, decreasing the effectiveness of the recommendations. These approaches focus on improving the accuracy of the recommendations. That is, try to make accurate predictions about what a user could listen to, or buy next, independently of how useful to the user could be the provided recommendations. In this Thesis we stress the importance of the user's perceived quality of the recommendations. We model the Long Tail curve of artist popularity to predict -potentially- interesting and unknown music, hidden in the tail of the popularity curve. Effective recommendation systems should promote novel and relevant material (non-obvious recommendations), taken primarily from the tail of a popularity distribution. The main contributions of this Thesis are: (i) a novel network-based approach for recommender systems, based on the analysis of the item (or user) similarity graph, and the popularity of the items, (ii) a user-centric evaluation that measures the user's relevance and novelty of the recommendations, and (iii) two prototype systems that implement the ideas derived from the theoretical work. Our findings have significant implications for recommender systems that assist users to explore the Long Tail, digging for content they might like.
|
3 |
基於電影拍攝手法之電影場景情緒探勘 / Emotion Discovery of Movie Content Based on Film Grammar廖家慧, Liao, Chia Hui Unknown Date (has links)
數位化的今天,電影逐漸成為人們日常生活的一部份,電影資料的內涵式分析也成為目前重要的研究主題。透過電影拍攝手法,我們知道電影視聽覺特徵與情緒之間有密不可分的關係。因此,在本研究中,我們希望利用探勘電影視聽覺特徵與情緒的關聯來達到自動判斷電影場景的情緒。
首先,先由人工標記訓練場景的情緒,之後,我們對所有的場景擷取定義的六類特徵值。特徵值包括電影場景的顏色、燈光、影片速度、特寫鏡頭、聲音和字幕六類。最後,我們利用Mixed Media Graph演算法來探勘場景情緒與特徵值之間的關聯,達到自動判斷電影場景情緒的功能。實驗結果顯示,準確率最高可達到70%。 / Movies play an important role in our life nowadays. How to analyze the emotional content of movies becomes one of the major issues. Based on film grammar, there are many audiovisual cues in movies helpful for detecting the emotions of scenes. In this research, we investigate the discovery of the relationship between audiovisual cues and emotions of scenes and the automatic emotion annotation of scenes is achieved.
First, the training scenes are labeled with the emotions manually. Second, six classes of audiovisual features are extracted from all scenes. These classes of features consist of color, light, tempo, close-up, audio, and textual. Finally, the graph-based approach, Mixed Media Graph is modified to mine the association between audiovisual features and emotions of the scenes. The experiments show that the accuracy achieves 70%.
|
4 |
Inhaltsbasierte Analyse und Segmentierung narrativer, audiovisueller Medien / Content-based Analysis and Segmentation of Narrative, Audiovisual MediaRickert, Markus 26 September 2017 (has links) (PDF)
Audiovisuelle Medien, insbesondere Filme und Fernsehsendungen entwickelten sich innerhalb der letzten einhundert Jahre zu bedeutenden Massenmedien. Große Bestände audiovisueller Medien werden heute in Datenbanken und Mediatheken verwaltet und professionellen Nutzern ebenso wie den privaten Konsumenten zur Verfügung gestellt. Eine besondere Herausforderung liegt in der Indexierung, Durchsuchung und Beschreibung der multimedialen Datenbestände.
Die Segmentierung audiovisueller Medien, als Teilgebiet der Videoanalyse, bildet die Grundlage für verschiedene Anwendungen im Bereich Multimedia-Information-Retrieval, Content-Browsing und Video-Summarization. Insbesondere die Segmentierung in semantische Handlungsanschnitte bei narrativen Medien gestaltet sich schwierig. Sie setzt ein besonderes Verständnis der filmischen Stilelemente vorraus, die im Rahmen des Schaffensprozesses genutzt wurden, um die Handlung und Narration zu unterstützten.
Die Arbeit untersucht die bekannten filmischen Stilelemente und wie sie sich im Rahmen algorithmischer Verfahren für die Analyse nutzen lassen. Es kann gezeigt werden, dass unter Verwendung eines mehrstufigen Analyse-Prozesses semantische Zusammenhänge in narrativen audiovisuellen Medien gefunden werden können, die zu einer geeigneten Sequenz-Segmentierung führen. / Audiovisual media, especially movies and TV shows, developed within the last hundred years into major mass media. Today, large stocks of audiovisual media are managed in databases and media libraries. The content is provided to professional users as well as private consumers. A particular challenge lies in the indexing, searching and description of multimedia assets.
The segmentation of audiovisual media as a branch of video analysis forms the basis for various applications in multimedia information retrieval, content browsing and video summarization. In particular, the segmentation into semantic meaningful scenes or sequences is difficult. It requires a special understanding of cinematic style elements that were used to support the narration during the creative process of film production.
This work examines the cinematic style elements and how they can be used in the context of algorithmic methods for analysis. For this purpose, an analysis framework was developed as well as a method for sequence-segmentation of films and videos. It can be shown that semantic relationships can be found in narrative audiovisual media, which lead to an appropriate sequence segmentation, by using a multi-stage analysis process, based on visual MPEG-7 descriptors.
|
5 |
Inhaltsbasierte Analyse und Segmentierung narrativer, audiovisueller MedienRickert, Markus 26 September 2017 (has links)
Audiovisuelle Medien, insbesondere Filme und Fernsehsendungen entwickelten sich innerhalb der letzten einhundert Jahre zu bedeutenden Massenmedien. Große Bestände audiovisueller Medien werden heute in Datenbanken und Mediatheken verwaltet und professionellen Nutzern ebenso wie den privaten Konsumenten zur Verfügung gestellt. Eine besondere Herausforderung liegt in der Indexierung, Durchsuchung und Beschreibung der multimedialen Datenbestände.
Die Segmentierung audiovisueller Medien, als Teilgebiet der Videoanalyse, bildet die Grundlage für verschiedene Anwendungen im Bereich Multimedia-Information-Retrieval, Content-Browsing und Video-Summarization. Insbesondere die Segmentierung in semantische Handlungsanschnitte bei narrativen Medien gestaltet sich schwierig. Sie setzt ein besonderes Verständnis der filmischen Stilelemente vorraus, die im Rahmen des Schaffensprozesses genutzt wurden, um die Handlung und Narration zu unterstützten.
Die Arbeit untersucht die bekannten filmischen Stilelemente und wie sie sich im Rahmen algorithmischer Verfahren für die Analyse nutzen lassen. Es kann gezeigt werden, dass unter Verwendung eines mehrstufigen Analyse-Prozesses semantische Zusammenhänge in narrativen audiovisuellen Medien gefunden werden können, die zu einer geeigneten Sequenz-Segmentierung führen. / Audiovisual media, especially movies and TV shows, developed within the last hundred years into major mass media. Today, large stocks of audiovisual media are managed in databases and media libraries. The content is provided to professional users as well as private consumers. A particular challenge lies in the indexing, searching and description of multimedia assets.
The segmentation of audiovisual media as a branch of video analysis forms the basis for various applications in multimedia information retrieval, content browsing and video summarization. In particular, the segmentation into semantic meaningful scenes or sequences is difficult. It requires a special understanding of cinematic style elements that were used to support the narration during the creative process of film production.
This work examines the cinematic style elements and how they can be used in the context of algorithmic methods for analysis. For this purpose, an analysis framework was developed as well as a method for sequence-segmentation of films and videos. It can be shown that semantic relationships can be found in narrative audiovisual media, which lead to an appropriate sequence segmentation, by using a multi-stage analysis process, based on visual MPEG-7 descriptors.
|
Page generated in 0.0729 seconds