Spelling suggestions: "subject:"speech prosodic""
1 |
Syllable fusion in Hong Kong Cantonese connected speechWong, Wai Yi Peggy 14 July 2006 (has links)
No description available.
|
2 |
Computational Affect Detection for Education and HealthCooper, David G. 01 September 2011 (has links)
Emotional intelligence has a prominent role in education, health care, and day to day interaction. With the increasing use of computer technology, computers are interacting with more and more individuals. This interaction provides an opportunity to increase knowledge about human emotion for human consumption, well-being, and improved computer adaptation. This thesis explores the efficacy of using up to four different sensors in three domains for computational affect detection. We first consider computer-based education, where a collection of four sensors is used to detect student emotions relevant to learning, such as frustration, confidence, excitement and interest while students use a computer geometry tutor. The best classier of each emotion in terms of accuracy ranges from 78% to 87.5%. We then use voice data collected in a clinical setting to differentiate both gender and culture of the speaker. We produce classifiers with accuracies between 84% and 94% for gender, and between 58% and 70% for American vs. Asian culture, and we find that classifiers for distinguishing between four cultures do not perform better than chance. Finally, we use video and audio in a health care education scenario to detect students' emotions during a clinical simulation evaluation. The video data provides classifiers with accuracies between 63% and 88% for the emotions of confident, anxious, frustrated, excited, and interested. We find the audio data to be too complex to single out the voice source of the student by automatic means. In total, this work is a step forward in the automatic computational detection of affect in realistic settings.
|
3 |
Emotion recognition from speech using prosodic featuresVäyrynen, E. (Eero) 29 April 2014 (has links)
Abstract
Emotion recognition, a key step of affective computing, is the process of decoding an embedded emotional message from human communication signals, e.g. visual, audio, and/or other physiological cues. It is well-known that speech is the main channel for human communication and thus vital in the signalling of emotion and semantic cues for the correct interpretation of contexts. In the verbal channel, the emotional content is largely conveyed as constant paralinguistic information signals, from which prosody is the most important component. The lack of evaluation of affect and emotional states in human machine interaction is, however, currently limiting the potential behaviour and user experience of technological devices.
In this thesis, speech prosody and related acoustic features of speech are used for the recognition of emotion from spoken Finnish. More specifically, methods for emotion recognition from speech relying on long-term global prosodic parameters are developed. An information fusion method is developed for short segment emotion recognition using local prosodic features and vocal source features. A framework for emotional speech data visualisation is presented for prosodic features.
Emotion recognition in Finnish comparable to the human reference is demonstrated using a small set of basic emotional categories (neutral, sad, happy, and angry). A recognition rate for Finnish was found comparable with those reported in the western language groups. Increased emotion recognition is shown for short segment emotion recognition using fusion techniques. Visualisation of emotional data congruent with the dimensional models of emotion is demonstrated utilising supervised nonlinear manifold modelling techniques. The low dimensional visualisation of emotion is shown to retain the topological structure of the emotional categories, as well as the emotional intensity of speech samples.
The thesis provides pattern recognition methods and technology for the recognition of emotion from speech using long speech samples, as well as short stressed words. The framework for the visualisation and classification of emotional speech data developed here can also be used to represent speech data from other semantic viewpoints by using alternative semantic labellings if available. / Tiivistelmä
Emootiontunnistus on affektiivisen laskennan keskeinen osa-alue. Siinä pyritään ihmisen kommunikaatioon sisältyvien emotionaalisten viestien selvittämiseen, esim. visuaalisten, auditiivisten ja/tai fysiologisten vihjeiden avulla. Puhe on ihmisten tärkein tapa kommunikoida ja on siten ensiarvoisen tärkeässä roolissa viestinnän oikean semanttisen ja emotionaalisen tulkinnan kannalta. Emotionaalinen tieto välittyy puheessa paljolti jatkuvana paralingvistisenä viestintänä, jonka tärkein komponentti on prosodia. Tämän affektiivisen ja emotionaalisen tulkinnan vajaavaisuus ihminen-kone – interaktioissa rajoittaa kuitenkin vielä nykyisellään teknologisten laitteiden toimintaa ja niiden käyttökokemusta.
Tässä väitöstyössä on käytetty puheen prosodisia ja akustisia piirteitä puhutun suomen emotionaalisen sisällön tunnistamiseksi. Työssä on kehitetty pitkien puhenäytteiden prosodisiin piirteisiin perustuvia emootiontunnistusmenetelmiä. Lyhyiden puheenpätkien emotionaalisen sisällön tunnistamiseksi on taas kehitetty informaatiofuusioon perustuva menetelmä käyttäen prosodian sekä äänilähteen laadullisten piirteiden yhdistelmää. Lisäksi on kehitetty teknologinen viitekehys emotionaalisen puheen visualisoimiseksi prosodisten piirteiden avulla.
Tutkimuksessa saavutettiin ihmisten tunnistuskykyyn verrattava automaattisen emootiontunnistuksen taso käytettäessä suppeaa perusemootioiden joukkoa (neutraali, surullinen, iloinen ja vihainen). Emootiontunnistuksen suorituskyky puhutulle suomelle havaittiin olevan verrannollinen länsieurooppalaisten kielten kanssa. Lyhyiden puheenpätkien emotionaalisen sisällön tunnistamisessa saavutettiin taas parempi suorituskyky käytettäessä fuusiomenetelmää. Emotionaalisen puheen visualisoimiseksi kehitetyllä opetettavalla epälineaarisella manifoldimallinnustekniikalla pystyttiin tuottamaan aineistolle emootion dimensionaalisen mallin kaltainen visuaalinen rakenne. Mataladimensionaalisen kuvauksen voitiin edelleen osoittaa säilyttävän sekä tutkimusaineiston emotionaalisten luokkien että emotionaalisen intensiteetin topologisia rakenteita.
Tässä väitöksessä kehitettiin hahmontunnistusmenetelmiin perustuvaa teknologiaa emotionaalisen puheen tunnistamiseksi käytettäessä sekä pitkiä että lyhyitä puhenäytteitä. Emotionaalisen aineiston visualisointiin ja luokitteluun kehitettyä teknologista kehysmenetelmää käyttäen voidaan myös esittää puheaineistoa muidenkin semanttisten rakenteiden mukaisesti.
|
4 |
The Effects of Internal and Experience-Based Factors on the Perception of Lexical Pitch Accent by Native and Nonnative Japanese ListenersGoss, Seth Joshua 19 May 2015 (has links)
No description available.
|
5 |
Vokinesis : instrument de contrôle suprasegmental de la synthèse vocale / Vokinesis : an instrument for suprasegmental control of voice synthesisDelalez, Samuel 28 November 2017 (has links)
Ce travail s'inscrit dans le domaine du contrôle performatif de la synthèse vocale, et plus particulièrement de la modification temps-réel de signaux de voix pré-enregistrés. Dans un contexte où de tels systèmes n'étaient en mesure de modifier que des paramètres de hauteur, de durée et de qualité vocale, nos travaux étaient centrés sur la question de la modification performative du rythme de la voix. Une grande partie de ce travail de thèse a été consacrée au développement de Vokinesis, un logiciel de modification performative de signaux de voix pré-enregistrés. Il a été développé selon 4 objectifs: permettre le contrôle du rythme de la voix, avoir un système modulaire, utilisable en situation de concert ainsi que pour des applications de recherche. Son développement a nécessité une réflexion sur la nature du rythme vocal et sur la façon dont il doit être contrôlé. Il est alors apparu que l'unité rythmique inter-linguistique de base pour la production du rythme vocale est de l'ordre de la syllabe, mais que les règles de syllabification sont trop variables d'un langage à l'autre pour permettre de définir un motif rythmique inter-linguistique invariant. Nous avons alors pu montrer que le séquencement précis et expressif du rythme vocal nécessite le contrôle de deux phases, qui assemblées forment un groupe rythmique: le noyau et la liaison rythmiques. Nous avons mis en place plusieurs méthodes de contrôle rythmique que nous avons testées avec différentes interfaces de contrôle. Une évaluation objective a permis de valider l'une de nos méthodes du point de vue de la précision du contrôle rythmique. De nouvelles stratégies de contrôle de la hauteur et de paramètres de qualité vocale avec une tablette graphique ont été mises en place. Une réflexion sur la pertinence de cette interface au regard de l'essor des nouvelles interfaces musicales continues nous a permis de conclure que la tablette est la mieux adaptée au contrôle expressif de l'intonation (parole), mais que les PMC (Polyphonic Multidimensional Controllers) sont mieux adaptés au contrôle de la mélodie (chant, ou autres instruments).Le développement de Vokinesis a également nécessité la mise en place de la méthode de traitement de signal VoPTiQ (Voice Pitch, Time and Quality modification), combinant une adaptation de l'algorithme RT-PSOLA et des techniques particulières de filtrage pour les modulations de qualité vocale. L'utilisation musicale de Vokinesis a été évaluée avec succès dans le cadre de représentations publiques du Chorus Digitalis, pour du chant de type variété ou musique contemporaine. L'utilisation dans un cadre de musique électro a également été explorée par l'interfaçage du logiciel de création musicale Ableton Live à Vokinesis. Les perspectives d'application sont multiples: études scientifiques (recherches en prosodie, en parole expressive, en neurosciences...), productions sonores et musicales, pédagogie des langues, thérapies vocales. / This work belongs to the field of performative control of voice synthesis, and more precisely of real-time modification of pre-recorded voice signals. In a context where such systems were only capable of modifying parameters such as pitch, duration and voice quality, our work was carried around the question of performative modification of voice rhythm. One significant part of this thesis has been devoted to the development of Vokinesis, a program for performative modification of pre-recorded voice. It has been developed under 4 goals: to allow for voice rhythm control, to obtain a modular system, usable in public performances situations as well as for research applications. To achieve this development, a reflexion about the nature of voice rhythm and how it should be controlled has been carried out. It appeared that the basic inter-linguistic rhtyhmic unit is syllable-sized, but that syllabification rules are too language-dependant to provide a invariant inter-linguistic rhythmic pattern. We showed that accurate and expressive sequencing of vocal rhythm is performed by controlling the timing of two phases, which together form a rhythmic group: the rhythmic nucleus and the rhythmic link. We developed several rhythm control methods, tested with several control interfaces. An objective evaluation showed that one of our methods allows for very accurate control of rhythm. New strategies for voice pitch and quality control with a graphic tablet have been established. A reflexion about the pertinence of graphic tablets for pitch control, regarding the rise of new continuous musical interfaces, lead us to the conclusion that they best fit intonation control (speech), but that PMC (Polyphonic Multidimensional controllers) are better for melodic control (singing, or other instruments).The development of Vokinesis also required the implementation of the VoPTiQ (Voice Pitch, Time and Quality modification) signal processing method, which combines an adaptation of the RT-PSOLA algorithm and some specific filtering techniques for voice quality modulations. The use of Vokinesis as a musical instrument has been successfully evaluated in public representations of the Chorus Digitalis ensemble, for various singing styles (from pop to contemporary music). Its use for electro music has also been explored by interfacing the Ableton Live composition environnment with Vokinesis. Application perspectives are diverse: scientific studies (research in prosody, expressive speech, neurosciences), sound and music production, language learning and teaching, speech therapies.
|
Page generated in 0.0487 seconds