• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 86
  • 12
  • 10
  • 9
  • 6
  • 5
  • 3
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 191
  • 191
  • 49
  • 37
  • 35
  • 34
  • 28
  • 26
  • 25
  • 24
  • 23
  • 21
  • 21
  • 18
  • 16
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
181

La relation entre les traits de troubles de la personnalité limite et le fonctionnement socio-cognitif dans la population générale

Louis, Pascal 03 1900 (has links)
Des difficultés sociales sont liées au trouble de la personnalité limite (TPL). Pour les comprendre, le traitement de l’information lors de décisions sociales doit être étudié. Parmi ces informations sociales, les expressions faciales émotives sont moins bien identifiées chez les individus ayant un TPL (Daros et al., 2013) comparativement aux individus sains. Également, une étude rapporte que des participants ayant un TPL acceptent plus d’offres monétaires, indépendamment du degré d’équité, que les individus sains dans une tâche de prise de décision sociale simulant des interactions avec des individus (représentés par des visages joyeux ou colériques) divisant de l’argent avec les participants (Polgár et al., 2014). Cependant, ils rapportent aussi que pour les offres monétaires de 20 à 30 % du montant à diviser les participants sains acceptaient les offres des visages joyeux plus fréquemment, cet effet était absent pour les participants ayant un TPL. L’objectif de cette étude est donc de savoir si les différences de reconnaissance des émotions médient la relation de la sévérité des traits du TPL avec des décisions sociales d’acceptation d’offres monétaires accompagnées d’expressions faciales (joie, colère et neutre). Nos résultats corroborent l’association négative des traits du TPL avec la reconnaissance des émotions (Daros et al., 2013) et l’association positive entre l’acceptation des offres et les traits du TPL (Polgár et al., 2014). Cependant, aucun effet médiateur n’est observé. Ainsi, nous concluons que la reconnaissance des émotions et la prise de décision sociale représentent des aspects indépendants des profils sociocognitifs liés aux traits du TPL. / Social difficulties are linked to borderline personality disorder (BPD). To understand them, the processing of information during social decisions must be studied. A form of social information is emotional facial expressions which are less well identified in individuals with BPD (Daros et al., 2013) compared to participants without a diagnosed mental health issue. Also, one study reports that participants with BPD accept more monetary offers (regardless of the amount of money they were offered) participants without a diagnosed mental health issue in a social decision-making task simulating interaction with individuals (represented by happy or angry faces) dividing money with participants (Polgár et al., 2014). However, they also report that for monetary offers that represented 20-30% of the divided sum, participants without a diagnosed mental health issue participants accepted offers from happy faces more frequently. However, this effect was absent in participants with BPD. The aim of this study is to determine whether differences in emotion recognition abilities mediate the relationship of BPD trait severity with social decisions to accept monetary offers accompanied by facial expressions (joy, anger and neutral). Our results support the negative association of BPD traits with emotion recognition (Daros et al., 2013) and the positive association between offer acceptance and BPD traits (Polgár et al., 2014). However, no mediating effect is observed. Thus, we conclude that emotion recognition and social decision-making represent independent aspects of the sociocognitive profiles related to BPD traits.
182

A DEEP LEARNING BASED FRAMEWORK FOR NOVELTY AWARE EXPLAINABLE MULTIMODAL EMOTION RECOGNITION WITH SITUATIONAL KNOWLEDGE

Mijanur Palash (16672533) 03 August 2023 (has links)
<p>Mental health significantly impacts issues like gun violence, school shootings, and suicide. There is a strong connection between mental health and emotional states. By monitoring emotional changes over time, we can identify triggering events, detect early signs of instability, and take preventive measures. This thesis focuses on the development of a generalized and modular system for human emotion recognition and explanation based on visual information. The aim is to address the challenges of effectively utilizing different cues (modalities) available in the data for a reliable and trustworthy emotion recognition system. Our face is one of the most important medium through which we can express our emotion. Therefore We first propose SAFER, A novel facial emotion recognition system with background and place features. We provide a detailed evaluation framework to prove the high accuracy and generalizability. However, relying solely on facial expressions for emotion recognition can be unreliable, as faces can be covered or deceptive.  To enhance the system's reliability, we introduce EMERSK, a multimodal emotion recognition system that integrates various modalities, including facial expressions, posture, gait, and scene background, in a flexible and modular manner. It employs convolutional neural networks (CNNs), Long Short-term Memory (LSTM), and denoising auto-encoders to extract features from facial images, posture, gait, and scene background. In addition to multimodal feature fusion, the system utilizes situational knowledge derived from place type and adjective-noun pairs (ANP) extracted from the scene, as well as the spatio-temporal average distribution of emotions, to generate comprehensive explanations for the recognition outcomes. Extensive experiments on different benchmark datasets demonstrate the superiority of our approach over existing state-of-the-art methods. The system achieves improved performance in accurately recognizing and explaining human emotions. Moreover, we investigate the impact of novelty, such as face masks during the Covid-19 pandemic, on the emotion recognition. The study critically examines the limitations of mainstream facial expression datasets and proposes a novel dataset specifically tailored for facial emotion recognition with masked subjects. Additionally, we propose a continuous learning-based approach that incorporates a novelty detector working in parallel with the classifier to detect and properly handle instances of novelty. This approach ensures robustness and adaptability in the automatic emotion recognition task, even in the presence of novel factors such as face masks. This thesis contributes to the field of automatic emotion recognition by providing a generalized and modular approach that effectively combines multiple modalities, ensuring reliable and highly accurate recognition. Moreover, it generates situational knowledge that is valuable for mission-critical applications and provides comprehensive explanations of the output. The findings and insights from this research have the potential to enhance the understanding and utilization of multimodal emotion recognition systems in various real-world applications.</p> <p><br></p>
183

Influence of interpersonal abilities on social decisions and their physiological correlates

Kaltwasser, Laura 17 February 2016 (has links)
Das Konzept der interpersonellen Fähigkeiten bezieht sich auf Leistungsaufgaben der sozialen Kognition. Diese Aufgaben messen die Fähigkeiten Gesichter zu erkennen und sich diese zu merken sowie Emotionen zu erkennen und diese auszudrücken. Ziel dieser Dissertation war die Untersuchung des Einflusses von interpersonellen Fähigkeiten auf soziale Entscheidungen. Ein besonderer Fokus lag auf der Quantifizierung von individuellen Unterschieden in zugrundeliegenden neuronalen Mechanismen. Studie 1 erweiterte bestehende Evidenz zu Beziehungen zwischen psychometrischen Konstrukten der Gesichterkognition und Ereigniskorrelierten Potentialen, welche mit den verschiedenen Stadien der Gesichterverarbeitung (Enkodierung, Wahrnehmung, Gedächtnis) während einer Bekanntheitsentscheidung assoziiert sind. Unsere Ergebnisse bestätigen eine substantielle Beziehung zwischen der N170 Latenz und der Amplitude des frühen Wiederholungseffektes (ERE) mit drei Faktoren der Gesichterkognition. Je kürzer die N170 Latenz und je ausgeprägter die ERE Amplitude, umso genauer und schneller ist die Gesichterkognition. Studie 2 ergab, dass die Fähigkeit ängstliche Gesichter zu erkennen sowie die generelle spontane Expressivität während der sozialen Interaktion mit prosozialen Entscheidungen korreliert. Sensitivität für das Leid anderer sowie emotionale Expressivität scheinen reziproke Interaktionen mit Gleichgesinnten zu fördern. Studie 3 bestätigte das Modell der starken Reziprozität, da Prosozialität die negative Reziprozität im Ultimatum Spiel beeinflusste. Unter der Verwendung von Strukturgleichungsmodellen entdeckten wir, dass Menschen mit ausgeprägter Reziprozität eine größere Amplitude der relativen feedback-negativity auf das Gesicht von Spielpartnern zeigen. Insgesamt sprechen die Ergebnisse dafür, dass die etablierten individuellen Unterschiede in den Verhaltensmaßen der interpersonellen Fähigkeiten zum Teil auf individuelle Unterschiede in neuronalen Mechanismen zurückzuführen sind. / The concept of interpersonal abilities refers to performance measures of social cognition such as the abilities to perceive and remember faces and the abilities to recognize and express emotions. The aim of this dissertation was to examine the influence of interpersonal abilities on social decisions. A particular focus lay on the quantification of individual differences in brain-behavior relationships associated with processing interpersonally relevant stimuli. Study 1 added to existing evidence on brain-behavior relationships, specifically between psychometric constructs of face cognition and event-related potentials associated with different stages of face processing (encoding, perception, and memory) in a familiarity decision. Our findings confirm a substantial relationship between the N170 latency and the early-repetition effect (ERE) amplitude with three established face cognition ability factors. The shorter the N170 latency and the more pronounced the ERE amplitude, the better is the performance in face perception and memory and the faster is the speed of face cognition. Study 2 found that the ability to recognize fearful faces as well as the general spontaneous expressiveness during social interaction are linked to prosocial choices in several socio-economic games. Sensitivity to the distress of others and spontaneous expressiveness foster reciprocal interactions with prosocial others. Study 3 confirmed the model of strong reciprocity in that prosociality drives negative reciprocity in the ultimatum game. Using multilevel structural equation modeling in order to estimate brain-behavior relationships of fairness preferences, we found strong reciprocators to show more pronounced relative feedback-negativity amplitude in response to the faces of bargaining partners. Thus, the results of this dissertation suggest that established individual differences in behavioral measures of interpersonal ability are partly due to individual differences in brain mechanisms.
184

Multilingual Speech Emotion Recognition using pretrained models powered by Self-Supervised Learning / Flerspråkig känsloigenkänning från tal med hjälp av förtränade tal-modeller baserat på själv-övervakad Inlärning

Luthman, Felix January 2022 (has links)
Society is based on communication, for which speech is the most prevalent medium. In day to day interactions we talk to each other, but it is not only the words spoken that matters, but the emotional delivery as well. Extracting emotion from speech has therefore become a topic of research in the area of speech tasks. This area as a whole has in recent years adopted a Self- Supervised Learning approach for learning speech representations from raw speech audio, without the need for any supplementary labelling. These speech representations can be leveraged for solving tasks limited by the availability of annotated data, be it for low-resource language, or a general lack of data for the task itself. This thesis aims to evaluate the performances of a set of pre-trained speech models by fine-tuning them in different multilingual environments, and evaluating their performance thereafter. The model presented in this paper is based on wav2vec 2.0 and manages to correctly classify 86.58% of samples over eight different languages and four emotional classes when trained on those same languages. Experiments were conducted to garner how well a model trained on seven languages would perform on the one left out, which showed that there is quite a large margin of similarity in how different cultures express vocal emotions, and further investigations showed that as little as just a few minutes of in-domain data is able to increase the performance substantially. This shows promising results even for niche languages, as the amount of available data may not be as large of a hurdle as one might think. With that said, increasing the amount of data from minutes to hours does still garner substantial improvements, albeit to a lesser degree. / Hela vårt samhälle är byggt på kommunikation mellan olika människor, varav tal är det vanligaste mediet. På en daglig basis interagerar vi genom att prata med varandra, men det är inte bara orden som förmedlar våra intentioner, utan även hur vi uttrycker dem. Till exempel kan samma mening ge helt olika intryck beroende på ifall den sägs med ett argt eller glatt tonfall. Talbaserad forskning är ett stort vetenskapligt område i vilket talbaserad känsloigenkänning vuxit fram. Detta stora tal-område har under de senaste åren sett en tendens att utnyttja en teknik kallad själv-övervakad inlärning för att utnyttja omärkt ljuddata för att lära sig generella språkrepresentationer, vilket kan liknas vid att lära sig strukturen av tal. Dessa representationer, eller förtränade modeller, kan sedan utnyttjas som en bas för att lösa problem med begränsad tillgång till märkt data, vilket kan vara fallet för sällsynta språk eller unika uppgifter. Målet med denna rapport är att utvärdera olika applikationer av denna representations inlärning i en flerspråkig miljö genom att finjustera förtränade modeller för känsloigenkänning. I detta syfte presenterar vi en modell baserad på wav2vec 2.0 som lyckas klassifiera 86.58% av ljudklipp tagna från åtta olika språk över fyra olika känslo-klasser, efter att modellen tränats på dessa språk. För att avgöra hur bra en modell kan klassifiera data från ett språk den inte tränats på skapades modeller tränade på sju språk, och evaluerades sedan på det språk som var kvar. Dessa experiment visar att sättet vi uttrycker känslor mellan olika kulturer är tillräckligt lika för att modellen ska prestera acceptabelt även i det fall då modellen inte sett språket under träningsfasen. Den sista undersökningen utforskar hur olika mängd data från ett språk påverkar prestandan på det språket, och visar att så lite som endast ett par minuter data kan förbättra resultet nämnvärt, vilket är lovande för att utvidga modellen för fler språk i framtiden. Med det sagt är ytterligare data att föredra, då detta medför fortsatta förbättringar, om än i en lägre grad.
185

De l'usage des métadonnées dans l'objet sonore / The use of sound objects metadata

Debaecker, Jean 09 October 2012 (has links)
La reconnaissance des émotions dans la musique est un challenge industriel et académique. À l’heure de l’explosion des contenus multimédias, il nous importe de concevoir des ensembles structurés de termes, concepts et métadonnées facilitant l’organisation et l’accès aux connaissances. Notre problématique est la suivante : est-Il possible d'avoir une connaissance a priori de l'émotion en vue de son élicitation ? Autrement dit, dans quelles mesures est-Il possible d'inscrire les émotions ressenties à l'écoute d'une oeuvre musicale dans un régime de métadonnées et de bâtir une structure formelle algorithmique permettant d'isoler le mécanisme déclencheur des émotions ? Est-Il possible de connaître l'émotion que l'on ressentira à l'écoute d'une chanson, avant de l'écouter ? Suite à l'écoute, son élicitation est-Elle possible ? Est-Il possible de formaliser une émotion dans le but de la sauvegarder et de la partager ? Nous proposons un aperçu de l'existant et du contexte applicatif ainsi qu'une réflexion sur les enjeux épistémologiques intrinsèques et liés à l'indexation même de l'émotion : à travers lune démarche psychologique, physiologique et philosophique, nous proposerons un cadre conceptuel de cinq démonstrations faisant état de l'impossible mesure de l'émotion, en vue de son élicitation. Une fois dit à travers notre cadre théorique qu'il est formellement impossible d'indexer les émotions, il nous incombe de comprendre la mécanique d'indexation cependant proposée par les industriels et académiques. Nous proposons, via l'analyse d'enquêtes quantitatives et qualitatives, la production d'un algorithme effectuant des préconisationsd'écoute d’œuvres musicales. / Emotion recognition in music is an industrial and academic challenge. In the age of multimedia content explosion, we mean to design structured sets of terms, concepts and metadata facilitating access to organized knowledge. Here is our research question : can we have an a priori knowledge of emotion that could be elicited afterwards ? In other words, to what extent can we record emotions felt while listening to music, so as to turn them into metadata ? Can we create an algorithm enabling us to detect how emotions are released ? Are we likely to guess ad then elicit the emotion an individual will feel before listening to a particular song ? Can we formalize emotions to save, record and share them ? We are giving an overview of existing research, and tackling intrinsic epistemological issues related to emotion existing, recording and sharing out. Through a psychological, physiological ad philosophical approach, we are setting a theoretical framework, composed of five demonstrations which assert we cannot measure emotions in order to elicit them. Then, a practical approach will help us to understand the indexing process proposed in academic and industrial research environments. Through the analysis of quantitative and qualitative surveys, we are defining the production of an algorithm, enabling us to recommend musical works considering emotion.
186

Modelado de la cualidad de la voz para la síntesis del habla expresiva

Monzo Sánchez, Carlos Manuel 14 July 2010 (has links)
Aquesta tesi es realitza dins del marc de treball existent en el grup d'investigació Grup de Recerca en Tecnologies Mèdia (GTM) d'Enginyeria i Arquitectura La Salle, amb l'objectiu de dotar de major naturalitat a la interacció home-màquina. Per això ens basem en les limitacions de la tecnologia emprada fins al moment, detectant punts de millora en els que poder aportar solucions. Donat que la naturalitat de la parla està íntimament relacionada amb l'expressivitat que aquesta pot transmetre, aquests punts de millora es centren en la capacitat de treballar amb emocions o estils de parla expressius en general.L'objectiu últim d'aquesta tesi és la generació d'estils de parla expressius en l'àmbit de sistemes de Conversió de Text a Parla (CTP) orientats a la Síntesi de la Parla Expressiva (SPE), essent possible transmetre un missatge oral amb una certa expressivitat que l'oient sigui capaç de percebre i interpretar correctament. No obstant, aquest objectiu implica diferents metes intermitges: conèixer les opcions de parametrització existents, entendre cadascun dels paràmetres, detectar els pros i contres de la seva utilització, descobrir les relacions existents entre ells i els estils de parla expressius i, finalment, portar a terme la síntesi de la parla expressiva. Donat això, el propi procés de síntesi implica un treball previ en reconeixement d'emocions, que en si mateix podria ser una línia complerta d'investigació, ja que aporta el coneixement necessari per extreure models que poden ser usats durant el procés de síntesi.La cerca de l'increment de la naturalitat ha implicat una millor caracterització de la parla emocional o expressiva, raó per la qual s'ha investigat en parametritzacions que poguessin portar a terme aquesta comesa. Aquests són els paràmetres de Qualitat de la Veu Voice Quality (VoQ), que presenten com a característica principal que són capaços de caracteritzar individualment la parla, identificant cadascun dels factors que fan que sigui única. Els beneficis potencials, que aquest tipus de parametrització pot aportar a la interacció natural, són de dos classes: el reconeixement i la síntesi d'estils de parla expressius. La proposta de la parametrització de VoQ no pretén substituir a la ja emprada prosòdia, sinó tot el contrari, treballar conjuntament amb ella per tal de millorar els resultats obtinguts fins al moment.Un cop realitzada la selecció de paràmetres es planteja el modelat de la VoQ, és a dir la metodologia d'anàlisi i de modificació, de forma que cadascun d'ells pugui ser extret a partir de la senyal de veu i posteriorment modificat durant la síntesi. Així mateix, es proposen variacions pels paràmetres implicats i tradicionalment utilitzats, adaptant la seva definició al context de la parla expressiva. A partir d'aquí es passa a treballar en les relacions existents amb els estils de parla expressius, presentant finalment la metodologia de transformació d'aquests últims, mitjançant la modificació conjunta de la VoQ y la prosòdia, per a la SPE en un sistema de CTP. / Esta tesis se realiza dentro del marco de trabajo existente en el grupo de investigación Grup de Recerca en Tecnologies Mèdia (GTM) de Enginyeria i Arquitectura La Salle, con el objetivo de dotar de mayor naturalidad a la interacción hombre-máquina. Para ello nos basamos en las limitaciones de la tecnología empleada hasta el momento, detectando puntos de mejora en los que poder aportar soluciones. Debido a que la naturalidad del habla está íntimamente relacionada con la expresividad que esta puede transmitir, estos puntos de mejora se centran en la capacidad de trabajar con emociones o estilos de habla expresivos en general.El objetivo último de esta tesis es la generación de estilos de habla expresivos en el ámbito de sistemas de Conversión de Texto en Habla (CTH) orientados a la Síntesis del Habla Expresiva (SHE), siendo posible transmitir un mensaje oral con una cierta expresividad que el oyente sea capaz de percibir e interpretar correctamente. No obstante, este objetivo implica diferentes metas intermedias: conocer las opciones de parametrización existentes, entender cada uno de los parámetros, detectar los pros y contras de su utilización, descubrir las relaciones existentes entre ellos y los estilos de habla expresivos y, finalmente, llevar a cabo la síntesis del habla expresiva. El propio proceso de síntesis implica un trabajo previo en reconocimiento de emociones, que en sí mismo podría ser una línea completa de investigación, ya que muestra la viabilidad de usar los parámetros seleccionados en la discriminación de estos y aporta el conocimiento necesario para extraer los modelos que pueden ser usados durante el proceso de síntesis.La búsqueda del incremento de la naturalidad ha implicado una mejor caracterización del habla emocional o expresiva, con lo que para ello se ha investigado en parametrizaciones que pudieran llevar a cabo este cometido. Estos son los parámetros de Cualidad de la Voz Voice Quality (VoQ), que presentan como característica principal que son capaces de caracterizar individualmente el habla, identificando cada uno de los factores que hacen que sea única. Los beneficios potenciales, que este tipo de parametrización puede aportar a la interacción natural, son de dos clases: el reconocimiento y la síntesis de estilos de habla expresivos. La propuesta de la parametrización de VoQ no pretende sustituir a la ya empleada prosodia, sino todo lo contrario, trabajar conjuntamente con ella para mejorar los resultados obtenidos hasta el momento.Una vez realizada la selección de los parámetros se plantea el modelado de la VoQ, es decir, la metodología de análisis y de modificación de forma que cada uno de ellos pueda ser extraído a partir de la señal de voz y posteriormente modificado durante la síntesis. Asimismo, se proponen variaciones para los parámetros implicados y tradicionalmente utilizados, adaptando su definición al contexto del habla expresiva.A partir de aquí se pasa a trabajar en las relaciones existentes con los estilos de habla expresivos, presentando finalmente la metodología de transformación de estos últimos, mediante la modificación conjunta de VoQ y prosodia, para la SHE en un sistema de CTH. / This thesis is conducted on the existing working framework in the Grup de Recerca en Tecnologies Mèdia (GTM) research group of the Enginyeria i Arquitectura La Salle, with the aim of providing the man-machine interaction with more naturalness. To do this, we are based on the limitations of the technology used up to now, detecting the improvement points where we could contribute solutions. Given that the speech naturalness is closely linked with the expressivity communication, these improvement points are focused on the ability of working with emotions or expressive speech styles in general.The final goal of this thesis is the expressive speech styles generation in the field of Text-to-Speech (TTS) systems aimed at Expressive Speech Synthesis (ESS), with the possibility of communicating an oral message with a certain expressivity that the listener will be able to correctly perceive and interpret. Nevertheless, this goal involves different intermediate aims: to know the existing parameterization options, to understand each of the parameters, to find out the existing relations among them and the expressive speech styles and, finally, to carry out the expressive speech synthesis. All things considered, the synthesis process involves a previous work in emotion recognition, which could be a complete research field, since it shows the feasibility of using the selected parameters during their discrimination and provides with the necessary knowledge for the modelling that can be used during the synthesis process.The search for the naturalness improvement has implied a better characterization of the emotional or expressive speech, so we have researched on parameterizations that could perform this task. These are the Voice Quality (VoQ) parameters, which main feature is they are able to characterize the speech in an individual way, identifying each factor that makes it unique. The potential benefits that this kind of parameterization can provide with natural interaction are twofold: the expressive speech styles recognition and the synthesis. The VoQ parameters proposal is not trying to replace prosody, but working altogether to improve the results so far obtained.Once the parameters selection is conducted, the VoQ modelling is raised (i. e. analysis and modification methodology), so each of them can be extracted from the voice signal and later on modified during the synthesis. Also, variations are proposed for the involved and traditionally used parameters, adjusting their definition to the expressive speech context. From here, we work on the existing relations with the expressive speech styles and, eventually we show the transformation methodology for these ones, by means of the modification of VoQ and prosody, for the ESS in a TTS system.
187

Emotion and motion: age-related differences in recognizing virtual agent facial expressions

Smarr, Cory-Ann 05 October 2011 (has links)
Technological advances will allow virtual agents to increasingly help individuals with daily activities. As such, virtual agents will interact with users of various ages and experience levels. Facial expressions are often used to facilitate social interaction between agents and humans. However, older and younger adults do not label human or virtual agent facial expressions in the same way, with older adults commonly mislabeling certain expressions. The dynamic formation of facial expression, or motion, may provide additional facial information potentially making emotions less ambiguous. This study examined how motion affects younger and older adults in recognizing various intensities of emotion displayed by a virtual agent. Contrary to the dynamic advantage found in emotion recognition for human faces, older adults had higher emotion recognition for static virtual agent faces than dynamic ones. Motion condition did not influence younger adults' emotion recognition. Younger adults had higher emotion recognition than older adults for the emotions of anger, disgust, fear, happiness, and sadness. Low intensities of expression had lower emotion recognition than medium to high expression intensities.
188

An investigation into the feasibility of monitoring a call centre using an emotion recognition system

Stoop, Werner 04 June 2010 (has links)
In this dissertation a method for the classification of emotion in speech recordings made in a customer service call centre of a large business is presented. The problem addressed here is that customer service analysts at large businesses have to listen to large numbers of call centre recordings in order to discover customer service-related issues. Since recordings where the customer exhibits emotion are more likely to contain useful information for service improvement than “neutral” ones, being able to identify those recordings should save a lot of time for the customer service analyst. MTN South Africa agreed to provide assistance for this project. The system that has been developed for this project can interface with MTN’s call centre database, download recordings, classify them according to their emotional content, and provide feedback to the user. The system faces the additional challenge that it is required to classify emotion notwith- standing the fact that the caller may have one of several South African accents. It should also be able to function with recordings made at telephone quality sample rates. The project identifies several speech features that can be used to classify a speech recording according to its emotional content. The project uses these features to research the general methods by which the problem of emotion classification in speech can be approached. The project examines both a K-Nearest Neighbours Approach and an Artificial Neural Network- Based Approach to classify the emotion of the speaker. Research is also done with regard to classifying a recording according to the gender of the speaker using a neural network approach. The reason for this classification is that the gender of a speaker may be useful input into an emotional classifier. The project furthermore examines the problem of identifying smaller segments of speech in a recording. In the typical call centre conversation, a recording may start with the agent greeting the customer, the customer stating his or her problem, the agent performing an action, during which time no speech occurs, the agent reporting back to the user and the call being terminated. The approach taken by this project allows the program to isolate these different segments of speech in a recording and discard segments of the recording where no speech occurs. This project suggests and implements a practical approach to the creation of a classifier in a commercial environment through its use of a scripting language interpreter that can train a classifier in one script and use the trained classifier in another script to classify unknown recordings. The project also examines the practical issues involved in implementing an emotional clas- sifier. It addresses the downloading of recordings from the call centre, classifying the recording and presenting the results to the customer service analyst. AFRIKAANS : n Metode vir die klassifisering van emosie in spraakopnames in die oproepsentrum van ’n groot sake-onderneming word in hierdie verhandeling aangebied. Die probleem wat hierdeur aangespreek word, is dat kli¨entediens ontleders in ondernemings na groot hoeveelhede oproepsentrum opnames moet luister ten einde kli¨entediens aangeleenthede te identifiseer. Aangesien opnames waarin die kli¨ent emosie toon, heel waarskynlik nuttige inligting bevat oor diensverbetering, behoort die vermo¨e om daardie opnames te identifiseer vir die analis baie tyd te spaar. MTN Suid-Afrika het ingestem om bystand vir die projek te verleen. Die stelsel wat ontwikkel is kan opnames vanuit MTN se oproepsentrum databasis verkry, klassifiseer volgens emosionele inhoud en terugvoering aan die gebruiker verskaf. Die stelsel moet die verdere uitdaging kan oorkom om emosie te kan klassifiseer nieteenstaande die feit dat die spreker een van verskeie Suid-Afrikaanse aksente het. Dit moet ook in staat wees om opnames wat gemaak is teen telefoon gehalte tempos te analiseer. Die projek identifiseer verskeie spraak eienskappe wat gebruik kan word om ’n opname volgens emosionele inhoud te klassifiseer. Die projek gebruik hierdie eienskappe om die algemene metodes waarmee die probleem van emosie klassifisering in spraak benader kan word, na te vors. Die projek gebruik ’n K-Naaste Bure en ’n Neurale Netwerk benadering om die emosie van die spreker te klassifiseer. Navorsing is voorts gedoen met betrekking tot die klassifisering van die geslag van die spreker deur ’n neurale netwerk. Die rede vir hierdie klassifisering is dat die geslag van die spreker ’n nuttige inset vir ’n emosie klassifiseerder mag wees. Die projek ondersoek ook die probleem van identifisering van spraakgedeeltes in ’n opname. In ’n tipiese oproepsentrum gesprek mag die opname begin met die agent wat die kli¨ent groet, die kli¨ent wat sy of haar probleem stel, die agent wat ’n aksie uitvoer sonder spraak, die agent wat terugrapporteer aan die gebruiker en die oproep wat be¨eindig word. Die benadering van hierdie projek laat die program toe om hierdie verskillende gedeeltes te isoleer uit die opname en om gedeeltes waar daar geen spraak plaasvind nie, uit te sny. Die projek stel ’n praktiese benadering vir die ontwikkeling van ’n klassifiseerder in ’n kommersi¨ele omgewing voor en implementeer dit deur gebruik te maak van ’n programeer taal interpreteerder wat ’n klassifiseerder kan oplei in een program en die opgeleide klassifiseerder gebruik om ’n onbekende opname te klassifiseer met behulp van ’n ander program. Die projek ondersoek ook die praktiese aspekte van die implementering van ’n emosionele klassifiseerder. Dit spreek die aflaai van opnames uit die oproep sentrum, die klassifisering daarvan, en die aanbieding van die resultate aan die kli¨entediens analis, aan. Copyright / Dissertation (MEng)--University of Pretoria, 2010. / Electrical, Electronic and Computer Engineering / unrestricted
189

Rozpoznání emočního stavu z hrané a spontánní řeči / Emotion Recognition from Acted and Spontaneous Speech

Atassi, Hicham January 2014 (has links)
Dizertační práce se zabývá rozpoznáním emočního stavu mluvčích z řečového signálu. Práce je rozdělena do dvou hlavních častí, první část popisuju navržené metody pro rozpoznání emočního stavu z hraných databází. V rámci této části jsou představeny výsledky rozpoznání použitím dvou různých databází s různými jazyky. Hlavními přínosy této části je detailní analýza rozsáhlé škály různých příznaků získaných z řečového signálu, návrh nových klasifikačních architektur jako je například „emoční párování“ a návrh nové metody pro mapování diskrétních emočních stavů do dvou dimenzionálního prostoru. Druhá část se zabývá rozpoznáním emočních stavů z databáze spontánní řeči, která byla získána ze záznamů hovorů z reálných call center. Poznatky z analýzy a návrhu metod rozpoznání z hrané řeči byly využity pro návrh nového systému pro rozpoznání sedmi spontánních emočních stavů. Jádrem navrženého přístupu je komplexní klasifikační architektura založena na fúzi různých systémů. Práce se dále zabývá vlivem emočního stavu mluvčího na úspěšnosti rozpoznání pohlaví a návrhem systému pro automatickou detekci úspěšných hovorů v call centrech na základě analýzy parametrů dialogu mezi účastníky telefonních hovorů.
190

Určování stresu z řečového signálu / Stress recognition from speech signal

Staněk, Miroslav January 2016 (has links)
Předložená disertační práce se zabývá vývojem algoritmů pro detekci stresu z řečového signálu. Inovativnost této práce se vyznačuje dvěma typy analýzy řečového signálu, a to za použití samohláskových polygonů a analýzy hlasivkových pulsů. Obě tyto základní analýzy mohou sloužit k detekci stresu v řečovém signálu, což bylo dokázáno sérií provedených experimentů. Nejlepších výsledků bylo dosaženo pomocí tzv. Closing-To-Opening phase ratio příznaku v Top-To-Bottom kritériu v kombinaci s vhodným klasifikátorem. Detekce stresu založená na této analýze může být definována jako jazykově i fonémově nezávislá, což bylo rovněž dokázáno získanými výsledky, které dosahují v některých případech až 95% úspěšnosti. Všechny experimenty byly provedeny na vytvořené české databázi obsahující reálný stres, a některé experimenty byly také provedeny pro anglickou stresovou databázi SUSAS.

Page generated in 0.14 seconds