Spelling suggestions: "subject:"anda sentiment analysis."" "subject:"ando sentiment analysis.""
211 |
RELATOS VERBAIS DE CONSUMIDORES EM AVALIAÇÕES ON-LINE: PROSPECÇÃO COMPUTACIONAL E INTERPRETAÇÕES COM BASE NO BEHAVIORAL PERSPECTIVE MODEL (BPM)Brito, Parcilene Fernandes de 29 June 2018 (has links)
Submitted by admin tede (tede@pucgoias.edu.br) on 2018-09-27T18:13:44Z
No. of bitstreams: 1
ParcileneFernandesdeBrito.pdf: 2444083 bytes, checksum: eea4e1b897bdd9e57504c34888b57c01 (MD5) / Made available in DSpace on 2018-09-27T18:13:44Z (GMT). No. of bitstreams: 1
ParcileneFernandesdeBrito.pdf: 2444083 bytes, checksum: eea4e1b897bdd9e57504c34888b57c01 (MD5)
Previous issue date: 2018-06-29 / The vast amount of information available on the Internet have enabled numerous
multidisciplinar investigations aimed to understand nuances of human consumption
behavior, especially to identify people's opinions about products and services. From
Behavioral Perspective Model (BPM), consumer behavior analysis can be conducted
focusing on antecedent variables (behavioral setting and consumer learning history)
and consequences (reinforcement and punishment, utilitarian and informative) to the
occurrence of behavior. The present thesis investigated consumption behavior in the
context of tourism, with BPM as theoretical support for interpretations of verbal data
extracted from comments available on the TripAdvisor®, a website about tourism.
Verbal responses of tourism consumers, engaged in the process of online avaluation
of components of tourism products (specifically, Accommodations [ACO], Restaurants
[RES] and Attractions [ATR]), were analyzed. Research participants were the unknown
individuals who, between the beginning of February and the end of March 2017,
emitted 6.438.497 comments distributed among the 100 most evaluated Brazilian
touristic destinations at TribAdvisor®. In two studies (Study 1 [E1] and Study 2 [E2]),
the thesis research aimed at: a) extraction and analysis of tourist´s verbal information
(commentaries) throught a Sentiment Analysis (SA) computational technique;
extraction the number of touristic product component (ACO, RES e ATR) evaluative
indications emitted by tourism consumers with different statuses as TripColaborators;
extraction of the number of votes (Likes) to the comments; b) describe the polarized
evaluative response attributed to the 100 evaluated touristic destinations and interpret
such responding considering BPM concepts. E1 resulted in the successful
development of the SentimentALL tool, focusing on the AS module, and the generation
the primary variables explored in E2 (n = 197). In E2, data generated in E1 and derived
measures were explored (described in rankings and correlation analyzes) and
interpreted using the BPM conceptual framework. With a fundamental exploratory
caracter, the interpretative effort suggested profitable research lines and utility of the
computational and psychological knowledges integration. / A grande quantidade de informações disponíveis na internet tem viabilizado
numerosas investigações de caráter multidisciplinar com o objetivo de entender
nuances do comportamento de consumo humano, especialmente identificar as
opiniões das pessoas sobre produtos e serviços. A partir do Behavioral Perspective
Model (BPM), análises do comportamento do consumidor podem ser realizadas
considerando variáveis antecedentes (cenário do comportamento e história de
aprendizagem do consumidor) e consequências (reforços e punições, utilitários e
informativos) à ocorrência do comportamento. A presente tese investigou o
comportamento de consumo no contexto do turismo, com o BPM como suporte teórico
para interpretações de dados verbais extraídos de comentários disponíveis no
TripAdvisor®, website do setor. Para tanto, analisaram-se as respostas verbais de
turistas-consumidores no processo de “opinar on-line” sobre componentes de
produtos turísticos (especificamente, acomodações – ACO, restaurantes – RES e
atrações – ATR). Os participantes da pesquisa foram os indivíduos (desconhecidos)
que, entre o início de fevereiro e o final de março de 2017, emitiram, no TripAdvisor®,
6.438.497 comentários distribuídos entre os 100 destinos turísticos brasileiros mais
avaliados. Descrita em dois estudos (Estudo 1 [E1] e Estudo 2 [E2]), a pesquisa de
tese se propôs a: a) extração e análise de informações verbais (comentários) dos
turistas com base na técnica computacional Análise de Sentimentos (AS); extração do
número de indicações avaliativas dos componentes do produto turístico (ACO, RES e
ATR) emitidas por turistas-consumidores com diferentes status como
TripColaboradores; extração do número de votos úteis (Likes) nos comentários; b)
descrever o responder avaliativo polarizado atribuído aos 100 destinos turísticos
avaliados e analisar interpretativamente tal responder a partir do BPM. O E1 resultou
no desenvolvimento da ferramenta SentimentALL, com foco no módulo de AS, e na
geração das variáveis primárias exploradas no E2 (n = 197). No E2, dados gerados
no E1 e medidas derivadas foram explorados (descritos em rankings e análises de
correlação) e interpretados com recurso ao referencial conceitual do BPM. De caráter
fundamentalmente exploratório, o esforço interpretativo sugeriu linhas profícuas de
pesquisa e a utilidade da integração entre conhecimentos computacionais e
psicológicos.
|
212 |
Uma abordagem de redes neurais convolucionais para an?lise de sentimento multi-lingualBecker, Willian Eduardo 24 November 2017 (has links)
Submitted by PPG Ci?ncia da Computa??o (ppgcc@pucrs.br) on 2018-09-03T14:11:33Z
No. of bitstreams: 1
WILLIAN EDUARDO BECKER_DIS.pdf: 2142751 bytes, checksum: e6501a586bb81f7cbad7fa5ef35d32f2 (MD5) / Approved for entry into archive by Sheila Dias (sheila.dias@pucrs.br) on 2018-09-04T14:43:25Z (GMT) No. of bitstreams: 1
WILLIAN EDUARDO BECKER_DIS.pdf: 2142751 bytes, checksum: e6501a586bb81f7cbad7fa5ef35d32f2 (MD5) / Made available in DSpace on 2018-09-04T14:57:29Z (GMT). No. of bitstreams: 1
WILLIAN EDUARDO BECKER_DIS.pdf: 2142751 bytes, checksum: e6501a586bb81f7cbad7fa5ef35d32f2 (MD5)
Previous issue date: 2017-11-24 / Nowadays, the use of social media has become a daily activity of our society. The huge and uninterrupt flow of information in these spaces opens up the possibility of exploring this data in different ways. Sentiment Analysis (SA) is a task that aims to obtain knowledge about the polarity of a given text relying on several techniques of Natural Language Processing, with most of solutions dealing with only one language at a time. However, approaches that are not restricted to explore only one language are more related to extract the whole knowledge and possibilities of these data. Recent approaches based on Machine Learning propose to solve SA by using mainly Deep Learning Neural Networks have obtained good results in this task. In this work is proposed three Convolutional Neural Network architectures that deal with multilingual Twitter data of four languages. The first and second proposed models are characterized by the fact they require substantially less learnable parameters than other considered baselines while are more accurate than several other Deep Neural architectures. The third proposed model is able to perform a multitask classification by identifying the polarity of a given sentences and also its language. This model reaches an accuracy of 74.43% for SA and 98.40% for Language Identification in the four-language multilingual dataset. Results confirm that proposed model is the best choice for both sentiment and language classification by outperforming the considered baselines. / A utiliza??o de redes sociais tornou-se uma atividade cotidiana na sociedade atual. Com o enorme, e ininterrupto, fluxo de informa??es geradas nestes espa?os, abre-se a possibilidade de explorar estes dados de diversas formas. A An?lise de Sentimento (AS) ? uma tarefa que visa obter conhecimento sobre a polaridade das mensagens postadas, atrav?s de diversas t?cnicas de Processamento de Linguagem Natural, onde a maioria das solu??es lida com somente um idioma de cada vez. Entretanto, abordagens que n?o restringem se a explorar somente uma l?ngua, est?o mais pr?ximas de extra?rem todo o conhecimento e possibilidades destes dados. Abordagens recentes baseadas em Aprendizado de M?quina prop?em-se a resolver a AS apoiando-se principalmente nas Redes Neurais Profundas (Deep Learning), as quais obtiveram bons resultados nesta tarefa. Neste trabalho s?o propostas tr?s arquiteturas de Redes Neurais Convolucionais que lidam com dados multi-linguais extra?dos do Twitter contendo quatro l?nguas. Os dois primeiros modelos propostos caracterizam-se pelo fato de possu?rem um total de par?metros muito menor que os demais baselines considerados, e ainda assim, obt?m resultados superiores com uma boa margem de diferen?a. O ?ltimo modelo proposto ? capaz de realizar uma classifica??o multitarefa, identificando a polaridade das senten?as e tamb?m a l?ngua. Com este ?ltimo modelo obt?m-se uma acur?cia de 74.43% para AS e 98.40% para Identifica??o da L?ngua em um dataset com quatro l?nguas, mostrando-se a melhor escolha entre todos os baselines analisados.
|
213 |
Reflexões sobre uma experiência com a produção de textos on-line: uma análise das emoções expressas por alunos de ensino fundamentalSilva, Leandro Coimbra da 29 February 2016 (has links)
Submitted by Silvana Teresinha Dornelles Studzinski (sstudzinski) on 2016-11-01T12:08:09Z
No. of bitstreams: 1
Leandro Coimbra da Silva_.pdf: 3140075 bytes, checksum: c4d33da9807ccf22dd0841313010a4d6 (MD5) / Made available in DSpace on 2016-11-01T12:08:09Z (GMT). No. of bitstreams: 1
Leandro Coimbra da Silva_.pdf: 3140075 bytes, checksum: c4d33da9807ccf22dd0841313010a4d6 (MD5)
Previous issue date: 2016-02-29 / CAPES - Coordenação de Aperfeiçoamento de Pessoal de Nível Superior / FAPERGS - Fundação de Amparo à Pesquisa do Estado do Rio Grande do Sul / Neste trabalho, pretendo identificar como os alunos de uma turma de 6º ano de uma escola pública de Novo Hamburgo expressam linguisticamente a percepção de sua primeira experiência de produção de textos a partir do Moodle. Assim, busco refletir sobre as implicações de tal percepção para o processo de ensino e aprendizagem visando ao letramento digital. Nesse cenário, então, propusemos cinco aulas em uma turma institucional do Moodle entre junho e julho de 2014. As aulas foram orientadas a partir do programa da Olimpíada Brasileira de Língua Portuguesa e as postagens dos diários do ambiente virtual compõem o nosso corpus de pesquisa. Com isso, busquei construir caminhos para entender a tríade ensino on-line/cibercultura/letramento digital através da fundamentação teórica que parte do Sistema da Avaliatividade (MARTIN; WHITE, 2005) auxiliado pela abordagem da Psicologia Cognitiva (SCHERER, 2005), na intersecção entre letramento digital (SOARES, 2002; COSCARELLI, 2014) e multiletramentos (ROJO, 2009; 2012) e nas problemáticas atinentes à cibercultura/ciberespaço (LÉVY, 1999; LEMOS, 2000-2008). A metodologia de análise está amparada nas categorias conceituais do Sistema de Avaliatividade e nas problemáticas norteadoras trazidas pela tríade em Rojo (2009), Soares (2002, 2003, 2010), Coscarelli e Santos (2007), Coscarelli (2014) Franciosi, Medeiros e Cola (2003), Tori (2009), Lemos (2000), Lévy (1999), Dias (1999), Freitas (2010) e Santaella (2013). Nesse processo, os dados foram organizados a partir da concepção de Grupos de Avaliação (GAs) (WHITELAW, GARG; ARGAMON, 2005), que são grupos coerentes de palavras que expressam em conjunto uma atitude particular. Os resultados mostram a ocorrência de 183 adjetivos distribuídos em 163 GAs, de onde extraímos para reflexão as categorias de análise (1) aula, (2) Moodle, (3) produção, (4) experiência e (5) avaliador. Nossa análise mostra que o Sistema de Avaliatividade é um método eficaz tanto de avaliação do ambiente digital, consolidando-se como lugar de ensino aprendizagem no ensino fundamental, como para o mapeamento do léxico da emoção da faixa etária que compõe o grupo analisado. Mais que isso, mostra a possibilidade de um cenário de satisfação do aluno de ensino fundamental para com o processo de letramento digital a partir do ambiente virtual de aprendizagem / This work aims to identify how students from a 6th grade class at a public school in Novo Hamburgo, Brazil, linguistically express the perception of their first text production experience from Moodle. Thus, I tried to reflect on the implications of this perception to the process of teaching and learning aiming at the digital literacy. In this setting, then, we proposed five classes in a Moodle institutional group from June to July, 2014. The classes were driven from the Brazilian Olympiad of Portuguese Language program, and the daily postings of the virtual environment constitute our research corpus. Having that, I sought to build pathways to understand the triad on-line teaching/cyberculture/digital literacy through a theoretical background which considers the Appraisal Theory (MARTIN; WHITE, 2005) aided by the approach of Cognitive Psychology (SCHERER, 2005), at the intersection between digital literacy (SOARES, 2002; COSCARELLI, 2014) and multiliteracies (ROJO, 2009; 2012), and the issues relating to cyberculture/cyberspace (LÉVY, 1999; LEMOS, 2000-2008). The analysis methodology is supported on the conceptual categories of the Appraisal Theory and on the guiding questions brought by the triad at Rojo (2009), Soares (2002, 2003, 2010), Coscarelli and Santos (2007), Coscarelli (2014) Franciosi, Medeiros and Cola (2003), Tori (2009), Lemos (2000), Lévy (1999), Dias (1999), Freitas (2010) and Santaella (2013). In this process, data is organized from the Appraisal Groups (AGs) design (WHITELAW, GARG; ARGAMON, 2005), which are coherent groups of words that together express a particular attitude. The results show the occurrence of 183 adjectives distributed in 163 AGs, from which we extract to reflect the categories of analysis (1) class, (2) Moodle, (3) production, (4) experience and (5) evaluator. Our analysis shows that the Appraisal Theory is an effective method both to evaluate the digital environment, consolidating its position as a teaching-learning place in primary education, and for the emotion lexical mapping of the age group that constitutes the analyzed group. Moreover, it shows the possibility of a satisfaction setting of an elementary school student to the process of digital literacy from the virtual learning environment.
|
214 |
THREE ESSAYS ON THE APPLICATION OF MACHINE LEARNING METHODS IN ECONOMICSLawani, Abdelaziz 01 January 2018 (has links)
Over the last decades, economics as a field has experienced a profound transformation from theoretical work toward an emphasis on empirical research (Hamermesh, 2013). One common constraint of empirical studies is the access to data, the quality of the data and the time span it covers. In general, applied studies rely on surveys, administrative or private sector data. These data are limited and rarely have universal or near universal population coverage. The growth of the internet has made available a vast amount of digital information. These big digital data are generated through social networks, sensors, and online platforms. These data account for an increasing part of the economic activity yet for economists, the availability of these big data also raises many new challenges related to the techniques needed to collect, manage, and derive knowledge from them.
The data are in general unstructured, complex, voluminous and the traditional software used for economic research are not always effective in dealing with these types of data. Machine learning is a branch of computer science that uses statistics to deal with big data. The objective of this dissertation is to reconcile machine learning and economics. It uses threes case studies to demonstrate how data freely available online can be harvested and used in economics. The dissertation uses web scraping to collect large volume of unstructured data online. It uses machine learning methods to derive information from the unstructured data and show how this information can be used to answer economic questions or address econometric issues.
The first essay shows how machine learning can be used to derive sentiments from reviews and using the sentiments as a measure for quality it examines an old economic theory: Price competition in oligopolistic markets. The essay confirms the economic theory that agents compete for price. It also confirms that the quality measure derived from sentiment analysis of the reviews is a valid proxy for quality and influences price. The second essay uses a random forest algorithm to show that reviews can be harnessed to predict consumers’ preferences. The third essay shows how properties description can be used to address an old but still actual problem in hedonic pricing models: the Omitted Variable Bias. Using the Least Absolute Shrinkage and Selection Operator (LASSO) it shows that pricing errors in hedonic models can be reduced by including the description of the properties in the models.
|
215 |
網路評價搜尋結果的正負意見分類系統 / A sentiment classification system on search results of web opinions黃泓彰, Huang, Hung Chang Unknown Date (has links)
本研究嘗試建置一個包含兩個主要功能的系統,分別是網路評價搜尋以及情感分類。在網路評價搜尋的部份,我們使用Google搜尋並蒐集一攜帶型智慧裝置(智慧型手機、平板電腦與筆記型電腦)的網路評價搜尋結果;情感分類的部分則是將搜尋結果依照對該產品的意見分類為,共有正面/負面/中立、正面/負面、正面/非正面,以及負面/非負面等四種分類方式。為了建置此系統,我們首先從知名的網路論壇Mobile01和批踢踢蒐集和攜帶型智慧裝置有關的網路文章以及產品名稱,接著以人工的方式標記每篇文章,以及部分文章中的句子的情感。本研究設計了兩個層次的情感分類實驗,我們首先從語句層次出發,以監督式機器學習法訓練將句子分為正面/負面/中立等三個類別的分類模型後,再進入文章層次,將句子的意見彙整,並同樣以監督式機器學習法訓練四種不同文章層次的分類模型:正面/負面/中立、正面/負面、正面/非正面,以及負面/非負面。我們分別選出四種分類實驗中表現最佳的模型,並用於系統建置,其中表現最佳的是分類為正面/負面的分類模型,平均的F-measure為0.87;其次是分類為負面/非負面的模型,對負面類別的F-measure為0.83;接著是分類為正面/非正面的模型,對正面類別的F-measure為0.81;表現最差的是正面/負面/中立的分類,平均的F-measure為0.77。在正面/負面分類的準確率上,本研究的表現並不壞於過去以英文為主要語言的相關研究。最後,我們也以過去不經過語句層次的分類方法進行實驗並比較,其結果發現經過語句層次的情感分類比不經過語句層次的情感分類較佳。 / In this research, we implemented a system that retrieves the search results of mobile phones, tablets, and notebooks from Google, and then classifies them as: (1) positive, negative, or neutral, (2) positive or negative, (3) positive or non-positive, (4) negative or non-negative. To build this system, first we collected some documents about mobile phones, tablets, and notebooks on two popular web forums: mobile01.com and ptt.cc. Next, a sentiment label (positive, negative, or neutral) is attached to each document and each sentence of these documents. We designed a two-level supervised sentiment classification experiment. At sentence level, we trained classifiers that classify sentences as positive, negative, or neutral. The best sentence classifier was then used at document level. At document level, the sentiment labels of the sentences in documents are used. We trained classifiers in four different classification problems: (1) positive, negative, or neutral, (2) positive vs. negative, (3) positive vs. non-positive, (4) negative vs. non-negative. The best is the second classifier with an average F-measure of 0.87. The next is the fourth classifier with an F-measure of 0.83 on negative class, and then comes with the third classifier with an F-measure of 0.81 on positive class. The last is the first classifier with an average F-measure of 0.77. Our accuracy is not worse than the past English study on the classification of positive vs. negative. Finally, we conducted another classification experiment using document-level-only classification method, and the results showed that our two-level sentiment classification (first sentence level, then document level) outperforms document-level-only sentiment classification.
|
216 |
Attitydanalys av svenska produktomdömen – behövs språkspecifika verktyg? / Sentiment Analysis of Swedish Product Reviews – Are Language-specific Tools Necessary?Glant, Oliver January 2018 (has links)
Sentiment analysis of Swedish data is often performed using English tools and machine. This thesis compares using a neural network trained on Swedish data with a corresponding one trained on English data. Two datasets were used: approximately 200,000 non-neutral Swedish reviews from the company Prisjakt Sverige AB, one of the largest annotated datasets used for Swedish sentiment analysis, and 1,000,000 non-neutral English reviews from Amazon.com. Both networks were evaluated on 11,638 randomly selected reviews, in Swedish and in English machine translation. The test set had the same overrepresentation of positive reviews as the Swedish dataset (84% were positive). The results suggest that English tools can be used with machine translation for sentiment analysis of Swedish reviews, without loss of classification ability. However, the English tool required 33% more training data to achieve maximum performance. Evaluation on the unbalanced test set required extra consideration regarding statistical measures. F1-measure turned out to be reliable only when calculated for the underrepresented class. It then showed a strong correlation with the Matthews correlation coefficient, which has been found to be more reliable. This warrants further investigation into whether the correlation is valid for all different balances, which would simplify comparison between studies. / Attitydanalys av svensk data sker i många fall genom maskinöversättning till engelska för att använda tillgängliga analysverktyg. I den här uppsatsen undersöktes skillnaden mellan användning av ett neuronnät tränat på svensk data och av motsvarande neuronnät tränat på engelsk data. Två datamängder användes: cirka 200 000 icke-neutrala svenska produktomdömen från Prisjakt Sverige AB, en av de största annoterade datamängder som använts för svensk attitydanalys, och 1 000 000 icke-neutrala engelskaproduktomdömen från Amazon.com. Båda versionerna av neuronnätet utvärderades på 11 638 slumpmässigt utvalda svenska produktomdömen, i original och maskinöversatta till engelska. Testmängden hade samma överrepresentation av positiva omdömen som den svenska datamängden (84% positiva omdömen). Resultaten tyder på att engelska verktyg med hjälp av maskinöversättning kan användas för attitydanalys av svenska produktomdömen med bibehållen klassificeringsförmåga, dock krävdes cirka 33% större träningsdata för att det engelska verktyget skulle uppnå maximal klassificeringsförmåga. Utvärdering på den obalanserade datamängden visade sig ställa särskilda krav på de statistiska mått som användes. F1-värde fungerade tillfredsställande endast när det beräknades för den underrepresenterade klassen. Det korrelerade då starkt med Matthews korrelationskoefficient, som tidigare funnits vara ett pålitligare mått. Om korrelationen gäller vid alla olika balanser skulle jämförelser mellan olika studiers resultat underlättas, något som bör undersökas.
|
217 |
Natural language processing in cross-media analysisWoldemariam, Yonas Demeke January 2018 (has links)
A cross-media analysis framework is an integrated multi-modal platform where a media resource containing different types of data such as text, images, audio and video is analyzed with metadata extractors, working jointly to contextualize the media resource. It generally provides cross-media analysis and automatic annotation, metadata publication and storage, searches and recommendation services. For on-line content providers, such services allow them to semantically enhance a media resource with the extracted metadata representing the hidden meanings and make it more efficiently searchable. Within the architecture of such frameworks, Natural Language Processing (NLP) infrastructures cover a substantial part. The NLP infrastructures include text analysis components such as a parser, named entity extraction and linking, sentiment analysis and automatic speech recognition. Since NLP tools and techniques are originally designed to operate in isolation, integrating them in cross-media frameworks and analyzing textual data extracted from multimedia sources is very challenging. Especially, the text extracted from audio-visual content lack linguistic features that potentially provide important clues for text analysis components. Thus, there is a need to develop various techniques to meet the requirements and design principles of the frameworks. In our thesis, we explore developing various methods and models satisfying text and speech analysis requirements posed by cross-media analysis frameworks. The developed methods allow the frameworks to extract linguistic knowledge of various types and predict various information such as sentiment and competence. We also attempt to enhance the multilingualism of the frameworks by designing an analysis pipeline that includes speech recognition, transliteration and named entity recognition for Amharic, that also enables the accessibility of Amharic contents on the web more efficiently. The method can potentially be extended to support other under-resourced languages.
|
218 |
Analyse des médias sociaux de santé pour évaluer la qualité de vie des patientes atteintes d’un cancer du sein / Analysis of social health media to assess the quality of life of breast cancer patientsTapi Nzali, Mike Donald 28 September 2017 (has links)
En 2015, le nombre de nouveaux cas de cancer du sein en France s'élève à 54 000. Le taux de survie 5 ans après le diagnostic est de 89 %. Si les traitements modernes permettent de sauver des vies, certains sont difficiles à supporter. De nombreux projets de recherche clinique se sont donc focalisés sur la qualité de vie (QdV) qui fait référence à la perception que les patients ont de leurs maladies et de leurs traitements. La QdV est un critère d'évaluation clinique pertinent pour évaluer les avantages et les inconvénients des traitements que ce soit pour le patient ou pour le système de santé. Dans cette thèse, nous nous intéresserons aux histoires racontées par les patients dans les médias sociaux à propos de leur santé, pour mieux comprendre leur perception de la QdV. Ce nouveau mode de communication est très prisé des patients car associé à une grande liberté du discours due notamment à l'anonymat fourni par ces sites.L’originalité de cette thèse est d’utiliser et d'étendre des méthodes de fouille de données issues des médias sociaux pour la langue Française. Les contributions de ce travail sont les suivantes : (1) construction d’un vocabulaire patient/médecin ; (2) détection des thèmes discutés par les patients; (3) analyse des sentiments des messages postés par les patients et (4) mise en relation des différentes contributions citées.Dans un premier temps, nous avons utilisé les textes des patients pour construire un vocabulaire patient/médecin spécifique au domaine du cancer du sein, en recueillant divers types d'expressions non-expertes liées à la maladie, puis en les liant à des termes biomédicaux utilisés par les professionnels de la santé. Nous avons combiné plusieurs méthodes de la littérature basées sur des approches linguistiques et statistiques. Pour évaluer les relations obtenues, nous utilisons des validations automatiques et manuelles. Nous avons ensuite transformé la ressource construite dans un format lisible par l’être humain et par l’ordinateur en créant une ontologie SKOS, laquelle a été intégrée dans la plateforme BioPortal.Dans un deuxième temps, nous avons utilisé et étendu des méthodes de la littérature afin de détecter les différents thèmes discutés par les patients dans les médias sociaux et de les relier aux dimensions fonctionnelles et symptomatiques des auto-questionnaires de QdV (EORTC QLQ-C30 et EORTC QLQ-BR23). Afin de détecter les thèmes, nous avons appliqué le modèle d’apprentissage non supervisé LDA avec des prétraitements pertinents. Ensuite, nous avons proposé une méthode permettant de calculer automatiquement la similarité entre les thèmes détectés et les items des auto-questionnaires de QdV. Nous avons ainsi déterminé de nouveaux thèmes complémentaires à ceux déjà présents dans les questionnaires. Ce travail a ainsi mis en évidence que les données provenant des forums de santé sont susceptibles d'être utilisées pour mener une étude complémentaire de la QdV.Dans un troisième temps, nous nous sommes focalisés sur l’extraction de sentiments (polarité et émotions). Pour cela, nous avons évalué différentes méthodes et ressources pour la classification de sentiments en Français. Ces expérimentations ont permis de déterminer les caractéristiques utiles dans la classification de sentiments pour différents types de textes, y compris les textes provenant des forums de santé. Finalement, nous avons utilisé les différentes méthodes proposées dans cette thèse pour quantifier les thèmes et les sentiments identifiés dans les médias sociaux de santé.De manière générale, ces travaux ont ouvert des perspectives prometteuses sur diverses tâches d'analyse des médias sociaux pour la langue française et en particulier pour étudier la QdV des patients à partir des forums de santé. / In 2015, the number of new cases of breast cancer in France is 54,000.The survival rate after 5 years of cancer diagnosis is 89%.If the modern treatments allow to save lives, some are difficult to bear. Many clinical research projects have therefore focused on quality of life (QoL), which refers to the perception that patients have on their diseases and their treatments.QoL is an evaluation method of alternative clinical criterion for assessing the advantages and disadvantages of treatments for the patient and the health system. In this thesis, we will focus on the patients stories in social media dealing with their health. The aim is to better understand their perception of QoL. This new mode of communication is very popular among patients because it is associated with a great freedom of speech, induced by the anonymity provided by these websites.The originality of this thesis is to use and extend social media mining methods for the French language. The main contributions of this work are: (1) construction of a patient/doctor vocabulary; (2) detection of topics discussed by patients; (3) analysis of the feelings of messages posted by patients and (4) combinaison of the different contributions to quantify patients discourse.Firstly, we used the patient's texts to construct a patient/doctor vocabulary, specific to the field of breast cancer, by collecting various types of non-experts' expressions related to the disease, linking them to the biomedical terms used by health care professionals. We combined several methods of the literature based on linguistic and statistical approaches. To evaluate the relationships, we used automatic and manual validations. Then, we transformed the constructed resource into human-readable format and machine-readable format by creating a SKOS ontology, which is integrated into the BioPortal platform.Secondly, we used and extended literature methods to detect the different topics discussed by patients in social media and to relate them to the functional and symptomatic dimensions of the QoL questionnaires (EORTC QLQ-C30 and EORTC QLQ-BR23). In order to detect the topics discussed by patients, we applied the unsupervised learning LDA model with relevant preprocessing. Then, we applied a customized Jaccard coefficient to automatically compute the similarity distance between the topics detected with LDA and the items in the auto-questionnaires. Thus, we detected new emerging topics from social media that could be used to complete actual QoL questionnaires. This work confirms that social media can be an important source of information for the study of the QoL in the field of cancer.Thirdly, we focused on the extraction of sentiments (polarity and emotions). For this, we evaluated different methods and resources for the classification of feelings in French.These experiments aim to determine useful characteristics in the classification of feelings for different types of texts, including texts from health forums.Finally, we used the different methods proposed in this thesis to quantify the topics and feelings identified in the health social media.In general, this work has opened promising perspectives on various tasks of social media analysis for the French language and in particular the study of the QoL of patients from the health forums.
|
219 |
Tecnologias de linguagem aplicadas à percepção social : detecção de emoções em redes sociais sobre a seca em São PauloRodriguez, Nathália Ferrete January 2016 (has links)
Orientadora: Profª. Dra. Margarethe Born Steinberger-Elias / Dissertação (mestrado) - Universidade Federal do ABC, Programa de Pós-Graduação em Engenharia da Informação, 2016. / Redes sociais têm se tornado um meio comum para as pessoas se expressarem, especialmente emoções e sentimentos sobre assuntos diversos. A análise destes dados através da Inteligência Artificial e de PLN (Processamento de Linguagem Natural) possibilita a extração automática de conhecimento neste imenso volume de informação, inclusive do sentimento social sobre assuntos diversos. O conhecimento extraído pode ser utilizado para entender e antecipar expectativas a respeito de fatos, pessoas, produtos e serviços. Esta pesquisa interdisciplinar envolvendo Linguística, Computação e Ciências Sociais tem como objetivo principal a detecção automática de emoções sobre as mensagens e entidades mencionadas nos textos do Facebook e do Twitter no domínio da seca ocorrida no estado de São Paulo durante os anos de 2013, 2014 e 2015. O método de detecção automática de emoções proposto utiliza conceitos de PLN, além de métodos estatísticos para descoberta de padrões. Ele não se restringe ao domínio da seca, podendo ser estudado para aplicação em outros domínios. Para definição e aplicação do método foram utilizadas diversas ferramentas computacionais. / Social networks have become a common way for people to express themselves, especially emotions and feelings about various subjects. The analysis of these data by Artificial Intelligence and NLP (Natural Language Processing) allows automatic extraction of knowledge in this huge volume of information, including social sentiment on various subjects. The extracted knowledge can be used to understand and anticipate expectations of facts, people, goods and services. This interdisciplinary research involving Linguistics, Computing and Social Sciences aims automatic emotion detection of the messages and entities mentioned in Facebook and Twitter texts in the field of drought in the state of São Paulo during the years 2013, 2014 and 2015. The automatic emotion detecting method uses PLN concepts, as well as statistical methods for discovering patterns. It is not restricted to the field of dry and can be studied for application in other areas. For definition and application of the method were used various computational tools.
|
220 |
IoT on Twitter : A Mixed Methods StudyÅkerlund, Mathilda January 2017 (has links)
No description available.
|
Page generated in 0.1544 seconds