51 |
A Topics Analysis Model for Health Insurance ClaimsWebb, Jared Anthony 18 October 2013 (has links) (PDF)
Mathematical probability has a rich theory and powerful applications. Of particular note is the Markov chain Monte Carlo (MCMC) method for sampling from high dimensional distributions that may not admit a naive analysis. We develop the theory of the MCMC method from first principles and prove its relevance. We also define a Bayesian hierarchical model for generating data. By understanding how data are generated we may infer hidden structure about these models. We use a specific MCMC method called a Gibbs' sampler to discover topic distributions in a hierarchical Bayesian model called Topics Over Time. We propose an innovative use of this model to discover disease and treatment topics in a corpus of health insurance claims data. By representing individuals as mixtures of topics, we are able to consider their future costs on an individual level rather than as part of a large collective.
|
52 |
Elucidating AI Policy Discourse : Uncovering Themes Through Latent Dirichlet AllocationZetterblom, Patrik January 2023 (has links)
This thesis embarks on a journey to investigate the discourse contained within the policy documents examined by utilizing the topic modeling technique labeled Latent Dirichlet Allocation. The aforementioned investigation will be conducted through the theoretical lens of Systems Theory and Discourse Analysis Theory. The thesis aims to identify the core constituents, form a consensus and enrich the scientific communities’ understanding regarding how these core constituents alongside the discourse contained within the policy documents shape the overall landscape of AI governance in continental Europe. Furthermore, prior to an in depth investigation of the methods and theoretical frameworks mentioned above commences, an introduction is presented to give additional insight to the background of AI & the problem formulation. The results of this study reveal 8 inferred themes. These inferred themes are then thoroughly discussed in alignment with the principles and concepts set forth by the theoretical frameworks. The thesis then provides a conclusive penultimate subchapter that encapsulates the key points and directly addresses the research question before highlighting possible future research opportunities.
|
53 |
Qui est à blâmer pour la pandémie de la COVID-19? : analyse des perceptions de la responsabilité pendant la crise et évaluation de l’Allocation de Dirichlet latente dans l’étude de questions ouvertesChevalier, Marianne 08 1900 (has links)
La crise de la COVID-19 a provoqué des bouleversements majeurs dans la vie des populations du monde entier et a suscité des réactions sociales importantes. La propagation du virus contagieux de la COVID-19 a été rapidement suivie d’une « épidémie » d’explications et de discours tentant de donner un sens à la crise. Lorsqu’un événement dévastateur se produit, les gens se demandent ce qui se passe et ce que cela signifie. Le premier but de cette recherche est de suivre l’évolution de la dynamique du blâme et de la désignation de boucs émissaires au fur et à mesure que la pandémie de COVID-19 se déroule. Le deuxième but de cette recherche est d’évaluer l’intérêt d’utiliser l’Allocation de Dirichlet latente (ADL), un modèle de mélange/classe latente génératif bayésien, dans l’analyse de questions ouvertes. Les données ont été recueillies auprès d’un échantillon représentatif de 3617 Canadiens selon un devis de recherche longitudinal intensif (avec 12 temps de mesure). Neuf thématiques ont été identifiées, dont six sont récurrentes à différents temps de mesure. Les résultats indiquent que, durant les premiers mois de la pandémie, les Canadiens blâment majoritairement les collectivités distantes, telles que la Chine et les marchés aux animaux vivants (wet markets). Au fil du temps, ils blâment de plus en plus les collectivités locales, tels que les individus qui ne respectent pas les mesures sanitaires. Cette recherche met en évidence le rôle de la proximité géographique et de l’évaluation du risque dans la manière dont le public perçoit la pandémie. / The COVID-19 crisis has caused major disruptions in the lives of
populations around the globe and provoked important social responses. The
spread of the contagious COVID-19 virus was quickly followed by an outbreak
of explanations and discourses trying to make sense of the crisis. When
devastating events occur, people ask themselves what happened, why the event
happened and what it means. The first goal of this paper is to track the changing
dynamics of blame attribution and scapegoating as the COVID-19 pandemic
unfolds. The second goal of this paper is to evaluate the relevance of LDA (Latent
Dirichlet Allocation), a Bayesian generative mixture/latent class model, to
analyze open-ended survey responses. Data was collected from a representative
sample of 3,617 Canadians following an intensive longitudinal research design
(with 12 waves). Nine topics were identified, six of which were recurring.
Canadians mostly blame distant collectives in the early months of the pandemic,
especially China and wet markets. Over time, they increasingly blame local
collectives, such as individuals who do not comply with sanitary measures. This
study highlights the role of geographic proximity and perceived risk in shaping
public perceptions of the pandemic.
|
54 |
Cadrage en période de crise : réponses à la COVID-19 d’influenceurs de la droite radicale au QuébecEl Khalil, Khaoula 07 1900 (has links)
La prise en compte du cadrage fait par les influenceurs de la droite radicale et du contenu de leur discours reste peu explorée. Ces contenus sont particulièrement préoccupants lorsqu’ils sont produits par des « influenceurs » qui auraient non seulement un pouvoir social sur leurs nombreux adeptes engagés, mais qui susciteraient aussi une opposition souvent virulente envers les autorités. Certains affirment que la recherche a manqué d’études empiriques systématiques sur le sujet et l’étude de la variation de cadre serait une piste intéressante pour de futures recherches (Benford 1997). Il y a donc un besoin pressant de développer une compréhension rigoureuse de la façon dont des crises mondiales peuvent changer la façon dont certains influenceurs de la droite radicale cadrent leurs discours. En utilisant des données originales sur cinq influenceurs de la droite radicale au Québec sur la plateforme Twitter de janvier 2020 à avril 2022, nous relevons d’abord les sujets prédominants dans le discours des influenceurs de la droite radicale. Grâce à une analyse thématique par LDA, nous confirmons que sept sujets dominent le discours des influenceurs de la droite radicale durant la pandémie de COVID-19, soit les élites, la gestion de crise, les médias, la fausse pandémie, la conspiration, le gouvernement et la liberté. Deuxièmement, nous montrons que la crise sanitaire de COVID-19 a poussé les influenceurs de la droite radicale à changer leur discours et à adopter trois « cadres de crise » qui présentent la COVID-19 comme directement liée aux concepts de gouvernance, de conspiration et de liberté. / The framing done by radical right influencers and the content of their discourse remain underexplored. Such content is of serious concern when it is produced by "influencers" who would not only have social power over their many committed followers, but also would generate often virulent opposition to the authorities. Some argue that research has lacked systematic empirical studies on the topic and the study of frame variation would be an interesting avenue for future research (Benford 1997). There is thus a pressing need to develop a rigorous understanding of how global crises can change the way some radical right-wing influencers frame their discourse. Using original data about five radical right influencers in Quebec on the Twitter platform from January 2020 to April 2022, we first identify the predominant topics in radical right influencers' discourse. Through a thematic analysis by LDA, we confirm that six topics dominate the discourse of radical right influencers during the pandemic of COVID-19: elites, crisis management, media, fake pandemic, conspiracy, and freedom. Second, we show that the COVID-19 health crisis pushed radical right influencers to change their discourse and adopt three "crisis frames" that present COVID-19 as directly related to the concepts of conspiracy, governance, and freedom.
|
55 |
以推敲可能性模式探討影響評論幫助性之因素 / Factors Affecting Review Helpfulness : An Elaboration Likelihood Model Perspective熊耿得, Hsiung, Keng-Te Unknown Date (has links)
在電子商務中,評論會影響消費者的購買決策,透過評論幫助性可以篩選出關鍵的評論,以利消費者進行決策。本研究以推敲可能性模式作為研究架構,透過文字探勘挖掘評論的文本特性來探討影響幫助性之要素,中央線索除了評論長度與可讀性外,利用LDA主題模型衡量評論主題廣度;周邊線索則是透過環狀情緒模型進行情感分析,並透過評論者排名來衡量來源可信度,利用亞馬遜商店中的資料進行驗證分析。結果發現,消費者在判斷評論幫助性時,會參考中央以及周邊線索。具備高論點品質的中央線索將有效提升評論幫助性;周邊線索整體而言,證實了社會中存在負向偏誤,具備喚起度的負向情感較容易提升評論幫助性,而評論是否被認為有幫助確實會受到評論者的排名所影響。進階分析結果顯示,周邊的情感效果會受到評論者排名高低的影響,前段評論者應保持中立避免帶有個人情緒;中段評論者的評論幫助性會隨著情緒喚起度而增加;後段評論者則需要增加自身的負向情感,才能夠對於評論幫助性有正向影響。 / Online reviews are important factors in consumers’ purchase decision. The helpfulness of reviews allows consumers to quickly identify useful reviews. The purpose of this study is to investigate the nature of online reviews that affect their helpfulness through the lens of the elaboration likelihood model. For the central cues, we adopt latent dirichlet allocation to measure review breadth in addition to review length and review readability. For the peripheral cues, we use the sentiment analysis based on the circumplex model to catch the emotion effect and use the ranking of the reviewers to measure the source credibility. We used a dataset collected from Amazon.com to evaluate our model. The result suggests that consumers focus both central and peripheral cues when they read reviews. Consumers care about the length, breadth and readability of reviews associated with the central route, and the emotional effects associated with the peripheral route. In the advanced research, we split our sample into 3 groups by their ranking of the reviewers. We found that the top reviewers should keep neutral and avoid personal feelings to make their reviews more helpful; the middle reviewers can use more arousal words to improve their review helpfulness; the bottom reviewers must increase their emotional valence strength, especially the negative emotion to higher the perceived review helpfulness.
|
56 |
Cluster Identification : Topic Models, Matrix Factorization And Concept Association NetworksArun, R 07 1900 (has links) (PDF)
The problem of identifying clusters arising in the context of topic models and related approaches is important in the area of machine learning. The problem concerning traversals on Concept Association Networks is of great interest in the area of cognitive modelling. Cluster identification is the problem of finding the right number of clusters in a given set of points(or a dataset) in different settings including topic models and matrix factorization algorithms. Traversals in Concept Association Networks provide useful insights into cognitive modelling and performance. First, We consider the problem of authorship attribution of stylometry and the problem of cluster identification for topic models. For the problem of authorship attribution we show empirically that by using stop-words as stylistic features of an author, vectors obtained from the Latent Dirichlet Allocation (LDA) , outperforms other classifiers. Topics obtained by this method are generally abstract and it may not be possible to identify the cohesiveness of words falling in the same topic by mere manual inspection. Hence it is difficult to determine if the chosen number of topics is optimal. We next address this issue. We propose a new measure for topics arising out of LDA based on the divergence between the singular value distribution and the L1 norm distribution of the document-topic and topic-word matrices, respectively. It is shown that under certain assumptions, this measure can be used to find the right number of topics. Next we consider the Non-negative Matrix Factorization(NMF) approach for clustering documents. We propose entropy based regularization for a variant of the NMF with row-stochastic constraints on the component matrices. It is shown that when topic-splitting occurs, (i.e when an extra topic is required) an existing topic vector splits into two and the divergence term in the cost function decreases whereas the entropy term increases leading to a regularization.
Next we consider the problem of clustering in Concept Association Networks(CAN). The CAN are generic graph models of relationships between abstract concepts. We propose a simple clustering algorithm which takes into account the complex network properties of CAN. The performance of the algorithm is compared with that of the graph-cut based spectral clustering algorithm. In addition, we study the properties of traversals by human participants on CAN. We obtain experimental results contrasting these traversals with those obtained from (i) random walk simulations and (ii) shortest path algorithms.
|
57 |
Inference and applications for topic models / Inférence et applications pour les modèles thématiquesDupuy, Christophe 30 June 2017 (has links)
La plupart des systèmes de recommandation actuels se base sur des évaluations sous forme de notes (i.e., chiffre entre 0 et 5) pour conseiller un contenu (film, restaurant...) à un utilisateur. Ce dernier a souvent la possibilité de commenter ce contenu sous forme de texte en plus de l'évaluer. Il est difficile d'extraire de l'information d'un texte brut tandis qu'une simple note contient peu d'information sur le contenu et l'utilisateur. Dans cette thèse, nous tentons de suggérer à l'utilisateur un texte lisible personnalisé pour l'aider à se faire rapidement une opinion à propos d'un contenu. Plus spécifiquement, nous construisons d'abord un modèle thématique prédisant une description de film personnalisée à partir de commentaires textuels. Notre modèle sépare les thèmes qualitatifs (i.e., véhiculant une opinion) des thèmes descriptifs en combinant des commentaires textuels et des notes sous forme de nombres dans un modèle probabiliste joint. Nous évaluons notre modèle sur une base de données IMDB et illustrons ses performances à travers la comparaison de thèmes. Nous étudions ensuite l'inférence de paramètres dans des modèles à variables latentes à grande échelle, incluant la plupart des modèles thématiques. Nous proposons un traitement unifié de l'inférence en ligne pour les modèles à variables latentes à partir de familles exponentielles non-canoniques et faisons explicitement apparaître les liens existants entre plusieurs méthodes fréquentistes et Bayesiennes proposées auparavant. Nous proposons aussi une nouvelle méthode d'inférence pour l'estimation fréquentiste des paramètres qui adapte les méthodes MCMC à l'inférence en ligne des modèles à variables latentes en utilisant proprement un échantillonnage de Gibbs local. Pour le modèle thématique d'allocation de Dirichlet latente, nous fournissons une vaste série d'expériences et de comparaisons avec des travaux existants dans laquelle notre nouvelle approche est plus performante que les méthodes proposées auparavant. Enfin, nous proposons une nouvelle classe de processus ponctuels déterminantaux (PPD) qui peut être manipulée pour l'inférence et l'apprentissage de paramètres en un temps potentiellement sous-linéaire en le nombre d'objets. Cette classe, basée sur une factorisation spécifique de faible rang du noyau marginal, est particulièrement adaptée à une sous-classe de PPD continus et de PPD définis sur un nombre exponentiel d'objets. Nous appliquons cette classe à la modélisation de documents textuels comme échantillons d'un PPD sur les phrases et proposons une formulation du maximum de vraisemblance conditionnel pour modéliser les proportions de thèmes, ce qui est rendu possible sans aucune approximation avec notre classe de PPD. Nous présentons une application à la synthèse de documents avec un PPD sur 2 à la puissance 500 objets, où les résumés sont composés de phrases lisibles. / Most of current recommendation systems are based on ratings (i.e. numbers between 0 and 5) and try to suggest a content (movie, restaurant...) to a user. These systems usually allow users to provide a text review for this content in addition to ratings. It is hard to extract useful information from raw text while a rating does not contain much information on the content and the user. In this thesis, we tackle the problem of suggesting personalized readable text to users to help them make a quick decision about a content. More specifically, we first build a topic model that predicts personalized movie description from text reviews. Our model extracts distinct qualitative (i.e., which convey opinion) and descriptive topics by combining text reviews and movie ratings in a joint probabilistic model. We evaluate our model on an IMDB dataset and illustrate its performance through comparison of topics. We then study parameter inference in large-scale latent variable models, that include most topic models. We propose a unified treatment of online inference for latent variable models from a non-canonical exponential family, and draw explicit links between several previously proposed frequentist or Bayesian methods. We also propose a novel inference method for the frequentist estimation of parameters, that adapts MCMC methods to online inference of latent variable models with the proper use of local Gibbs sampling.~For the specific latent Dirichlet allocation topic model, we provide an extensive set of experiments and comparisons with existing work, where our new approach outperforms all previously proposed methods. Finally, we propose a new class of determinantal point processes (DPPs) which can be manipulated for inference and parameter learning in potentially sublinear time in the number of items. This class, based on a specific low-rank factorization of the marginal kernel, is particularly suited to a subclass of continuous DPPs and DPPs defined on exponentially many items. We apply this new class to modelling text documents as sampling a DPP of sentences, and propose a conditional maximum likelihood formulation to model topic proportions, which is made possible with no approximation for our class of DPPs. We present an application to document summarization with a DPP on 2 to the power 500 items, where the summaries are composed of readable sentences.
|
58 |
SPECIES- TO COMMUNITY-LEVEL RESPONSES TO CLIMATE CHANGE IN EASTERN U.S. FORESTSJonathan A Knott (8797934) 12 October 2021 (has links)
<p>Climate change has dramatically altered the ecological landscape of the eastern U.S., leading to shifts in phenological events and redistribution of tree species. However, shifts in phenology and species distributions have implications for the productivity of different populations and <a></a>the communities these species are a part of. Here, I utilized two studies to quantify the effects of climate change on forests of the eastern U.S. First, I used phenology observations at a common garden of 28 populations of northern red oak (<i>Quercus rubra</i>) across seven years to assess shifts in phenology in response to warming, identify population differences in sensitivity to warming, and correlate sensitivity to the productivity of the populations. Second, I utilized data from the USDA Forest Service’s Forest Inventory and Analysis Program to identify forest communities of the eastern U.S., assess shifts in their species compositions and spatial distributions, and determine which climate-related variables are most associated with changes at the community level. In the first study, I found that populations were shifting their spring phenology in response to warming, with the greatest sensitivity in populations from warmer, wetter climates. However, these populations with higher sensitivity did not have the highest productivity; rather, populations closer to the common garden with intermediate levels of sensitivity had the highest productivity. In the second study, I found that there were 12 regional forest communities of the eastern U.S., which varied in the amount their species composition shifted over the last three decades. Additionally, all 12 communities shifted their spatial distributions, but their shifts were not correlated with the distance and direction that climate change predicted them to shift. Finally, areas with the highest changes across all 12 communities were associated with warmer, wetter, lower temperature-variable climates generally in the southeastern U.S. Taken together, these studies provide insight into the ways in which forests are responding to climate change and have implications for the management and sustainability of forests in a continuously changing global environment.</p>
|
59 |
A Confirmatory Analysis for Automating the Evaluation of Motivation Letters to Emulate Human JudgmentMercado Salazar, Jorge Anibal, Rana, S M Masud January 2021 (has links)
Manually reading, evaluating, and scoring motivation letters as part of the admissions process is a time-consuming and tedious task for Dalarna University's program managers. An automated scoring system would provide them with relief as well as the ability to make much faster decisions when selecting applicants for admission. The aim of this thesis was to analyse current human judgment and attempt to emulate it using machine learning techniques. We used various topic modelling methods, such as Latent Dirichlet Allocation and Non-Negative Matrix Factorization, to find the most interpretable topics, build a bridge between topics and human-defined factors, and finally evaluate model performance by predicting scoring values and finding accuracy using logistic regression, discriminant analysis, and other classification algorithms. Despite the fact that we were able to discover the meaning of almost all human factors on our own, the topic models' accuracy in predicting overall score was unexpectedly low. Setting a threshold on overall score to select applicants for admission yielded a good overall accuracy result, but did not yield a good consistent precision or recall score. During our investigation, we attempted to determine the possible causes of these unexpected results and discovered that not only is topic modelling limitation to blame, but human bias also plays a role.
|
60 |
Hierarchical Text Topic Modeling with Applications in Social Media-Enabled Cyber Maintenance Decision Analysis and Quality Hypothesis GenerationSUI, ZHENHUAN 27 October 2017 (has links)
No description available.
|
Page generated in 0.5156 seconds