Global ETD Search

11	Tumor Gene Expression Purification Using Infinite Mixture Topic Models Deshwar, Amit Gulab 11 July 2013 (has links) There is significant interest in using gene expression measurements to aid in the personalization of medical treatment. The presence of significant normal tissue contamination in tumor samples makes it difficult to use tumor expression measurements to predict clinical variables and treatment response. I present a probabilistic method, TMMpure, to infer the expression profile of the cancerous tissue using a modified topic model that contains a hierarchical Dirichlet process prior on the cancer profiles. I demonstrate that TMMpure is able to infer the expression profile of cancerous tissue and improves the power of predictive models for clinical variables using expression profiles. Bayesian methods Gene expression purficiation Bayesian Nonparametric Topic models 0984 0800 0544
12	Examination of novel statistical designs for phase II and phase III clinical trials Ayanlowo, Ayanbola Olajumoke. January 2007 (has links) (PDF) Thesis (Ph. D.)--University of Alabama at Birmingham, 2007. / Title from first page of PDF file (viewed Oct. 13, 2008). Includes bibliographical references.
13	Modèles thématiques pour la découverte non supervisée de points de vue sur le Web / Topic Models for Unsupervised Discovery of Viewpoints on the Web Thonet, Thibaut 23 November 2017 (has links) Les plateformes en ligne telles que les blogs et les réseaux sociaux permettent aux internautes de s'exprimer sur des sujets d'une grande variété (produits commerciaux, politique, services, etc.). Cet important volume de données d'opinions peut être exploré et exploité grâce à des techniques de fouille de texte connues sous le nom de fouille d'opinions ou analyse de sentiments. Contrairement à la majorité des travaux actuels en fouille d'opinions, qui se focalisent sur les opinions simplement positives ou négatives (ou un intermédiaire entre ces deux extrêmes), nous nous intéressons dans cette thèse aux points de vue. La fouille de point de vue généralise l'opinion au delà de son acception usuelle liée à la polarité (positive ou négative) et permet l'étude d'opinions exprimées plus subtilement, telles que les opinions politiques. Nous proposons dans cette thèse des approches non supervisées - ne nécessitant aucune annotation préalable - basées sur des modèles thématiques probabilistes afin de découvrir simultanément les thèmes et les points de vue exprimés dans des corpus de textes d'opinion. Dans notre première contribution, nous avons exploré l'idée de différencier mots d'opinions (spécifiques à la fois à un point de vue et à un thème) et mots thématiques (dépendants du thème mais neutres vis-à-vis des différents points de vue) en nous basant sur les parties de discours, inspirée par des pratiques similaires dans la littérature de fouille d'opinions classique - restreinte aux opinions positives et négatives. Notre seconde contribution se focalise quant à elle sur les points de vue exprimés sur les réseaux sociaux. Notre objectif est ici d'analyser dans quelle mesure l'utilisation des interactions entre utilisateurs, en outre de leur contenu textuel généré, est bénéfique à l'identification de leurs points de vue. Nos différentes contributions ont été évaluées et comparées à l'état de l'art sur des collections de documents réels. / The advent of online platforms such as weblogs and social networking sites provided Internet users with an unprecedented means to express their opinions on a wide range of topics, including policy and commercial products. This large volume of opinionated data can be explored and exploited through text mining techniques known as opinion mining or sentiment analysis. Contrarily to traditional opinion mining work which mostly focuses on positive and negative opinions (or an intermediate in-between), we study a more challenging type of opinions: viewpoints. Viewpoint mining reaches beyond polarity-based opinions (positive/negative) and enables the analysis of more subtle opinions such as political opinions. In this thesis, we proposed unsupervised approaches – i.e., approaches which do not require any labeled data – based on probabilistic topic models to jointly discover topics and viewpoints expressed in opinionated data. In our first contribution, we explored the idea of separating opinion words (specific to both viewpoints and topics) from topical, neutral words based on parts of speech, inspired by similar practices in the litterature of non viewpoint-related opinion mining. Our second contribution tackles viewpoints expressed by social network users. We aimed to study to what extent social interactions between users – in addition to text content – can be beneficial to identify users' viewpoints. Our different contributions were evaluated and benchmarked against state-of-the-art baselines on real-world datasets Fouille d'opinions Fouille de points de vue Modèles thématiques Opinion mining Viewpoint mining Topic models
14	Probabilistic Topic Models for Human Emotion Analysis January 2015 (has links) abstract: While discrete emotions like joy, anger, disgust etc. are quite popular, continuous emotion dimensions like arousal and valence are gaining popularity within the research community due to an increase in the availability of datasets annotated with these emotions. Unlike the discrete emotions, continuous emotions allow modeling of subtle and complex affect dimensions but are difficult to predict. Dimension reduction techniques form the core of emotion recognition systems and help create a new feature space that is more helpful in predicting emotions. But these techniques do not necessarily guarantee a better predictive capability as most of them are unsupervised, especially in regression learning. In emotion recognition literature, supervised dimension reduction techniques have not been explored much and in this work a solution is provided through probabilistic topic models. Topic models provide a strong probabilistic framework to embed new learning paradigms and modalities. In this thesis, the graphical structure of Latent Dirichlet Allocation has been explored and new models tuned to emotion recognition and change detection have been built. In this work, it has been shown that the double mixture structure of topic models helps 1) to visualize feature patterns, and 2) to project features onto a topic simplex that is more predictive of human emotions, when compared to popular techniques like PCA and KernelPCA. Traditionally, topic models have been used on quantized features but in this work, a continuous topic model called the Dirichlet Gaussian Mixture model has been proposed. Evaluation of DGMM has shown that while modeling videos, performance of LDA models can be replicated even without quantizing the features. Until now, topic models have not been explored in a supervised context of video analysis and thus a Regularized supervised topic model (RSLDA) that models video and audio features is introduced. RSLDA learning algorithm performs both dimension reduction and regularized linear regression simultaneously, and has outperformed supervised dimension reduction techniques like SPCA and Correlation based feature selection algorithms. In a first of its kind, two new topic models, Adaptive temporal topic model (ATTM) and SLDA for change detection (SLDACD) have been developed for predicting concept drift in time series data. These models do not assume independence of consecutive frames and outperform traditional topic models in detecting local and global changes respectively. / Dissertation/Thesis / Doctoral Dissertation Computer Science 2015 Computer science Arousal and Valence Change Detection Dimension Reduction Emotion Recognition Regularized Topic Model Topic Models
15	A Quality Criteria Based Evaluation of Topic Models Sathi, Veer Reddy, Ramanujapura, Jai Simha January 2016 (has links) Context. Software testing is the process, where a particular software product, or a system is executed, in order to find out the bugs, or issues which may otherwise degrade its performance. Software testing is usually done based on pre-defined test cases. A test case can be defined as a set of terms, or conditions that are used by the software testers to determine, if a particular system that is under test operates as it is supposed to or not. However, in numerous situations, test cases can be so many that executing each and every test case is practically impossible, as there may be many constraints. This causes the testers to prioritize the functions that are to be tested. This is where the ability of topic models can be exploited. Topic models are unsupervised machine learning algorithms that can explore large corpora of data, and classify them by identifying the hidden thematic structure in those corpora. Using topic models for test case prioritization can save a lot of time and resources. Objectives. In our study, we provide an overview of the amount of research that has been done in relation to topic models. We want to uncover various quality criteria, evaluation methods, and metrics that can be used to evaluate the topic models. Furthermore, we would also like to compare the performance of two topic models that are optimized for different quality criteria, on a particular interpretability task, and thereby determine the topic model that produces the best results for that task. Methods. A systematic mapping study was performed to gain an overview of the previous research that has been done on the evaluation of topic models. The mapping study focused on identifying quality criteria, evaluation methods, and metrics that have been used to evaluate topic models. The results of mapping study were then used to identify the most used quality criteria. The evaluation methods related to those criteria were then used to generate two optimized topic models. An experiment was conducted, where the topics generated from those two topic models were provided to a group of 20 subjects. The task was designed, so as to evaluate the interpretability of the generated topics. The performance of the two topic models was then compared by using the Precision, Recall, and F-measure. Results. Based on the results obtained from the mapping study, Latent Dirichlet Allocation (LDA) was found to be the most widely used topic model. Two LDA topic models were created, optimizing one for the quality criterion Generalizability (TG), and one for Interpretability (TI); using the Perplexity, and Point-wise Mutual Information (PMI) measures respectively. For the selected metrics, TI showed better performance, in Precision and F-measure, than TG. However, the performance of both TI and TG was comparable in case of Recall. The total run time of TI was also found to be significantly high than TG. The run time of TI was 46 hours, and 35 minutes, whereas for TG it was 3 hours, and 30 minutes.Conclusions. Looking at the F-measure, it can be concluded that the interpretability topic model (TI) performs better than the generalizability topic model (TG). However, while TI performed better in precision, Conclusions. Looking at the F-measure, it can be concluded that the interpretability topic model (TI) performs better than the generalizability topic model (TG). However, while TI performed better in precision, recall was comparable. Furthermore, the computational cost to create TI is significantly higher than for TG. Hence, we conclude that, the selection of the topic model optimization should be based on the aim of the task the model is used for. If the task requires high interpretability of the model, and precision is important, such as for the prioritization of test cases based on content, then TI would be the right choice, provided time is not a limiting factor. However, if the task aims at generating topics that provide a basic understanding of the concepts (i.e., interpretability is not a high priority), then TG is the most suitable choice; thus making it more suitable for time critical tasks. Topic models Topic interpretability Test cases Latent Dirichlet Allocation Topic model optimization Software Engineering Programvaruteknik
16	Sur la méthode des moments pour l'estimation des modèles à variables latentes / On the method of moments for estimation in latent linear models Podosinnikova, Anastasia 01 December 2016 (has links) Les modèles linéaires latents sont des modèles statistique puissants pour extraire la structure latente utile à partir de données non structurées par ailleurs. Ces modèles sont utiles dans de nombreuses applications telles que le traitement automatique du langage naturel et la vision artificielle. Pourtant, l'estimation et l'inférence sont souvent impossibles en temps polynomial pour de nombreux modèles linéaires latents et on doit utiliser des méthodes approximatives pour lesquelles il est difficile de récupérer les paramètres. Plusieurs approches, introduites récemment, utilisent la méthode des moments. Elles permettent de retrouver les paramètres dans le cadre idéalisé d'un échantillon de données infini tiré selon certains modèles, mais ils viennent souvent avec des garanties théoriques dans les cas où ce n'est pas exactement satisfait. Dans cette thèse, nous nous concentrons sur les méthodes d'estimation fondées sur l'appariement de moment pour différents modèles linéaires latents. L'utilisation d'un lien étroit avec l'analyse en composantes indépendantes, qui est un outil bien étudié par la communauté du traitement du signal, nous présentons plusieurs modèles semiparamétriques pour la modélisation thématique et dans un contexte multi-vues. Nous présentons des méthodes à base de moment ainsi que des algorithmes pour l'estimation dans ces modèles, et nous prouvons pour ces méthodes des résultats de complexité améliorée par rapport aux méthodes existantes. Nous donnons également des garanties d'identifiabilité, contrairement à d'autres modèles actuels. C'est une propriété importante pour assurer leur interprétabilité. / Latent linear models are powerful probabilistic tools for extracting useful latent structure from otherwise unstructured data and have proved useful in numerous applications such as natural language processing and computer vision. However, the estimation and inference are often intractable for many latent linear models and one has to make use of approximate methods often with no recovery guarantees. An alternative approach, which has been popular lately, are methods based on the method of moments. These methods often have guarantees of exact recovery in the idealized setting of an infinite data sample and well specified models, but they also often come with theoretical guarantees in cases where this is not exactly satisfied. In this thesis, we focus on moment matchingbased estimation methods for different latent linear models. Using a close connection with independent component analysis, which is a well studied tool from the signal processing literature, we introduce several semiparametric models in the topic modeling context and for multi-view models and develop moment matching-based methods for the estimation in these models. These methods come with improved sample complexity results compared to the previously proposed methods. The models are supplemented with the identifiability guarantees, which is a necessary property to ensure their interpretability. This is opposed to some other widely used models, which are unidentifiable. Modèles thématiques Modèles à variables latentes Méthode des moments Topic models Latent variable models Method of moments 004
17	Anchor-based Topic Modeling with Human Interpretable Results / Tolkningsbara ämnesmodeller baserade på ankarord Andersson, Henrik January 2020 (has links) Topic models are useful tools for exploring large data sets of textual content by exposing a generative process from which the text was produced. Anchor-based topic models utilize the anchor word assumption to define a set of algorithms with provable guarantees which recover the underlying topics with a run time practically independent of corpus size. A number of extensions to the initial anchor word-based algorithms, and enhancements made to tangential models, have been proposed which improve the intrinsic characteristics of the model making them more interpretable by humans. This thesis evaluates improvements to human interpretability due to: low-dimensional word embeddings in combination with a regularized objective function, automatic topic merging using tandem anchors, and utilizing word embeddings to synthetically increase corpus density. Results show that tandem anchors are viable vehicles for automatic topic merging, and that using word embeddings significantly improves the original anchor method across all measured metrics. Combining low-dimensional embeddings and a regularized objective results in computational downsides with small or no improvements to the metrics measured. topic models anchor-based anchor words efficient human interpretable
18	Structured Topic Models: Jointly Modeling Words and Their Accompanying Modalities Wang, Xuerui 01 May 2009 (has links) The abundance of data in the information age poses an immense challenge for us: how to perform large-scale inference to understand and utilize this overwhelming amount of information. Such techniques are of tremendous intellectual significance and practical impact. As part of this grand challenge, the goal of my Ph.D. thesis is to develop effective and efficient statistical topic models for massive text collections by incorporating extra information from other modalities in addition to the text itself. Text documents are not just text, and different kinds of additional information are naturally interleaved with text. Most previous work, however, pays attention to only one modality at a time, and ignore the others. In my thesis, I will present a series of probabilistic topic models to show how we can bridge multiple modalities of information, in a united fashion, for various tasks. Interestingly, joint inference over multiple modalities leads to many findings that can not be discovered from just one modality alone, as briefly illustrated below: Email is pervasive nowadays. Much previous work in natural language processing modeled text using latent topics ignoring the social networks. On the other hand, social network research mainly dealt with the existence of links between entities without taking into consideration the language content or topics on those links. The author-recipient-topic (ART) model, by contrast, steers the discovery of topics according to the relationships between people, and learns topic distributions based on the direction-sensitive messages sent between entities. However, the ART model does not explicitly identify groups formed by entities in the network. Previous work in social network analysis ignores the fact that different groupings arise for different topics. The group-topic (GT) model, a probabilistic generative model of entity relationships and textual attributes, simultaneously discovers groups among the entities and topics among the corresponding text. Many of the large datasets do not have static latent structures; they are instead dynamic. The topics over time (TOT) model explicitly models time as an observed continuous variable. This allows TOT to see long-range dependencies in time and also helps avoid a Markov model's risk of inappropriately dividing a topic in two when there is a brief gap in its appearance. By treating time as a continuous variable, we also avoid the difficulties of discretization. Most topic models, including all of the above, rely on the bag of words assumption. However, word order and phrases are often critical to capturing the meaning of text. The topical n -grams (TNG) model discovers topics as well as meaningful, topical phrases simultaneously. In summary, we believe that these models are clear evidence that we can better understand and utilize massive text collections when additional modalities are considered and modeled jointly with text. Multiple modalities Social network analysis Text mining Topic models Computer Sciences
19	Discovering interpretable topics in free-style text: diagnostics, rare topics, and topic supervision Zheng, Ning 07 January 2008 (has links) No description available. topic models Bayesian inference Dirichlet processes experimental design hierarchical Dirichlet processes
20	Approaches to Automatically Constructing Polarity Lexicons for Sentiment Analysis on Social Networks Khuc, Vinh Ngoc 16 August 2012 (has links) No description available. Computer Science social networks sentiment analysis sentiment lexicon graph propagation hadoop topic models lda

Search results