Global ETD Search

1	Applying graphical models to partially observed data generating processes / Ali, Rebecca Ayesha, January 2002 (has links) Thesis (Ph. D.)--University of Washington, 2002. / Vita. Includes bibliographical references (p. 121-123). Graphical modeling (Statistics)
2	Non-decomposable discrete graphical models / Liu, Jinnan. January 2008 (has links) Thesis (Ph.D.)--York University, 2008. Graduate Programme in Mathematics and Statistics. / Typescript. Includes bibliographical references (leaves 81-83). Also available on the Internet. MODE OF ACCESS via web browser by entering the following URL: http://gateway.proquest.com/openurl?url_ver=Z39.88-2004&res_dat=xri:pqdiss&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&rft_dat=xri:pqdiss:NR39029
3	Causally appropriate graphical modelling for time series with applications to economics, ecology and environmental science : a thesis submitted in partial fulfilment of the requirements for the degree of Master of Science in Mathematics and Statistics in the University of Canterbury / Meurk, Carla. January 2005 (has links) Thesis (M. Sc.)--University of Canterbury, 2006. / Typescript (photocopy). Includes bibliographical references (leaves 108-110). Also available via the World Wide Web.
4	Statistical Methods for Constructing Heterogeneous Biomarker Networks Xie, Shanghong January 2019 (has links) The theme of this dissertation is to construct heterogeneous biomarker networks using graphical models for understanding disease progression and prognosis. Biomarkers may organize into networks of connected regions. Substantial heterogeneity in networks between individuals and subgroups of individuals is observed. The strengths of network connections may vary across subjects depending on subject-specific covariates (e.g., genetic variants, age). In addition, the connectivities between biomarkers, as subject-specific network features, have been found to predict disease clinical outcomes. Thus, it is important to accurately identify biomarker network structure and estimate the strength of connections. Graphical models have been extensively used to construct complex networks. However, the estimated networks are at the population level, not accounting for subjects’ covariates. More flexible covariate-dependent graphical models are needed to capture the heterogeneity in subjects and further create new network features to improve prediction of disease clinical outcomes and stratify subjects into clinically meaningful groups. A large number of parameters are required in covariate-dependent graphical models. Regularization needs to be imposed to handle the high-dimensional parameter space. Furthermore, personalized clinical symptom networks can be constructed to investigate co-occurrence of clinical symptoms. When there are multiple biomarker modalities, the estimation of a target biomarker network can be improved by incorporating prior network information from the external modality. This dissertation contains four parts to achieve these goals: (1) An efficient l0-norm feature selection method based on augmented and penalized minimization to tackle the high-dimensional parameter space involved in covariate-dependent graphical models; (2) A two-stage approach to identify disease-associated biomarker network features; (3) An application to construct personalized symptom networks; (4) A node-wise biomarker graphical model to leverage the shared mechanism between multi-modality data when external modality data is available. In the first part of the dissertation, we propose a two-stage procedure to regularize l0-norm as close as possible and solve it by a highly efficient and simple computational algorithm. Advances in high-throughput technologies in genomics and imaging yield unprecedentedly large numbers of prognostic biomarkers. To accommodate the scale of biomarkers and study their association with disease outcomes, penalized regression is often used to identify important biomarkers. The ideal variable selection procedure would search for the best subset of predictors, which is equivalent to imposing an l0-penalty on the regression coefficients. Since this optimization is a non-deterministic polynomial-time hard (NP-hard) problem that does not scale with number of biomarkers, alternative methods mostly place smooth penalties on the regression parameters, which lead to computationally feasible optimization problems. However, empirical studies and theoretical analyses show that convex approximation of l0-norm (e.g., l1) does not outperform their l0 counterpart. The progress for l0-norm feature selection is relatively slower, where the main methods are greedy algorithms such as stepwise regression or orthogonal matching pursuit. Penalized regression based on regularizing l0-norm remains much less explored in the literature. In this work, inspired by the recently popular augmenting and data splitting algorithms including alternating direction method of multipliers, we propose a two-stage procedure for l0-penalty variable selection, referred to as augmented penalized minimization-L0 (APM-L0). APM-L0 targets l0-norm as closely as possible while keeping computation tractable, efficient, and simple, which is achieved by iterating between a convex regularized regression and a simple hard-thresholding estimation. The procedure can be viewed as arising from regularized optimization with truncated l1 norm. Thus, we propose to treat regularization parameter and thresholding parameter as tuning parameters and select based on cross-validation. A one-step coordinate descent algorithm is used in the first stage to significantly improve computational efficiency. Through extensive simulation studies and real data application, we demonstrate superior performance of the proposed method in terms of selection accuracy and computational speed as compared to existing methods. The proposed APM-L0 procedure is implemented in the R-package APML0. In the second part of the dissertation, we develop a two-stage method to estimate biomarker networks that account for heterogeneity among subjects and evaluate the network’s association with disease clinical outcome. In the first stage, we propose a conditional Gaussian graphical model with mean and precision matrix depending on covariates to obtain subject- or subgroup-specific networks. In the second stage, we evaluate the clinical utility of network measures (connection strengths) estimated from the first stage. The second stage analysis provides the relative predictive power of between-region network measures on clinical impairment in the context of regional biomarkers and existing disease risk factors. We assess the performance of the proposed method by extensive simulation studies and application to a Huntington’s disease (HD) study to investigate the effect of HD causal gene on the rate of change in motor symptom through affecting brain subcortical and cortical grey matter atrophy connections. We show that cortical network connections and subcortical volumes, but not subcortical connections are identified to be predictive of clinical motor function deterioration. We validate these findings in an independent HD study. Lastly, highly similar patterns seen in the grey matter connections and a previous white matter connectivity study suggest a shared biological mechanism for HD and support the hypothesis that white matter loss is a direct result of neuronal loss as opposed to the loss of myelin or dysmyelination. In the third part of the dissertation, we apply the methodology to construct heterogeneous cross-sectional symptom networks. The co-occurrence of symptoms may result from the direct interactions between these symptoms and the symptoms can be treated as a system. In addition, subject-specific risk factors (e.g., genetic variants, age) can also exert external influence on the system. In this work, we develop a covariate-dependent conditional Gaussian graphical model to obtain personalized symptom networks. The strengths of network connections are modeled as a function of covariates to capture the heterogeneity among individuals and subgroups of individuals. We assess the performance of the proposed method by simulation studies and an application to a Huntington’s disease study to investigate the networks of symptoms in different domains (motor, cognitive, psychiatric) and identify the important brain imaging biomarkers associated with the connections. We show that the symptoms in the same domain interact more often with each other than across domains. We validate the findings using subjects’ measurements from follow-up visits. In the fourth part of the dissertation, we propose an integrative learning approach to improve the estimation of subject-specific networks of target modality when external modality data is available. The biomarker networks measured by different modalities of data (e.g., structural magnetic resonance imaging (sMRI), diffusion tensor imaging (DTI)) may share the same true underlying biological mechanism. In this work, we propose a node-wise biomarker graphical model to leverage the shared mechanism between multi-modality data to provide a more reliable estimation of the target modality network and account for the heterogeneity in networks due to differences between subjects and networks of external modality. Latent variables are introduced to represent the shared unobserved biological network and the information from the external modality is incorporated to model the distribution of the underlying biological network. An approximation approach is used to calculate the posterior expectations of latent variables to reduce time. The performance of the proposed method is demonstrated by extensive simulation studies and an application to construct gray matter brain atrophy network of Huntington’s disease by using sMRI data and DTI data. The estimated network measures are shown to be meaningful for predicting follow-up clinical outcomes in terms of patient stratification and prediction. Lastly, we conclude the dissertation with comments on limitations and extensions. Biometry Biochemical markers Prognosis Graphical modeling (Statistics)
5	Maximum likelihood estimation in Gaussian AMP chain graph models and Gaussian ancestral graph models / Drton, Mathias, January 2004 (has links) Thesis (Ph. D.)--University of Washington, 2004. / Vita. Includes bibliographical references (p. 71-78).
6	Using the structure of d-connecting paths as a qualitative measure of the strength of dependence / Chaudhuri, Sanjay, January 2005 (has links) Thesis (Ph. D.)--University of Washington, 2005. / Vita. Includes bibliographical references (p. 94-95).
7	Estimativa da irradiação solar global pelo método de Angstrom-Prescott e técnicas de aprendizado de máquinas / Silva, Maurício Bruno Prado da, 1988. January 2016 (has links) Orientador: João Francisco Escobedo / Banca: Erico Tadao Teramoto / Banca: Silvia Helena Modenese Gorla da Silva / Resumo: No presente trabalho é descrito o estudo comparativo de métodos de estimativas da irradiação solar global (HG) nas partições diária (HGd) e mensal (HGm): geradas pela técnica de Angstrom-Prescott (A-P) e duas técnicas de Aprendizado de Máquina (AM), Máquinas de Vetores de Suporte (MVS) e Redes Neurais Artificiais (RNA). A base de dados usada foi medida no período de 1996 a 2011, na Estação Solarimétrica em Botucatu. Por meio da regressão entre a transmissividade atmosférica (HG/HO) e razão de insolação (n/N), o modelo estatístico (A-P) foi determinado, obtendo equações lineares que permitem estimar HG com elevados coeficientes de determinação. As técnicas, MVS e RNA, foram treinadas na mesma arquitetura de A-P (modelo 1). As técnicas MVS e RNA foram treinadas ainda em mais 3 modelos com acréscimos, uma a uma, das variáveis temperatura do ar, precipitação e umidade relativa (modelos 2, 3 e 4). Os modelos foram validados usando uma base de dados de dois anos, denominadas de típico e atipico, por meio de correlações entre os valores estimados e medidos, indicativos estatísticos rMBE, MBE, rRMSE, RMSE e d de Willmott. Os indicativos estatísticos r das correlações mostraram que o modelo (A-P) pode estimar HG com elevados coeficientes de determinação nas duas condições de validação. Já indicativos estatísticos rMBE, MBE, rRMSE, RMSE e d de Willmott indicam que o modelo (A-P) pode ser utilizado na estimativa de HGd com exatidão e precisão. Os indicativos estatísticos obtidos pelos 4 modelos das técnicas MVSd e RNAd (diária) e MVSm e RNAm (mensal) podem ser utilizadas nas estimativas de HGd com elevadas correlações e com precisão e exatidão. Entre os modelos foram selecionadas por comparação entre os indicativo estatisticos as redes MVS4d e RNA4d ... / Abstract: In this paper describes the comparative study of different methods for estimating global solar irradiation (HG) in the daily partitions (HGd) and monthly (HGm): generated by Angstrom-Prescott (AP) and two machine learning techniques (ML), Support Vector Machines (SVM) and Artificial Neural Networks (ANN). The used database was measured from 1996 to 2011, in Solarimetric station in Botucatu. Through regression between atmospheric transmissivity (HG / HO) and insolation ratio (n / N), the statistical model (A-P) was determined, obtaining linear equations that allow estimating HG with high coefficients of determination. The techniques, svm and ANN, were trained on the same architecture of A-P (model 1). The SVM and ANN techniques were further trained on the most models with 3 additions, one by one, the variable air temperature, rainfall and relative humidity (model 2, 3 and 4 ). The models were validated using a database of two years, called of typical and atypical, with correlation between estimated and measured values, statistical indications: rMBE, MBE, rRMSE, RMSE, and d Willmott. The statistical indicative of correlations coefficient (r) showed that the model (A-P) can be estimated with high HG determination coefficients in the two validation conditions. The rMBE, MBE, rRMSE, RMSE Willmott and d indicate that the model (A-P) can be used to estimate HGD with accuracy and precision. The statistical indicative obtained by the four models of technical SVMd and ANNd (daily) and SVMm and ANNm (monthly) can be used in the estimates of HGD with high correlations and with precision and accuracy. Among the models were selected by comparing the indicative statistical SVM4d and ANN4d networks (daily) and SVM1m and ANN1m (monthly). The comparison of statistical indicative rMBE, MBE, rRMSE, RMSE, d Willmott, r and R2 obtained in the validation of the models (A-P), SVM and ANN showed that: the SVM technique ... / Mestre Radiação solar. Modelagem gráfica (Estatística) Aprendizado do computador. Graphical modeling (Statistics)
8	Deep Probabilistic Graphical Modeling Dieng, Adji Bousso January 2020 (has links) Probabilistic graphical modeling (PGM) provides a framework for formulating an interpretable generative process of data and expressing uncertainty about unknowns. This makes PGM very useful for understanding phenomena underlying data and for decision making. PGM has seen great success in domains where interpretable inferences are key, e.g. marketing, medicine, neuroscience, and social science. However, PGM tends to lack flexibility, which has hindered its use when it comes to modeling large scale high-dimensional complex data and performing tasks that require flexibility (e.g. in vision and language applications.) Deep learning (DL) is another framework for modeling and learning from data that has seen great empirical success in recent years. DL is very powerful and offers great flexibility, but it lacks the interpretability and calibration of PGM. This thesis develops deep probabilistic graphical modeling (DPGM). DPGM consists in leveraging DL to make PGM more flexible. DPGM brings about new methods for learning from data that exhibit the advantages of both PGM and DL. We use DL within PGM to build flexible models endowed with an interpretable latent structure. One family of models we develop extends exponential family principal component analysis (EF-PCA) using neural networks to improve predictive performance while enforcing the interpretability of the latent factors. Another model class we introduce enables accounting for long-term dependencies when modeling sequential data, which is a challenge when using purely DL or PGM approaches. This model class for sequential data was successfully applied to language modeling, unsupervised document representation learning for sentiment analysis, conversation modeling, and patient representation learning for hospital readmission prediction. Finally, DPGM successfully solves several outstanding problems of probabilistic topic models. Leveraging DL within PGM also brings about new algorithms for learning with complex data. For example, we develop entropy-regularized adversarial learning, a learning paradigm that deviates from the traditional maximum likelihood approach used in PGM. From the DL perspective, entropy-regularized adversarial learning provides a solution to the long-standing mode collapse problem of generative adversarial networks. Statistics Computer science Graphical modeling (Statistics) Machine learning
9	Probabilistic models for information extraction: from cascaded approach to joint approach. / CUHK electronic theses & dissertations collection January 2010 (has links) Based on these observations and analysis, we propose a joint discriminative probabilistic framework to optimize all relevant subtasks simultaneously. This framework defines a joint probability distribution for both segmentations in sequence data and relations of segments in the form of an exponential family. This model allows tight interactions between segmentations and relations of segments and it offers a natural way for IE tasks. Since exact parameter estimation and inference are prohibitively intractable, a structured variational inference algorithm is developed to perform parameter estimation approximately. For inference, we propose a strong bi-directional MH approach to find the MAP assignments for joint segmentations and relations to explore mutual benefits on both directions, such that segmentations can aid relations, and vice-versa. / Information Extraction (IE) aims at identifying specific pieces of information (data) in a unstructured or semi-structured textual document and transforming unstructured information in a corpus of documents or Web pages into a structured database. There are several representative tasks in IE: named entity recognition (NER), which aims at identifying phrases that denote types of named entities, entity relation extraction, which aims at discovering the events or relations related to the entities, and the task of coreference resolution, aims at determining whether two extracted mentions of entities refer to the same object. IE is useful for a wide variety of applications. / The end-to-end performance of high-level IE systems for compound tasks is often hampered by the use of cascaded frameworks. The integrated model we proposed can alleviate some of these problems, but it is only loosely coupled. Parameter estimation is performed independently and it only allows information to flow in one direction. In this top-down integration model, the decision of the bottom sub-model could guide the decision of the upper sub-model, but not vice-versa. Thus, deep interactions and dependencies between different tasks can hardly be well captured. / We have investigated and developed a cascaded framework in an attempt to consider entity extraction and qualitative domain knowledge based on undirected, discriminatively-trained probabilistic graphical models. This framework consists of two stages and it is the combination of statistical learning and first-order logic. As a pipeline model, the first stage is a base model and the second stage is used to validate and correct the errors made in the base model. We incorporated domain knowledge that can be well formulated into first-order logic to extract entity candidates from the base model. We have applied this framework and achieved encouraging results in Chinese NER on the People's Daily corpus. / We perform extensive experiments on three important IE tasks using real-world datasets, namely Chinese NER, entity identification and relationship extraction from Wikipedia's encyclopedic articles, and citation matching, to test our proposed models, including the bidirectional model, the integrated model, and the joint model. Experimental results show that our models significantly outperform current state-of-the-art probabilistic models, such as decoupled and joint models, illustrating the feasibility and promise of our proposed approaches. (Abstract shortened by UMI.) / We present a general, strongly-coupled, and bidirectional architecture based on discriminatively trained factor graphs for information extraction, which consists of two components---segmentation and relation. First we introduce joint factors connecting variables of relevant subtasks to capture dependencies and interactions between them. We then propose a strong bidirectional Markov chain Monte Carlo (MCMC) sampling inference algorithm which allows information to flow in both directions to find the approximate maximum a posteriori (MAP) solution for all subtasks. Notably, our framework is considerably simpler to implement, and outperforms previous ones. / Yu, Xiaofeng. / Adviser: Zam Wai. / Source: Dissertation Abstracts International, Volume: 72-04, Section: B, page: . / Thesis (Ph.D.)--Chinese University of Hong Kong, 2010. / Includes bibliographical references (leaves 109-123). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Electronic reproduction. Ann Arbor, MI : ProQuest Information and Learning Company, [200-] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Abstract also in Chinese. Graphical modeling (Statistics) Names, Chinese Random fields Text processing (Computer science)
10	Graphical and Bayesian analysis of unbalanced patient management data / Righter, Emily Stewart, January 2007 (has links) (PDF) Project (M.S.)--Brigham Young University. Dept. of Statistics, 2007. / Includes bibliographical references (p. 60-61).

Search results