Global ETD Search

61	Statistical methods for topology inference, denoising, and bootstrapping in networks Kang, Xinyu 13 November 2018 (has links) Quite often, the data we observe can be effectively represented using graphs. The underlying structure of the resulting graph, however, might contain noise and does not always hold constant across scales. With the right tools, we could possibly address these two problems. This thesis focuses on developing the right tools and provides insights in looking at them. Specifically, I study several problems that incorporate network data within the multi-scale framework, aiming at identifying common patterns and differences, of signals over networks across different scales. Additional topics in network denoising and network bootstrapping will also be discussed. The first problem we consider is the connectivity changes in dynamic networks constructed from multiple time series data. Multivariate time series data is often non-stationary. Furthermore, it is not uncommon to expect changes in a system across multiple time scales. Motivated by these observations, we in-corporate the traditional Granger-causal type of modeling within the multi-scale framework and propose a new method to detect the connectivity changes and recover the dynamic network structure. The second problem we consider is how to denoise and approximate signals over a network adjacency matrix. We propose an adaptive unbalanced Haar wavelet based transformation of the network data, and show that it is efficient in approximation and denoising of the graph signals over a network adjacency matrix. We focus on the exact decompositions of the network, the corresponding approximation theory, and denoising signals over graphs, particularly from the perspective of compression of the networks. We also provide a real data application on denoising EEG signals over a DTI network. The third problem we consider is in network denoising and network inference. Network representation is popular in characterizing complex systems. However, errors observed in the original measurements will propagate to network statistics and hence induce uncertainties to the summaries of the networks. We propose a spectral-denoising based resampling method to produce confidence intervals that propagate the inferential errors for a number of Lipschitz continuous net- work statistics. We illustrate the effectiveness of the method through a series of simulation studies. Statistics Network Graphical model Multiscale modeling
62	Semi-supervised and active training of conditional random fields for activity recognition Mahdaviani, Maryam 05 1900 (has links) Automated human activity recognition has attracted increasing attention in the past decade. However, the application of machine learning and probabilistic methods for activity recognition problems has been studied only in the past couple of years. For the first time, this thesis explores the application of semi-supervised and active learning in activity recognition. We present a new and efficient semi-supervised training method for parameter estimation and feature selection in conditional random fields (CRFs),a probabilistic graphical model. In real-world applications such as activity recognition, unlabeled sensor traces are relatively easy to obtain whereas labeled examples are expensive and tedious to collect. Furthermore, the ability to automatically select a small subset of discriminatory features from a large pool can be advantageous in terms of computational speed as well as accuracy. We introduce the semi-supervised virtual evidence boosting (sVEB)algorithm for training CRFs — a semi-supervised extension to the recently developed virtual evidence boosting (VEB) method for feature selection and parameter learning. sVEB takes advantage of the unlabeled data via mini-mum entropy regularization. The objective function combines the unlabeled conditional entropy with labeled conditional pseudo-likelihood. The sVEB algorithm reduces the overall system cost as well as the human labeling cost required during training, which are both important considerations in building real world inference systems. Moreover, we propose an active learning algorithm for training CRFs is based on virtual evidence boosting and uses entropy measures. Active virtual evidence boosting (aVEB) queries the user for most informative examples, efficiently builds up labeled training examples and incorporates unlabeled data as in sVEB. aVEB not only reduces computational complexity of training CRFs as in sVEB, but also outputs more accurate classification results for the same fraction of labeled data. Ina set of experiments we illustrate that our algorithms, sVEB and aVEB, benefit from both the use of unlabeled data and automatic feature selection, and outperform other semi-supervised and active training approaches. The proposed methods could also be extended and employed for other classification problems in relational data. / Science, Faculty of / Computer Science, Department of / Graduate semi-supervised learning active learning graphical model
63	Monte Carlo integration in discrete undirected probabilistic models Hamze, Firas 05 1900 (has links) This thesis contains the author’s work in and contributions to the field of Monte Carlo sampling for undirected graphical models, a class of statistical model commonly used in machine learning, computer vision, and spatial statistics; the aim is to be able to use the methodology and resultant samples to estimate integrals of functions of the variables in the model. Over the course of the study, three different but related methods were proposed and have appeared as research papers. The thesis consists of an introductory chapter discussing the models considered, the problems involved, and a general outline of Monte Carlo methods. The three subsequent chapters contain versions of the published work. The second chapter, which has appeared in (Hamze and de Freitas 2004), is a presentation of new MCMC algorithms for computing the posterior distributions and expectations of the unknown variables in undirected graphical models with regular structure. For demonstration purposes, we focus on Markov Random Fields (MRFs). By partitioning the MRFs into non-overlapping trees, it is possible to compute the posterior distribution of a particular tree exactly by conditioning on the remaining tree. These exact solutions allow us to construct efficient blocked and Rao-Blackwellised MCMC algorithms. We show empirically that tree sampling is considerably more efficient than other partitioned sampling schemes and the naive Gibbs sampler, even in cases where loopy belief propagation fails to converge. We prove that tree sampling exhibits lower variance than the naive Gibbs sampler and other naive partitioning schemes using the theoretical measure of maximal correlation. We also construct new information theory tools for comparing different MCMC schemes and show that, under these, tree sampling is more efficient. Although the work discussed in Chapter 2 exhibited promise on the class of graphs to which it was suited, there are many cases where limiting the topology is quite a handicap. The work in Chapter 3 was an exploration in an alternative methodology for approximating functions of variables representable as undirected graphical models of arbitrary connectivity with pairwise potentials, as well as for estimating the notoriously difficult partition function of the graph. The algorithm, published in (Hamze and de Freitas 2005), fits into the framework of sequential Monte Carlo methods rather than the more widely used MCMC, and relies on constructing a sequence of intermediate distributions which get closer to the desired one. While the idea of using “tempered” proposals is known, we construct a novel sequence of target distributions where, rather than dropping a global temperature parameter, we sequentially couple individual pairs of variables that are, initially, sampled exactly from a spanning treeof the variables. We present experimental results on inference and estimation of the partition function for sparse and densely-connected graphs. The final contribution of this thesis, presented in Chapter 4 and also in (Hamze and de Freitas 2007), emerged from some empirical observations that were made while trying to optimize the sequence of edges to add to a graph so as to guide the population of samples to the high-probability regions of the model. Most important among these observations was that while several heuristic approaches, discussed in Chapter 1, certainly yielded improvements over edge sequences consisting of random choices, strategies based on forcing the particles to take large, biased random walks in the state-space resulted in a more efficient exploration, particularly at low temperatures. This motivated a new Monte Carlo approach to treating complex discrete distributions. The algorithm is motivated by the N-Fold Way, which is an ingenious event-driven MCMC sampler that avoids rejection moves at any specific state. The N-Fold Way can however get “trapped” in cycles. We surmount this problem by modifying the sampling process to result in biased state-space paths of randomly chosen length. This alteration does introduce bias, but the bias is subsequently corrected with a carefully engineered importance sampler. / Science, Faculty of / Computer Science, Department of / Graduate Graphical models Monte Carlo inference Bayesian inference
64	Statistical Methods for Constructing Heterogeneous Biomarker Networks Xie, Shanghong January 2019 (has links) The theme of this dissertation is to construct heterogeneous biomarker networks using graphical models for understanding disease progression and prognosis. Biomarkers may organize into networks of connected regions. Substantial heterogeneity in networks between individuals and subgroups of individuals is observed. The strengths of network connections may vary across subjects depending on subject-specific covariates (e.g., genetic variants, age). In addition, the connectivities between biomarkers, as subject-specific network features, have been found to predict disease clinical outcomes. Thus, it is important to accurately identify biomarker network structure and estimate the strength of connections. Graphical models have been extensively used to construct complex networks. However, the estimated networks are at the population level, not accounting for subjects’ covariates. More flexible covariate-dependent graphical models are needed to capture the heterogeneity in subjects and further create new network features to improve prediction of disease clinical outcomes and stratify subjects into clinically meaningful groups. A large number of parameters are required in covariate-dependent graphical models. Regularization needs to be imposed to handle the high-dimensional parameter space. Furthermore, personalized clinical symptom networks can be constructed to investigate co-occurrence of clinical symptoms. When there are multiple biomarker modalities, the estimation of a target biomarker network can be improved by incorporating prior network information from the external modality. This dissertation contains four parts to achieve these goals: (1) An efficient l0-norm feature selection method based on augmented and penalized minimization to tackle the high-dimensional parameter space involved in covariate-dependent graphical models; (2) A two-stage approach to identify disease-associated biomarker network features; (3) An application to construct personalized symptom networks; (4) A node-wise biomarker graphical model to leverage the shared mechanism between multi-modality data when external modality data is available. In the first part of the dissertation, we propose a two-stage procedure to regularize l0-norm as close as possible and solve it by a highly efficient and simple computational algorithm. Advances in high-throughput technologies in genomics and imaging yield unprecedentedly large numbers of prognostic biomarkers. To accommodate the scale of biomarkers and study their association with disease outcomes, penalized regression is often used to identify important biomarkers. The ideal variable selection procedure would search for the best subset of predictors, which is equivalent to imposing an l0-penalty on the regression coefficients. Since this optimization is a non-deterministic polynomial-time hard (NP-hard) problem that does not scale with number of biomarkers, alternative methods mostly place smooth penalties on the regression parameters, which lead to computationally feasible optimization problems. However, empirical studies and theoretical analyses show that convex approximation of l0-norm (e.g., l1) does not outperform their l0 counterpart. The progress for l0-norm feature selection is relatively slower, where the main methods are greedy algorithms such as stepwise regression or orthogonal matching pursuit. Penalized regression based on regularizing l0-norm remains much less explored in the literature. In this work, inspired by the recently popular augmenting and data splitting algorithms including alternating direction method of multipliers, we propose a two-stage procedure for l0-penalty variable selection, referred to as augmented penalized minimization-L0 (APM-L0). APM-L0 targets l0-norm as closely as possible while keeping computation tractable, efficient, and simple, which is achieved by iterating between a convex regularized regression and a simple hard-thresholding estimation. The procedure can be viewed as arising from regularized optimization with truncated l1 norm. Thus, we propose to treat regularization parameter and thresholding parameter as tuning parameters and select based on cross-validation. A one-step coordinate descent algorithm is used in the first stage to significantly improve computational efficiency. Through extensive simulation studies and real data application, we demonstrate superior performance of the proposed method in terms of selection accuracy and computational speed as compared to existing methods. The proposed APM-L0 procedure is implemented in the R-package APML0. In the second part of the dissertation, we develop a two-stage method to estimate biomarker networks that account for heterogeneity among subjects and evaluate the network’s association with disease clinical outcome. In the first stage, we propose a conditional Gaussian graphical model with mean and precision matrix depending on covariates to obtain subject- or subgroup-specific networks. In the second stage, we evaluate the clinical utility of network measures (connection strengths) estimated from the first stage. The second stage analysis provides the relative predictive power of between-region network measures on clinical impairment in the context of regional biomarkers and existing disease risk factors. We assess the performance of the proposed method by extensive simulation studies and application to a Huntington’s disease (HD) study to investigate the effect of HD causal gene on the rate of change in motor symptom through affecting brain subcortical and cortical grey matter atrophy connections. We show that cortical network connections and subcortical volumes, but not subcortical connections are identified to be predictive of clinical motor function deterioration. We validate these findings in an independent HD study. Lastly, highly similar patterns seen in the grey matter connections and a previous white matter connectivity study suggest a shared biological mechanism for HD and support the hypothesis that white matter loss is a direct result of neuronal loss as opposed to the loss of myelin or dysmyelination. In the third part of the dissertation, we apply the methodology to construct heterogeneous cross-sectional symptom networks. The co-occurrence of symptoms may result from the direct interactions between these symptoms and the symptoms can be treated as a system. In addition, subject-specific risk factors (e.g., genetic variants, age) can also exert external influence on the system. In this work, we develop a covariate-dependent conditional Gaussian graphical model to obtain personalized symptom networks. The strengths of network connections are modeled as a function of covariates to capture the heterogeneity among individuals and subgroups of individuals. We assess the performance of the proposed method by simulation studies and an application to a Huntington’s disease study to investigate the networks of symptoms in different domains (motor, cognitive, psychiatric) and identify the important brain imaging biomarkers associated with the connections. We show that the symptoms in the same domain interact more often with each other than across domains. We validate the findings using subjects’ measurements from follow-up visits. In the fourth part of the dissertation, we propose an integrative learning approach to improve the estimation of subject-specific networks of target modality when external modality data is available. The biomarker networks measured by different modalities of data (e.g., structural magnetic resonance imaging (sMRI), diffusion tensor imaging (DTI)) may share the same true underlying biological mechanism. In this work, we propose a node-wise biomarker graphical model to leverage the shared mechanism between multi-modality data to provide a more reliable estimation of the target modality network and account for the heterogeneity in networks due to differences between subjects and networks of external modality. Latent variables are introduced to represent the shared unobserved biological network and the information from the external modality is incorporated to model the distribution of the underlying biological network. An approximation approach is used to calculate the posterior expectations of latent variables to reduce time. The performance of the proposed method is demonstrated by extensive simulation studies and an application to construct gray matter brain atrophy network of Huntington’s disease by using sMRI data and DTI data. The estimated network measures are shown to be meaningful for predicting follow-up clinical outcomes in terms of patient stratification and prediction. Lastly, we conclude the dissertation with comments on limitations and extensions. Biometry Biochemical markers Prognosis Graphical modeling (Statistics)
65	A Molecular Modeling Toolkit with Applications to Efficient Free Energy Computation Tezcan, Hasan Gokhan 01 January 2010 (has links) (PDF) In this thesis we develop a molecular modeling toolkit that models the conformation space of proteins and allows easy prototyping of algorithms on the conformation space models, by extending an existing molecular modeling tool. Our toolkit creates a factor graph to represent the conformation space model and links it with an inference framework. This enables execution of statistical inference tasks and implementation of algorithms that work on graphical models. As an application of our toolkit, we study estimating free energy changes after mutations. We show that it is possible to represent molecular dynamics trajectories using graphical models and free energy perturbation calculations can be done efficiently on these models. proteins free energy conformation space graphical models
66	Improving Password Usability with Visual Techniques Komanduri, Saranga 13 November 2007 (has links) No description available. passwords graphical authentication picture superiority security indicators
67	GRAPHICAL USER INTERFACE FOR AIR TRAFFIC CONTROL Laskar, Pallav 09 April 2012 (has links) No description available. Computer Science Graphical interface 3D air traffic
68	ActiveSPEC and ANSE Usage Environments in Orbit Doumit, Sarjoun S. January 2000 (has links) No description available. heterogeneous analysis and graphical formalism for active networks
69	A preference order dynamic program for underground stope design Holguin, Stefano January 1987 (has links) No description available. Underground Stope Design Graphical Modeling of Orebodies
70	Variable screening and graphical modeling for ultra-high dimensional longitudinal data Zhang, Yafei 02 July 2019 (has links) Ultrahigh-dimensional variable selection is of great importance in the statistical research. And independence screening is a powerful tool to select important variable when there are massive variables. Some commonly used independence screening procedures are based on single replicate data and are not applicable to longitudinal data. This motivates us to propose a new Sure Independence Screening (SIS) procedure to bring the dimension from ultra-high down to a relatively large scale which is similar to or smaller than the sample size. In chapter 2, we provide two types of SIS, and their iterative extensions (iterative SIS) to enhance the finite sample performance. An upper bound on the number of variables to be included is derived and assumptions are given under which sure screening is applicable. The proposed procedures are assessed by simulations and an application of them to a study on systemic lupus erythematosus illustrates the practical use of these procedures. After the variables screening process, we then explore the relationship among the variables. Graphical models are commonly used to explore the association network for a set of variables, which could be genes or other objects under study. However, graphical modes currently used are only designed for single replicate data, rather than longitudinal data. In chapter 3, we propose a penalized likelihood approach to identify the edges in a conditional independence graph for longitudinal data. We used pairwise coordinate descent combined with second order cone programming to optimize the penalized likelihood and estimate the parameters. Furthermore, we extended the nodewise regression method the for longitudinal data case. Simulation and real data analysis exhibit the competitive performance of the penalized likelihood method. / Doctor of Philosophy / Longitudinal data have received a considerable amount of attention in the fields of health science studies. The information from this type of data could be helpful with disease detection and control. Besides, a graph of factors related to the disease can also be built up to represent their relationships between each other. In this dissertation, we develop a framework to find out important factor(s) from thousands of factors in longitudinal data that is/are related to the disease. In addition, we develop a graphical method that can show the relationship among the important factors identified from the previous screening. In practice, combining these two methods together can identify important factors for a disease as well as the relationship among the factors, and thus provide us a deeper understanding about the disease. graphical model variable screening longitudinal data analysis

Search results