Spelling suggestions: "subject:"datent structure"" "subject:"iatent structure""
11 |
Statistical analysis for transformation latent variable models with incomplete data. / CUHK electronic theses & dissertations collectionJanuary 2013 (has links)
潜变量模型作为处理多元数据的一种有效的方法,在行为学、教育学、社会心理学以及医学等各个领域都受到了广泛关注。在分析潜变量模型时,大多数现有的统计方法和软件都是基于响应变量为正态分布的假设。尽管一些最近发展的方法可以处理部分的非正态数据,但在分析高度非正态的数据时依然存在问题。此外,在实际研究中还经常会遇到不完全数据,如缺失数据和删失数据。简单地忽略或错误地处理不完全数据可能会严重扭曲统计结果。在本文中,我们发展了贝叶斯惩罚样条方法,同时采用马尔科夫链蒙特卡洛方法,用以分析存有高度非正态和不完全数据的变换潜变量模型。我们在变换潜变量模型中讨论了不同类型的不完全数据,如完全随机缺失数据、随机缺失数据、不可忽略的缺失数据以及删失数据。我们还利用离差信息准则来选择正确的模型和数据缺失机制。我们通过许多模拟研究论证了我们提出的方法。此方法被应用于关于工作满意度、家庭生活、工作态度的研究,以及香港地区2 型糖尿病患者心血管疾病的研究。 / Latent variable models (LVMs), as useful multivariate techniques, have attracted significant attention from various fields, including the behavioral, educational, social-psychological, and medical sciences. In the analysis of LVMs, most existing statistical methods and software have been developed under the normal assumption of response variables. While some recent developments can partially address the non-normality of data, they are still problematic in dealing with highly non-normal data. Moreover, the presence of incomplete data, such as missing data and censoring data, is a practical issue in substantive research. Simply ignoring incomplete data or wrongly managing incomplete data might seriously distort statistical influence results. In this thesis, we develop a Bayesian P-spline approach, coupled with Markov chain Monte Carlo (MCMC) methods, to analyze transformation LVMs with highly non-normal and incomplete data. Different types of incomplete data, such as missing completely at random data, missing at random data, nonignorable missing data, as well as censored data, are discussed in the context of transformation LVMs. The deviance information criterion is proposed to conduct model comparison and select an appropriate missing mechanism. The empirical performance of the proposed methodologies is examined via many simulation studies. Applications to a study concerning people's job satisfaction, home life, and work attitude, as well as a study on cardiovascular diseases for type 2 diabetic patients in Hong Kong are presented. / Detailed summary in vernacular field only. / Liu, Pengfei. / Thesis (Ph.D.)--Chinese University of Hong Kong, 2013. / Includes bibliographical references (leaves 115-127). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Abstract also in Chinese. / Abstract --- p.ii / Acknowledgement --- p.v / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Latent Variable Models --- p.1 / Chapter 1.2 --- Missing Data --- p.4 / Chapter 1.3 --- Censoring Data --- p.5 / Chapter 1.4 --- Penalized B-splines --- p.6 / Chapter 1.5 --- Bayesian Methods --- p.7 / Chapter 1.6 --- Outline of the Thesis --- p.8 / Chapter 2 --- Transformation Structural Equation Models --- p.9 / Chapter 2.1 --- Introduction --- p.9 / Chapter 2.2 --- Model Description --- p.11 / Chapter 2.3 --- Bayesian Estimation --- p.12 / Chapter 2.3.1 --- Bayesian P-splines --- p.12 / Chapter 2.3.2 --- Identifiability Constraints --- p.15 / Chapter 2.3.3 --- Prior Distributions --- p.16 / Chapter 2.3.4 --- Posterior Inference --- p.18 / Chapter 2.4 --- Bayesian Model Selection via DIC --- p.20 / Chapter 2.5 --- Simulation Studies --- p.23 / Chapter 2.5.1 --- Simulation 1 --- p.23 / Chapter 2.5.2 --- Simulation 2 --- p.26 / Chapter 2.5.3 --- Simulation 3 --- p.27 / Chapter 2.6 --- Conclusion --- p.28 / Chapter 3 --- Transformation SEMs with Missing Data that are Missing At Random --- p.43 / Chapter 3.1 --- Introduction --- p.43 / Chapter 3.2 --- Model Description --- p.45 / Chapter 3.3 --- Bayesian Estimation and Model Selection --- p.46 / Chapter 3.3.1 --- Modeling Transformation Functions --- p.46 / Chapter 3.3.2 --- Identifiability Constraints --- p.47 / Chapter 3.3.3 --- Prior Distributions --- p.48 / Chapter 3.3.4 --- Bayesian Estimation --- p.49 / Chapter 3.3.5 --- Model Selection via DIC --- p.52 / Chapter 3.4 --- Simulation Studies --- p.53 / Chapter 3.4.1 --- Simulation 1 --- p.54 / Chapter 3.4.2 --- Simulation 2 --- p.56 / Chapter 3.5 --- Conclusion --- p.57 / Chapter 4 --- Transformation SEMs with Nonignorable Missing Data --- p.65 / Chapter 4.1 --- Introduction --- p.65 / Chapter 4.2 --- Model Description --- p.67 / Chapter 4.3 --- Bayesian Inference --- p.68 / Chapter 4.3.1 --- Model Identification and Prior Distributions --- p.68 / Chapter 4.3.2 --- Posterior Inference --- p.69 / Chapter 4.4 --- Selection of Missing Mechanisms --- p.71 / Chapter 4.5 --- Simulation studies --- p.73 / Chapter 4.5.1 --- Simulation 1 --- p.73 / Chapter 4.5.2 --- Simulation 2 --- p.76 / Chapter 4.6 --- A Real Example --- p.77 / Chapter 4.7 --- Conclusion --- p.79 / Chapter 5 --- Transformation Latent Variable Models with Multivariate Censored Data --- p.86 / Chapter 5.1 --- Introduction --- p.86 / Chapter 5.2 --- Model Description --- p.88 / Chapter 5.3 --- Bayesian Inference --- p.90 / Chapter 5.3.1 --- Model Identification and Bayesian P-splines --- p.90 / Chapter 5.3.2 --- Prior Distributions --- p.91 / Chapter 5.3.3 --- Posterior Inference --- p.93 / Chapter 5.4 --- Simulation Studies --- p.96 / Chapter 5.4.1 --- Simulation 1 --- p.96 / Chapter 5.4.2 --- Simulation 2 --- p.99 / Chapter 5.5 --- A Real Example --- p.100 / Chapter 5.6 --- Conclusion --- p.103 / Chapter 6 --- Conclusion and Further Development --- p.113 / Bibliography --- p.115
|
12 |
Learning non-Gaussian factor analysis with different structures: comparative investigations on model selection and applications. / 基於多種結構的非高斯因數分析的模型選擇學習演算法比較研究及其應用 / CUHK electronic theses & dissertations collection / Ji yu duo zhong jie gou de fei Gaosi yin shu fen xi de mo xing xuan ze xue xi yan suan fa bi jiao yan jiu ji qi ying yongJanuary 2012 (has links)
高維資料的隱含結構挖掘是機器學習、模式識別和生物資訊學等領域中的重要問題。本論文從實踐和理論上研究了具有不同隱含結構模式的非高斯因數分析(Non-Gaussian Factor Analysis)模型。本文既從兩步法又從自動法的角度重點研究確定隱因數個數的模型選擇問題,及其在模式識別和生物資訊學上的實際應用。 / 非高斯因數分析在單高斯因數的情況下退化為傳統的因數分析(Factor Analysis)模型。我們發展了一套系統地比較模型選擇方法性能的工具,比較研究了經典的模型選擇準則(比如AIC 等),及近年來基於隨機矩陣理論的統計檢驗方法,還有貝葉斯陰陽(Bayesian Ying-Yang)和諧學習理論。同時,我們也對四個經典準則提供了一個適用於小樣本的低估因數數目傾向的相對排序的理論結果。 / 基於傳統的因數分析模型,我們還研究了參數化形式對模型選擇方法的性能的影響,一個重要的但被忽略或很少研究的問題,因為似然函數等價的參數化形式在傳統的模型選擇準則像AIC 下不會有性能差異。但是,我們通過大量的模擬資料和實際資料上的結果發現,在兩個常用的似然函數等價的因數分析參數化形式中,其中一個更加有利於在變分貝葉斯(Variational Bayes)和貝葉斯陰陽理論框架下做模型選擇。 進一步地,該兩個參數化形式被作為兩端拓展成一系列具有等價似然函數的參數化形式。實驗結果更加可靠地揭示了參數化形式的逐漸變化對模型選擇的影響。同時,實驗結果也顯示參數先驗分佈的引入可以提高模型選擇的準確度,並給出了相應的新的學習演算法。系統比較表明,不僅是兩步法還是自動法,貝葉斯陰陽學習理論都比變分貝葉斯的模型選擇的性能更佳,並且能在有利的參數化形式中獲得更大的提高。 / 二元因數分析(Binary FA)也是一種非高斯因數分析模型,它用伯努利因數去解釋隱含結構。首先,我們引入一種叫做正則對偶(canonical dual)的方法去解決在二元因數分析學習演算法中遇到的一個計算複雜度很大的二值二次規劃(Binary Quadratic Programming)問題。雖然它不能準確找到二值二次規劃的全域最優,它卻提高了整個學習演算法的計算速度和自動模型選擇的準確性。由此表明,局部嵌套的子優化問題的解不需要太精確反而能對整個學習演算法的性能更有利。然後,先驗分佈的引入進一步提高了模型選擇的性能,並且貝葉斯陰陽學習理論被系統的實驗結果證實要優於變分貝葉斯。接著,我們進一步發展了一個適用於二值資料的二元矩陣分解演算法。該演算法有理論的結果保證它的性能,並且在實際應用中,能以比其他相關演算法更優的性能從大規模的蛋白相互作用網路中檢測出蛋白功能複合物。 / 進一步,我們在一個半盲(semi-blind)的框架下研究了非高斯因數分析的演算法及其在系統生物學中的應用。非高斯因數分析模型被用於基因轉錄調控建模,並引入稀疏約束到連接矩陣,從而提出一個能有效估計轉錄因數調控信號的方法,而不需要像網路分量分析(Network Component Analysis)方法那樣預先給定轉錄因數調控基因的拓撲網路結構。特別地,借助二元因數分析,調控信號中的二元特徵能被直接捕捉。這種似開關的模式在很多生物過程的調控機制裡面起著重要作用。 / 最後,基於半盲非高斯因數分析學習演算法,我們提出了一套分析外顯子測序數據的方法,能有效地找出與疾病關聯的易感基因,提供了一個可能的方向去解決傳統的全基因組關聯分析(GWAS)方法在低頻高雜訊的外顯子測序數據上失效的問題。在一個1457 個樣本的大規模外顯子測序數據的初步結果顯示,我們的方法既能確認很多已經被認為是與疾病相關的基因,又能找到新的被重複驗證有顯著性的易感基因。相關的表達譜資料進一步顯示所找到的新基因在疾病和對照上有顯著的上下調的表達差異。 / Mining the underlying structure from high dimensional observations is of critical importance in machine learning, pattern recognition and bioinformatics. In this thesis, we, empirically or theoretically, investigate non-Gaussian Factor Analysis (NFA) models with different underlying structures. We focus on the problem of determining the number of latent factors of NFA, from two-stage approach model selection to automatic model selection, with real applications in pattern recognition and bioinformatics. / We start by a degenerate case of NFA, the conventional Factor Analysis (FA) with latent Gaussian factors. Many model selection methods have been proposed and used for FA, and it is important to examine their relative strengths and weaknesses. We develop an empirical analysis tool, to facilitate a systematic comparison on model selection performances of not only classical criteria (e.g., Akaike’s information criterion or shortly AIC) but also recently developed methods (e.g., Kritchman & Nadler’s hypothesis tests), as well as the Bayesian Ying-Yang (BYY) harmony learning. Also, we prove a theoretical relative order of underestimation tendency of four classical criteria. / Then, we investigate how parameterizations affect model selection performance, an issue that has been ignored or seldom studied since traditional model selection criteria, like AIC, perform equivalently on different parameterizations that have equivalent likelihood functions. Focusing on two typical parameterizations of FA, one of which is found to be better than the other under both Variational Bayes (VB) and BYY via extensive experiments on synthetic and real data. Moreover, a family of FA parameterizations that have equivalent likelihood functions are presented, where each one is featured by an integer r, with the two known parameterizations being both ends as r varies from zero to its upper bound. Investigations on this FA family not only confirm the significant difference between the two parameterizations in terms of model selection performance, but also provide insights into what makes a better parameterization. With a Bayesian treatment to the new FA family, alternative VB algorithms on FA are derived, and also BYY algorithms on FA are extended to be equipped with prior distributions on the parameters. A systematic comparison shows that BYY generally outperforms VB under various scenarios including varying simulation configurations and incrementally adding priors to parameters, as well as automatic model selection. / To describe binary latent features, we proceed to binary factor analysis (BFA), which considers Bernoulli factors. First, we introduce a canonical dual approach to tackling a difficult Binary Quadratic Programming (BQP) problem encountered as a computational bottleneck in BFA learning. Although it is not an exact BQP solver, it improves the learning speed and model selection accuracy, which indicates that some amount of error in solving the BQP, a problem nested in the hierarchy of the whole learning process, brings gain on both computational efficiency and model selection performance. The results also imply that optimization is important in learning, but learning is not just a simple optimization. Second, we develop BFA algorithms under VB and BYY to incorporate Bayesian priors on the parameters to improve the automatic model selection performance, and also show that BYY is superior to VB under a systematic comparison. Third, for binary observations, we propose a Bayesian Binary Matrix Factorization (BMF) algorithm under the BYY framework. The performance of the BMF algorithm is guaranteed with theoretical proofs and verified by experiments. We apply it to discovering protein complexes from protein-protein interaction (PPI) networks, an important problem in bioinformatics, with outperformance comparing to other related methods. / Furthermore, we investigate NFA under a semi-blind learning framework. In practice, there exist many scenarios of knowing partially either or both of the system and the input. Here, we modify Network Component Analysis (NCA) to model gene transcriptional regulation in system biology by NFA. The previous hardcut NFA algorithm is extended here as sparse BYY-NFA by considering either or both of a priori connectivity and a priori sparse constraint. Therefore, the a priori knowledge about the connection topology of the TF-gene regulatory network required by NCA is not necessary for our NFA algorithm. The sparse BYY-NFA can be further modified to get a sparse BYY-BFA algorithm, which directly models the switching patterns of latent transcription factor (TF) activities in gene regulation, e.g., whether or not a TF is activated. Mining switching patterns provides insights into exploring regulation mechanism of many biological processes. / Finally, the semi-blind NFA learning is applied to identify those single nucleotide polymorphisms (SNPs) that are significantly associated with a disease or a complex trait from exome sequencing data. By encoding each exon/gene (which may contain multiple SNPs) as a vector, an NFA classifier, obtained in a supervised way on a training set, is used for prediction on a testing set. The genes are selected according to the p-values of Fisher’s exact test on the confusion tables collected from prediction results. The selected genes on a real dataset from an exome sequencing project on psoriasis are consistent in part with published results, and some of them are probably novel susceptible genes of the disease according to the validation results. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Tu, Shikui. / Thesis (Ph.D.)--Chinese University of Hong Kong, 2012. / Includes bibliographical references (leaves 196-212). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Abstract also in Chinese. / Abstract --- p.i / Acknowledgement --- p.iv / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Background --- p.1 / Chapter 1.1.1 --- Motivations --- p.1 / Chapter 1.1.2 --- Independent Factor Analysis (IFA) --- p.2 / Chapter 1.1.3 --- Learning Methods --- p.6 / Chapter 1.2 --- Related Work --- p.14 / Chapter 1.2.1 --- Learning Gaussian FA --- p.14 / Chapter 1.2.2 --- Learning NFA --- p.16 / Chapter 1.2.3 --- Learning Semi-blind NFA --- p.18 / Chapter 1.3 --- Main Contribution of the Thesis --- p.18 / Chapter 1.4 --- Thesis Organization --- p.25 / Chapter 1.5 --- Publication List --- p.27 / Chapter 2 --- FA comparative analysis --- p.31 / Chapter 2.1 --- Determining the factor number --- p.32 / Chapter 2.2 --- Model Selection Methods --- p.34 / Chapter 2.2.1 --- Two-Stage Procedure and Classical Model Selection Criteria --- p.34 / Chapter 2.2.2 --- Kritchman&Nadler's Hypothesis Test (KN) --- p.35 / Chapter 2.2.3 --- Minimax Rank Estimation (MM) --- p.37 / Chapter 2.2.4 --- Minka's Criterion (MK) for PCA --- p.38 / Chapter 2.2.5 --- Bayesian Ying-Yang (BYY) Harmony Learning --- p.39 / Chapter 2.3 --- Empirical Analysis --- p.42 / Chapter 2.3.1 --- A New Tool for Empirical Comparison --- p.42 / Chapter 2.3.2 --- Investigation On Model Selection Performance --- p.44 / Chapter 2.4 --- A Theoretic Underestimation Partial Order --- p.49 / Chapter 2.4.1 --- Events of Estimating the Hidden Dimensionality --- p.49 / Chapter 2.4.2 --- The Structural Property of the Criterion Function --- p.49 / Chapter 2.4.3 --- Experimental Justification --- p.54 / Chapter 2.5 --- Concluding Remarks --- p.58 / Chapter 3 --- FA parameterizations affect model selection --- p.70 / Chapter 3.1 --- Parameterization Issue in Model Selection --- p.71 / Chapter 3.2 --- FAr: ML-equivalent Parameterizations of FA --- p.72 / Chapter 3.3 --- Variational Bayes on FAr --- p.74 / Chapter 3.4 --- Bayesian Ying-Yang Harmony Learning on FAr --- p.77 / Chapter 3.5 --- Empirical Analysis --- p.82 / Chapter 3.5.1 --- Three levels of investigations --- p.82 / Chapter 3.5.2 --- FA-a vs FA-b: performances of BYY, VB, AIC, BIC, and DNLL --- p.84 / Chapter 3.5.3 --- FA-r: performances of VB versus BYY --- p.87 / Chapter 3.5.4 --- FA-a vs FA-b: automatic model selection performance of BYYandVB --- p.90 / Chapter 3.5.5 --- Classification Performance on Real World Data Sets --- p.92 / Chapter 3.6 --- Concluding remarks --- p.93 / Chapter 4 --- BFA learning versus optimization --- p.104 / Chapter 4.1 --- Binary Factor Analysis --- p.105 / Chapter 4.2 --- BYY Harmony Learning on BFA --- p.107 / Chapter 4.3 --- Empirical Analysis --- p.108 / Chapter 4.3.1 --- BIC and Variational Bayes (VB) on BFA --- p.108 / Chapter 4.3.2 --- Error in solving BQP affects model selection --- p.110 / Chapter 4.3.3 --- Priors over parameters affect model selection --- p.114 / Chapter 4.3.4 --- Comparisons among BYY, VB, and BIC --- p.115 / Chapter 4.3.5 --- Applications in recovering binary images --- p.116 / Chapter 4.4 --- Concluding Remarks --- p.117 / Chapter 5 --- BMF for PPI network analysis --- p.124 / Chapter 5.1 --- The problem of protein complex prediction --- p.125 / Chapter 5.2 --- A novel binary matrix factorization (BMF) algorithm --- p.126 / Chapter 5.3 --- Experimental Results --- p.130 / Chapter 5.3.1 --- Other methods in comparison --- p.130 / Chapter 5.3.2 --- Data sets --- p.131 / Chapter 5.3.3 --- Evaluation criteria --- p.131 / Chapter 5.3.4 --- On altered graphs by randomly adding and deleting edges --- p.132 / Chapter 5.3.5 --- On real PPI data sets --- p.137 / Chapter 5.3.6 --- On gene expression data for biclustering --- p.137 / Chapter 5.4 --- A Theoretical Analysis on BYY-BMF --- p.138 / Chapter 5.4.1 --- Main results --- p.138 / Chapter 5.4.2 --- Experimental justification --- p.140 / Chapter 5.4.3 --- Proofs --- p.143 / Chapter 5.5 --- Concluding Remarks --- p.147 / Chapter 6 --- Semi-blind NFA: algorithms and applications --- p.148 / Chapter 6.1 --- Determining transcription factor activity --- p.148 / Chapter 6.1.1 --- A brief review on NCA --- p.149 / Chapter 6.1.2 --- Sparse NFA --- p.150 / Chapter 6.1.3 --- Sparse BFA --- p.156 / Chapter 6.1.4 --- On Yeast cell-cycle data --- p.160 / Chapter 6.1.5 --- On E. coli carbon source transition data --- p.166 / Chapter 6.2 --- Concluding Remarks --- p.170 / Chapter 7 --- Applications on Exome Sequencing Data Analysis --- p.172 / Chapter 7.1 --- From GWAS to Exome Sequencing --- p.172 / Chapter 7.2 --- Encoding An Exon/Gene --- p.173 / Chapter 7.3 --- An NFA Classifier --- p.175 / Chapter 7.4 --- Results --- p.176 / Chapter 7.4.1 --- Simulation --- p.176 / Chapter 7.4.2 --- On a real exome sequencing data set: AHMUe --- p.177 / Chapter 7.5 --- Concluding Remarks --- p.186 / Chapter 8 --- Conclusion and FutureWork --- p.187 / Chapter A --- Derivations of the learning algorithms on FA-r --- p.190 / Chapter A.1 --- The VB learning algorithm on FA-r --- p.190 / Chapter A.2 --- The BYY learning algorithm on FA-r --- p.193 / Bibliography --- p.195
|
13 |
Statistical Methods for Integrated Cancer Genomic Data Using a Joint Latent Variable ModelDrill, Esther January 2018 (has links)
Inspired by the TCGA (The Cancer Genome Atlas), we explore multimodal genomic datasets with integrative methods using a joint latent variable approach. We use iCluster+, an existing clustering method for integrative data, to identify potential subtypes within TCGA sarcoma and mesothelioma tumors, and across a large cohort of 33 dierent TCGA cancer datasets. For classication, motivated to improve the prediction of platinum resistance in high grade serous ovarian cancer (HGSOC) treatment, we propose novel integrative methods, iClassify to perform classication using a joint latent variable model. iClassify provides eective data integration and classication while handling heterogeneous data types, while providing a natural framework to incorporate covariate risk factors and examine genomic driver by covariate risk factor interaction. Feature selection is performed through a thresholding parameter that combines both latent variable and feature coecients. We demonstrate increased accuracy in classication over methods that assume homogeneous data type, such as linear discriminant analysis and penalized logistic regression, and improved feature selection. We apply iClassify to a TCGA cohort of HGSOC patients with three types of genomic data and platinum response data. This methodology has broad applications beyond predicting treatment outcomes and disease progression in cancer, including predicting prognosis and diagnosis in other diseases with major public health implications.
|
14 |
Three Contributions to Latent Variable ModelingLiu, Xiang January 2019 (has links)
The dissertation includes three papers that address some theoretical and technical issues of latent variable models. The first paper extends the uniformly most powerful test approach for testing person parameter in IRT to the two-parameter logistic models. In addition, an efficient branch-and-bound algorithm for computing the exact p-value is proposed. The second paper proposes a reparameterization of the log-linear CDM model. A Gibbs sampler is developed for posterior computation. The third paper proposes an ordered latent class model with infinite classes using a stochastic process prior. Furthermore, a nonparametric IRT application is also discussed.
|
15 |
Latent models for cross-covariance /Wegelin, Jacob A. January 2001 (has links)
Thesis (Ph. D.)--University of Washington, 2001. / Vita. Includes bibliographical references (p. 139-145).
|
16 |
Search-based learning of latent tree models /Chen, Tao. January 2009 (has links)
Thesis (Ph.D.)--Hong Kong University of Science and Technology, 2009. / Includes bibliographical references (p. 95-99).
|
17 |
New Advancements of Scalable Statistical Methods for Learning Latent Structures in Big DataZhao, Shiwen January 2016 (has links)
<p>Constant technology advances have caused data explosion in recent years. Accord- ingly modern statistical and machine learning methods must be adapted to deal with complex and heterogeneous data types. This phenomenon is particularly true for an- alyzing biological data. For example DNA sequence data can be viewed as categorical variables with each nucleotide taking four different categories. The gene expression data, depending on the quantitative technology, could be continuous numbers or counts. With the advancement of high-throughput technology, the abundance of such data becomes unprecedentedly rich. Therefore efficient statistical approaches are crucial in this big data era.</p><p>Previous statistical methods for big data often aim to find low dimensional struc- tures in the observed data. For example in a factor analysis model a latent Gaussian distributed multivariate vector is assumed. With this assumption a factor model produces a low rank estimation of the covariance of the observed variables. Another example is the latent Dirichlet allocation model for documents. The mixture pro- portions of topics, represented by a Dirichlet distributed variable, is assumed. This dissertation proposes several novel extensions to the previous statistical methods that are developed to address challenges in big data. Those novel methods are applied in multiple real world applications including construction of condition specific gene co-expression networks, estimating shared topics among newsgroups, analysis of pro- moter sequences, analysis of political-economics risk data and estimating population structure from genotype data.</p> / Dissertation
|
18 |
Contextual Modulation of Competitive Object Candidates in Early Object RecognitionUnknown Date (has links)
Object recognition is imperfect; often incomplete processing or deprived
information yield misperceptions (i.e., misidentification) of objects. While quickly
rectified and typically benign, instances of such errors can produce dangerous
consequences (e.g., police shootings). Through a series of experiments, this study
examined the competitive process of multiple object interpretations (candidates) during
the earlier stages of object recognition process using a lexical decision task paradigm.
Participants encountered low-pass filtered objects that were previously demonstrated to
evoke multiple responses: a highly frequented interpretation (“primary candidates”) and a
lesser frequented interpretation (“secondary candidates”). When objects were presented
without context, no facilitative effects were observed for primary candidates. However,
secondary candidates demonstrated evidence for being actively suppressed. / Includes bibliography. / Thesis (M.S.)--Florida Atlantic University, 2017. / FAU Electronic Theses and Dissertations Collection
|
19 |
Semiparametric latent variable models with Bayesian p-splines. / CUHK electronic theses & dissertations collectionJanuary 2010 (has links)
In medical, behavioral, and social-psychological sciences, latent variable models are useful in handling variables that cannot be directly measured by a single observed variable, but instead are assessed through a number of observed variables. Traditional latent variable models are usually based on parametric assumptions on both relations between outcome and explanatory latent variables, and error distributions. In this thesis, semiparametric models with Bayesian P-splines are developed to relax these rigid assumptions. / In the fourth part of the thesis, the methodology developed in the third part is further extended to a varying coefficient model with latent variables. Varying coefficient model is a class of flexible semiparametric models in which the effects of covariates are modeled dynamically by unspecified smooth functions. A transformation varying coefficient model can handle arbitrarily distributed dynamic data. A simulation study shows that our proposed method performs well in the analysis of this complex model. / In the last part of the thesis, we propose a finite mixture of varying coefficient models to analyze dynamic data with heterogeneity. A simulation study demonstrates that our proposed method can explore possible existence of different groups in a dynamic data, where in each group the dynamic influences of covariates on the response variables have different patterns. The proposed method is applied to a longitudinal study concerning the effectiveness of heroin treatment. Distinct patterns of heroin use and treatment effect in different patient groups are identified. / In the second part of the thesis, a latent variable model is proposed to relax the first assumption, in which unknown additive functions of latent variables in the structural equation are modeled by Bayesian P-splines. The estimation of nonparametric functions is based on powerful Markov chain Monte Carlo (MCMC) algorithm with block update scheme. A simulation study shows that the proposed method can handle much wider situation than traditional models. The proposed semiparametric latent variable model is applied to a study on osteoporosis prevention and control. Some interesting functional relations, which may be overlooked by traditional parametric latent variable models, are revealed. / In the third part of the thesis, a transformation model is developed to relax the second assumption, which usually assumes the normality of observed variables and random errors. In our proposed model, the nonnormal response variables are transformed to normal by unknown functions modeled with Bayesian P-splines. This semiparametric transformation model is shown to be applicable to a wide range of statistical analysis. The model is applied to a study on the intervention treatment of polydrug use in which the traditional model assumption is violated because many observed variables exhibit serious departure from normality. / Lu, Zhaohua. / Adviser: Xin-Yuan Song. / Source: Dissertation Abstracts International, Volume: 72-04, Section: B, page: . / Thesis (Ph.D.)--Chinese University of Hong Kong, 2010. / Includes bibliographical references (leaves 119-130). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Electronic reproduction. Ann Arbor, MI : ProQuest Information and Learning Company, [200-] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Abstract also in Chinese.
|
20 |
On local and global influence analysis of latent variable models with ML and Bayesian approaches. / CUHK electronic theses & dissertations collectionJanuary 2004 (has links)
Bin Lu. / "September 2004." / Thesis (Ph.D.)--Chinese University of Hong Kong, 2004. / Includes bibliographical references (p. 118-126) / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Mode of access: World Wide Web. / Abstracts in English and Chinese.
|
Page generated in 0.0992 seconds