• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 62
  • 12
  • 8
  • 8
  • 6
  • 3
  • 3
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 118
  • 118
  • 26
  • 16
  • 16
  • 13
  • 13
  • 12
  • 11
  • 11
  • 10
  • 10
  • 9
  • 9
  • 9
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
21

A comparison of latent growth models for constructs measured by multiple indicators

Leite, Walter Lana 28 August 2008 (has links)
Not available / text
22

A comparison of latent growth models for constructs measured by multiple indicators

Leite, Walter Lana, Stapleton, Laura M., January 2005 (has links) (PDF)
Thesis (Ph. D.)--University of Texas at Austin, 2005. / Supervisor: Laura M. Stapleton. Vita. Includes bibliographical references.
23

Functional Studies and X-Ray Structure Analysis of Human Interleukin-5 Receptor Alpha and Human Interleukin-5 Complex

Patiño Gonzalez, Edwin January 2007 (has links) (PDF)
Interleukin-5 (IL-5) is a member of the hematopoietic class I cytokines and is specifically involved in eosinophil activation. IL-5 plays an important role in disease conditions such as allergic asthma and other hypereosinophilias, which are characterized by highly increased levels of eosinophils in peripheral blood and tissues. The IL-5 receptor is a heterodimer consisting of a binding alpha subunit (IL- 5Rα) and a common beta subunit (IL-5Rβ). This IL-5Rβ is shared with the IL-3 and GM-CSF receptors. The IL-5Rα is required for ligand-specific binding, whereas the association of the IL-5Rβ subunit triggers intracellular signal transduction. Previous studies have described the crystallographic structure of human IL-5 (hIL-5), as well as that of the common IL-5Rβ chain (IL-5Rβc) However, no experimental structural data are yet available for the interaction of the high-affinity IL-5 receptor IL-5Rα with its ligand IL-5. Therefore, this thesis had the principle objective to gain new insights into the basis of this important agonist-receptor interaction. In particular, data on the recombinant expression, purification and preparation of the binary complex of hIL-5 bound to the receptor ectodomain of hIL-5Rα are shown, as well as the subsequent crystal structure analysis of the binary ligand-receptor (hIL-5Rα/hIL-5) complex. Both proteins were expressed in an Escherichia coli expression system, purified to homogeneity, and crystallized. However, since the initial analysis of these crystals did not show any X-ray diffraction, each step of the preparation and crystallization procedure had to be stepwise optimized. Several improvements proved to be crucial for obtaining crystals suitable for structure analysis. A free cysteine residue in the N-terminal domain of the hIL-5Rα ectodomain protein was mutated to alanine to remove protein heterogeneity. In addition, hIL-5 affinity chromatography of the receptor protein proved to be absolutely crucial for crystal quality. Additive screening using the initial crystallization condition finally yielded crystals of the binary complex, which diffracted to 2.5Å resolution and were suitable for structure analysis. The preliminary structure data demonstrate a new receptor architecture for the IL-5Rα ligand-binding domain, which has no similarities to other cytokine class I receptor structures known so far. The complex structure demonstrates that the ligand-binding region of human IL-5Rα is dispersed over all three extracellular domains, and adopts a binding topology in which the cytokine recognition motif (CRM) needs the first Fn-III domain of the human IL-5Rα to bind the ligand. In a second project, a prokaryotic expression system for murine IL-5 (mIL-5) was established to allow the production of mIL-5 and mIL-5 antagonist that should facilitate functional studies in mice. Since the expression of mIL-5 in E. coli had never been successful so far, a fusion protein system was generated expressing high yields of mIL-5. Chemical cleavage with cyanogen bromide (CNBr) was used to release mIL-5 monomers, which were subsequently purified and refolded. This technique yielded an active murine IL-5 dimer as confirmed by TF-1 cell proliferation assays. The protein was crystallized and the structure of mIL-5 could be determined at 2.5Å resolution. The molecular structure revealed a symmetrical left-handed four helices bundle dimer similar to human IL-5. Analysis of the structure-/function relationship allowed us to design specific mIL-5 antagonist molecules, which are still under examination. Taken together, these findings provide further insights in the IL-5 and IL-5R interaction which may help to further understand and depict this and other cytokine-receptor interactions of similar architecture, e.g. the IL-13 ligand-receptor system. Ultimately, this may represent another piece of puzzle in the attempts to rationally design and engineer novel IL-5-related pharmacological therapeutics. / Interleukin-5 (IL-5) ist ein Mitglied der Gruppe der hematopoetischen Zytokine der Klasse I und spielt eine Schlüsselrolle bei der Aktivierung von eosinophilen Granulozyten. IL-5 hat damit ein wichtige pathophysiologische Funktion bei der Entstehung von Krankheiten wie allergischem Asthma und anderen Hypereosinophilien, die alle durch eine stark erhöhte Zahl von Eosinophilen in peripherem Blut und Geweben charakerisiert sind. Der IL-5 Rezeptor ist ein Heterodimer, der aus einer alpha-Untereinheit (IL-5Rα) und einer mit den IL-3 und GM-CSF Rezeptoren gemeinsamen beta-Untereinheit (IL-5Rβ) besteht. Der IL-5Rα ist für die spezifische Liganden-Bindung notwendig, während der mit der IL-5Rβ Untereinheit assozierte Komplex die intrazelluläre Signaltransduktion einleitet. In früheren Studien konnte bereits die Kristallstruktur des menschlichen IL-5 (hIL-5) und der gemeinsamen IL-5Rβ-Kette (IL-5Rβc) aufgeklärt werden. Allerdings liegen bisher noch keine experimentellen Strukturdaten für die Interaktionen des hochaffinen IL-5 Rezeptor IL-5Rα mit seinem Ligand IL-5 vor. Deshalb war es die Hauptzielsetzung dieser Arbeit, neue Einblicke in die molekulare Basis der Interaktion von IL-5 Rezeptor und Agonisten zu gewinnen. Im Einzelnen beschreibe ich in dieser Arbeit die rekombinante Expression, Aufreinigung und Herstellung des binären Komplexes von der hIL-5 Bindung an die extrazellulären Domäne des Rezeptors hIL-5Rα sowie die anschließende kristallographische Strukturanalyse dieses binären Ligand-Rezeptor-Komplexes (hIL-Rα/hIL-5). Beide Proteine wurden in einem Escherichia coli-Expressionssystem rekombinant hergestellt, bis zur Homogenität gereinigt und anschließend kristallisiert. Die Analysen dieser ersten Kristalle zeigten nicht die gewünschte Beugung der Röntgenstrahlung, weshalb in allen anschließenden Schritten eine schrittweise Optimierung der Produktions- und Kristallisationsbedingungen durchgeführt wurde. Als Ergebnis dieser Optimierungsstrategie konnten schließlich Kristalle erhalten werden, die für eine Strukturanalyse geeignet waren. Ein ungepaartes Cystein in der N-terminalen Domäne des extrazellulären hIL-5Rα-Protein wurde durch Alanin ersetzt, um so die Protein-Heterogenität durch Cystein-Oxidationsprodukte zu verringern. Die affinitätschromatographische Aufreinigung des Rezeptorproteins war ebenfalls entscheidend, um eine hohe Kristallqualität zu erreichen. Die Verwendung verschiedener Additivsubstanzen zusätzlich zu der initialen Kristallisationsbedingungen führte letztlich zur Bildung für die Strukturanalyse geeigneter Einzelkristallen des binären Komplexes (hIL-5Rα/hIL-5). Ihre Messung ergab Beugungsdaten mit einer maximalen Auflösung von 2.5Å. Eine erste Strukturanalyse zeigt klar, dass die Liganden-bindende Domäne des IL-5Rα Rezeptors eine bisher unbekannte, neuartige Rezeptor-Architektur aufweist, die keinerlei Ähnlichkeit zu bisher bekannten Zytokinrezeptor-strukturen der Klasse I hat. Die Struktur des Komplexes zeigt zudem, dass das Liganden-bindende Epitop von IL-5Rα über alle drei extrazelluläre Domänen verteilt ist und eine Topologie aufweist, in der zusätzlich zu dem Zytokin-Erkennungsmotiv (CRM) die erste Fn-III Domäne von hIL-5Rα benötigt wird, um den Liganden hochaffin binden zu können. In einem zweiten Projekt wurde ein prokaryotisches Expressionssystem für murines Interleukin-5 (mIL-5) entwickelt, welches die Produktion von mIL-5 und eines mIL-5 Antagonisten für funktionelle Studien in einem Mausmodell ermöglichen sollte. Da eine rekombinante Produktion von mIL-5 in E.coli bisher nicht erfolgreich war, wurde ein Fusionsproteinsystem entwickelt, welches die Produktion großer Mengen von mIL-5 Protein erlaubt. Es wurde eine chemische Spaltung mit Cyanbromid (CNBr) durchgeführt, um das Monomer aus dem Fusionsprotein freizusetzen. Das so erhaltene mIL-5 Monomer wurde gereinigt und renaturiert. Nach Rückfaltung zeigt das dimere Protein in TF-1 Zellen eine mit den Literaturwerten vergleichbare biologische Aktivität. Das so erhaltene mIL-5 Protein wurde kristallisiert und mittels Röntgenbeugung analysiert. Auf diese Weisen konnten Beugungsdaten von mIL-5 Kristallen mit einer maximalen Auflösung von 2.5Å erhalten werden. Die Struktur weist ähnlich wie das humane IL-5 ein symmetrisches Vier-Helix Bündel auf. Die Struktur-/Funktionsanalyse ermöglichte daraufhin, definierte mIL-5-Antagonisten zu entwickeln, die sich derzeit noch in der Untersuchung befinden. Zusammengefasst tragen die hier präsentierten Ergebnisse dazu bei, die molekularen Grundlagen der spezifischen IL-5 und IL-5R Bindung sowie Interaktionen ähnlichen Liganden-Rezeptor-Typen zu verstehen. Letztendlich besteht die Hoffnung, dass diese und ähnliche Arbeiten ein weiteres wichtiges Puzzlestück darstellen bei dem Versuch, neue und innovative pharmakologische Therapieansätze zu entwickeln, die bei dem für die Pathophysiologie von Asthma wichtigen Schlüsselmolekül IL-5 angreifen.
24

A latent variable approach to impute missing values: with application in air pollution data.

January 1999 (has links)
Wing-Yeong Lee. / Thesis (M.Phil.)--Chinese University of Hong Kong, 1999. / Includes bibliographical references (leaves 73-75). / Abstracts in English and Chinese. / Chapter Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Introduction --- p.1 / Chapter 1.2 --- The observed data --- p.3 / Chapter 1.3 --- Outline of the thesis --- p.8 / Chapter Chapter 2 --- Modeling using Latent Variable --- p.9 / Chapter Chapter 3 --- Imputation Procedure --- p.16 / Chapter 3.1 --- Introduction --- p.16 / Chapter 3.2 --- Introduction to Metropolis-Hastings algorithm --- p.18 / Chapter 3.3 --- Introduction to Gibbs sampler --- p.19 / Chapter 3.4 --- Imputation step --- p.21 / Chapter 3.5 --- Initialization of the missing values by regression --- p.23 / Chapter 3.6 --- Initialization of the parameters and creating the latent variable and noises --- p.27 / Chapter 3.7 --- Simulation of Y's --- p.30 / Chapter 3.8 --- Simulation of the parameters --- p.34 / Chapter 3.9 --- Simulation of T by use of the Metropolis-Hastings algorithm --- p.41 / Chapter 3.10 --- Distribution of Vij's given all other values --- p.44 / Chapter 3.11 --- Simulation procedure of Vij's --- p.46 / Chapter Chapter 4 --- Data Analysis of the Pollutant Data --- p.48 / Chapter 4.1 --- Convergence of the process --- p.48 / Chapter 4.2 --- Data analysis --- p.53 / Chapter Chapter 5 --- Conclusion --- p.69 / REFERENCES --- p.73
25

Statistical analysis for transformation latent variable models with incomplete data. / CUHK electronic theses & dissertations collection

January 2013 (has links)
潜变量模型作为处理多元数据的一种有效的方法,在行为学、教育学、社会心理学以及医学等各个领域都受到了广泛关注。在分析潜变量模型时,大多数现有的统计方法和软件都是基于响应变量为正态分布的假设。尽管一些最近发展的方法可以处理部分的非正态数据,但在分析高度非正态的数据时依然存在问题。此外,在实际研究中还经常会遇到不完全数据,如缺失数据和删失数据。简单地忽略或错误地处理不完全数据可能会严重扭曲统计结果。在本文中,我们发展了贝叶斯惩罚样条方法,同时采用马尔科夫链蒙特卡洛方法,用以分析存有高度非正态和不完全数据的变换潜变量模型。我们在变换潜变量模型中讨论了不同类型的不完全数据,如完全随机缺失数据、随机缺失数据、不可忽略的缺失数据以及删失数据。我们还利用离差信息准则来选择正确的模型和数据缺失机制。我们通过许多模拟研究论证了我们提出的方法。此方法被应用于关于工作满意度、家庭生活、工作态度的研究,以及香港地区2 型糖尿病患者心血管疾病的研究。 / Latent variable models (LVMs), as useful multivariate techniques, have attracted significant attention from various fields, including the behavioral, educational, social-psychological, and medical sciences. In the analysis of LVMs, most existing statistical methods and software have been developed under the normal assumption of response variables. While some recent developments can partially address the non-normality of data, they are still problematic in dealing with highly non-normal data. Moreover, the presence of incomplete data, such as missing data and censoring data, is a practical issue in substantive research. Simply ignoring incomplete data or wrongly managing incomplete data might seriously distort statistical influence results. In this thesis, we develop a Bayesian P-spline approach, coupled with Markov chain Monte Carlo (MCMC) methods, to analyze transformation LVMs with highly non-normal and incomplete data. Different types of incomplete data, such as missing completely at random data, missing at random data, nonignorable missing data, as well as censored data, are discussed in the context of transformation LVMs. The deviance information criterion is proposed to conduct model comparison and select an appropriate missing mechanism. The empirical performance of the proposed methodologies is examined via many simulation studies. Applications to a study concerning people's job satisfaction, home life, and work attitude, as well as a study on cardiovascular diseases for type 2 diabetic patients in Hong Kong are presented. / Detailed summary in vernacular field only. / Liu, Pengfei. / Thesis (Ph.D.)--Chinese University of Hong Kong, 2013. / Includes bibliographical references (leaves 115-127). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Abstract also in Chinese. / Abstract --- p.ii / Acknowledgement --- p.v / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Latent Variable Models --- p.1 / Chapter 1.2 --- Missing Data --- p.4 / Chapter 1.3 --- Censoring Data --- p.5 / Chapter 1.4 --- Penalized B-splines --- p.6 / Chapter 1.5 --- Bayesian Methods --- p.7 / Chapter 1.6 --- Outline of the Thesis --- p.8 / Chapter 2 --- Transformation Structural Equation Models --- p.9 / Chapter 2.1 --- Introduction --- p.9 / Chapter 2.2 --- Model Description --- p.11 / Chapter 2.3 --- Bayesian Estimation --- p.12 / Chapter 2.3.1 --- Bayesian P-splines --- p.12 / Chapter 2.3.2 --- Identifiability Constraints --- p.15 / Chapter 2.3.3 --- Prior Distributions --- p.16 / Chapter 2.3.4 --- Posterior Inference --- p.18 / Chapter 2.4 --- Bayesian Model Selection via DIC --- p.20 / Chapter 2.5 --- Simulation Studies --- p.23 / Chapter 2.5.1 --- Simulation 1 --- p.23 / Chapter 2.5.2 --- Simulation 2 --- p.26 / Chapter 2.5.3 --- Simulation 3 --- p.27 / Chapter 2.6 --- Conclusion --- p.28 / Chapter 3 --- Transformation SEMs with Missing Data that are Missing At Random --- p.43 / Chapter 3.1 --- Introduction --- p.43 / Chapter 3.2 --- Model Description --- p.45 / Chapter 3.3 --- Bayesian Estimation and Model Selection --- p.46 / Chapter 3.3.1 --- Modeling Transformation Functions --- p.46 / Chapter 3.3.2 --- Identifiability Constraints --- p.47 / Chapter 3.3.3 --- Prior Distributions --- p.48 / Chapter 3.3.4 --- Bayesian Estimation --- p.49 / Chapter 3.3.5 --- Model Selection via DIC --- p.52 / Chapter 3.4 --- Simulation Studies --- p.53 / Chapter 3.4.1 --- Simulation 1 --- p.54 / Chapter 3.4.2 --- Simulation 2 --- p.56 / Chapter 3.5 --- Conclusion --- p.57 / Chapter 4 --- Transformation SEMs with Nonignorable Missing Data --- p.65 / Chapter 4.1 --- Introduction --- p.65 / Chapter 4.2 --- Model Description --- p.67 / Chapter 4.3 --- Bayesian Inference --- p.68 / Chapter 4.3.1 --- Model Identification and Prior Distributions --- p.68 / Chapter 4.3.2 --- Posterior Inference --- p.69 / Chapter 4.4 --- Selection of Missing Mechanisms --- p.71 / Chapter 4.5 --- Simulation studies --- p.73 / Chapter 4.5.1 --- Simulation 1 --- p.73 / Chapter 4.5.2 --- Simulation 2 --- p.76 / Chapter 4.6 --- A Real Example --- p.77 / Chapter 4.7 --- Conclusion --- p.79 / Chapter 5 --- Transformation Latent Variable Models with Multivariate Censored Data --- p.86 / Chapter 5.1 --- Introduction --- p.86 / Chapter 5.2 --- Model Description --- p.88 / Chapter 5.3 --- Bayesian Inference --- p.90 / Chapter 5.3.1 --- Model Identification and Bayesian P-splines --- p.90 / Chapter 5.3.2 --- Prior Distributions --- p.91 / Chapter 5.3.3 --- Posterior Inference --- p.93 / Chapter 5.4 --- Simulation Studies --- p.96 / Chapter 5.4.1 --- Simulation 1 --- p.96 / Chapter 5.4.2 --- Simulation 2 --- p.99 / Chapter 5.5 --- A Real Example --- p.100 / Chapter 5.6 --- Conclusion --- p.103 / Chapter 6 --- Conclusion and Further Development --- p.113 / Bibliography --- p.115
26

Learning non-Gaussian factor analysis with different structures: comparative investigations on model selection and applications. / 基於多種結構的非高斯因數分析的模型選擇學習演算法比較研究及其應用 / CUHK electronic theses & dissertations collection / Ji yu duo zhong jie gou de fei Gaosi yin shu fen xi de mo xing xuan ze xue xi yan suan fa bi jiao yan jiu ji qi ying yong

January 2012 (has links)
高維資料的隱含結構挖掘是機器學習、模式識別和生物資訊學等領域中的重要問題。本論文從實踐和理論上研究了具有不同隱含結構模式的非高斯因數分析(Non-Gaussian Factor Analysis)模型。本文既從兩步法又從自動法的角度重點研究確定隱因數個數的模型選擇問題,及其在模式識別和生物資訊學上的實際應用。 / 非高斯因數分析在單高斯因數的情況下退化為傳統的因數分析(Factor Analysis)模型。我們發展了一套系統地比較模型選擇方法性能的工具,比較研究了經典的模型選擇準則(比如AIC 等),及近年來基於隨機矩陣理論的統計檢驗方法,還有貝葉斯陰陽(Bayesian Ying-Yang)和諧學習理論。同時,我們也對四個經典準則提供了一個適用於小樣本的低估因數數目傾向的相對排序的理論結果。 / 基於傳統的因數分析模型,我們還研究了參數化形式對模型選擇方法的性能的影響,一個重要的但被忽略或很少研究的問題,因為似然函數等價的參數化形式在傳統的模型選擇準則像AIC 下不會有性能差異。但是,我們通過大量的模擬資料和實際資料上的結果發現,在兩個常用的似然函數等價的因數分析參數化形式中,其中一個更加有利於在變分貝葉斯(Variational Bayes)和貝葉斯陰陽理論框架下做模型選擇。 進一步地,該兩個參數化形式被作為兩端拓展成一系列具有等價似然函數的參數化形式。實驗結果更加可靠地揭示了參數化形式的逐漸變化對模型選擇的影響。同時,實驗結果也顯示參數先驗分佈的引入可以提高模型選擇的準確度,並給出了相應的新的學習演算法。系統比較表明,不僅是兩步法還是自動法,貝葉斯陰陽學習理論都比變分貝葉斯的模型選擇的性能更佳,並且能在有利的參數化形式中獲得更大的提高。 / 二元因數分析(Binary FA)也是一種非高斯因數分析模型,它用伯努利因數去解釋隱含結構。首先,我們引入一種叫做正則對偶(canonical dual)的方法去解決在二元因數分析學習演算法中遇到的一個計算複雜度很大的二值二次規劃(Binary Quadratic Programming)問題。雖然它不能準確找到二值二次規劃的全域最優,它卻提高了整個學習演算法的計算速度和自動模型選擇的準確性。由此表明,局部嵌套的子優化問題的解不需要太精確反而能對整個學習演算法的性能更有利。然後,先驗分佈的引入進一步提高了模型選擇的性能,並且貝葉斯陰陽學習理論被系統的實驗結果證實要優於變分貝葉斯。接著,我們進一步發展了一個適用於二值資料的二元矩陣分解演算法。該演算法有理論的結果保證它的性能,並且在實際應用中,能以比其他相關演算法更優的性能從大規模的蛋白相互作用網路中檢測出蛋白功能複合物。 / 進一步,我們在一個半盲(semi-blind)的框架下研究了非高斯因數分析的演算法及其在系統生物學中的應用。非高斯因數分析模型被用於基因轉錄調控建模,並引入稀疏約束到連接矩陣,從而提出一個能有效估計轉錄因數調控信號的方法,而不需要像網路分量分析(Network Component Analysis)方法那樣預先給定轉錄因數調控基因的拓撲網路結構。特別地,借助二元因數分析,調控信號中的二元特徵能被直接捕捉。這種似開關的模式在很多生物過程的調控機制裡面起著重要作用。 / 最後,基於半盲非高斯因數分析學習演算法,我們提出了一套分析外顯子測序數據的方法,能有效地找出與疾病關聯的易感基因,提供了一個可能的方向去解決傳統的全基因組關聯分析(GWAS)方法在低頻高雜訊的外顯子測序數據上失效的問題。在一個1457 個樣本的大規模外顯子測序數據的初步結果顯示,我們的方法既能確認很多已經被認為是與疾病相關的基因,又能找到新的被重複驗證有顯著性的易感基因。相關的表達譜資料進一步顯示所找到的新基因在疾病和對照上有顯著的上下調的表達差異。 / Mining the underlying structure from high dimensional observations is of critical importance in machine learning, pattern recognition and bioinformatics. In this thesis, we, empirically or theoretically, investigate non-Gaussian Factor Analysis (NFA) models with different underlying structures. We focus on the problem of determining the number of latent factors of NFA, from two-stage approach model selection to automatic model selection, with real applications in pattern recognition and bioinformatics. / We start by a degenerate case of NFA, the conventional Factor Analysis (FA) with latent Gaussian factors. Many model selection methods have been proposed and used for FA, and it is important to examine their relative strengths and weaknesses. We develop an empirical analysis tool, to facilitate a systematic comparison on model selection performances of not only classical criteria (e.g., Akaike’s information criterion or shortly AIC) but also recently developed methods (e.g., Kritchman & Nadler’s hypothesis tests), as well as the Bayesian Ying-Yang (BYY) harmony learning. Also, we prove a theoretical relative order of underestimation tendency of four classical criteria. / Then, we investigate how parameterizations affect model selection performance, an issue that has been ignored or seldom studied since traditional model selection criteria, like AIC, perform equivalently on different parameterizations that have equivalent likelihood functions. Focusing on two typical parameterizations of FA, one of which is found to be better than the other under both Variational Bayes (VB) and BYY via extensive experiments on synthetic and real data. Moreover, a family of FA parameterizations that have equivalent likelihood functions are presented, where each one is featured by an integer r, with the two known parameterizations being both ends as r varies from zero to its upper bound. Investigations on this FA family not only confirm the significant difference between the two parameterizations in terms of model selection performance, but also provide insights into what makes a better parameterization. With a Bayesian treatment to the new FA family, alternative VB algorithms on FA are derived, and also BYY algorithms on FA are extended to be equipped with prior distributions on the parameters. A systematic comparison shows that BYY generally outperforms VB under various scenarios including varying simulation configurations and incrementally adding priors to parameters, as well as automatic model selection. / To describe binary latent features, we proceed to binary factor analysis (BFA), which considers Bernoulli factors. First, we introduce a canonical dual approach to tackling a difficult Binary Quadratic Programming (BQP) problem encountered as a computational bottleneck in BFA learning. Although it is not an exact BQP solver, it improves the learning speed and model selection accuracy, which indicates that some amount of error in solving the BQP, a problem nested in the hierarchy of the whole learning process, brings gain on both computational efficiency and model selection performance. The results also imply that optimization is important in learning, but learning is not just a simple optimization. Second, we develop BFA algorithms under VB and BYY to incorporate Bayesian priors on the parameters to improve the automatic model selection performance, and also show that BYY is superior to VB under a systematic comparison. Third, for binary observations, we propose a Bayesian Binary Matrix Factorization (BMF) algorithm under the BYY framework. The performance of the BMF algorithm is guaranteed with theoretical proofs and verified by experiments. We apply it to discovering protein complexes from protein-protein interaction (PPI) networks, an important problem in bioinformatics, with outperformance comparing to other related methods. / Furthermore, we investigate NFA under a semi-blind learning framework. In practice, there exist many scenarios of knowing partially either or both of the system and the input. Here, we modify Network Component Analysis (NCA) to model gene transcriptional regulation in system biology by NFA. The previous hardcut NFA algorithm is extended here as sparse BYY-NFA by considering either or both of a priori connectivity and a priori sparse constraint. Therefore, the a priori knowledge about the connection topology of the TF-gene regulatory network required by NCA is not necessary for our NFA algorithm. The sparse BYY-NFA can be further modified to get a sparse BYY-BFA algorithm, which directly models the switching patterns of latent transcription factor (TF) activities in gene regulation, e.g., whether or not a TF is activated. Mining switching patterns provides insights into exploring regulation mechanism of many biological processes. / Finally, the semi-blind NFA learning is applied to identify those single nucleotide polymorphisms (SNPs) that are significantly associated with a disease or a complex trait from exome sequencing data. By encoding each exon/gene (which may contain multiple SNPs) as a vector, an NFA classifier, obtained in a supervised way on a training set, is used for prediction on a testing set. The genes are selected according to the p-values of Fisher’s exact test on the confusion tables collected from prediction results. The selected genes on a real dataset from an exome sequencing project on psoriasis are consistent in part with published results, and some of them are probably novel susceptible genes of the disease according to the validation results. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Tu, Shikui. / Thesis (Ph.D.)--Chinese University of Hong Kong, 2012. / Includes bibliographical references (leaves 196-212). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Abstract also in Chinese. / Abstract --- p.i / Acknowledgement --- p.iv / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Background --- p.1 / Chapter 1.1.1 --- Motivations --- p.1 / Chapter 1.1.2 --- Independent Factor Analysis (IFA) --- p.2 / Chapter 1.1.3 --- Learning Methods --- p.6 / Chapter 1.2 --- Related Work --- p.14 / Chapter 1.2.1 --- Learning Gaussian FA --- p.14 / Chapter 1.2.2 --- Learning NFA --- p.16 / Chapter 1.2.3 --- Learning Semi-blind NFA --- p.18 / Chapter 1.3 --- Main Contribution of the Thesis --- p.18 / Chapter 1.4 --- Thesis Organization --- p.25 / Chapter 1.5 --- Publication List --- p.27 / Chapter 2 --- FA comparative analysis --- p.31 / Chapter 2.1 --- Determining the factor number --- p.32 / Chapter 2.2 --- Model Selection Methods --- p.34 / Chapter 2.2.1 --- Two-Stage Procedure and Classical Model Selection Criteria --- p.34 / Chapter 2.2.2 --- Kritchman&Nadler's Hypothesis Test (KN) --- p.35 / Chapter 2.2.3 --- Minimax Rank Estimation (MM) --- p.37 / Chapter 2.2.4 --- Minka's Criterion (MK) for PCA --- p.38 / Chapter 2.2.5 --- Bayesian Ying-Yang (BYY) Harmony Learning --- p.39 / Chapter 2.3 --- Empirical Analysis --- p.42 / Chapter 2.3.1 --- A New Tool for Empirical Comparison --- p.42 / Chapter 2.3.2 --- Investigation On Model Selection Performance --- p.44 / Chapter 2.4 --- A Theoretic Underestimation Partial Order --- p.49 / Chapter 2.4.1 --- Events of Estimating the Hidden Dimensionality --- p.49 / Chapter 2.4.2 --- The Structural Property of the Criterion Function --- p.49 / Chapter 2.4.3 --- Experimental Justification --- p.54 / Chapter 2.5 --- Concluding Remarks --- p.58 / Chapter 3 --- FA parameterizations affect model selection --- p.70 / Chapter 3.1 --- Parameterization Issue in Model Selection --- p.71 / Chapter 3.2 --- FAr: ML-equivalent Parameterizations of FA --- p.72 / Chapter 3.3 --- Variational Bayes on FAr --- p.74 / Chapter 3.4 --- Bayesian Ying-Yang Harmony Learning on FAr --- p.77 / Chapter 3.5 --- Empirical Analysis --- p.82 / Chapter 3.5.1 --- Three levels of investigations --- p.82 / Chapter 3.5.2 --- FA-a vs FA-b: performances of BYY, VB, AIC, BIC, and DNLL --- p.84 / Chapter 3.5.3 --- FA-r: performances of VB versus BYY --- p.87 / Chapter 3.5.4 --- FA-a vs FA-b: automatic model selection performance of BYYandVB --- p.90 / Chapter 3.5.5 --- Classification Performance on Real World Data Sets --- p.92 / Chapter 3.6 --- Concluding remarks --- p.93 / Chapter 4 --- BFA learning versus optimization --- p.104 / Chapter 4.1 --- Binary Factor Analysis --- p.105 / Chapter 4.2 --- BYY Harmony Learning on BFA --- p.107 / Chapter 4.3 --- Empirical Analysis --- p.108 / Chapter 4.3.1 --- BIC and Variational Bayes (VB) on BFA --- p.108 / Chapter 4.3.2 --- Error in solving BQP affects model selection --- p.110 / Chapter 4.3.3 --- Priors over parameters affect model selection --- p.114 / Chapter 4.3.4 --- Comparisons among BYY, VB, and BIC --- p.115 / Chapter 4.3.5 --- Applications in recovering binary images --- p.116 / Chapter 4.4 --- Concluding Remarks --- p.117 / Chapter 5 --- BMF for PPI network analysis --- p.124 / Chapter 5.1 --- The problem of protein complex prediction --- p.125 / Chapter 5.2 --- A novel binary matrix factorization (BMF) algorithm --- p.126 / Chapter 5.3 --- Experimental Results --- p.130 / Chapter 5.3.1 --- Other methods in comparison --- p.130 / Chapter 5.3.2 --- Data sets --- p.131 / Chapter 5.3.3 --- Evaluation criteria --- p.131 / Chapter 5.3.4 --- On altered graphs by randomly adding and deleting edges --- p.132 / Chapter 5.3.5 --- On real PPI data sets --- p.137 / Chapter 5.3.6 --- On gene expression data for biclustering --- p.137 / Chapter 5.4 --- A Theoretical Analysis on BYY-BMF --- p.138 / Chapter 5.4.1 --- Main results --- p.138 / Chapter 5.4.2 --- Experimental justification --- p.140 / Chapter 5.4.3 --- Proofs --- p.143 / Chapter 5.5 --- Concluding Remarks --- p.147 / Chapter 6 --- Semi-blind NFA: algorithms and applications --- p.148 / Chapter 6.1 --- Determining transcription factor activity --- p.148 / Chapter 6.1.1 --- A brief review on NCA --- p.149 / Chapter 6.1.2 --- Sparse NFA --- p.150 / Chapter 6.1.3 --- Sparse BFA --- p.156 / Chapter 6.1.4 --- On Yeast cell-cycle data --- p.160 / Chapter 6.1.5 --- On E. coli carbon source transition data --- p.166 / Chapter 6.2 --- Concluding Remarks --- p.170 / Chapter 7 --- Applications on Exome Sequencing Data Analysis --- p.172 / Chapter 7.1 --- From GWAS to Exome Sequencing --- p.172 / Chapter 7.2 --- Encoding An Exon/Gene --- p.173 / Chapter 7.3 --- An NFA Classifier --- p.175 / Chapter 7.4 --- Results --- p.176 / Chapter 7.4.1 --- Simulation --- p.176 / Chapter 7.4.2 --- On a real exome sequencing data set: AHMUe --- p.177 / Chapter 7.5 --- Concluding Remarks --- p.186 / Chapter 8 --- Conclusion and FutureWork --- p.187 / Chapter A --- Derivations of the learning algorithms on FA-r --- p.190 / Chapter A.1 --- The VB learning algorithm on FA-r --- p.190 / Chapter A.2 --- The BYY learning algorithm on FA-r --- p.193 / Bibliography --- p.195
27

Statistical Methods for Integrated Cancer Genomic Data Using a Joint Latent Variable Model

Drill, Esther January 2018 (has links)
Inspired by the TCGA (The Cancer Genome Atlas), we explore multimodal genomic datasets with integrative methods using a joint latent variable approach. We use iCluster+, an existing clustering method for integrative data, to identify potential subtypes within TCGA sarcoma and mesothelioma tumors, and across a large cohort of 33 dierent TCGA cancer datasets. For classication, motivated to improve the prediction of platinum resistance in high grade serous ovarian cancer (HGSOC) treatment, we propose novel integrative methods, iClassify to perform classication using a joint latent variable model. iClassify provides eective data integration and classication while handling heterogeneous data types, while providing a natural framework to incorporate covariate risk factors and examine genomic driver by covariate risk factor interaction. Feature selection is performed through a thresholding parameter that combines both latent variable and feature coecients. We demonstrate increased accuracy in classication over methods that assume homogeneous data type, such as linear discriminant analysis and penalized logistic regression, and improved feature selection. We apply iClassify to a TCGA cohort of HGSOC patients with three types of genomic data and platinum response data. This methodology has broad applications beyond predicting treatment outcomes and disease progression in cancer, including predicting prognosis and diagnosis in other diseases with major public health implications.
28

Three Contributions to Latent Variable Modeling

Liu, Xiang January 2019 (has links)
The dissertation includes three papers that address some theoretical and technical issues of latent variable models. The first paper extends the uniformly most powerful test approach for testing person parameter in IRT to the two-parameter logistic models. In addition, an efficient branch-and-bound algorithm for computing the exact p-value is proposed. The second paper proposes a reparameterization of the log-linear CDM model. A Gibbs sampler is developed for posterior computation. The third paper proposes an ordered latent class model with infinite classes using a stochastic process prior. Furthermore, a nonparametric IRT application is also discussed.
29

In silico analysis of C-type lectin domains’ structure and properties

Zelensky, Alex N., Alex.Zelensky@anu.edu.au January 2005 (has links)
Members of the C-type lectin domain (CTLD) superfamily are metazoan proteins functionally important in glycoprotein metabolism, mechanisms of multicellular integration and immunity. This thesis presents the results of several computational and experimental studies of the CTLD structure, function and evolution.¶ Core structural properties of the CTLD fold were explored in a comparative analysis of the 37 distinct CTLD structures available publicly, which demonstrate significant structural conservation despite low or undetectable sequence similarity. Pairwise structural alignments of all CTLD structures were created with three different methods (DALI, CE and LOCK) and analysed manually and using a computational algorithm developed for this purpose. The analysis revealed a set of conserved positions and interactions, which were classified based on their role in CTLD structure maintenance.¶ The CTLD family is large and diverse. To organize and annotate the several thousand of known CTLD-containing protein sequences and integrate the information on their evolution, structure and function a local database and a web-based interface to it were developed. The software is written in Perl, is based on bioperl, bioperl-db and Apache::ASP modules, and can be used for collaborative annotation of any collection of phylogenetically related sequences.¶ Several studies of CTLD genomics were performed. In one such study, carried out in collaboration with the RIKEN structural genomics centre, CTLD sequences from the Caenorhabditis elegans genome were identified and clustered into groups based on similarity. The most representative members of the groups were then selected, which if characterized structurally would tell most about the C. elegans CTLDs and provide templates for homology modelling of all C. elegans CTLD structures.¶ In the other whole-genome study, the CTLD family in the puffer fish Fugu rubripes was analysed using the draft genome sequence. This work extended and complemented three genome-level surveys on human, C. elegans and D. melanogaster reported previously. The study showed that the CTLD repertoire of Fugu rubripes is very similar to that of mammals, although several interesting differences exist, and that Fugu CTLD-encoding genes are selectively duplicated in a manner suggesting an ancient large-scale duplication event. Another important finding was the identification of several new CTLDcps, which had mammalian orthologues not recognized previously.¶ CBCP, a novel CTLD-containing protein highly conserved between fish and mammals with previously unknown domain architecture, was predicted in the Fugu study based solely on ab initio gene models from the Fugu locus and cross-species genomic DNA alignments. To test if the prediction was correct, a full-length cDNA of the mouse CBCP was cloned, its tissue distribution characterized and untranslated regions determined by RACE. The full-length mCBCP transcript is 10 kb long, encodes a protein of 2172 amino acids and confirms the original prediction. The presence of a large N-terminal NG2 domain makes CBCP a member of a small but very interesting family of Metazoan proteins.
30

Exploratory market structure analysis. Topology-sensitive methodology.

Mazanec, Josef January 1999 (has links) (PDF)
Given the recent abundance of brand choice data from scanner panels market researchers have neglected the measurement and analysis of perceptions. Heterogeneity of perceptions is still a largely unexplored issue in market structure and segmentation studies. Over the last decade various parametric approaches toward modelling segmented perception-preference structures such as combined MDS and Latent Class procedures have been introduced. These methods, however, are not taylored for qualitative data describing consumers' redundant and fuzzy perceptions of brand images. A completely different method is based on topology-sensitive vector quantization (VQ) for consumers-by-brands-by-attributes data. It maps the segment-specific perceptual structures into bubble-pie-bar charts with multiple brand positions demonstrating perceptual distinctiveness or similarity. Though the analysis proceeds without any distributional assumptions it allows for significance testing. The application of exploratory and inferential data processing steps to the same data base is statistically sound and particularly attractive for market structure analysts. A brief outline of the VQ method is followed by a sample study with travel market data which proved to be particularly troublesome for conventional processing tools. (author's abstract) / Series: Report Series SFB "Adaptive Information Systems and Modelling in Economics and Management Science"

Page generated in 0.063 seconds