• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 3081
  • 943
  • 353
  • 314
  • 185
  • 108
  • 49
  • 49
  • 49
  • 49
  • 49
  • 48
  • 40
  • 37
  • 30
  • Tagged with
  • 6330
  • 1456
  • 1126
  • 1081
  • 845
  • 741
  • 735
  • 723
  • 651
  • 625
  • 510
  • 493
  • 484
  • 484
  • 457
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
331

Studies on variations on the minority game.

January 2002 (has links)
Lim Sze Wah. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2002. / Includes bibliographical references (leaves 71-73). / Abstracts in English and Chinese. / Chapter 1 --- Introduction --- p.1 / Chapter 2 --- A brief review on the basic minority game --- p.6 / Chapter 2.1 --- Model --- p.7 / Chapter 2.2 --- Results --- p.9 / Chapter 2.3 --- Discussion --- p.11 / Chapter 2.3.1 --- Bit-string Statistics and Market Efficiency --- p.11 / Chapter 2.3.2 --- Crowds and Anticrowds Effect --- p.14 / Chapter 2.3.3 --- Hamming Distance and Reduced Strategies Space --- p.15 / Chapter 3 --- A brief review on existing variations on the minority game --- p.17 / Chapter 3.1 --- Darwinism process and MG --- p.17 / Chapter 3.2 --- Evolutionary MG (EMG) --- p.17 / Chapter 3.3 --- Modified EMG (MEMG) --- p.18 / Chapter 3.4 --- MG with arbitrary cutoff --- p.18 / Chapter 3.5 --- Thermal MG (TMG) --- p.19 / Chapter 3.6 --- Three-Sided MG --- p.19 / Chapter 3.7 --- MG with variable payoffs --- p.19 / Chapter 4 --- Minority game with varying number of participants --- p.21 / Chapter 4.1 --- The modified MG --- p.22 / Chapter 4.1.1 --- Model --- p.22 / Chapter 4.1.2 --- Results --- p.23 / Chapter 4.2 --- Mixed-population --- p.33 / Chapter 4.2.1 --- Model --- p.33 / Chapter 4.2.2 --- Results --- p.33 / Chapter 4.2.3 --- Discussions --- p.37 / Chapter 5 --- Minority game considering recent performance of strategies --- p.39 / Chapter 5.1 --- The modified MG --- p.40 / Chapter 5.1.1 --- Model --- p.40 / Chapter 5.1.2 --- Results --- p.40 / Chapter 5.2 --- Mixed-population --- p.46 / Chapter 5.2.1 --- Model --- p.46 / Chapter 5.2.2 --- Results --- p.47 / Chapter 5.2.3 --- Discussions --- p.50 / Chapter 6 --- Minority game combining both modifications --- p.52 / Chapter 6.1 --- Model --- p.52 / Chapter 6.2 --- Results --- p.53 / Chapter 7 --- Conclusion --- p.68 / Bibliography --- p.71
332

Transcriptome analysis and applications based on next-generation RNA sequencing data. / CUHK electronic theses & dissertations collection

January 2012 (has links)
二代cDNA测序技术,又名“RNA-Seq“,为转录组(transcriptome)的研究提供了新的手段。作为革命性的技术方法,RNA-Seq 不仅可以帮助准确测量转录体(transcript)的表达水平,更可以发现新的转录体和揭示转录调控的机理。同时,整合多个不同水平的测序数据,例如基因组(genome)测序,甲基化组(methylome)测序等,可以为深入挖掘生物学意义提供一个强有力的的工具。 / 我的博士研究主要集中在二代测序(next-generation sequencing,NGS),特别是RNA-Seq数据的分析。它主要包含三部分:分析工具开发,数据分析和机理研究。 / 大量测序数据的分析对于二代测序技术来说是一个重大的挑战。目前,相对于剪接比对工具(splice-aware aligner),普通比对工具可以极速(ultrafast)的将数以千万记的短序列(Reads)比对到基因组,但是他们很难处理那些跨过剪接位点(splice junction)的短序列(spliced reads)或者匹配多个基因组位置的短序列(multireads)。我们开发了一个利用two-seed策略的全新的序列比对工具-ABMapper。基准测试(Benchmark test) 结果显示ABMapper比其他的同类工具:TopHat和SpliceMap有更高的accuracy和recall。另一方面,spliced reads和multireads在基因组上会有多个匹配的位置,选择最可能的位置也成为一个大问题。在计算基因表达值时,multireads和spliced reads常会被随机的选定其中之一,或者直接被排除。这种处理方式会引入偏差而直接影响下游(downstream)分析的准确性。为了解决multireads和spliced reads位置选择问题,我们提出了一个利用内含子(intron)长度的Geometric-tail (GT) 经验分布的最大似然估计 (maximum likelihood estimation) 的方法。这个概率模型可以适用于剪接位点位于短序列上或者位于成对短序列(Pair-ended, PE) 之间的情况。基于这个模型,我们可以更好的确定那些在基因组上存在多个匹配的成对短序列(pair-ended, PE reads)的最可能位置。 / 测序数据的积累为深入研究生物学意义提供了丰富的资源。利用RNA-Seq数据和甲基化测序数据,我们建立了一个基于DNA甲基化模式 (pattern) 的基因表达水平的预测模型。根据这个模型,我们发现DNA甲基化可以相当准确的预测基因表达水平,准确率达到78%。我们还发现基因主体上的DNA甲基化比启动子 (promoter) 附近的更重要。最后我们还从整合所有甲基化模式和CpG模式的组合数据集中,利用特征筛选(feature selection)选择了一个最优化子集。我们基于最优子集建立了特征重叠作用网络,进一步揭示了DNA甲基化模式对于基因表达的协作调控机理。 / 除了开发RNA-Seq数据分析的工具和数据挖掘,我们还分析斑马鱼(zebrafish)的转录组(transcriptome)。RNA-Seq数据分析结合荧光成像,定量PCR等生物学实验,揭示了Calycosin处理之后的相关作用通路(pathway)和差异表达基因,分析结果还证明了Calycosin在体内的血管生成活性。 / 综上所述,本论文将会详细阐述我在二代测序数据分析,基于数据挖掘的生物学意义的发现和转录组分析方面的工作。 / The recent development of next generation RNA-sequencing, termed ‘RNA-Seq’, has offered an opportunity to explore the RNA transcripts from the whole transcriptome. As a revolutionary method, RNA-Seq not only could precisely measure the abundances of transcripts, but discover the novel transcribed contents and uncover the unknown regulatory mechanisms. Meanwhile, the combination of different levels of next-generation sequencing, such as genome sequencing and methylome sequencing has provided a powerful tool for novel discovery in the biological context. / My PhD study focuses on the analysis of next-generation sequencing data, especially on RNA-Seq data. It mainly includes three parts: pipeline development analysis, data analysis and mechanistic study. / As the next-generation sequencing (NGS) technology, the analysis of massive NGS data is a great challenge. Many existing general aligners (as contrast to splicing-aware alignment tools) are capable of mapping millions of sequencing reads onto a reference genome. However, they are neither designed for reads that span across splice junctions (spliced reads) nor for reads that could match multiple locations along the reference genome (multireads). Hence, we have developed an ab initio mapping method - ABMapper, using two-seed strategy. The benchmark results show that ABMapper can get higher accuracy and recall compared with the same kind of tools: TopHat and SpliceMap. On the other hand, the selection of the most probable location for spliced reads and multireads becomes a big problem. These reads are randomly assigned to one of the possible locations or discarded completely when calculating the expression level, which would bias the downstream analysis, such as the differentiated expression analysis and alternative splicing analysis. To rationally determine the location of spliced reads and multireads, we have proposed a maximum likelihood estimation method based on a geometric-tail (GT) distribution of intron length. This probabilistic model deals with splice junctions between reads, or those encompassed in one or both of a pair-ended (PE) reads. Based on this model, multiple alignments of reads within a PE pair can be properly resolved. / The accumulation of NGS data has provided rich resources for deep discovery of biological significance. We have integrated RNA-Seq data and methylation sequencing data to build a predictive model for the regulation of gene expression based on DNA methylation patterns. We found that DNA methylation could predict gene expression fairly accurately and the accuracy can reach up to 78%. We have also found DNA methylation at gene body is the most important region in these models, even more useful than promoter. Finally, feature overlap network based on an optimum subset of combination of all methylation patterns and CpG patterns has indicated the collaborative regulation of gene expression by DNA methylation patterns. / Not only new algorithms were developed to facilitate the RNA-Seq data analysis, but the transcriptome analysis was performed on zebrafish. The analysis of differentially-expressed genes and pathways involved after calycosin treatment, combined with other experimental evidence such as fluorescence microscopy and quantitative real-time polymerase chain reaction (qPCR), has well demonstrated the proangiogenic effects of calycosin in vivo. / In summary, this thesis detailed my work on NGS data analysis, discovery of biological significance using data-mining algorithms and transcriptome analysis. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Lou, Shaoke. / Thesis (Ph.D.)--Chinese University of Hong Kong, 2012. / Includes bibliographical references (leaves 135-146). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Abstract also in Chinese. / 摘要 --- p.iii / Acknowledgement --- p.v / Chapter Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Bioinformatics --- p.1 / Chapter 1.2 --- Bioinformatics application --- p.1 / Chapter 1.3 --- Motivation --- p.2 / Chapter 1.4 --- Objectives --- p.3 / Chapter 1.5 --- Thesis outline --- p.3 / Chapter Chapter 2 --- Background --- p.4 / Chapter 2.1 --- Biological and biotechnology background --- p.4 / Chapter 2.1.1 --- Central dogma and biology ABC --- p.4 / Chapter 2.1.2 --- Transcription --- p.5 / Chapter 2.1.3 --- Splicing and Alternative Splicing --- p.6 / Chapter 2.1.4 --- Next-generation Sequencing --- p.10 / Chapter 2.1.5 --- RNA-Seq --- p.18 / Chapter 2.2 --- Computational background --- p.20 / Chapter 2.2.1 --- Approximate string matching and read mapping --- p.21 / Chapter 2.2.2 --- Read mapping algorithms and tools --- p.22 / Chapter 2.2.3 --- Spliced alignment tools --- p.27 / Chapter Chapter 3 --- ABMapper: a two-seed based spliced alignment tool --- p.29 / Chapter 3.1 --- Introduction --- p.29 / Chapter 3.2 --- State-of-the-art --- p.30 / Chapter 3.3 --- Problem formulation --- p.31 / Chapter 3.4 --- Methods --- p.33 / Chapter 3.5 --- Results --- p.35 / Chapter 3.5.1 --- Benchmark test --- p.35 / Chapter 3.5.2 --- Complexity analysis --- p.39 / Chapter 3.5.3 --- Comparison with other tools --- p.39 / Chapter 3.6 --- Discussion and conclusion --- p.41 / Chapter Chapter 4 --- Geometric-tail (GT) model for rational selection of RNA-Seq read location --- p.42 / Chapter 4.1 --- Introduction --- p.42 / Chapter 4.2 --- State-of-the-art --- p.44 / Chapter 4.3 --- Problem formulation --- p.44 / Chapter 4.4 --- Algorithms --- p.45 / Chapter 4.5 --- Results --- p.49 / Chapter 4.5.1 --- Workflow of GT MLE method --- p.49 / Chapter 4.5.2 --- GT distribution and insert-size distribution --- p.50 / Chapter 4.5.3 --- Multiread analysis --- p.51 / Chapter 4.5.4 --- Splice-site comparison --- p.52 / Chapter 4.6 --- Discussion and conclusion --- p.55 / Chapter Chapter 5 --- Explore relationship between methylation patterns and gene expression --- p.56 / Chapter 5.1 --- Introduction --- p.56 / Chapter 5.2 --- State-of-the-art --- p.58 / Chapter 5.3 --- Problem formulation --- p.62 / Chapter 5.4 --- Methods --- p.62 / Chapter 5.4.1 --- NGS sequencing and analysis --- p.62 / Chapter 5.4.2 --- Data preparation and transformation --- p.64 / Chapter 5.4.3 --- Random forest (RF) classification and regression --- p.65 / Chapter 5.5 --- Results --- p.68 / Chapter 5.5.1 --- Genome wide profiling of methylation --- p.68 / Chapter 5.5.2. --- Aggregation plot of methylation levels at different regions --- p.72 / Chapter 5.5.3. --- Scatterplot between methylation and gene expression --- p.75 / Chapter 5.5.4 --- Predictive model of gene expression using DNA methylation features --- p.76 / Chapter 5.5.5 --- Comb-model based on the full dataset --- p.87 / Chapter 5.6 --- Discussion and conclusion --- p.98 / Chapter Chapter 6 --- RNA-Seq data analysis and applications --- p.99 / Chapter 6.1 --- Transcriptional Profiling of Angiogenesis Activities of Calycosin in Zebrafish --- p.99 / Chapter 6.1.1 --- Introduction --- p.99 / Chapter 6.1.2 --- Background --- p.100 / Chapter 6.1.3 --- Materials and methods and ethics statement --- p.101 / Chapter 6.1.4 --- Results --- p.104 / Chapter 6.1.5 --- Conclusion --- p.108 / Chapter 6.2 --- An integrated web medicinal materials DNA database: MMDBD (Medicinal Materials DNA Barcode Database). --- p.110 / Chapter 6.2.1 --- Introduction --- p.110 / Chapter 6.2.2 --- Background --- p.110 / Chapter 6.2.3 --- Construction and content --- p.113 / Chapter 6.2.4 --- Utility and discussion --- p.116 / Chapter 6.2.5 --- Conclusion and future development --- p.119 / Chapter Chapter 7 --- Conclusion --- p.121 / Chapter 7.1 --- Conclusion --- p.121 / Chapter 7.2 --- Future work --- p.123 / Appendix --- p.124 / Chapter A1. --- Descriptive analysis of trio data --- p.124 / Chapter A2. --- Whole genome methylation level profiling --- p.125 / Chapter A3. --- Global sliding window correlation between individuals --- p.128 / Chapter A4. --- Features selected after second-run filtering --- p.133 / Bibliography --- p.135 / Chapter A. --- Publications --- p.135 / Reference --- p.135
333

Bayesian approach for two model-selection-related bioinformatics problems. / CUHK electronic theses & dissertations collection

January 2013 (has links)
在貝葉斯推理框架下,貝葉斯方法可以通過數據推斷複雜概率模型中的參數和結構。它被廣泛應用於多个領域。對於生物信息學問題,貝葉斯方法同樣也是一個理想的方法。本文通過介紹新的貝葉斯模型和計算方法討論並解決了兩個與模型選擇相關的生物信息學問題。 / 第一個問題是關於在DNA 序列中的模式識別的相關研究。串聯重複序列片段在DNA 序列中經常出現。它對於基因組進化和人類疾病的研究非常重要。在這一部分,本文主要討論不確定數目的同一模式的串聯重複序列彌散分佈在同一個序列中的情況。我們首先對串聯重複序列片段構建概率模型。然後利用馬爾可夫鏈蒙特卡羅算法探索後驗分佈進而推斷出串聯重複序列的重複片段的模式矩陣和位置。此外,利用RJMCMC 算法解決由不確定數目的重複片段引起的模型選擇問題。 / 另一個問題是對於生物分子的構象轉換的分析。一組生物分子的構象可被分成幾個不同的亞穩定狀態。由於生物分子的功能和構象之間的固有聯繫,構象轉變在不同的生物分子的生物過程中都扮演者非常重要的角色。一般我們從分子動力學模擬中可以得到構象轉換的數據。基於從分子動力學模擬中得到的微觀狀態水準上的構象轉換資訊,我們利用貝葉斯方法研究從微觀狀態到可變數目的亞穩定狀態的聚合問題。 / 本文通過對以上兩個問題討論闡釋貝葉斯方法在生物信息學研究的多個方面具備優勢。這包括闡述生物問題的多變性,處理噪聲和失數據,以及解決模型選擇問題。 / Bayesian approach is a powerful framework for inferring the parameters and structures of complicated probabilistic models from data. It is widely applied in many areas and also ideal for Bioinformatics problems due to their usually high complexity. In this thesis, new Bayesian models and computing methods are introduced to solve two Bioinformatics problems which are both related to model selection. / The first problem is about the repeat pattern recognition. Tandem repeats occur frequently in DNA sequences. They are important for studying genome evolution and human disease. This thesis focuses on the case that an unknown number of tandem repeat segments of the same pattern are dispersively distributed in a sequence. A probabilistic generative model is introduced for the tandem repeats. Markov chain Monte Carlo algorithms are used to explore the posterior distribution as an effort to infer both the specific pattern of the tandem repeats and the location of repeat segments. Furthermore, reversible jump Markov chain Monte Carlo algorithms are used to address the transdimensional model selection problem raised by the variable number of repeat segments. / The second part of this thesis is engaged in the conformational transitions of biomolecules. Because the function of a biological biomolecule is inherently related to its variable conformations which can be grouped into a set of metastable or long-live states, conformational transitions are important in biological processes. The 3D structure changes are generally simulated from the molecular dynamics computer simulation. Based on the conformational transitions on microstate level from molecular dynamics simulation, a Bayesian approach is developed to cluster the microstates into an uncertainty number of metastable that induces the model selection problem. / With these two problems, this thesis shows that the Bayesian approach for bioinformatics problems has its advantages in terms of taking account of the inherent uncertainty in biological data, handling noisy or missing data, and dealing with the model selection problem. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Liang, Tong. / Thesis (Ph.D.)--Chinese University of Hong Kong, 2013. / Includes bibliographical references (leaves 120-130). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Abstracts also in Chinese. / Abstract --- p.i / Acknowledgement --- p.iv / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Motivation --- p.1 / Chapter 1.2 --- Statistical Background --- p.2 / Chapter 1.3 --- Tandem Repeats --- p.4 / Chapter 1.4 --- Conformational Space --- p.5 / Chapter 1.5 --- Outlines --- p.7 / Chapter 2 --- Preliminaries --- p.9 / Chapter 2.1 --- Bayesian Inference --- p.9 / Chapter 2.2 --- Markov chain Monte Carlo --- p.10 / Chapter 2.2.1 --- Gibbs sampling --- p.11 / Chapter 2.2.2 --- Metropolis - Hastings algorithm --- p.12 / Chapter 2.2.3 --- Reversible Jump MCMC --- p.12 / Chapter 3 --- Detection of Dispersed Short Tandem Repeats Using Reversible Jump MCMC --- p.14 / Chapter 3.1 --- Background --- p.14 / Chapter 3.2 --- Generative Model --- p.17 / Chapter 3.3 --- Statistical inference --- p.18 / Chapter 3.3.1 --- Likelihood --- p.19 / Chapter 3.3.2 --- Prior Distributions --- p.19 / Chapter 3.3.3 --- Sampling from Posterior Distribution via RJMCMC --- p.20 / Chapter 3.3.4 --- Extra MCMC moves for better mixing --- p.26 / Chapter 3.3.5 --- The complete algorithm --- p.29 / Chapter 3.4 --- Experiments --- p.29 / Chapter 3.4.1 --- Evaluation and comparison of the two RJMCMC versions using synthetic data --- p.30 / Chapter 3.4.2 --- Comparison with existing methods using synthetic data --- p.33 / Chapter 3.4.3 --- Sensitivity to Priors --- p.43 / Chapter 3.4.4 --- Real data experiment --- p.45 / Chapter 3.5 --- Discussion --- p.50 / Chapter 4 --- A Probabilistic Clustering Algorithm for Conformational Changes of Biomolecules --- p.53 / Chapter 4.1 --- Introduction --- p.53 / Chapter 4.1.1 --- Molecular dynamic simulation --- p.54 / Chapter 4.1.2 --- Hierarchical Conformational Space --- p.55 / Chapter 4.1.3 --- Clustering Algorithms --- p.56 / Chapter 4.2 --- Generative Model --- p.58 / Chapter 4.2.1 --- Model 1: Vanilla Model --- p.59 / Chapter 4.2.2 --- Model 2: Zero-Inflated Model --- p.60 / Chapter 4.2.3 --- Model 3: Constrained Model --- p.61 / Chapter 4.2.4 --- Model 4: Constrained and Zero-Inflated Model --- p.61 / Chapter 4.3 --- Statistical Inference for Vanilla Model --- p.62 / Chapter 4.3.1 --- Priors --- p.62 / Chapter 4.3.2 --- Posterior distribution --- p.63 / Chapter 4.3.3 --- Collapsed Gibbs for Vanilla Model with a Fixed Number of Clusters --- p.63 / Chapter 4.3.4 --- Inference on the Number of Clusters --- p.65 / Chapter 4.3.5 --- Synthetic Data Study --- p.68 / Chapter 4.4 --- Statistical Inference for Zero-Inflated Model --- p.76 / Chapter 4.4.1 --- Method 1 --- p.78 / Chapter 4.4.2 --- Method 2 --- p.81 / Chapter 4.4.3 --- Synthetic Data Study --- p.84 / Chapter 4.5 --- Statistical Inference for Constrained Model --- p.85 / Chapter 4.5.1 --- Priors --- p.85 / Chapter 4.5.2 --- Posterior Distribution --- p.86 / Chapter 4.5.3 --- Collapsed Posterior Distribution --- p.86 / Chapter 4.5.4 --- Updating for Cluster Labels K --- p.89 / Chapter 4.5.5 --- Updating for Constrained Λ from Truncated Distribution --- p.89 / Chapter 4.5.6 --- Updating the Number of Clusters --- p.91 / Chapter 4.5.7 --- Uniform Background Parameters on Λ --- p.92 / Chapter 4.6 --- Real Data Experiments --- p.93 / Chapter 4.7 --- Discussion --- p.104 / Chapter 5 --- Conclusion and FutureWork --- p.107 / Chapter A --- Appendix --- p.109 / Chapter A.1 --- Post-processing for indel treatment --- p.109 / Chapter A.2 --- Consistency Score --- p.111 / Chapter A.3 --- A Proof for Collapsed Posterior distribution in Constrained Model in Chapter 4 --- p.111 / Chapter A.4 --- Estimated Transition Matrices for Alanine Dipeptide by Chodera et al. (2006) --- p.117 / Bibliography --- p.120
334

Simulation for tests on the validity of the assumption that the underlying distribution of life is exponential

Thoppil, Anjo January 2010 (has links)
Typescript (photocopy). / Digitized by Kansas Correctional Industries
335

A comparison of normal theory and bootstrap confidence intervals on the parameters of nonlinear models

Elling, Mary Margaret January 2011 (has links)
Typescript (photocopy). / Digitized by Kansas Correctional Industries
336

Selected Legal Applications for Bayesian Methods

Cheng, Edward K. January 2018 (has links)
This dissertation offers three contexts in which Bayesian methods can address tricky problems in the legal system. Chapter 1 offers a method for attacking case publication bias, the possibility that certain legal outcomes may be more likely to be published or observed than others. It builds on ideas from multiple systems estimation (MSE), a technique traditionally used for estimating hidden populations, to detect and correct case publication bias. Chapter 2 proposes new methods for dividing attorneys' fees in complex litigation involving multiple firms. It investigates optimization and statistical approaches that use peer reports of each firm's relative contribution to estimate a "fair" or consensus division of the fees. The methods proposed have lower informational requirements than previous work and appear to be robust to collusive behavior by the firms. Chapter 3 introduces a statistical method for classifying legal cases by doctrinal area or subject matter. It proposes using a latent space approach based on case citations as an alternative to the traditional manual coding of cases, reducing subjectivity, arbitrariness, and confirmation bias in the classification process.
337

Bayesian variable selection for high dimensional data analysis. / CUHK electronic theses & dissertations collection

January 2010 (has links)
In the practice of statistical modeling, it is often desirable to have an accurate predictive model. Modern data sets usually have a large number of predictors. For example, DNA microarray gene expression data usually have the characteristics of fewer observations and larger number of variables. Hence parsimony is especially an important issue. Best-subset selection is a conventional method of variable selection. Due to the large number of variables with relatively small sample size and severe collinearity among the variables, standard statistical methods for selecting relevant variables often face difficulties. / In the third part of the thesis, we propose a Bayesian stochastic search variable selection approach for multi-class classification, which can identify relevant genes by assessing sets of genes jointly. We consider a multinomial probit model with a generalized g-prior for the regression coefficients. An efficient algorithm using simulation-based MCMC methods are developed for simulating parameters from the posterior distribution. This algorithm is robust to the choice of initial value, and produces posterior probabilities of relevant genes for biological interpretation. We demonstrate the performance of the approach with two well- known gene expression profiling data: leukemia data and lymphoma data. Compared with other classification approaches, our approach selects smaller numbers of relevant genes and obtains competitive classification accuracy based on obtained results. / The last part of the thesis is about the further research, which presents a stochastic variable selection approach with different two-level hierarchical prior distributions. These priors can be used as a sparsity-enforcing mechanism to perform gene selection for classification. Using simulation-based MCMC methods for simulating parameters from the posterior distribution, an efficient algorithm can be developed and implemented. / The second part of the thesis proposes a Bayesian stochastic variable selection approach for gene selection based on a probit regression model with a generalized singular g-prior distribution for regression coefficients. Using simulation-based MCMC methods for simulating parameters from the posterior distribution, an efficient and dependable algorithm is implemented. It is also shown that this algorithm is robust to the choice of initial values, and produces posterior probabilities of related genes for biological interpretation. The performance of the proposed approach is compared with other popular methods in gene selection and classification via the well known colon cancer and leukemia data sets in microarray literature. / Yang, Aijun. / Adviser: Xin-Yuan Song. / Source: Dissertation Abstracts International, Volume: 72-04, Section: B, page: . / Thesis (Ph.D.)--Chinese University of Hong Kong, 2010. / Includes bibliographical references (leaves 89-98). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Electronic reproduction. Ann Arbor, MI : ProQuest Information and Learning Company, [200-] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Abstract also in Chinese.
338

Analysis of structural equation models by Bayesian computation methods.

January 1996 (has links)
by Jian-Qing Shi. / Thesis (Ph.D.)--Chinese University of Hong Kong, 1996. / Includes bibliographical references (leaves 118-123). / Chapter Chapter 1. --- Introduction and overview --- p.1 / Chapter Chapter 2. --- General methodology --- p.8 / Chapter Chapter 3. --- A Bayesian approach to confirmatory factor analysis --- p.16 / Chapter 3.1 --- Confirmatory factor analysis model and its prior --- p.16 / Chapter 3.2 --- The algorithm of data augmentation --- p.19 / Chapter 3.2.1 --- Data augmentation and one-run method --- p.19 / Chapter 3.2.2 --- Rao-Blackwellized estimation --- p.22 / Chapter 3.3 --- Asymptotic properties --- p.28 / Chapter 3.3.1 --- Asymptotic normality and posterior covariance matrix --- p.28 / Chapter 3.3.2 --- Goodness-of-fit statistic --- p.31 / Chapter Chapter 4. --- Bayesian inference for structural equation models --- p.34 / Chapter 4.1 --- LISREL Model and prior information --- p.34 / Chapter 4.2 --- Algorithm and conditional distributions --- p.38 / Chapter 4.2.1 --- Data augmentation algorithm --- p.38 / Chapter 4.2.2 --- Conditional distributions --- p.39 / Chapter 4.3 --- Posterior analysis --- p.44 / Chapter 4.3.1 --- Rao-Blackwellized estimation --- p.44 / Chapter 4.3.2 --- Asymptotic properties and goodness-of-fit statistic --- p.45 / Chapter 4.4 --- Simulation study --- p.47 / Chapter Chapter 5. --- A Bayesian estimation of factor score with non-standard data --- p.52 / Chapter 5.1 --- General Bayesian approach to polytomous data --- p.52 / Chapter 5.2 --- Covariance matrix of the posterior distribution --- p.61 / Chapter 5.3 --- Data augmentation --- p.65 / Chapter 5.4 --- EM algorithm --- p.68 / Chapter 5.5 --- Analysis of censored data --- p.72 / Chapter 5.5.1 --- General Bayesian approach --- p.72 / Chapter 5.5.2 --- EM algorithm --- p.76 / Chapter 5.6 --- Analysis of truncated data --- p.78 / Chapter Chapter 6. --- Structural equation model with continuous and polytomous data --- p.82 / Chapter 6.1 --- Factor analysis model with continuous and polytomous data --- p.83 / Chapter 6.1.1 --- Model and Bayesian inference --- p.83 / Chapter 6.1.2 --- Gibbs sampler algorithm --- p.85 / Chapter 6.1.3 --- Thresholds parameters --- p.89 / Chapter 6.1.4 --- Posterior analysis --- p.92 / Chapter 6.2 --- LISREL model with continuous and polytomous data --- p.94 / Chapter 6.2.1 --- LISREL model and Bayesian inference --- p.94 / Chapter 6.2.2 --- Posterior analysis --- p.101 / Chapter 6.3 --- Simulation study --- p.103 / Chapter Chapter 7. --- Further development --- p.108 / Chapter 7.1 --- More about one-run method --- p.108 / Chapter 7.2 --- Structural equation model with censored data --- p.111 / Chapter 7.3 --- Multilevel structural equation model --- p.114 / References --- p.118 / Appendix --- p.124 / Chapter A.1 --- The derivation of conditional distribution --- p.124 / Chapter A.2 --- Generate a random variate from normal density which restricted in an interval --- p.129 / Tables --- p.132 / Figures --- p.155
339

Bayesian approach for a multigroup structural equation model with fixed covariates.

January 2003 (has links)
Oi-Ping Chiu. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2003. / Includes bibliographical references (leaves 45-46). / Abstracts in English and Chinese. / Chapter 1 --- Introduction --- p.1 / Chapter 2 --- Model --- p.4 / Chapter 2.1 --- General Model --- p.4 / Chapter 2.2 --- Constraint --- p.5 / Chapter 3 --- Bayesian Estimation via Gibbs Sampler --- p.7 / Chapter 3.1 --- Conditional Distributions --- p.10 / Chapter 3.2 --- Constraint --- p.15 / Chapter 3.3 --- Bayesian Estimation --- p.16 / Chapter 4 --- Model Comparison using the Bayes Factor --- p.18 / Chapter 5 --- Simulation Study --- p.22 / Chapter 6 --- Real Example --- p.27 / Chapter 6.1 --- Model Selection --- p.29 / Chapter 6.2 --- Bayesian Estimate --- p.30 / Chapter 6.3 --- Sensitivity Analysis --- p.31 / Chapter 7 --- Discussion --- p.32 / Chapter A --- p.34 / Bibliography --- p.45
340

Security of genetic databases

Giggins, Helen January 2009 (has links)
Research Doctorate - Doctor of Philosophy (PhD) / The rapid pace of growth in the field of human genetics has left researchers with many new challenges in the area of security and privacy. To encourage participation and foster trust towards research, it is important to ensure that genetic databases are adequately protected. This task is a particularly challenging one for statistical agencies due to the high prevalence of categorical data contained within statistical genetic databases. The absence of natural ordering makes the application of traditional Statistical Disclosure Control (SDC) methods less straightforward, which is why we have proposed a new noise addition technique for categorical values. The main contributions of the thesis are as follows. We provide a comprehensive analysis of the trust relationships that occur between the different stakeholders in a genetic data warehouse system. We also provide a quantifiable model of trust that allows the database manager to granulate the level of protection based on the amount of trust that exists between the stakeholders. To the best of our knowledge, this is the first time that trust has been applied in the SDC context. We propose a privacy protection framework for genetic databases which is designed to deal with the fact that genetic data warehouses typically contain a high proportion of categorical data. The framework includes the use of a clustering technique which allows for the easier application of traditional noise addition techniques for categorical values. Another important contribution of this thesis is a new similarity measure for categorical values, which aims to capture not only the direct similarity between values, but also some sense of transitive similarity. This novel measure also has possible applications in providing a way of ordering categorical values, so that more traditional SDC methods can be more easily applied to them. Our analysis of experimental results also points to a numerical attribute phenomenon, whereby we typically have high similarity between numerical values that are close together, and where the similarity decreases as the absolute value of the difference between numerical values increases. However, some numerical attributes appear to not behave in a strictly `numerical' way. That is, values which are close together numerically do not always appear very similar. We also provide a novel noise addition technique for categorical values, which employs our similarity measure to partition the values in the data set. Our method - VICUS - then perturbs the original microdata file so that each value is more likely to be changed to another value in the same partition than one from a different partition. The technique helps to ensure that the perturbed microdata file retains data quality while also preserving the privacy of individual records.

Page generated in 0.0637 seconds