• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 157
  • 45
  • 32
  • 16
  • 4
  • 4
  • 4
  • 3
  • 2
  • 2
  • 1
  • 1
  • 1
  • Tagged with
  • 309
  • 309
  • 78
  • 53
  • 52
  • 49
  • 44
  • 42
  • 42
  • 41
  • 35
  • 34
  • 32
  • 28
  • 25
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
41

Cognitive Diagnostic Model, a Simulated-Based Study: Understanding Compensatory Reparameterized Unified Model (CRUM)

Galeshi, Roofia 28 November 2012 (has links)
A recent trend in education has been toward formative assessments to enable teachers, parents, and administrators assist students succeed. Cognitive diagnostic modeling (CDM) has the potential to provide valuable information for stakeholders to assist students identify their skill deficiency in specific academic subjects. Cognitive diagnosis models are mainly viewed as a family of latent class confirmatory probabilistic models. These models allow the mapping of students' skill profiles/academic ability. Using a complex simulation studies, the methodological issues in one of the existing cognitive models, referred to as compensatory reparameterized unified model (CRUM) under the log-linear model family of CDM, was investigated. In order for practitioners to implement these models, their item parameter recovery and examinees' classifications need to be studied in detail. A series of complex simulated data were generated for investigation with the following designs: three attributes with seven items, three attributes with thirty five items, four attributes with fifteen items, and five attributes with thirty one items. Each dataset was generated with observations of: 50, 100, 500, 1,000, 5,000, and 10,000 examinees. The first manuscript is the report of the investigation of how accurately CRUM could recover item parameters and classify examinees under true QMattrix specification and various research designs. The results suggested that the test length with regards to number of attributes and sample size affects the item parameter recovery and examinees classification accuracy. The second manuscript is the report of the investigation of the sensitivity of relative fit indices in detecting misfit for over- and opposite-Q-Matrix misspecifications. The relative fit indices under investigation were Akaike information criterion (AIC), Bayesian information criterion (BIC), and sample size adjusted Bayesian information criterion (ssaBIC). The results suggested that the CRUM can be a robust model given the consideration to the observation number and item/attribute combinations. The findings of this dissertation fill some of the existing gaps in the methodological issues regarding cognitive models' applicability and generalizability. It helps practitioners design tests in CDM framework in order to attain reliable and valid results. / Ph. D.
42

Clustering Response-Stressor Relationships in Ecological Studies

Gao, Feng 31 July 2008 (has links)
This research is motivated by an issue frequently encountered in water quality monitoring and ecological assessment. One concern for researchers and watershed resource managers is how the biological community in a watershed is affected by human activities. The conventional single model approach based on regression and logistic regression usually fails to adequately model the relationship between biological responses and environmental stressors since the study samples are collected over a large spatial region and the response-stressor relationships are usually weak in this situation. In this dissertation, we propose two alternative modeling approaches to partition the whole region of study into disjoint subregions and model the response-stressor relationships within subregions simultaneously. In our examples, these modeling approaches found stronger relationships within subregions and should help the resource managers improve impairment assessment and decision making. The first approach is an adjusted Bayesian classification and regression tree (ABCART). It is based on the Bayesian classification and regression tree approach (BCART) and is modified to accommodate spatial partitions in ecological studies. The second approach is a Voronoi diagram based partition approach. This approach uses the Voronoi diagram technique to randomly partition the whole region into subregions with predetermined minimum sample size. The optimal partition/cluster is selected by Monte Carlo simulation. We propose several model selection criteria for optimal partitioning and modeling according to the nature of the study and extend it to multivariate analysis to find the underlying structure of response-stressor relationships. We also propose a multivariate hotspot detection approach (MHDM) to find the region where the response-stressor relationship is the strongest according to an R-square-like criterion. Several sets of ecological data are studied in this dissertation to illustrate the implementation of the above partition modeling approaches. The findings from these studies are consistent with other studies. / Ph. D.
43

Mixed Model Selection Based on the Conceptual Predictive Statistic

Wenren, Cheng 05 August 2014 (has links)
No description available.
44

Selection of Predictors and Estimators in Spatial Statistics

Bradley, Jonathan R. 19 September 2013 (has links)
No description available.
45

Adaptive LASSO For Mixed Model Selection via Profile Log-Likelihood

Pan, Juming 18 July 2016 (has links)
No description available.
46

Variable selection in the general linear model for censored data

Yu, Lili 08 March 2007 (has links)
No description available.
47

Development of Numerical Estimation: Data and Models

Young, Christopher J. 21 October 2011 (has links)
No description available.
48

Bayesian variable selection for linear mixed models when p is much larger than n with applications in genome wide association studies

Williams, Jacob Robert Michael 05 June 2023 (has links)
Genome-wide association studies (GWAS) seek to identify single nucleotide polymorphisms (SNP) causing phenotypic responses in individuals. Commonly, GWAS analyses are done by using single marker association testing (SMA) which investigates the effect of a single SNP at a time and selects a candidate set of SNPs using a strict multiple correction penalty. As SNPs are not independent but instead strongly correlated, SMA methods lead to such high false discovery rates (FDR) that the results are difficult to use by wet lab scientists. To address this, this dissertation proposes three different novel Bayesian methods: BICOSS, BGWAS, and IEB. From a Bayesian modeling point of view, SNP search can be seen as a variable selection problem in linear mixed models (LMMs) where $p$ is much larger than $n$. To deal with the $p>>n$ issue, our three proposed methods use novel Bayesian approaches based on two steps: a screening step and a model selection step. To control false discoveries, we link the screening and model selection steps through a common probability of a null SNP. To deal with model selection, we propose novel priors that are extensions for LMMs of nonlocal priors, Zellner-g prior, unit Information prior, and Zellner-Siow prior. For each method, extensive simulation studies and case studies show that these methods improve the recall of true causal SNPs and, more importantly, drastically decrease FDR. Because our Bayesian methods provide more focused and precise results, they may speed up discovery of important SNPs and significantly contribute to scientific progress in the areas of biology, agricultural productivity, and human health. / Doctor of Philosophy / Genome-wide association studies (GWAS) seek to identify locations in DNA known as single nucleotide polymorphisms (SNPs) that are the underlying cause of observable traits such as height or breast cancer. Commonly, GWAS analyses are performed by investigating each SNP individually and seeing which SNPs are highly correlated with the response. However, as the SNPs themselves are highly correlated, investigating each one individually leads to a high number of false positives. To address this, this dissertation proposes three different advanced statistical methods: BICOSS, BGWAS, and IEB. Through extensive simulations, our methods are shown to not only drastically reduce the number of falsely detected SNPs but also increase the detection rate of true causal SNPs. Because our novel methods provide more focused and precise results, they may speed up discovery of important SNPs and significantly contribute to scientific progress in the areas of biology, agricultural productivity, and human health.
49

Classification et inférence de réseaux pour les données RNA-seq / Clustering and network inference for RNA-seq data

Gallopin, Mélina 09 December 2015 (has links)
Cette thèse regroupe des contributions méthodologiques à l'analyse statistique des données issues des technologies de séquençage du transcriptome (RNA-seq). Les difficultés de modélisation des données de comptage RNA-seq sont liées à leur caractère discret et au faible nombre d'échantillons disponibles, limité par le coût financier du séquençage. Une première partie de travaux de cette thèse porte sur la classification à l'aide de modèle de mélange. L'objectif de la classification est la détection de modules de gènes co-exprimés. Un choix naturel de modélisation des données RNA-seq est un modèle de mélange de lois de Poisson. Mais des transformations simples des données permettent de se ramener à un modèle de mélange de lois gaussiennes. Nous proposons de comparer, pour chaque jeu de données RNA-seq, les différentes modélisations à l'aide d'un critère objectif permettant de sélectionner la modélisation la plus adaptée aux données. Par ailleurs, nous présentons un critère de sélection de modèle prenant en compte des informations biologiques externes sur les gènes. Ce critère facilite l'obtention de classes biologiquement interprétables. Il n'est pas spécifique aux données RNA-seq. Il est utile à toute analyse de co-expression à l'aide de modèles de mélange visant à enrichir les bases de données d'annotations fonctionnelles des gènes. Une seconde partie de travaux de cette thèse porte sur l'inférence de réseau à l'aide d'un modèle graphique. L'objectif de l'inférence de réseau est la détection des relations de dépendance entre les niveaux d'expression des gènes. Nous proposons un modèle d'inférence de réseau basé sur des lois de Poisson, prenant en compte le caractère discret et la grande variabilité inter-échantillons des données RNA-seq. Cependant, les méthodes d'inférence de réseau nécessitent un nombre d'échantillons élevé.Dans le cadre du modèle graphique gaussien, modèle concurrent au précédent, nous présentons une approche non-asymptotique pour sélectionner des sous-ensembles de gènes pertinents, en décomposant la matrice variance en blocs diagonaux. Cette méthode n'est pas spécifique aux données RNA-seq et permet de réduire la dimension de tout problème d'inférence de réseau basé sur le modèle graphique gaussien. / This thesis gathers methodologicals contributions to the statistical analysis of next-generation high-throughput transcriptome sequencing data (RNA-seq). RNA-seq data are discrete and the number of samples sequenced is usually small due to the cost of the technology. These two points are the main statistical challenges for modelling RNA-seq data.The first part of the thesis is dedicated to the co-expression analysis of RNA-seq data using model-based clustering. A natural model for discrete RNA-seq data is a Poisson mixture model. However, a Gaussian mixture model in conjunction with a simple transformation applied to the data is a reasonable alternative. We propose to compare the two alternatives using a data-driven criterion to select the model that best fits each dataset. In addition, we present a model selection criterion to take into account external gene annotations. This model selection criterion is not specific to RNA-seq data. It is useful in any co-expression analysis using model-based clustering designed to enrich functional annotation databases.The second part of the thesis is dedicated to network inference using graphical models. The aim of network inference is to detect relationships among genes based on their expression. We propose a network inference model based on a Poisson distribution taking into account the discrete nature and high inter sample variability of RNA-seq data. However, network inference methods require a large number of samples. For Gaussian graphical models, we propose a non-asymptotic approach to detect relevant subsets of genes based on a block-diagonale decomposition of the covariance matrix. This method is not specific to RNA-seq data and reduces the dimension of any network inference problem based on the Gaussian graphical model.
50

Robust estimation of the number of components for mixtures of linear regression

Meng, Li January 1900 (has links)
Master of Science / Department of Statistics / Weixin Yao / In this report, we investigate a robust estimation of the number of components in the mixture of regression models using trimmed information criterion. Compared to the traditional information criterion, the trimmed criterion is robust and not sensitive to outliers. The superiority of the trimmed methods in comparison with the traditional information criterion methods is illustrated through a simulation study. A real data application is also used to illustrate the effectiveness of the trimmed model selection methods.

Page generated in 0.1144 seconds