• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 157
  • 45
  • 32
  • 16
  • 4
  • 4
  • 4
  • 3
  • 2
  • 2
  • 1
  • 1
  • 1
  • Tagged with
  • 309
  • 309
  • 78
  • 53
  • 52
  • 49
  • 44
  • 42
  • 42
  • 41
  • 35
  • 34
  • 32
  • 28
  • 25
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
11

Application of a spatially referenced water quality model to predict E. coli flux in two Texas river basins

, Deepti 15 May 2009 (has links)
Water quality models are applied to assess the various processes affecting the concentrations of contaminants in a watershed. SPAtially Referenced Regression On Watershed attributes (SPARROW) is a nonlinear regression based approach to predict the fate and transport of contaminants in river basins. In this research SPARROW was applied to the Guadalupe and San Antonio River Basins of Texas to assess E. coli contamination. Since SPARROW relies on the measured records of concentrations of contaminants collected at monitoring stations for the prediction, the effect of the locations and selections of the monitoring stations was analyzed. The results of SPARROW application were studied in detail to evaluate the contribution from the statistically significant sources. For verification of SPARROW application, results were compared to 303 (d) list of Clean Water Act, 2000. Further, a methodology to maintain the monitoring records of the highly contaminated areas in the watersheds was explored with the application of the genetic algorithm. In this study, the importance of the available scale and details of explanatory variables (sources, land-water delivery and reservoir/ stream attenuation factors) in predicting the water quality processes were also analyzed. The effect of uncertainty in the monitored records on SPARROW application was discussed. The application of SPARROW and genetic algorithm were explored to design a monitoring network for the study area. The results of this study show that SPARROW model can be used successfully to predict the pathogen contamination of rivers. Also, SPARROW can be applied to design the monitoring network for the basins.
12

Cross-Validation for Model Selection in Model-Based Clustering

O'Reilly, Rachel 04 September 2012 (has links)
Clustering is a technique used to partition unlabelled data into meaningful groups. This thesis will focus on the area of clustering called model-based clustering, where it is assumed that data arise from a finite number of subpopulations, each of which follows a known statistical distribution. The number of groups and shape of each group is unknown in advance, and thus one of the most challenging aspects of clustering is selecting these features. Cross-validation is a model selection technique which is often used in regression and classification, because it tends to choose models that predict well, and are not over-fit to the data. However, it has rarely been applied in a clustering framework. Herein, cross-validation is applied to select the number of groups and covariance structure within a family of Gaussian mixture models. Results are presented for both real and simulated data. / Ontario Graduate Scholarship Program
13

Using phylogenetics and model selection to investigate the evolution of RNA genes in genomic alignments

Allen, James January 2013 (has links)
The diversity and range of the biological functions of non-coding RNA molecules (ncRNA) have only recently been realised, and phylogenetic analysis of the RNA genes that define these molecules can provide important insights into the evolutionary pressures acting on RNA genes, and can lead to a better understanding of the structure and function of ncRNA. An appropriate dataset is fundamental to any evolutionary analysis, and because existing RNA alignments are unsuitable, I describe a software pipeline to derive RNA gene datasets from genomic alignments. RNA gene prediction software has not previously been evaluated on such sets of known RNA genes, and I find that two popular methods fail to predict the genes in approximately half of the alignments. In addition, high numbers of predictions are made in flanking regions that lack RNA genes, and these results provide motivation for subsequent phylogenetic analyses, because a better understanding of RNA gene evolution should lead to improved methods of prediction. I analyse the RNA gene alignments with a range of evolutionary models of substitution and examine which models best describe the changes evident in the alignment. The best models are expected to provide more accurate trees, and their properties can also shed light on the evolutionary processes that occur in RNA genes. Comparing DNA and RNA substitution models is non-trivial however, because they describe changes between two different types of state, so I present a proof that allows models with different state spaces to be compared in a statistically valid manner. I find that a large proportion of RNA genes are well described by a single RNA model that includes parameters describing both nucleotides and RNA structure, highlighting the multiple levels of constraint that act on the genes. The choice of model affects the inference of a phylogenetic tree, suggesting that model selection, with RNA models, should be standard practice for analysis of RNA genes.
14

Problems in generalized linear model selection and predictive evaluation for binary outcomes

Ten Eyck, Patrick 15 December 2015 (has links)
This manuscript consists of three papers which formulate novel generalized linear model methodologies. In Chapter 1, we introduce a variant of the traditional concordance statistic that is associated with logistic regression. This adjusted c − statistic as we call it utilizes the differences in predicted probabilities as weights for each event/non- event observation pair. We highlight an extensive comparison of the adjusted and traditional c-statistics using simulations and apply these measures in a modeling application. In Chapter 2, we feature the development and investigation of three model selection criteria based on cross-validatory c-statistics: Model Misspecification Pre- diction Error, Fitting Sample Prediction Error, and Sum of Prediction Errors. We examine the properties of the corresponding selection criteria based on the cross- validatory analogues of the traditional and adjusted c-statistics via simulation and illustrate these criteria in a modeling application. In Chapter 3, we propose and investigate an alternate approach to pseudo- likelihood model selection in the generalized linear mixed model framework. After outlining the problem with the pseudo-likelihood model selection criteria found using the natural approach to generalized linear mixed modeling, we feature an alternate approach, implemented using a SAS macro, that obtains and applies the pseudo-data from the full model for fitting all candidate models. We justify the propriety of the resulting pseudo-likelihood selection criteria using simulations and implement this new method in a modeling application.
15

On the Model Selection in a Frailty Setting

Lundell, Jill F. 01 May 1998 (has links)
When analyzing data in a survival setting, whether of people or objects, one of the assumptions made is that the population is homogeneous. This is not true in reality and certain adjustments can be made in the model to account for heterogeneity. Frailty is one method of dealing with some of this heterogeneity. It is not possible to measure frailty directly and hence it can be very difficult to determine which frailty model is appropriate for the data in interest. This thesis investigates three model selection methods in their effectiveness at determining which frailty distribution best describes a given set of data. The model selection methods used are the Bayes factor, neural networks, and classification trees. Results favored classification trees. Very poor results were observed with neural networks.
16

Decompressing the Mental Number Line

Young, Christopher John 28 September 2009 (has links)
No description available.
17

Outlier Detection in Gaussian Mixture Models

Clark, Katharine January 2020 (has links)
Unsupervised classification is a problem often plagued by outliers, yet there is a paucity of work on handling outliers in unsupervised classification. Mixtures of Gaussian distributions are a popular choice in model-based clustering. A single outlier can affect parameters estimation and, as such, must be accounted for. This issue is further complicated by the presence of multiple outliers. Predicting the proportion of outliers correctly is paramount as it minimizes misclassification error. It is proved that, for a finite Gaussian mixture model, the log-likelihoods of the subset models are distributed according to a mixture of beta-type distributions. This relationship is leveraged in two ways. First, an algorithm is proposed that predicts the proportion of outliers by measuring the adherence of a set of subset log-likelihoods to a beta-type mixture reference distribution. This algorithm removes the least likely points, which are deemed outliers, until model assumptions are met. Second, a hypothesis test is developed, which, at a chosen significance level, can test whether a dataset contains a single outlier. / Thesis / Master of Science (MSc)
18

Detection of Latent Heteroscedasticity and Group-Based Regression Effects in Linear Models via Bayesian Model Selection

Metzger, Thomas Anthony 22 August 2019 (has links)
Standard linear modeling approaches make potentially simplistic assumptions regarding the structure of categorical effects that may obfuscate more complex relationships governing data. For example, recent work focused on the two-way unreplicated layout has shown that hidden groupings among the levels of one categorical predictor frequently interact with the ungrouped factor. We extend the notion of a "latent grouping factor'' to linear models in general. The proposed work allows researchers to determine whether an apparent grouping of the levels of a categorical predictor reveals a plausible hidden structure given the observed data. Specifically, we offer Bayesian model selection-based approaches to reveal latent group-based heteroscedasticity, regression effects, and/or interactions. Failure to account for such structures can produce misleading conclusions. Since the presence of latent group structures is frequently unknown a priori to the researcher, we use fractional Bayes factor methods and mixture g-priors to overcome lack of prior information. We provide an R package, slgf, that implements our methodology in practice, and demonstrate its usage in practice. / Doctor of Philosophy / Statistical models are a powerful tool for describing a broad range of phenomena in our world. However, many common statistical models may make assumptions that are overly simplistic and fail to account for key trends and patterns in data. Specifically, we search for hidden structures formed by partitioning a dataset into two groups. These two groups may have distinct variability, statistical effects, or other hidden effects that are missed by conventional approaches. We illustrate the ability of our method to detect these patterns through a variety of disciplines and data layouts, and provide software for researchers to implement this approach in practice.
19

Unsupervised Signal Deconvolution for Multiscale Characterization of Tissue Heterogeneity

Wang, Niya 29 June 2015 (has links)
Characterizing complex tissues requires precise identification of distinctive cell types, cell-specific signatures, and subpopulation proportions. Tissue heterogeneity, arising from multiple cell types, is a major confounding factor in studying individual subpopulations and repopulation dynamics. Tissue heterogeneity cannot be resolved directly by most global molecular and genomic profiling methods. While signal deconvolution has widespread applications in many real-world problems, there are significant limitations associated with existing methods, mainly unrealistic assumptions and heuristics, leading to inaccurate or incorrect results. In this study, we formulate the signal deconvolution task as a blind source separation problem, and develop novel unsupervised deconvolution methods within the Convex Analysis of Mixtures (CAM) framework, for characterizing multi-scale tissue heterogeneity. We also explanatorily test the application of Significant Intercellular Genomic Heterogeneity (SIGH) method. Unlike existing deconvolution methods, CAM can identify tissue-specific markers directly from mixed signals, a critical task, without relying on any prior knowledge. Fundamental to the success of our approach is a geometric exploitation of tissue-specific markers and signal non-negativity. Using a well-grounded mathematical framework, we have proved new theorems showing that the scatter simplex of mixed signals is a rotated and compressed version of the scatter simplex of pure signals and that the resident markers at the vertices of the scatter simplex are the tissue-specific markers. The algorithm works by geometrically locating the vertices of the scatter simplex of measured signals and their resident markers. The minimum description length (MDL) criterion is applied to determine the number of tissue populations in the sample. Based on CAM principle, we integrated nonnegative independent component analysis (nICA) and convex matrix factorization (CMF) methods, developed CAM-nICA/CMF algorithm, and applied them to multiple gene expression, methylation and protein datasets, achieving very promising results validated by the ground truth or gene enrichment analysis. We integrated CAM with compartment modeling (CM) and developed multi-tissue compartment modeling (MTCM) algorithm, tested on real DCE-MRI data derived from mouse models with consistent and plausible results. We also developed an open-source R-Java software package that implements various CAM based algorithms, including an R package approved by Bioconductor specifically for tumor-stroma deconvolution. While intercellular heterogeneity is often manifested by multiple clones with distinct sequences, systematic efforts to characterize intercellular genomic heterogeneity must effectively distinguish significant genuine clonal sequences from probabilistic fake derivatives. Based on the preliminary studies originally targeting immune T-cells, we tested and applied the SIGH algorithm to characterize intercellular heterogeneity directly from mixed sequencing reads. SIGH works by exploiting the statistical differences in both the sequencing error rates at different nucleobases and the read counts of fake sequences in relation to genuine clones of variable abundance. / Ph. D.
20

Topics on Uncertainty Quantification for Model Selection

Wang, Linna January 2021 (has links)
No description available.

Page generated in 0.1104 seconds