• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 151
  • 45
  • 32
  • 15
  • 4
  • 4
  • 4
  • 3
  • 2
  • 2
  • 1
  • 1
  • 1
  • Tagged with
  • 297
  • 297
  • 74
  • 52
  • 50
  • 47
  • 44
  • 42
  • 42
  • 41
  • 35
  • 34
  • 28
  • 27
  • 25
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
11

Application of a spatially referenced water quality model to predict E. coli flux in two Texas river basins

, Deepti 15 May 2009 (has links)
Water quality models are applied to assess the various processes affecting the concentrations of contaminants in a watershed. SPAtially Referenced Regression On Watershed attributes (SPARROW) is a nonlinear regression based approach to predict the fate and transport of contaminants in river basins. In this research SPARROW was applied to the Guadalupe and San Antonio River Basins of Texas to assess E. coli contamination. Since SPARROW relies on the measured records of concentrations of contaminants collected at monitoring stations for the prediction, the effect of the locations and selections of the monitoring stations was analyzed. The results of SPARROW application were studied in detail to evaluate the contribution from the statistically significant sources. For verification of SPARROW application, results were compared to 303 (d) list of Clean Water Act, 2000. Further, a methodology to maintain the monitoring records of the highly contaminated areas in the watersheds was explored with the application of the genetic algorithm. In this study, the importance of the available scale and details of explanatory variables (sources, land-water delivery and reservoir/ stream attenuation factors) in predicting the water quality processes were also analyzed. The effect of uncertainty in the monitored records on SPARROW application was discussed. The application of SPARROW and genetic algorithm were explored to design a monitoring network for the study area. The results of this study show that SPARROW model can be used successfully to predict the pathogen contamination of rivers. Also, SPARROW can be applied to design the monitoring network for the basins.
12

Cross-Validation for Model Selection in Model-Based Clustering

O'Reilly, Rachel 04 September 2012 (has links)
Clustering is a technique used to partition unlabelled data into meaningful groups. This thesis will focus on the area of clustering called model-based clustering, where it is assumed that data arise from a finite number of subpopulations, each of which follows a known statistical distribution. The number of groups and shape of each group is unknown in advance, and thus one of the most challenging aspects of clustering is selecting these features. Cross-validation is a model selection technique which is often used in regression and classification, because it tends to choose models that predict well, and are not over-fit to the data. However, it has rarely been applied in a clustering framework. Herein, cross-validation is applied to select the number of groups and covariance structure within a family of Gaussian mixture models. Results are presented for both real and simulated data. / Ontario Graduate Scholarship Program
13

Using phylogenetics and model selection to investigate the evolution of RNA genes in genomic alignments

Allen, James January 2013 (has links)
The diversity and range of the biological functions of non-coding RNA molecules (ncRNA) have only recently been realised, and phylogenetic analysis of the RNA genes that define these molecules can provide important insights into the evolutionary pressures acting on RNA genes, and can lead to a better understanding of the structure and function of ncRNA. An appropriate dataset is fundamental to any evolutionary analysis, and because existing RNA alignments are unsuitable, I describe a software pipeline to derive RNA gene datasets from genomic alignments. RNA gene prediction software has not previously been evaluated on such sets of known RNA genes, and I find that two popular methods fail to predict the genes in approximately half of the alignments. In addition, high numbers of predictions are made in flanking regions that lack RNA genes, and these results provide motivation for subsequent phylogenetic analyses, because a better understanding of RNA gene evolution should lead to improved methods of prediction. I analyse the RNA gene alignments with a range of evolutionary models of substitution and examine which models best describe the changes evident in the alignment. The best models are expected to provide more accurate trees, and their properties can also shed light on the evolutionary processes that occur in RNA genes. Comparing DNA and RNA substitution models is non-trivial however, because they describe changes between two different types of state, so I present a proof that allows models with different state spaces to be compared in a statistically valid manner. I find that a large proportion of RNA genes are well described by a single RNA model that includes parameters describing both nucleotides and RNA structure, highlighting the multiple levels of constraint that act on the genes. The choice of model affects the inference of a phylogenetic tree, suggesting that model selection, with RNA models, should be standard practice for analysis of RNA genes.
14

Problems in generalized linear model selection and predictive evaluation for binary outcomes

Ten Eyck, Patrick 15 December 2015 (has links)
This manuscript consists of three papers which formulate novel generalized linear model methodologies. In Chapter 1, we introduce a variant of the traditional concordance statistic that is associated with logistic regression. This adjusted c − statistic as we call it utilizes the differences in predicted probabilities as weights for each event/non- event observation pair. We highlight an extensive comparison of the adjusted and traditional c-statistics using simulations and apply these measures in a modeling application. In Chapter 2, we feature the development and investigation of three model selection criteria based on cross-validatory c-statistics: Model Misspecification Pre- diction Error, Fitting Sample Prediction Error, and Sum of Prediction Errors. We examine the properties of the corresponding selection criteria based on the cross- validatory analogues of the traditional and adjusted c-statistics via simulation and illustrate these criteria in a modeling application. In Chapter 3, we propose and investigate an alternate approach to pseudo- likelihood model selection in the generalized linear mixed model framework. After outlining the problem with the pseudo-likelihood model selection criteria found using the natural approach to generalized linear mixed modeling, we feature an alternate approach, implemented using a SAS macro, that obtains and applies the pseudo-data from the full model for fitting all candidate models. We justify the propriety of the resulting pseudo-likelihood selection criteria using simulations and implement this new method in a modeling application.
15

Decompressing the Mental Number Line

Young, Christopher John 28 September 2009 (has links)
No description available.
16

Outlier Detection in Gaussian Mixture Models

Clark, Katharine January 2020 (has links)
Unsupervised classification is a problem often plagued by outliers, yet there is a paucity of work on handling outliers in unsupervised classification. Mixtures of Gaussian distributions are a popular choice in model-based clustering. A single outlier can affect parameters estimation and, as such, must be accounted for. This issue is further complicated by the presence of multiple outliers. Predicting the proportion of outliers correctly is paramount as it minimizes misclassification error. It is proved that, for a finite Gaussian mixture model, the log-likelihoods of the subset models are distributed according to a mixture of beta-type distributions. This relationship is leveraged in two ways. First, an algorithm is proposed that predicts the proportion of outliers by measuring the adherence of a set of subset log-likelihoods to a beta-type mixture reference distribution. This algorithm removes the least likely points, which are deemed outliers, until model assumptions are met. Second, a hypothesis test is developed, which, at a chosen significance level, can test whether a dataset contains a single outlier. / Thesis / Master of Science (MSc)
17

Unsupervised Signal Deconvolution for Multiscale Characterization of Tissue Heterogeneity

Wang, Niya 29 June 2015 (has links)
Characterizing complex tissues requires precise identification of distinctive cell types, cell-specific signatures, and subpopulation proportions. Tissue heterogeneity, arising from multiple cell types, is a major confounding factor in studying individual subpopulations and repopulation dynamics. Tissue heterogeneity cannot be resolved directly by most global molecular and genomic profiling methods. While signal deconvolution has widespread applications in many real-world problems, there are significant limitations associated with existing methods, mainly unrealistic assumptions and heuristics, leading to inaccurate or incorrect results. In this study, we formulate the signal deconvolution task as a blind source separation problem, and develop novel unsupervised deconvolution methods within the Convex Analysis of Mixtures (CAM) framework, for characterizing multi-scale tissue heterogeneity. We also explanatorily test the application of Significant Intercellular Genomic Heterogeneity (SIGH) method. Unlike existing deconvolution methods, CAM can identify tissue-specific markers directly from mixed signals, a critical task, without relying on any prior knowledge. Fundamental to the success of our approach is a geometric exploitation of tissue-specific markers and signal non-negativity. Using a well-grounded mathematical framework, we have proved new theorems showing that the scatter simplex of mixed signals is a rotated and compressed version of the scatter simplex of pure signals and that the resident markers at the vertices of the scatter simplex are the tissue-specific markers. The algorithm works by geometrically locating the vertices of the scatter simplex of measured signals and their resident markers. The minimum description length (MDL) criterion is applied to determine the number of tissue populations in the sample. Based on CAM principle, we integrated nonnegative independent component analysis (nICA) and convex matrix factorization (CMF) methods, developed CAM-nICA/CMF algorithm, and applied them to multiple gene expression, methylation and protein datasets, achieving very promising results validated by the ground truth or gene enrichment analysis. We integrated CAM with compartment modeling (CM) and developed multi-tissue compartment modeling (MTCM) algorithm, tested on real DCE-MRI data derived from mouse models with consistent and plausible results. We also developed an open-source R-Java software package that implements various CAM based algorithms, including an R package approved by Bioconductor specifically for tumor-stroma deconvolution. While intercellular heterogeneity is often manifested by multiple clones with distinct sequences, systematic efforts to characterize intercellular genomic heterogeneity must effectively distinguish significant genuine clonal sequences from probabilistic fake derivatives. Based on the preliminary studies originally targeting immune T-cells, we tested and applied the SIGH algorithm to characterize intercellular heterogeneity directly from mixed sequencing reads. SIGH works by exploiting the statistical differences in both the sequencing error rates at different nucleobases and the read counts of fake sequences in relation to genuine clones of variable abundance. / Ph. D.
18

Topics on Uncertainty Quantification for Model Selection

Wang, Linna January 2021 (has links)
No description available.
19

Using Helix-coil Models to Study Protein Unfolded States

Hughes, Roy Gene January 2016 (has links)
<p>An abstract of a thesis devoted to using helix-coil models to study unfolded states.\\</p><p>Research on polypeptide unfolded states has received much more attention in the last decade or so than it has in the past. Unfolded states are thought to be implicated in various</p><p>misfolding diseases and likely play crucial roles in protein folding equilibria and folding rates. Structural characterization of unfolded states has proven to be</p><p>much more difficult than the now well established practice of determining the structures of folded proteins. This is largely because many core assumptions underlying</p><p>folded structure determination methods are invalid for unfolded states. This has led to a dearth of knowledge concerning the nature of unfolded state conformational</p><p>distributions. While many aspects of unfolded state structure are not well known, there does exist a significant body of work stretching back half a century that</p><p>has been focused on structural characterization of marginally stable polypeptide systems. This body of work represents an extensive collection of experimental</p><p>data and biophysical models associated with describing helix-coil equilibria in polypeptide systems. Much of the work on unfolded states in the last decade has not been devoted</p><p>specifically to the improvement of our understanding of helix-coil equilibria, which arguably is the most well characterized of the various conformational equilibria</p><p>that likely contribute to unfolded state conformational distributions. This thesis seeks to provide a deeper investigation of helix-coil equilibria using modern</p><p>statistical data analysis and biophysical modeling techniques. The studies contained within seek to provide deeper insights and new perspectives on what we presumably</p><p>know very well about protein unfolded states. \\</p><p>Chapter 1 gives an overview of recent and historical work on studying protein unfolded states. The study of helix-coil equilibria is placed in the context</p><p>of the general field of unfolded state research and the basics of helix-coil models are introduced.\\</p><p>Chapter 2 introduces the newest incarnation of a sophisticated helix-coil model. State of the art modern statistical techniques are employed to estimate the energies</p><p>of various physical interactions that serve to influence helix-coil equilibria. A new Bayesian model selection approach is utilized to test many long-standing </p><p>hypotheses concerning the physical nature of the helix-coil transition. Some assumptions made in previous models are shown to be invalid and the new model </p><p>exhibits greatly improved predictive performance relative to its predecessor. \\</p><p>Chapter 3 introduces a new statistical model that can be used to interpret amide exchange measurements. As amide exchange can serve as a probe for residue-specific</p><p>properties of helix-coil ensembles, the new model provides a novel and robust method to use these types of measurements to characterize helix-coil ensembles experimentally</p><p>and test the position-specific predictions of helix-coil models. The statistical model is shown to perform exceedingly better than the most commonly used </p><p>method for interpreting amide exchange data. The estimates of the model obtained from amide exchange measurements on an example helical peptide </p><p>also show a remarkable consistency with the predictions of the helix-coil model. \\</p><p>Chapter 4 involves a study of helix-coil ensembles through the enumeration of helix-coil configurations. Aside from providing new insights into helix-coil ensembles,</p><p>this chapter also introduces a new method by which helix-coil models can be extended to calculate new types of observables. Future work on this approach could potentially</p><p>allow helix-coil models to move into use domains that were previously inaccessible and reserved for other types of unfolded state models that were introduced in chapter 1.</p> / Dissertation
20

Aplicação do algorítmo genético no mapeamento de genes epistáticos em cruzamentos controlados / Application of genetic algorithm in the genes epistatic map in controlled crossings

Oliveira, Paulo Tadeu Meira e Silva de 22 August 2008 (has links)
O mapeamento genético é constituído por procedimentos experimentais e estatísticos que buscam detectar genes associados à etiologia e regulação de doenças, além de estimar os efeitos genéticos e as localizações genômicas correspondentes. Considerando delineamentos experimentais que envolvem cruzamentos controlados de animais ou plantas, diferentes formulações de modelos de regressão podem ser adotados na identificação de QTLs (do inglês, quantitative trait loci), incluindo seus efeitos principais e possíveis efeitos de interação (epistasia). A dificuldade nestes casos de mapeamento é a comparação de modelos que não necessariamente são encaixados e envolvem um espaço de busca de alta dimensão. Para este trabalho, descrevemos um método geral para melhorar a eficiência computacional em mapeamento simultâneo de múltiplos QTLs e de seus efeitos de interação. A literatura tem usado métodos de busca exaustiva ou busca condicional. Propomos o uso do algoritmo genético para pesquisar o espaço multilocos, sendo este mais útil para genomas maiores e mapas densos de marcadores moleculares. Por meio de estudos de simulações mostramos que a busca baseada no algoritmo genético tem eficiência, em geral, mais alta que aquela de um método de busca condicional e que esta eficiência é comparável àquela de uma busca exaustiva. Na formalização do algoritmo genético pesquisamos o comportamento de parâmetros tais como: probabilidade de recombinação, probabilidade de mutação, tamanho amostral, quantidade de gerações, quantidade de soluções e tamanho do genoma, para diferentes funções objetivo: BIC (do inglês, Bayesian Information Criterion), AIC (do inglês, Akaike Information Criterion) e SSE, a soma de quadrados dos resíduos de um modelo ajustado. A aplicação das metodologias propostas é também considerada na análise de um conjunto de dados genotípicos e fenotípicos de ratos provenientes de um delineamento F2. / Genetic mapping is defined in terms of experimental and statistical procedures applied for detection and localization of genes associated to the etiology and regulation of diseases. Considering experimental designs in controlled crossings of animals or plants, different formulations of regression models can be adopted in the identification of QTL\'s (Quantitative Trait Loci) to the inclusion of the main and interaction effects between genes (epistasis). The difficulty in these approaches of gene mapping is the comparison of models that are not necessarily nested and involves a multiloci search space of high dimension. In this work, we describe a general method to improve the computational efficiency in simultaneous mapping of multiples QTL\'s and their interactions effects. The literature has used methods of exhausting search or conditional search. We consider the genetic algorithm to search the multiloci space, looking for epistatics loci distributed on the genome. Compared to the others procedures, the advantage to use such algorithm increases more for set of genes bigger and dense maps of molecular markers. Simulation studies have shown that the search based on the genetic algorithm has efficiency, in general, higher than the conditional search and that its efficiency is comparable to that one of an exhausting search. For formalization of the genetic algorithm we consider different values of the parameters as recombination probability, mutation probability, sample size, number of generations, number of solutions and size of the set of genes. We evaluate different objective functions under the genetic algorithm: BIC, AIC and SSE. In addition, we used the sample phenotypic and genotypic data bank. Briefly, the study examined blood pressure variation before and after a salt loading experiment in an intercross (F2) progeny.

Page generated in 0.0841 seconds