• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 331
  • 135
  • 10
  • 4
  • Tagged with
  • 928
  • 928
  • 467
  • 437
  • 384
  • 380
  • 380
  • 184
  • 174
  • 92
  • 68
  • 66
  • 63
  • 62
  • 61
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
261

Novel Statistical Models for Quantitative Shape-Gene Association Selection

Dai, Xiaotian 01 December 2017 (has links)
Other research reported that genetic mechanism plays a major role in the development process of biological shapes. The primary goal of this dissertation is to develop novel statistical models to investigate the quantitative relationships between biological shapes and genetic variants. However, these problems can be extremely challenging to traditional statistical models for a number of reasons: 1) the biological phenotypes cannot be effectively represented by single-valued traits, while traditional regression only handles one dependent variable; 2) in real-life genetic data, the number of candidate genes to be investigated is extremely large, and the signal-to-noise ratio of candidate genes is expected to be very high. In order to address these challenges, we propose three statistical models to handle multivariate, functional, and multilevel functional phenotypes, with applications to biological shape data using different shape descriptors. To the best of our knowledge, there is no statistical model developed for multilevel functional phenotypes. Even though multivariate regressions have been well-explored and these approaches can be applied to genetic studies, we show that the model proposed in this dissertation can outperform other alternatives regarding variable selection and prediction through simulation examples and real data examples. Although motivated ultimately by genetic research, the proposed models can be used as general-purpose machine learning algorithms with far-reaching applications.
262

Computational Topics in Lie Theory and Representation Theory

Apedaile, Thomas J. 01 May 2014 (has links)
The computer algebra system Maple contains a basic set of commands for working with Lie algebras. The purpose of this thesis was to extend the functionality of these Maple packages in a number of important areas. First, programs for dening multiplication in several types of Cayley algebras, Jordan algebras and Cliord algebras were created to allow users to perform a variety of calculations. Second, commands were created for calculating some basic properties of nite-dimensional representations of complex semisimple Lie algebras. These commands allow one to identify a given representation as direct sum of irreducible subrepresentations, each one identied by an invariant highest weight. Third, creating an algorithm to calculate the Lie bracket for Vinberg's symmetric construction of Freudenthal's Magic Square allowed for a uniform construction of all ve exceptional Lie algebras. Maple examples and tutorials are provided to illustrate the implementation and use of the algebras now available in Maple as well as the tools for working with Lie algebra representations.
263

Applications of Bayesian Statistics in Fluvial Bed Load Transport

Schmelter, Mark L. 01 May 2013 (has links)
Fluvial sediment transport is a process that has long been important in managing water resources. While we intuitively recognize that increased flow amounts to increased sediment discharge, there is still significant uncertainty in the details. Because sediment transport---and in the context of this dissertation, bed load transport---is a strongly nonlinear process that is usually modeled using empirical or semi-empirical equations, there exists a large amount of uncertainty around model parameters, predictions, and model suitability. The focus of this dissertation is to develop and demonstrate a series of physically- and statistically-based sediment transport models that build on the scientific knowledge of the physics of sediment transport while evaluating the phenomenon in an environment that leads us to robust estimates of parametric, predictive, and model selection uncertainty. The success of these models permits us to put theoretically and procedurally sound uncertainty estimates to a process that is widely acknowledged to be variable and uncertain but has, to date, not developed robust statistical tools to quantify this uncertainty. This dissertation comprises four individual papers that methodically develop and prove the concept of Bayesian statistical sediment transport models. A simple pedagogical model is developed using synthetic and laboratory flume data---this model is then compared to traditional statistical approaches that are more familiar to the discipline. A single-fraction sediment transport model is developed on the Snake River to develop a probabilistic sediment budget whose results are compared to a sediment budget developed through an ad hoc uncertainty analysis. Lastly, a multi-fraction sediment transport model is developed in which multiple fractions of laboratory flume experiments are modeled and the results are compared to the standard theory that has been already published. The results of these models demonstrate that a Bayesian approach to sediment transport has much to offer the discipline as it is able to 1) accurately provide estimates of model parameters, 2) quantify parametric uncertainty of the models, 3) provide a means to evaluate relative model fit between different deterministic equations, 4) provide predictive uncertainty of sediment transport, 5) propagate uncertainty from the root causes into secondary and tertiary dependent functions, and 6) provide a means by which testing of established theory can be performed.
264

Sequential Probing With a Random Start

Miller, Joshua 01 January 2018 (has links)
Processing user requests quickly requires not only fast servers, but also demands methods to quickly locate idle servers to process those requests. Methods of finding idle servers are analogous to open addressing in hash tables, but with the key difference that servers may return to an idle state after having been busy rather than staying busy. Probing sequences for open addressing are well-studied, but algorithms for locating idle servers are less understood. We investigate sequential probing with a random start as a method for finding idle servers, especially in cases of heavy traffic. We present a procedure for finding the distribution of the number of probes required for finding an idle server by using a Markov chain and ideas from enumerative combinatorics, then present numerical simulation results in lieu of a general analytic solution.
265

MULTIFACTOR DIMENSIONALITY REDUCTION WITH P RISK SCORES PER PERSON

Li, Ye 01 January 2018 (has links)
After reviewing Multifactor Dimensionality Reduction(MDR) and its extensions, an approach to obtain P(larger than 1) risk scores is proposed to predict the continuous outcome for each subject. We study the mean square error(MSE) of dimensionality reduced models fitted with sets of 2 risk scores and investigate the MSE for several special cases of the covariance matrix. A methodology is proposed to select a best set of P risk scores when P is specified a priori. Simulation studies based on true models of different dimensions(larger than 3) demonstrate that the selected set of P(larger than 1) risk scores outperforms the single aggregated risk score generated in AQMDR and illustrate that our methodology can determine a best set of P risk scores effectively. With different assumptions on the dimension of the true model, we considered the preferable set of risk scores from the best set of two risk scores and the best set of three risk scores. Further, we present a methodology to access a set of P risk scores when P is not given a priori. The expressions of asymptotic estimated mean square error of prediction(MSPE) are derived for a 1-dimensional model and 2-dimensional model. In the last main chapter, we apply the methodology of selecting a best set of risk scores where P has been specified a priori to Alzheimer’s Disease data and achieve a set of 2 risk scores and a set of three risk scores for each subject to predict measurements on biomarkers that are crucially involved in Alzheimer’s Disease.
266

Smart Classifiers and Bayesian Inference for Evaluating River Sensitivity to Natural and Human Disturbances: A Data Science Approach

Underwood, Kristen 01 January 2018 (has links)
Excessive rates of channel adjustment and riverine sediment export represent societal challenges; impacts include: degraded water quality and ecological integrity, erosion hazards to infrastructure, and compromised public safety. The nonlinear nature of sediment erosion and deposition within a watershed and the variable patterns in riverine sediment export over a defined timeframe of interest are governed by many interrelated factors, including geology, climate and hydrology, vegetation, and land use. Human disturbances to the landscape and river networks have further altered these patterns of water and sediment routing. An enhanced understanding of river sediment sources and dynamics is important for stakeholders, and will become more critical under a nonstationary climate, as sediment yields are expected to increase in regions of the world that will experience increased frequency, persistence, and intensity of storm events. Practical tools are needed to predict sediment erosion, transport and deposition and to characterize sediment sources within a reasonable measure of uncertainty. Water resource scientists and engineers use multidimensional data sets of varying types and quality to answer management-related questions, and the temporal and spatial resolution of these data are growing exponentially with the advent of automated samplers and in situ sensors (i.e., “big data”). Data-driven statistics and classifiers have great utility for representing system complexity and can often be more readily implemented in an adaptive management context than process-based models. Parametric statistics are often of limited efficacy when applied to data of varying quality, mixed types (continuous, ordinal, nominal), censored or sparse data, or when model residuals do not conform to Gaussian distributions. Data-driven machine-learning algorithms and Bayesian statistics have advantages over Frequentist approaches for data reduction and visualization; they allow for non-normal distribution of residuals and greater robustness to outliers. This research applied machine-learning classifiers and Bayesian statistical techniques to multidimensional data sets to characterize sediment source and flux at basin, catchment, and reach scales. These data-driven tools enabled better understanding of: (1) basin-scale spatial variability in concentration-discharge patterns of instream suspended sediment and nutrients; (2) catchment-scale sourcing of suspended sediments; and (3) reach-scale sediment process domains. The developed tools have broad management application and provide insights into landscape drivers of channel dynamics and riverine solute and sediment export.
267

On Some Ridge Regression Estimators for Logistic Regression Models

Williams, Ulyana P 28 March 2018 (has links)
The purpose of this research is to investigate the performance of some ridge regression estimators for the logistic regression model in the presence of moderate to high correlation among the explanatory variables. As a performance criterion, we use the mean square error (MSE), the mean absolute percentage error (MAPE), the magnitude of bias, and the percentage of times the ridge regression estimator produces a higher MSE than the maximum likelihood estimator. A Monto Carlo simulation study has been executed to compare the performance of the ridge regression estimators under different experimental conditions. The degree of correlation, sample size, number of independent variables, and log odds ratio has been varied in the design of experiment. Simulation results show that under certain conditions, the ridge regression estimators outperform the maximum likelihood estimator. Moreover, an empirical data analysis supports the main findings of this study. This thesis proposed and recommended some good ridge regression estimators of the logistic regression model for the practitioners in the field of health, physical and social sciences.
268

Spatial multivariate design in the plane and on stream networks

Li, Jie 01 December 2009 (has links)
In environmental studies, measurements of interest are often taken on multiple variables. The results of spatial data analyses can be substantially affected by the spatial configuration of the sites where measurements are taken. Hence, optimal designs which result in data guaranteeing efficient statistical inferences need to be studied. We study optimal designs on two large classes of spatial regions with respect to three design criteria, which were prediction, covariance parameter estimation, and empirical prediction. The first class of regions includes those in the plane, where Euclidean distance is used. The performance of the optimal designs is compared to that of randomly chosen designs. Optimal designs for a small example and a relatively large example are obtained. For the small example, complete enumeration of all possible designs is computationally feasible. For the large example, the computational difficulty in searching for the optimal spatial sampling design is overcome by a simulated annealing algorithm. The second class of spatial regions includes streams and rivers, where the distance is defined as distance along the stream network. A moving average construction is used to establish valid covariance and cross-covariance models using stream distance. Optimal designs for small and large examples are obtained. An application of our methodology to a real stream network is included. We discuss the impact of asymmetry in the cross covariance function on the spatial multivariate design. The relationship between multivariate optimal design and univariate optimal design if the multivariate design is restricted to be completely collocated is studied. The efficiency lost if we consider the design that is optimal within the class of collocated designs is discussed.
269

Penalized methods and algorithms for high-dimensional regression in the presence of heterogeneity

Yi, Congrui 01 December 2016 (has links)
In fields such as statistics, economics and biology, heterogeneity is an important topic concerning validity of data inference and discovery of hidden patterns. This thesis focuses on penalized methods for regression analysis with the presence of heterogeneity in a potentially high-dimensional setting. Two possible strategies to deal with heterogeneity are: robust regression methods that provide heterogeneity-resistant coefficient estimation, and direct detection of heterogeneity while estimating coefficients accurately in the meantime. We consider the first strategy for two robust regression methods, Huber loss regression and quantile regression with Lasso or Elastic-Net penalties, which have been studied theoretically but lack efficient algorithms. We propose a new algorithm Semismooth Newton Coordinate Descent to solve them. The algorithm is a novel combination of Semismooth Newton Algorithm and Coordinate Descent that applies to penalized optimization problems with both nonsmooth loss and nonsmooth penalty. We prove its convergence properties, and show its computational efficiency through numerical studies. We also propose a nonconvex penalized regression method, Heterogeneity Discovery Regression (HDR) , as a realization of the second idea. We establish theoretical results that guarantees statistical precision for any local optimum of the objective function with high probability. We also compare the numerical performances of HDR with competitors including Huber loss regression, quantile regression and least squares through simulation studies and a real data example. In these experiments, HDR methods are able to detect heterogeneity accurately, and also largely outperform the competitors in terms of coefficient estimation and variable selection.
270

On pricing barrier options and exotic variations

Wang, Xiao 01 May 2018 (has links)
Barrier options have become increasingly popular financial instruments due to the lower costs and the ability to more closely match speculating or hedging needs. In addition, barrier options play a significant role in modeling and managing risks in insurance and finance as well as in refining insurance products such as variable annuities and equity-indexed annuities. Motivated by these immediate applications arising from actuarial and financial contexts, the thesis studies the pricing of barrier options and some exotic variations, assuming that the underlying asset price follows the Black-Scholes model or jump-diffusion processes. Barrier options have already been well treated in the classical Black-Scholes framework. The first part of the thesis aims to develop a new valuation approach based on the technique of exponential stopping and/or path counting of Brownian motions. We allow the option's boundaries to vary exponentially in time with different rates, and manage to express our pricing formulas properly as combinations of the prices of certain binary options. These expressions are shown to be extremely convenient in further pricing some exotic variations including sequential barrier options, immediate rebate options, multi-asset barrier options and window barrier options. Many known results will be reproduced and new explicit formulas will also be derived, from which we can better understand the impact on option values of various sophisticated barrier structures. We also consider jump-diffusion models, where it becomes difficult, if not impossible, to obtain the barrier option value in analytical form for exponentially curved boundaries. Our model assumes that the logarithm of the underlying asset price is a Brownian motion plus an independent compound Poisson process. It is quite common to assign a particular distribution (such as normal or double exponential distribution) for the jump size if one wants to pursue closed-form solutions, whereas our method permits any distributions for the jump size as long as they belong to the exponential family. The formulas derived in the thesis are explicit in the sense that they can be efficiently implemented through Monte Carlo simulations, from which we achieve a good balance between solution tractability and model complexity.

Page generated in 0.0895 seconds