• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 331
  • 135
  • 10
  • 4
  • Tagged with
  • 928
  • 928
  • 467
  • 437
  • 384
  • 380
  • 380
  • 184
  • 174
  • 92
  • 68
  • 66
  • 63
  • 62
  • 61
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
371

Evaluation of Some Statistical Methods for the Identification of Differentially Expressed Genes

Haddon, Andrew L 24 March 2015 (has links)
Microarray platforms have been around for many years and while there is a rise of new technologies in laboratories, microarrays are still prevalent. When it comes to the analysis of microarray data to identify differentially expressed (DE) genes, many methods have been proposed and modified for improvement. However, the most popular methods such as Significance Analysis of Microarrays (SAM), samroc, fold change, and rank product are far from perfect. When it comes down to choosing which method is most powerful, it comes down to the characteristics of the sample and distribution of the gene expressions. The most practiced method is usually SAM or samroc but when the data tends to be skewed, the power of these methods decrease. With the concept that the median becomes a better measure of central tendency than the mean when the data is skewed, the tests statistics of the SAM and fold change methods are modified in this thesis. This study shows that the median modified fold change method improves the power for many cases when identifying DE genes if the data follows a lognormal distribution.
372

Maximum Likelihood Estimation of Parameters in Exponential Power Distribution with Upper Record Values

Zhi, Tianchen 27 March 2017 (has links)
The exponential power (EP) distribution is a very important distribution that was used by survival analysis and related with asymmetrical EP distribution. Many researchers have discussed statistical inference about the parameters in EP distribution using i.i.d random samples. However, sometimes available data might contain only record values, or it is more convenient for researchers to collect record values. We aim to resolve this problem. We estimated two parameters of the EP distribution by MLE using upper record values. According to simulation study, we used the Bias and MSE of the estimators for studying the efficiency of the proposed estimation method. Then, we discussed the prediction on the next upper record value by known upper record values. The study concluded that MLEs of EP distribution parameters by upper record values has satisfactory performance. Also, prediction of the next upper record value performed well
373

A Statistical Analysis of Hurricanes in the Atlantic Basin and Sinkholes in Florida

D'andrea, Joy Marie 04 April 2016 (has links)
Beaches can provide a natural barrier between the ocean and inland communities, ecosystems, and resources. These environments can move and change in response to winds, waves, and currents. When a hurricane occurs, these changes can be rather large and possibly catastrophic. The high waves and storm surge act together to erode beaches and inundate low-lying lands, putting inland communities at risk. There are thousands of buoys in the Atlantic Basin that record and update data to help predict climate conditions in the state of Florida. The data that was compiled and used into a larger data set came from two different sources. First, the hurricane data for the years 1992 – 2014 came from Unisys Weather site (Atlantic Basin Hurricanes data, last 40 years) and the buoy data has been available from the national buoy center. Using various statistical methods, we will analyze the probability of a storm being present, given conditions at the buoy; determine the probability of a storm being present categorically. There are four different types of sinkholes that exist in Florida and they are: Collapse Sinkholes, Solution Sinkholes, Alluvial Sinkholes, and Raveling Sinkholes. In Florida there are sinkholes that occur, because of the different soil types that are prevalent in certain areas. The data that was used in this study came from the Florida Department of Environmental Protection, Subsidence Incident Reports. The size of the data was 926 with 15 variables. We will present a statistical analysis of a sinkholes length and width relationship, determine the average size of the diameter of a sinkhole, discuss the relationship of sinkhole size depending upon their soil types, and acknowledge the best probable occurrence of when a sinkhole occurs. There will be five research chapters in this dissertation. In Chapter 2, the concept of Exploratory Factor Analysis and Non-Response Analysis will be introduced, in accordance of analyzing hurricanes. Chapter 3 will also address the topic of hurricanes that have formed from the Atlantic Basin from 1992 – 2014. The discussion of the probability of a storm being present (also categorically) will be addressed. In Chapter 4 a study of sinkholes in Florida will be addressed. In Chapter 5 we will continue our discussion on sinkholes in Florida, but focus on the time to event between the occurrences of the sinkholes. In the last chapter, Chapter 6, we will conclude with a future works and projects that can be created from the foundations of this dissertation.
374

Production of Biodiesel from Soybean Oil Using Supercritical Methanol

Deshpande, Shriyash Rajendra 10 March 2016 (has links)
The slow yet steady expansion of the global economies, has led to an increased demand for energy and fuel, which would eventually lead to shortage of fossil fuel resources in the near future. Consequently, researchers have been investigating other fuels like biodiesel. Biodiesel refers to the monoalkyl esters which can be derived from a wide range of sources like vegetable oils, animal fats, algae lipids and waste greases. Currently, biodiesel is largely produced by the conventional route, using an acid, a base or an enzyme catalyst. Drawbacks associated with this route result in higher production costs and longer processing times. Conversely, supercritical transesterification presents several advantages over conventional transesterification, such as, faster reaction rates, catalyst free reaction, less product purification steps and higher yields. This work focused on the supercritical transesterification of cooking oil, soybean in particular. The experimental investigation was conducted using methanol at supercritical conditions. These conditions were milder in terms of pressure than those reported in literature. A batch setup was designed, built and used to carry out the supercritical transesterification reactions. The biodiesel content was analyzed using gas chromatography-mass spectrometry to calculate reaction yields. Methyl ester yield of 90% was achieved within 10 minutes of reaction time using supercritical transesterification. A maximum yield of 97% was achieved with this process in 50 minutes of reaction time. Two key factors, temperature and molar ratio were studied using variance analysis and linear regression and their significance on the biodiesel yield was determined. The kinetic tendency of the reaction was investigated and the values of rate constants, activation energy and the pre-exponential factor were estimated.
375

Ensemble Learning Method on Machine Maintenance Data

Zhao, Xiaochuang 05 November 2015 (has links)
In the industry, a lot of companies are facing the explosion of big data. With this much information stored, companies want to make sense of the data and use it to help them for better decision making, especially for future prediction. A lot of money can be saved and huge revenue can be generated with the power of big data. When building statistical learning models for prediction, companies in the industry are aiming to build models with efficiency and high accuracy. After the learning models have been developed for production, new data will be generated. With the updated data, the models have to be updated as well. Due to this nature, the model performs best today doesn’t mean it will necessarily perform the same tomorrow. Thus, it is very hard to decide which algorithm should be used to build the learning model. This paper introduces a new method that ensembles the information generated by two different classification statistical learning algorithms together as inputs for another learning model to increase the final prediction power. The dataset used in this paper is NASA’s Turbofan Engine Degradation data. There are 49 numeric features (X) and the response Y is binary with 0 indicating the engine is working properly and 1 indicating engine failure. The model’s purpose is to predict whether the engine is going to pass or fail. The dataset is divided in training set and testing set. First, training set is used twice to build support vector machine (SVM) and neural network models. Second, it used the trained SVM and neural network model taking X of the training set as input to predict Y1 and Y2. Then, it takes Y1 and Y2 as inputs to build the Penalized Logistic Regression model, which is the ensemble model here. Finally, use the testing set follow the same steps to get the final prediction result. The model accuracy is calculated using overall classification accuracy. The result shows that the ensemble model has 92% accuracy. The prediction accuracies of SVM, neural network and ensemble models are compared to prove that the ensemble model successfully captured the power of the two individual learning model.
376

A Comparison of Methods for Generating Bivariate Non-normally Distributed Random Variables

Stewart, Jaimee E. 01 January 2009 (has links)
Many distributions of multivariate data in the real world follow a non-normal model with distributions being skewed and/or heavy tailed. In studies in which multivariate non-normal distributions are needed, it is important for simulations of those variables to provide data that is close to the desired parameters while also being fast and easy to perform. Three algorithms for generating multivariate non-normal distributions are reviewed for accuracy, speed and simplicity. They are the Fleishman Power Method, the Fifth-Order Polynomial Transformation Method, and the Generalized Lambda Distribution Method. Simulations were run in order to compare the three methods by how well they generate bivariate distributions with the desired means, variances, skewness, kurtoses, and correlation, simplicity of the algorithms, and how quickly the desired distributions were calculated.
377

Regularized multivariate stochastic regression

Chen, Kun 01 July 2011 (has links)
In many high dimensional problems, the dependence structure among the variables can be quite complex. An appropriate use of the regularization techniques coupled with other classical statistical methods can often improve estimation and prediction accuracy and facilitate model interpretation, by seeking a parsimonious model representation that involves only the subset of revelent variables. We propose two regularized stochastic regression approaches, for efficiently estimating certain sparse dependence structure in the data. We first consider a multivariate regression setting, in which the large number of responses and predictors may be associated through only a few channels/pathways and each of these associations may only involve a few responses and predictors. We propose a regularized reduced-rank regression approach, in which the model estimation and rank determination are conducted simultaneously and the resulting regularized estimator of the coefficient matrix admits a sparse singular value decomposition (SVD). Secondly, we consider model selection of subset autoregressive moving-average (ARMA) modelling, for which automatic selection methods do not directly apply because the innovation process is latent. We propose to identify the optimal subset ARMA model by fitting a penalized regression, e.g. adaptive Lasso, of the time series on its lags and the lags of the residuals from a long autoregression fitted to the time-series data, where the residuals serve as proxies for the innovations. Computation algorithms and regularization parameter selection methods for both proposed approaches are developed, and their properties are explored both theoretically and by simulation. Under mild regularity conditions, the proposed methods are shown to be selection consistent, asymptotically normal and enjoy the oracle properties. We apply the proposed approaches to several applications across disciplines including cancer genetics, ecology and macroeconomics.
378

Spectral classification of high-dimensional time series

Zhang, Fuli 01 August 2018 (has links)
In this era of big data, multivariate time-series (MTS) data are prevalent in diverse domains and often high dimensional. However, there have been limited studies of building a capable classifier with MTS via classical machine learning methods that can deal with the double curse of dimensionality due to high variable dimension and long time series (large sample size). In this thesis, we propose two approaches to address this problem for multiclass classification with high dimensional MTS. Both approaches leverage the dynamics of an MTS captured by non-parametric modeling of its spectral density function. In the first approach, we introduce the reduced-rank spectral classifier (RRSC), which utilizes low-rank estimation and some new discrimination functions. We illustrate the efficacy of the RRSC with both simulations and real applications. For binary classification, we establish the consistency of the RRSC and provide an asymptotic formula for the misclassification error rates, under some regularity conditions. The second approach concerns the development of the random projection ensemble classifier for time series (RPECTS). This method first applies dimension reduction in the time domain via projecting the time-series variables into some low dimensional space, followed by measuring the disparity via some novel base classifier between the data and the candidate generating processes in the projected space. We assess the classification performance of our new approaches by simulations and compare them with some existing methods using real applications. Finally, we elaborate two R packages that implement the aforementioned methods.
379

Non-parametric inference of risk measures

Ahn, Jae Youn 01 May 2012 (has links)
Responding to the changes in the insurance environment of the past decade, insurance regulators globally have been revamping the valuation and capital regulations. This thesis is concerned with the design and analysis of statistical inference procedures that are used to implement these new and upcoming insurance regulations, and their analysis in a more general setting toward lending further insights into their performance in practical situations. The quantitative measure of risk that is used in these new and upcoming regulations is the risk measure known as the Tail Value-at-Risk (T-VaR). In implementing these regulations, insurance companies often have to estimate the T-VaR of product portfolios from the output of a simulation of its cash flows. The distributions for the underlying economic variables are either estimated or prescribed by regulations. In this situation the computational complexity of estimating the T-VaR arises due to the complexity in determining the portfolio cash flows for a given realization of economic variables. A technique that has proved promising in such settings is that of importance sampling. While the asymptotic behavior of the natural non-parametric estimator of T-VaR under importance sampling has been conjectured, the literature has lacked an honest result. The main goal of the first part of the thesis is to give a precise weak convergence result describing the asymptotic behavior of this estimator under importance sampling. Our method also establishes such a result for the natural non-parametric estimator for the Value-at-Risk, another popular risk measure, under weaker assumptions than those used in the literature. We also report on a simulation study conducted to examine the quality of these asymptotic approximations in small samples. The Haezendonck-Goovaerts class of risk measures corresponds to a premium principle that is a multiplicative analog of the zero utility principle, and is thus of significant academic interest. From a practical point of view our interest in this class of risk measures arose primarily from the fact that the T-VaR is, in a sense, a minimal member of the class. Hence, a study of the natural non-parametric estimator for these risk measures will lend further insights into the statistical inference for the T-VaR. Analysis of the asymptotic behavior of the generalized estimator has proved elusive, largely due to the fact that, unlike the T-VaR, it lacks a closed form expression. Our main goal in the second part of this thesis is to study the asymptotic behavior of this estimator. In order to conduct a simulation study, we needed an efficient algorithm to compute the Haezendonck-Goovaerts risk measure with precise error bounds. The lack of such an algorithm has clearly been noticed in the literature, and has impeded the quality of simulation results. In this part we also design and analyze an algorithm for computing these risk measures. In the process of doing we also derive some fundamental bounds on the solutions to the optimization problem underlying these risk measures. We also have implemented our algorithm on the R software environment, and included its source code in the Appendix.
380

Wellness Paradigms in Predicting Stress and Burnout Among Beginning Expatriate Teachers

Proctor, Kimala 01 January 2019 (has links)
Research indicates that the current teacher shortage is in part due to stress and burnout. A topic that has not been examined is beginning expatriate English medium teachers (EMTs) with 5 years or less of teaching experience in the United Arab Emirates and the relationship between using individualized, self-initiated wellness paradigms and stress, job burnout, and intent to leave the teaching profession. The transactional model of stress and coping, Maslach's multidimensional theory of burnout, and the health promotion model were used to evaluate the moderating effects of the EMTs' burnout and stress levels on their wellness and intent to leave. In a quantitative, correlational design, multiple linear and moderated multiple regression were used to analyze data from a sample of 165 EMTs employed in schools in the United Arab Emirates. Results indicated that spiritual growth was a significant, negative predictor of intent to leave. EMTs' burnout and stress levels did not have a moderating effect on spiritual growth and intent to leave. There was a significant, positive relationship between emotional exhaustion, personal accomplishment, and intent to leave. These results can foster positive social change by bringing awareness to the stress and burnout that EMTs experience and by proposing that administrators, stakeholders, and school district personnel provide coping mechanisms for teachers to deal with stress, burnout, and intent to leave.

Page generated in 0.2819 seconds