Global ETD Search

1	INFERENCE AFTER VARIABLE SELECTION Pelawa Watagoda, Lasanthi Chathurika Ranasinghe 01 August 2017 (has links) This thesis presents inference for the multiple linear regression model Y = beta_1 x_1 + ... + beta_p x_p + e after model or variable selection, including prediction intervals for a future value of the response variable Y_f, and testing hypotheses with the bootstrap. If n is the sample size, most results are for n/p large, but prediction intervals are developed that may increase in average length slowly as p increases for fixed n if the model is sparse: k predictors have nonzero coefficients beta_i where n/k is large. Bootstrap Forward Selection Lasso Prediction Interval Relaxed Lasso Ridge Regression
2	Novel Methods of Biomarker Discovery and Predictive Modeling using Random Forest January 2017 (has links) abstract: Random forest (RF) is a popular and powerful technique nowadays. It can be used for classification, regression and unsupervised clustering. In its original form introduced by Leo Breiman, RF is used as a predictive model to generate predictions for new observations. Recent researches have proposed several methods based on RF for feature selection and for generating prediction intervals. However, they are limited in their applicability and accuracy. In this dissertation, RF is applied to build a predictive model for a complex dataset, and used as the basis for two novel methods for biomarker discovery and generating prediction interval. Firstly, a biodosimetry is developed using RF to determine absorbed radiation dose from gene expression measured from blood samples of potentially exposed individuals. To improve the prediction accuracy of the biodosimetry, day-specific models were built to deal with day interaction effect and a technique of nested modeling was proposed. The nested models can fit this complex data of large variability and non-linear relationships. Secondly, a panel of biomarkers was selected using a data-driven feature selection method as well as handpick, considering prior knowledge and other constraints. To incorporate domain knowledge, a method called Know-GRRF was developed based on guided regularized RF. This method can incorporate domain knowledge as a penalized term to regulate selection of candidate features in RF. It adds more flexibility to data-driven feature selection and can improve the interpretability of models. Know-GRRF showed significant improvement in cross-species prediction when cross-species correlation was used to guide selection of biomarkers. The method can also compete with existing methods using intrinsic data characteristics as alternative of domain knowledge in simulated datasets. Lastly, a novel non-parametric method, RFerr, was developed to generate prediction interval using RF regression. This method is widely applicable to any predictive models and was shown to have better coverage and precision than existing methods on the real-world radiation dataset, as well as benchmark and simulated datasets. / Dissertation/Thesis / Doctoral Dissertation Biomedical Informatics 2017 Biostatistics feature selection prediction interval predictive modeling random forest
3	Meta-uncertainty and resilience with applications in intelligence analysis Schenk, Jason Robert 07 January 2008 (has links) No description available. Latent Dirichlet Allocation prediction interval stress-strain consequence-likelihood
4	Comparison of Prediction Intervals for the Gumbel Distribution Fang, Lin 06 1900 (has links) <p> The problem of obtaining a prediction interval at specified confidence level to contain k future observations from the Gumbel distribution, based on an observed sample from the same distribution, is considered. An existing method due to Hahn, which is originally valid for the normal, is adapted to the Gumbel case. Motivated by the equivalence between Hahn's prediction intervals and Bayesian predictive intervals for the normal, we develop Bayesian predictive intervals for the Gumbel in the case where the scale parameter b is both known and unknown. Furthermore, we perform comparison of Hahn's and Bayesian intervals. We find that the Bayesian is better in the b known case, while Hahn and Bayes perform about the same in the other case when b is unknown. We then consider the maximum of the Hahn's and Bayesian predicted lower limits which is shown to be a better predictor when b is unknown. All the discussions are based on Monte Carlo simulations. In the end, the results are applied to Ontario Power Generation data on feeder thicknesses.</p> / Thesis / Master of Science (MSc)
5	Inference in Constrained Linear Regression Chen, Xinyu 27 April 2017 (has links) Regression analyses constitutes an important part of the statistical inference and has great applications in many areas. In some applications, we strongly believe that the regression function changes monotonically with some or all of the predictor variables in a region of interest. Deriving analyses under such constraints will be an enormous task. In this work, the restricted prediction interval for the mean of the regression function is constructed when two predictors are present. I use a modified likelihood ratio test (LRT) to construct prediction intervals. Least favorable distribution Restricted prediction interval Chi-bar- square distribution Likelihood ratio test
6	Regressão linear com medidas censuradas / Linear regression with censored data Taga, Marcel Frederico de Lima 07 November 2008 (has links) Consideramos um modelo de regressão linear simples, em que tanto a variável resposta como a independente estão sujeitas a censura intervalar. Como motivação utilizamos um estudo em que o objetivo é avaliar a possibilidade de previsão dos resultados de um exame audiológico comportamental a partir dos resultados de um exame audiológico eletrofisiológico. Calculamos intervalos de previsão para a variável resposta, analisamos o comportamento dos estimadores de máxima verossimilhança obtidos sob o modelo proposto e comparamos seu desempenho com aquele de estimadores obtidos de um modelo de regressão linear simples usual, no qual a censura dos dados é desconsiderada. / We consider a simple linear regression model in which both variables are interval censored. To motivate the problem we use data from an audiometric study designed to evaluate the possibility of prediction of behavioral thresholds from physiological thresholds. We develop prediction intervals for the response variable, obtain the maximum likelihood estimators of the proposed model and compare their performance with that of estimators obtained under ordinary linear regression models. censura intervalar interval censoring intervalo de previsão linear regression prediction interval regressão linear
7	The prediction of bus arrival time using Automatic Vehicle Location Systems data Jeong, Ran Hee 17 February 2005 (has links) Advanced Traveler Information System (ATIS) is one component of Intelligent Transportation Systems (ITS), and a major component of ATIS is travel time information. The provision of timely and accurate transit travel time information is important because it attracts additional ridership and increases the satisfaction of transit users. The cost of electronics and components for ITS has been decreased, and ITS deployment is growing nationwide. Automatic Vehicle Location (AVL) Systems, which is a part of ITS, have been adopted by many transit agencies. These allow them to track their transit vehicles in real-time. The need for the model or technique to predict transit travel time using AVL data is increasing. While some research on this topic has been conducted, it has been shown that more research on this topic is required. The objectives of this research were 1) to develop and apply a model to predict bus arrival time using AVL data, 2) to identify the prediction interval of bus arrival time and the probabilty of a bus being on time. In this research, the travel time prediction model explicitly included dwell times, schedule adherence by time period, and traffic congestion which were critical to predict accurate bus arrival times. The test bed was a bus route running in the downtown of Houston, Texas. A historical based model, regression models, and artificial neural network (ANN) models were developed to predict bus arrival time. It was found that the artificial neural network models performed considerably better than either historical data based models or multi linear regression models. It was hypothesized that the ANN was able to identify the complex non-linear relationship between travel time and the independent variables and this led to superior results. Because variability in travel time (both waiting and on-board) is extremely important for transit choices, it would also be useful to extend the model to provide not only estimates of travel time but also prediction intervals. With the ANN models, the prediction intervals of bus arrival time were calculated. Because the ANN models are non parametric models, conventional techniques for prediction intervals can not be used. Consequently, a newly developed computer-intensive method, the bootstrap technique was used to obtain prediction intervals of bus arrival time. On-time performance of a bus is very important to transit operators to provide quality service to transit passengers. To measure the on-time performance, the probability of a bus being on time is required. In addition to the prediction interval of bus arrival time, the probability that a given bus is on time was calculated. The probability density function of schedule adherence seemed to be the gamma distribution or the normal distribution. To determine which distribution is the best fit for the schedule adherence, a chi-squared goodness-of-fit test was used. In brief, the normal distribution estimates well the schedule adherence. With the normal distribution, the probability of a bus being on time, being ahead schedule, and being behind schedule can be estimated. bus arrival time prediction model Automatic Vehicle Location (AVL) Systems GPS Neural Network model prediction interval
8	Regressão linear com medidas censuradas / Linear regression with censored data Marcel Frederico de Lima Taga 07 November 2008 (has links) Consideramos um modelo de regressão linear simples, em que tanto a variável resposta como a independente estão sujeitas a censura intervalar. Como motivação utilizamos um estudo em que o objetivo é avaliar a possibilidade de previsão dos resultados de um exame audiológico comportamental a partir dos resultados de um exame audiológico eletrofisiológico. Calculamos intervalos de previsão para a variável resposta, analisamos o comportamento dos estimadores de máxima verossimilhança obtidos sob o modelo proposto e comparamos seu desempenho com aquele de estimadores obtidos de um modelo de regressão linear simples usual, no qual a censura dos dados é desconsiderada. / We consider a simple linear regression model in which both variables are interval censored. To motivate the problem we use data from an audiometric study designed to evaluate the possibility of prediction of behavioral thresholds from physiological thresholds. We develop prediction intervals for the response variable, obtain the maximum likelihood estimators of the proposed model and compare their performance with that of estimators obtained under ordinary linear regression models. censura intervalar intervalo de previsão regressão linear interval censoring linear regression prediction interval
9	Parameter Estimation and Prediction Interval Construction for Location-Scale Models with Nuclear Applications Wei, Xingli January 2014 (has links) This thesis presents simple efficient algorithms to estimate distribution parameters and to construct prediction intervals for location-scale families. Specifically, we study two scenarios: one is a frequentist method for a general location--scale family and then extend to a 3-parameter distribution, another is a Bayesian method for the Gumbel distribution. At the end of the thesis, a generalized bootstrap resampling scheme is proposed to construct prediction intervals for data with an unknown distribution. Our estimator construction begins with the equivariance principle, and then makes use of unbiasedness principle. These two estimates have closed form and are functions of the sample mean, sample standard deviation, sample size, as well as the mean and variance of a corresponding standard distribution. Next, we extend the previous result to estimate a 3-parameter distribution which we call a mixed method. A central idea of the mixed method is to estimate the location and scale parameters as functions of the shape parameter. The sample mean is a popular estimator for the population mean. The mean squared error (MSE) of the sample mean is often large, however, when the sample size is small or the scale parameter is greater than the location parameter. To reduce the MSE of our location estimator, we introduce an adaptive estimator. We will illustrate this by the example of the power Gumbel distribution. The frequentist approach is often criticized as failing to take into account the uncertainty of an unknown parameter, whereas a Bayesian approach incorporates such uncertainty. The present Bayesian analysis for the Gumbel data is achieved numerically as it is hard to obtain an explicit form. We tackle the problem by providing an approximation to the exponential sum of Gumbel random variables. Next, we provide two efficient methods to construct prediction intervals. The first one is a Monte Carlo method for a general location-scale family, based on our previous parameter estimation. Another is the Gibbs sampler, a special case of Markov Chain Monte Carlo. We derive the predictive distribution by making use of an approximation to the exponential sum of Gumbel random variables . Finally, we present a new generalized bootstrap and show that Efron's bootstrap re-sampling is a special case of the new re-sampling scheme. Our result overcomes the issue of the bootstrap of its ``inability to draw samples outside the range of the original dataset.'' We give an applications for constructing prediction intervals, and simulation shows that generalized bootstrap is better than that of the bootstrap when the sample size is small. The last contribution in this thesis is an improved GRS method used in nuclear engineering for construction of non-parametric tolerance intervals for percentiles of an unknown distribution. Our result shows that the required sample size can be reduced by a factor of almost two when the distribution is symmetric. The confidence level is computed for a number of distributions and then compared with the results of applying the generalized bootstrap. We find that the generalized bootstrap approximates the confidence level very well. / Dissertation / Doctor of Philosophy (PhD) Parameter estimation Prediction interval Mixed method Generalized bootstrap Markov Chain Monte Carlo Bayesian method
10	Deep Quantile Regression for Unsupervised Anomaly Detection in Time-Series Tambuwal, Ahmad I., Neagu, Daniel 18 November 2021 (has links) Yes / Time-series anomaly detection receives increasing research interest given the growing number of data-rich application domains. Recent additions to anomaly detection methods in research literature include deep neural networks (DNNs: e.g., RNN, CNN, and Autoencoder). The nature and performance of these algorithms in sequence analysis enable them to learn hierarchical discriminative features and time-series temporal nature. However, their performance is affected by usually assuming a Gaussian distribution on the prediction error, which is either ranked, or threshold to label data instances as anomalous or not. An exact parametric distribution is often not directly relevant in many applications though. This will potentially produce faulty decisions from false anomaly predictions due to high variations in data interpretation. The expectations are to produce outputs characterized by a level of confidence. Thus, implementations need the Prediction Interval (PI) that quantify the level of uncertainty associated with the DNN point forecasts, which helps in making better-informed decision and mitigates against false anomaly alerts. An effort has been made in reducing false anomaly alerts through the use of quantile regression for identification of anomalies, but it is limited to the use of quantile interval to identify uncertainties in the data. In this paper, an improve time-series anomaly detection method called deep quantile regression anomaly detection (DQR-AD) is proposed. The proposed method go further to used quantile interval (QI) as anomaly score and compare it with threshold to identify anomalous points in time-series data. The tests run of the proposed method on publicly available anomaly benchmark datasets demonstrate its effective performance over other methods that assumed Gaussian distribution on the prediction or reconstruction cost for detection of anomalies. This shows that our method is potentially less sensitive to data distribution than existing approaches. / Petroleum Technology Development Fund (PTDF) PhD Scholarship, Nigeria (Award Number: PTDF/ ED/PHD/IAT/884/16) Time-series Anomaly detection Prediction interval Deep neural networks Long short-term memory Quantile regression

Search results