Global ETD Search

41	Some problems in multiple regression. Cairns, Malcolm Bernard January 1972 (has links) No description available. Regression analysis
42	Supervised ridge regression in high dimensional linear regression. / 高維線性回歸的監督嶺回歸 / CUHK electronic theses & dissertations collection / Gao wei xian xing hui gui de jian du ling hui gui January 2013 (has links) 在機器學習領域，我們通常有很多的特徵變量，以確定一些回應變量的行為。例如在基因測試問題，我們有數以萬計的基因用來作為特徵變量，而它們與某些疾病的關係需要被確定。沒有提供具體的知識，最簡單和基本的方法來模擬這種問題會是一個線性的模型。有很多現成的方法來解決線性回歸問題，像傳統的普通最小二乘回歸法，嶺回歸和套索回歸。設 N 為樣本數和，p 為特徵變量數，在普通的情況下，我們通常有足夠的樣本（N> P）。在這種情況下，普通線性回歸的方法，例如嶺回歸通常會給予合理的對未來的回應變量測值的預測。隨著現代統計學的發展，我們經常會遇到高維問題（N << P），如 DNA 芯片數據的測試問題。在這些類型的高維問題中，確定特徵變量和回應變量之間的關係在沒有任何進一步的假設的情況下是相當困難的。在很多現實問題中，儘管有大量的特徵變量存在，但是完全有可能只有極少數的特徵變量和回應變量有直接關係，而大部分其他的特徵變量都是無效的。套索和嶺回歸等傳統線性回歸在高維問題中有其局限性。套索回歸在應用於高維問題時，會因為測量噪聲的存在而表現得很糟糕，這將導致非常低的預測準確率。嶺回歸也有其明顯的局限性。它不能夠分開真正的特徵變量和無效的特徵變量。我提出的新方法的目的就是在高維線性回歸中克服以上兩種方法的局限性，從而導致更精確和穩定的預測。想法其實很簡單，與其做一個單一步驟的線性回歸，我們將回歸過程分成兩個步驟。第一步，我们棄那些預測有相關性很小或為零的特徵變量。第二步，我們應該得到一個消減過的特徵變量集，我們將用這個集和回應變量來進行嶺回歸從而得到我們需要的結果。 / In the field of statistical learning, we usually have a lot of features to determine the behavior of some response. For example in gene testing problems we have lots of genes as features and their relations with certain disease need to be determined. Without specific knowledge available, the most simple and fundamental way to model this kind of problem would be a linear model. There are many existing method to solve linear regression, like conventional ordinary least squares, ridge regression and LASSO (least absolute shrinkage and selection operator). Let N denote the number of samples and p denote the number of predictors, in ordinary settings where we have enough samples (N > p), ordinary linear regression methods like ridge regression will usually give reasonable predictions for the future values of the response. In the development of modern statistical learning, it's quite often that we meet high dimensional problems (N << p), like documents classification problems and microarray data testing problems. In high-dimensional problems it is generally quite difficult to identify the relationship between the predictors and the response without any further assumptions. Despite the fact that there are many predictors for prediction, most of the predictors are actually spurious in a lot of real problems. A predictor being spurious means that it is not directly related to the response. For example in microarray data testing problems, millions of genes may be available for doing prediction, but only a few hundred genes are actually related to the target disease. Conventional techniques in linear regression like LASSO and ridge regression both have their limitations in high-dimensional problems. The LASSO is one of the "state of the art technique for sparsity recovery, but when applied to high-dimensional problems, LASSO's performance is degraded a lot due to the presence of the measurement noise, which will result in high variance prediction and large prediction error. Ridge regression on the other hand is more robust to the additive measurement noise, but has its obvious limitation of not being able to separate true predictors from spurious predictors. As mentioned previously in many high-dimensional problems a large number of the predictors could be spurious, then in these cases ridge's disability in separating spurious and true predictors will result in poor interpretability of the model as well as poor prediction performance. The new technique that I will propose in this thesis aims to accommodate for the limitations of these two methods thus resulting in more accurate and stable prediction performance in a high-dimensional linear regression problem with signicant measurement noise. The idea is simple, instead of the doing a single step regression, we divide the regression procedure into two steps. In the first step we try to identify the seemingly relevant predictors and those that are obviously spurious by calculating the uni-variant correlations between the predictors and the response. We then discard those predictors that have very small or zero correlation with the response. After the first step we should have obtained a reduced predictor set. In the second step we will perform a ridge regression between the reduced predictor set and the response, the result of this ridge regression will then be our desired output. The thesis will be organized as follows, first I will start with a literature review about the linear regression problem and introduce in details about the ridge and LASSO and explain more precisely about their limitations in high-dimensional problems. Then I will introduce my new method called supervised ridge regression and show the reasons why it should dominate the ridge and LASSO in high-dimensional problems, and some simulation results will be demonstrated to strengthen my argument. Finally I will conclude with the possible limitations of my method and point out possible directions for further investigations. / Detailed summary in vernacular field only. / Zhu, Xiangchen. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2013. / Includes bibliographical references (leaves 68-69). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Abstracts also in Chinese. / Chapter 1. --- BASICS ABOUT LINEAR REGRESSION --- p.2 / Chapter 1.1 --- Introduction --- p.2 / Chapter 1.2 --- Linear Regression and Least Squares --- p.2 / Chapter 1.2.1 --- Standard Notations --- p.2 / Chapter 1.2.2 --- Least Squares and Its Geometric Meaning --- p.4 / Chapter 2. --- PENALIZED LINEAR REGRESSION --- p.9 / Chapter 2.1 --- Introduction --- p.9 / Chapter 2.2 --- Deficiency of the Ordinary Least Squares Estimate --- p.9 / Chapter 2.3 --- Ridge Regression --- p.12 / Chapter 2.3.1 --- Introduction to Ridge Regression --- p.12 / Chapter 2.3.2 --- Expected Prediction Error And Noise Variance Decomposition of Ridge Regression --- p.13 / Chapter 2.3.3 --- Shrinkage effects on different principal components by ridge regression --- p.18 / Chapter 2.4 --- The LASSO --- p.22 / Chapter 2.4.1 --- Introduction to the LASSO --- p.22 / Chapter 2.4.2 --- The Variable Selection Ability and Geometry of LASSO --- p.25 / Chapter 2.4.3 --- Coordinate Descent Algorithm to solve for the LASSO --- p.28 / Chapter 3. --- LINEAR REGRESSION IN HIGH-DIMENSIONAL PROBLEMS --- p.31 / Chapter 3.1 --- Introduction --- p.31 / Chapter 3.2 --- Spurious Predictors and Model Notations for High-dimensional Linear Regression --- p.32 / Chapter 3.3 --- Ridge and LASSO in High-dimensional Linear Regression --- p.34 / Chapter 4. --- THE SUPERVISED RIDGE REGRESSION --- p.39 / Chapter 4.1 --- Introduction --- p.39 / Chapter 4.2 --- Definition of Supervised Ridge Regression --- p.39 / Chapter 4.3 --- An Underlying Latent Model --- p.43 / Chapter 4.4 --- Ridge LASSO and Supervised Ridge Regression --- p.45 / Chapter 4.4.1 --- LASSO vs SRR --- p.45 / Chapter 4.4.2 --- Ridge regression vs SRR --- p.46 / Chapter 5. --- TESTING AND SIMULATION --- p.49 / Chapter 5.1 --- A Simulation Example --- p.49 / Chapter 5.2 --- More Experiments --- p.54 / Chapter 5.2.1 --- Correlated Spurious and True Predictors --- p.55 / Chapter 5.2.2 --- Insufficient Amount of Data Samples --- p.59 / Chapter 5.2.3 --- Low Dimensional Problem --- p.62 / Chapter 6. --- CONCLUSIONS AND DISCUSSIONS --- p.66 / Chapter 6.1 --- Conclusions --- p.66 / Chapter 6.2 --- References and Related Works --- p.68 Regression analysis Ridge regression (Statistics)
43	Ridge regression : biased estimation based on ill-conditioned data Bulmahn, Barbara J. January 1979 (has links) Multiple linear regression is a widely used statistical method. Its application, especially in the sciences, social sciences, and economics assists administrators in evaluating programs and planners in predicting future situations. The method is so common that most institutions have in their computer operation some standard programs to deal with the calculations. These traditional approaches use the method of least squares and yield an unbiased estimate of the parameters. The general linear model used is Y = Xβ+ e, where E(e) = 0, E(ee`) = σ2In and X is (n x p) and full rank. The least squares estimate of the unknown parameter vector β is then given by β = (X'X)-1X̀Y. This approach, however, often produces unsatisfactory (or even inaccurate) results if the data vectors are ill-conditioned. Such ill-conditioning is a result of non-orthogonal data vectors and inter-correlation of response variables that are unfortunately quite common in all fields.In recent years it has become obvious that for these applications the unbiased estimate is not necessarily the best over-all in terms of mean square error. A biased estimate may actually be of more value in analysis and prediction. Ridge estimators are biased estimators that have proved useful in these cases. In their basic form β(k) = [(X'X) + kI]-1 X́Y, they differ from the least squares estimator in that they have a small positive constant added to the diagonal elements of the X́X matrix.This thesis will first deal with the situations in which the least squares approach is not adequate and the cases where the ridge estimate contributes to a usable solution. The significant work which has been done in the field will be surveyed and the main problem of determining an appropriate constant k for the ridge estimate will be considered. Ridge regression (Statistics) Regression analysis.
44	Ridge regression, a remedy for imprecise estimate Alagheband, B. M. D. January 1981 (has links) No description available. Ridge regression (Statistics) Regression analysis.
45	An investigation of methods of ridge regression Galpin, Jacqueline Suzanne. January 1978 (has links) Thesis (M.S.)--University of South Africa. / Includes bibliographical references (leaves 200-202).
46	Ridge regression, a remedy for imprecise estimate Alagheband, Bijan M. D. January 1981 (has links) No description available. Regression analysis. Ridge regression (Statistics)
47	Regression då data utgörs av urval av ranger Widman, Linnea January 2012 (has links) För alpina skidåkare mäter man prestationer i så kallad FIS-ranking. Vi undersöker några metoder för hur man kan analysera data där responsen består av ranger som dessa. Vid situationer då responsdata utgörs av urval av ranger finns ingen självklar analysmetod. Det vi undersöker är skillnaderna vid användandet av olika regressionsanpassningar så som linjär, logistisk och ordinal logistisk regression för att analysera data av denna typ. Vidare används bootstrap för att bilda konfidensintervall. Det visar sig att för våra datamaterial ger metoderna liknande resultat när det gäller att hitta betydelsefulla förklarande variabler. Man kan därmed utgående från denna undersökning, inte se några skäl till varför man ska använda de mer avancerade modellerna. / Alpine skiers measure their performance in FIS ranking. We will investigate some methods on how to analyze data where response data is based on ranks like this. In situations where response data is based on ranks there is no obvious method of analysis. Here, we examine differences in the use of linear, logistic and ordinal logistic regression to analyze data of this type. Bootstrap is used to make confidence intervals. For our data these methods give similar results when it comes to finding important explanatory variables. Based on this survey we cannot see any reason why one should use the more advanced models. Ranks Linear regression Logistic regression Ordinal logistic regression Bootstrap Ranger Linjär regression Logistisk regression Ordinal logistisk regression Bootsrap
48	A Comparison of Three Criteria Employed in the Selection of Regression Models Using Simulated and Real Data Graham, D. Scott 12 1900 (has links) Researchers who make predictions from educational data are interested in choosing the best regression model possible. Many criteria have been devised for choosing a full or restricted model, and also for selecting the best subset from an all-possible-subsets regression. The relative practical usefulness of three of the criteria used in selecting a regression model was compared in this study: (a) Mallows' C_p, (b) Amemiya's prediction criterion, and (c) Hagerty and Srinivasan's method involving predictive power. Target correlation matrices with 10,000 cases were simulated so that the matrices had varying degrees of effect sizes. The amount of power for each matrix was calculated after one or two predictors was dropped from the full regression model, for sample sizes ranging from n = 25 to n = 150. Also, the null case, when one predictor was uncorrelated with the other predictors, was considered. In addition, comparisons for regression models selected using C_p and prediction criterion were performed using data from the National Educational Longitudinal Study of 1988. regression models educational data Regression analysis.
49	Robust linear regression Bai, Xue January 1900 (has links) Master of Science / Department of Statistics / Weixin Yao / In practice, when applying a statistical method it often occurs that some observations deviate from the usual model assumptions. Least-squares (LS) estimators are very sensitive to outliers. Even one single atypical value may have a large effect on the regression parameter estimates. The goal of robust regression is to develop methods that are resistant to the possibility that one or several unknown outliers may occur anywhere in the data. In this paper, we review various robust regression methods including: M-estimate, LMS estimate, LTS estimate, S-estimate, [tau]-estimate, MM-estimate, GM-estimate, and REWLS estimate. Finally, we compare these robust estimates based on their robustness and efficiency through a simulation study. A real data set application is also provided to compare the robust estimates with traditional least squares estimator. Linear regression model Robust regression Statistics (0463)
50	Maximum Likelihood Estimation of Logistic Sinusoidal Regression Models Weng, Yu 12 1900 (has links) We consider the problem of maximum likelihood estimation of logistic sinusoidal regression models and develop some asymptotic theory including the consistency and joint rates of convergence for the maximum likelihood estimators. The key techniques build upon a synthesis of the results of Walker and Song and Li for the widely studied sinusoidal regression model and on making a connection to a result of Radchenko. Monte Carlo simulations are also presented to demonstrate the finite-sample performance of the estimators Maximum likelihood estimation logistic regression sinusoidal regression

Search results