241 |
Fisher and logistic discriminant function estimation in the presence of collinearityO'Donnell, Robert P. (Robert Paul) 27 September 1990 (has links)
The relative merits of the Fisher linear discriminant function
(Efron, 1975) and logistic regression procedure (Press and Wilson,
1978; McLachlan and Byth, 1979), applied to the two group
discrimination problem under conditions of multivariate normality and
common covariance, have been debated. In related research, DiPillo
(1976, 1977, 1979) has argued that a biased Fisher linear
discriminant function is preferable when one or more collinearities
exist among the classifying variables.
This paper proposes a generalized ridge logistic regression
(GRL) estimator as a logistic analog to DiPillo's biased alternative
estimator. Ridge and Principal Component logistic estimators
proposed by Schaefer et al. (1984) for conventional logistic
regression are shown to be special cases of this generalized ridge
logistic estimator.
Two Fisher estimators (Linear Discriminant Function (LDF) and
Biased Linear Discriminant Function (BLDF)) and three logistic
estimators (Linear Logistic Regression (LLR), Ridge Logistic
Regression (RLR) and Principal Component Logistic Regression (PCLR))
are compared in a Monte Carlo simulation under varying conditions of
distance between populations, training set s1ze and degree of
collinearity. A new approach to the selection of the ridge parameter
in the BLDF method is proposed and evaluated.
The results of the simulation indicate that two of the biased
estimators (BLDF, RLR) produce smaller MSE values and are more stable
estimators (smaller standard deviations) than their unbiased
counterparts. But the improved performance for MSE does not
translate into equivalent improvement in error rates. The expected
actual error rates are only marginally smaller for the biased
estimators. The results suggest that small training set size, rather
than strong collinearity, may produce the greatest classification
advantage for the biased estimators.
The unbiased estimators (LDF, LLR) produce smaller average apparent
error rates. The relative advantage of the Fisher
estimators over the logistic estimators is maintained. But, given
that the comparison is made under conditions most favorable to the
Fisher estimators, the absolute advantage of the Fisher estimators is
small. The new ridge parameter selection method for the BLDF
estimator performs as well as, but no better than, the method used by
DiPillo.
The PCLR estimator shows performance comparable to the other
estimators when there is a high level of collinearity. However, the
estimator gives up a significant degree of performance in conditions
where collinearity is not a problem. / Graduation date: 1991
|
242 |
Diagnostic tools for overdispersion in generalized linear modelsGanio-Gibbons, Lisa M. 18 August 1989 (has links)
Data in the form of counts or proportions often exhibit more
variability than that predicted by a Poisson or binomial
distribution. Many different models have been proposed to account
for extra-Poisson or extra-binomial variation. A simple model
includes a single heterogeneity factor (dispersion parameter) in the
variance. Other models that allow the dispersion parameter to vary
between groups or according to a continuous covariate also exist but
require a more complicated analysis. This thesis is concerned with
(1) understanding the consequences of using an oversimplified model
for overdispersion, (2) presenting diagnostic tools for detecting the
dependence of overdispersion on covariates in regression settings for
counts and proportions and (3) presenting diagnostic tools for
distinguishing between some commonly used models for overdispersed
data.
The double exponential family of distributions is used as a
foundation for this work. A double binomial or double Poisson
density is constructed from a binomial or Poisson density and an
additional dispersion parameter. This provides a completely
parametric framework for modeling overdispersed counts and
proportions.
The first issue above is addressed by exploring the properties
of maximum likelihood estimates obtained from incorrectly specified
likelihoods. The diagnostic tools are based on a score test in the
double exponential family. An attractive feature of this test is
that it can be computed from the components of the deviance in the
standard generalized linear model fit. A graphical display is
suggested by the score test. For the normal linear model, which is a
special case of the double exponential family, the diagnostics reduce
to those for heteroscedasticity presented by Cook and Weisberg
(1983). / Graduation date: 1990
|
243 |
Design of a Software Application for Visualization of GPS and Vehicle DataArslan, Recep Sinan Jr January 2009 (has links)
I present an application to visualization of GPS data and Linear Correlations and models. A collection of data for each vehicle is used to compute correlations. Deviating correlations can be indicative of a faulty vehicle. The correlation values for each vehicle are computed with use linear regression algorithms using up to 4 signals in the data (with varied time window), and display the model parameters in a window next to the GPS map. Multiple measurements (multiple drive routes and multiple model parameters) are displayed at the same time, allowing tracking over time and comparison of different vehicles. The whole technique is demonstrated on three data which is set on first frame by user. The results are displayed with a java application and Google Map.
|
244 |
Vaccinering mot H1N1 : En studie av vad som påverkade svenska individers vaccinationsbeslut 2009Altersved, Sofia, Mäkelä, Elin January 2012 (has links)
The Swine flu (H1N1) erupted in 2009 and wasquickly spread over the world and developed into a pandemic, with a great threat against people’s health. It was soon discovered that the H1N1–virus had a different character than the seasonal flu, since it especially affected younger individuals and the consequences from the disease were expected to be more severe. In Sweden it was decided to provide a free of charge vaccination against the H1N1-virus, and the Swedish vaccination ratiobecome relatively high compared to other countries. This thesis studies what factors affected the Swedish population´s decision to take the flu shot against the H1N1-virus in 2009. This is done by a statistical study with a logistic regression analysis, which is conducted on secondary data. The results show that the probability of vaccination against H1N1 increases if the individual is over 60 years, and increases with growing income. The results also show that women have a higher vaccination propensity than men. In contrast, there’s no association between vaccination against H1N1 and the level of health or education level. As the results were not entirely consistent in comparison with theories and previous studies, it can be concluded that it is difficult to determine how different factors actually affected the individuals’ vaccination decision against H1N1. Possibly,it depends on the specific and extreme circumstances with regard to H1N1. Therefore, it may be difficult to predict how individuals will behave in the case of future pandemics. / Svininfluensan (H1N1) bröt ut 2009 och spred sig snabbt över flera länder i världen med utveckling till en pandemi, vilket utgjorde ett stort hot mot människors hälsa. Det konstaterades snart att H1N1 var av en annan karaktär än säsongsinfluensan, då den framförallt drabbade yngre individer och konsekvenserna av sjukdomen förväntades vara allvarligare. I Sverige beslutades att befolkningen skulle erbjudas en kostnadsfri vaccinering och den svenska vaccinationstäckningsgraden blev relativt hög i jämförelse med många andra länder. Denna uppsats undersöker vilka faktorer som påverkade svenska befolkningens beslut om vaccinering mot svininfluensan under 2009. Detta görs genom en statistisk undersökning i form av en logistisk regressionsanalys som utförs på sekundärdata. Resultaten visar att sannolikheten för vaccinering mot H1N1 ökar om individen är över 60 år, samt ökar med en stigande inkomst. Resultaten visar också att kvinnor har högre benägenhet att vaccinera sig än män. Däremot förekommer inget samband mellan hälsonivå eller utbildning och vaccinering mot H1N1. Då resultaten inte var helt konsistenta i jämförelse med teorier och tidigare studier, kan konstateras att det är svårt att fastställa hur olika faktorer påverkade individers vaccinationsbeslut mot H1N1. Möjligtvis kan detta bero på de särskilda och extrema omständigheter som rörde H1N1. Utifrån detta kan det bli svårt att förutse hur individer kommer resonera och agera inför eventuella framtida pandemier.
|
245 |
Factors Affect the Employment of Youth in ChinaLi, Xiaoxue January 2009 (has links)
Today's young people are well-educated ever but in a poor employment situation. At the beginning of this paper, I first state the situation both in the world and in China, revealing the poor employment situation of youth. Then I introduce systems related to youth employment in China and measures the government taken to help graduate students to find a job. The purpose of this paper is to analyze employment of youth people in China especially among the medium and highly educated people and find which and how the factors contribute to it. By using the Logistic Regression by STATA, I find that the main factors are gender, age, living area, and political status, major and educational level. The result reveals that the discrimination and gap between rural and urban area are severe issues in China. Last but not least, I give some suggestions both to the society and the individual to improve the youth employment.
|
246 |
Accounting for the effects of rehabilitation actions on the reliability of flexible pavements: performance modeling and optimizationDeshpande, Vighnesh Prakash 15 May 2009 (has links)
A performance model and a reliability-based optimization model for flexible pavements
that accounts for the effects of rehabilitation actions are developed. The developed
performance model can be effectively implemented in all the applications that require
the reliability (performance) of pavements, before and after the rehabilitation actions.
The response surface methodology in conjunction with Monte Carlo simulation is used
to evaluate pavement fragilities. To provide more flexibility, the parametric regression
model that expresses fragilities in terms of decision variables is developed. Developed
fragilities are used as performance measures in a reliability-based optimization model.
Three decision policies for rehabilitation actions are formulated and evaluated using a
genetic algorithm. The multi-objective genetic algorithm is used for obtaining optimal
trade-off between performance and cost.
To illustrate the developed model, a numerical study is presented. The developed
performance model describes well the behavior of flexible pavement before as well as
after rehabilitation actions. The sensitivity measures suggest that the reliability of
flexible pavements before and after rehabilitation actions can effectively be improved by providing an asphalt layer as thick as possible in the initial design and improving the
subgrade stiffness. The importance measures suggest that the asphalt layer modulus at
the time of rehabilitation actions represent the principal uncertainty for the performance
after rehabilitation actions. Statistical validation of the developed response model shows
that the response surface methodology can be efficiently used to describe pavement
responses. The results for parametric regression model indicate that the developed
regression models are able to express the fragilities in terms of decision variables.
Numerical illustration for optimization shows that the cost minimization and reliability
maximization formulations can be efficiently used in determining optimal rehabilitation
policies. Pareto optimal solutions obtained from multi-objective genetic algorithm can be
used to obtain trade-off between cost and performance and avoid possible conflict
between two decision policies.
|
247 |
Modeling and characterization of potato quality by active thermographySun, Chih-Chen 15 May 2009 (has links)
This research focuses on characterizing a potato with extra sugar content and identifying the location and depth of the extra sugar content using the active thermography imaging technique. The extra sugar content of the potato is an important problem for potato growers and potato chip manufacturers. Extra sugar content could result in diseases or wounds in the potato tuber. In general, potato tubers with low sugar content are considered as having a higher quality.
The inspection system and general methodologies characterizing extra sugar content will be presented in this study. The average heating rate obtained from the thermal image analysis is the major factor in characterization procedures. Using information on the average heating rate, the probability of achieving a potato with extra sugar content may be predicted using the logistic regression model. In addition, neural networks are also used to identify the potato with extra sugar contents. The correct rate for identifying a potato with extra sugar content in it can reach 85%. The location of extra sugar content can also be found using the logistic regression model. Results show the overall correct rate predicting the extra sugar content location with a resolution of 20 by 20 pixels is 91%. In predicting the extra sugar content depth, amounts exceeds 2/3 inches are not detectable by analyzing thermal images. The depth of extra sugar content can be discriminated in 0.3 inch increments with a high rate of accuracy (87.5%).
|
248 |
Statistical Relationships of the Tropical Rainfall Measurement Mission (TRMM) Precipitation and Large-scale FlowBorg, Kyle 2010 May 1900 (has links)
The relationship between precipitation and large-flow is important to understand and characterize in the climate system. We examine statistical relationships
between the Tropical Rainfall Measurement Mission (TRMM) 3B42 gridded precipitation and large-scale
ow variables in the Tropics for 2000{2007. These variables
include NCEP/NCAR Re-analysis sea surface temperatures (SSTs), vertical temperature pro files, omega, and moist static energy, as well as Atmospheric Infrared Sounder
(AIRS) vertical temperatures and QuikSCAT surface divergence. We perform correlation analysis, empirical orthogonal function analysis, and logistic regression analysis
on monthly, pentad, daily and near-instantaneous time scales. Logistic regression
analysis is able to incorporate the non-linear nature of precipitation in the relation-
ship. Flow variables are interpolated to the 0.25 degrees TRMM 3B42 grid and examined
separately for each month to o set the effects of the seasonal cycle.
January correlations of NCEP/NCAR Re-analysis SSTs and TRMM 3B42 precipitation have a coherent area of positive correlations in the Western and Central
Tropical Pacific on all time scales. These areas correspond with the South Pacific
Convergence Zone (SPCZ) and the Inter Tropical Convergence Zone (ITCZ). 500mb
omega is negatively correlated with TRMM 3B42 precipitation across the Tropics on
all time scales. QuikSCAT divergence correlations with precipitation have a band of weak and noisy correlations along the ITCZ on monthly time scales in January. Moist
static energy, calculated from NCEP/NCAR Re-analysis has a large area of negative
correlations with precipitation in the Central Tropical Pacific on all four time scales.
The first few Empirical Orthogonal Functions (EOFs) of vertical temperature
profiles in the Tropical Pacific have similar structure on monthly, pentad, and daily
timescales. Logistic regression fit coefficients are large for SST and precipitation in
four regions located across the Tropical Pacific. These areas show clear thresholded
behavior. Logistic regression results for other variables and precipitation are less
clear. The results from SST and precipitation logistic regression analysis indicate the
potential usefulness of logistic regression as a non-linear statistic relating precipitation
and certain
ow variables.
|
249 |
The Research in Key Factors of Credit Risk for MortgageHsu, Chao-Yi 06 July 2004 (has links)
The wellness of credit risk has great influence on the Value of Mortage-Backed Securities (MBS), but there isn¡¦t any valuator to supervise and to estimate these securities-issued institutions in Taiwan. For earning the trust of the masses, these institutions must have great abilities to control credit risk in an acceptable degree, and then the people will be willing to invest in these MBS.
This research makes use of data totaling 20,576 cases (17,425 normal cases and 3,151 default cases) from a certain domestic bank, Bank P, and constructs the Logistic Regression Model to steer the substantial evidence research. With the right prediction of 96.7% in normal, 85.4% in default, and 95% in whole, we find that we can use the borrower¡¦s age, occupation, the object of collateral, the use of collateral, the loan purpose, the year of loan, the line of credit, the category of interest, the interest rate, the source of case and the branch office as key factors for credit risk appraisal of reference provided to banks.
In this study, we will determine whether interest rate is the key factor for default, followed by occupation. The other two factors, the category of interest and the source of case, which are not popularly talked about in related studies, are confirmed as the remarkable influence factors for credit risk. The other important discovery is that the influence of the loan condition and the specialities of the collateral have greater impact on credit risk than the personality of the borrower.
This research provides some reference for financial institutions on credit evaluation, and makes up a good model for credit control. For previously issued MBS, this research also provides some academic basis for future adaptation.
|
250 |
noneWu, Shin-Hwa 11 July 2005 (has links)
none
|
Page generated in 0.0358 seconds