1 |
Bias correction of bounded location errors in binary dataWalker, Nelson B. January 1900 (has links)
Master of Science / Department of Statistics / Trevor Hefley / Binary regression models for spatial data are commonly used in disciplines such as epidemiology and ecology. Many spatially-referenced binary data sets suffer from location error, which occurs when the recorded location of an observation differs from its true location. When location error occurs, values of the covariates associated with the true spatial locations of the observations cannot be obtained. We show how a change of support (COS) can be applied to regression models for binary data to provide bias-corrected coefficient estimates when the true values of the covariates are unavailable, but the unknown location of the observations are contained within non-overlapping polygons of any geometry. The COS accommodates spatial and non-spatial covariates and preserves the convenient interpretation of methods such as logistic and probit regression. Using a simulation experiment, we compare binary regression models with a COS to naive approaches that ignore location error. We illustrate the flexibility of the COS by modeling individual-level disease risk in a population using a binary data set where the location of the observations are unknown, but contained within administrative units. Our simulation experiment and data illustration corroborate that conventional regression models for binary data which ignore location error are unreliable, but that the COS can be used to eliminate bias while preserving model choice.
|
2 |
Study On Crash Characteristics And Injury Severity At Roadway Work ZonesWang, Qing 26 March 2009 (has links)
In USA, despite recent efforts to improve work zone safety, the number of crashes and fatalities at work zones has increased continuously over several past years. For addressing the existing safety problems, a clear understanding of the characteristics of work zone crashes is necessary. This thesis summarized a research study focusing on work zone traffic crash analysis to investigate the characteristics of work zone crashes and to identify the factors contributing to injury severity at work zones. These factors included roadway design, environmental conditions, traffic conditions and vehicle/driver features. Especially, special population groups, which divided into older, middle Age, and young, were inspected. This study was based on history crash data from the Florida State, which were extracted from the Florida CAR (Crash Analysis Reporting) system. Descriptive statistics method was used to find the characteristics of crashes at work zones. After then, an injury severity predict model, using the ordered probit regression technology, was developed to investigate the impacts of various factors on different the injury severity at work zones. From the model, it can be concluded that some factors, including the road section with curve, alcohol/drugs involved, a high speed, angle crash and too young or old drivers are more likely to increase the probability of angle crashes. Based on the magnitudes of the variable coefficients, the factor of maximum posted speed have a great impact to injury severity, which shows restriction to driving speed is principle countermeasure for improving work zone safety.
|
3 |
Immateriella tillgångar och konkursprediktion : Undersökning av en justerad Skogsviks Modell för svenska SMEsHolst, Sofia, Bossen, Ebba January 2024 (has links)
Denna studie syftar till att anpassa Skogsviks konkursprediktionsmodell från 1987 för svenska små och medelstora företag (SMEs). Studien fokuserar även på hur nya redovisningsstandarder för immateriella tillgångar påverkar modellen. Genom att inkludera ett nyckeltal som speglar förhållandet mellan immateriella tillgångar och totala tillgångar (IA) i den justerade modellen, möjliggör studien en jämförelse med Skogsviks Modell. Detta för att utvärdera nyckeltalets relevans i förutsägelsen av konkursrisk för SMEs och undersöka den justerade modellens prediktionsförmåga. Studien undersöker även vilka av Skogsviks Modells variabler som är effektiva för att förutspå konkurs för SMEs. Denna studie bidrar till forskning om redovisningsbaserade konkursprediktionsmodeller för SMEs inom ramen för svensk redovisningspraxis. Modellen skapas genom en probit regression och utvärderas sedan genom två valideringstester, Receiver Operating Characteristic (ROC) och Cumulative Accuracy Profile (CAP). Analysen omfattar data från 544 svenska företag, varav 81 har gått i konkurs. Resultaten visar att den justerade modellen, med nyckeltalet IA, har en högre prediktionsförmåga för SMEs jämfört med Skogsviks Modell utifrån dagens redovisningsstandarder för immateriella tillgångar. Studien bekräftar även att nyckeltalen ROA (Avkastning på totala tillgångar), ETA (Eget kapitalandel) och IA (Immateriella tillgångar i relation till totala tillgångar) är effektiva indikatorer för att förutspå konkursrisken för svenska SMEs idag.
|
4 |
Bayesian Hierarchical Latent Model for Gene Set AnalysisChao, Yi 13 May 2009 (has links)
Pathway is a set of genes which are predefined and serve a particular celluar or physiological function. Ranking pathways relevant to a particular phenotype can help researchers focus on a few sets of genes in pathways. In this thesis, a Bayesian hierarchical latent model was proposed using generalized linear random effects model. The advantage of the approach was that it can easily incorporate prior knowledges when the sample size was small and the number of genes was large. For the covariance matrix of a set of random variables, two Gaussian random processes were considered to construct the dependencies among genes in a pathway. One was based on the polynomial kernel and the other was based on the Gaussian kernel. Then these two kernels were compared with constant covariance matrix of the random effect by using the ratio, which was based on the joint posterior distribution with respect to each model. For mixture models, log-likelihood values were computed at different values of the mixture proportion, compared among mixtures of selected kernels and point-mass density (or constant covariance matrix). The approach was applied to a data set (Mootha et al., 2003) containing the expression profiles of type II diabetes where the motivation was to identify pathways that can discriminate between normal patients and patients with type II diabetes. / Master of Science
|
5 |
User Adoption of Big Data Analyticsin the Public SectorAkintola, Abayomi Rasheed January 2019 (has links)
The goal of this thesis was to investigate the factors that influence the adoption of big data analytics by public sector employees based on the adapted Unified Theory of Acceptance and Use of Technology (UTAUT) model. A mixed method of survey and interviews were used to collect data from employees of a Canadian provincial government ministry. The results show that performance expectancy and facilitating conditions have significant positive effects on the adoption intention of big data analytics, while effort expectancy has a significant negative effect on the adoption intention of big data analytics. The result shows that social influence does not have a significant effect on adoption intention. In terms of moderating variables, the results show that gender moderates the effects of effort expectancy, social influence and facilitating condition; data experience moderates the effects of performance expectancy, effort expectancy and facilitating condition; and leadership moderates the effect of social influence. The moderation effects of age on performance expectancy, effort expectancy is significant for only employees in the 40 to 49 age group while the moderation effects of age on social influence is significant for employees that are 40 years and more. Based on the results, implications for public sector organizations planning to implement big data analytics were discussed and suggestions for further research were made. This research contributes to existing studies on the user adoption of big data analytics.
|
6 |
Bayesian models for DNA microarray data analysisLee, Kyeong Eun 29 August 2005 (has links)
Selection of signi?cant genes via expression patterns is important in a microarray problem. Owing to small sample size and large number of variables (genes), the selection process can be unstable. This research proposes a hierarchical Bayesian model for gene (variable) selection. We employ latent variables in a regression setting and use a Bayesian mixture prior to perform the variable selection. Due to the binary nature of the data, the posterior distributions of the parameters are not in explicit form, and we need to use a combination of truncated sampling and Markov Chain Monte Carlo (MCMC) based computation techniques to simulate the posterior distributions. The Bayesian model is ?exible enough to identify the signi?cant genes as well as to perform future predictions. The method is applied to cancer classi?cation via cDNA microarrays. In particular, the genes BRCA1 and BRCA2 are associated with a hereditary disposition to breast cancer, and the method is used to identify the set of signi?cant genes to classify BRCA1 and others. Microarray data can also be applied to survival models. We address the issue of how to reduce the dimension in building model by selecting signi?cant genes as well as assessing the estimated survival curves. Additionally, we consider the wellknown Weibull regression and semiparametric proportional hazards (PH) models for survival analysis. With microarray data, we need to consider the case where the number of covariates p exceeds the number of samples n. Speci?cally, for a given vector of response values, which are times to event (death or censored times) and p gene expressions (covariates), we address the issue of how to reduce the dimension by selecting the responsible genes, which are controlling the survival time. This approach enables us to estimate the survival curve when n << p. In our approach, rather than ?xing the number of selected genes, we will assign a prior distribution to this number. The approach creates additional ?exibility by allowing the imposition of constraints, such as bounding the dimension via a prior, which in e?ect works as a penalty. To implement our methodology, we use a Markov Chain Monte Carlo (MCMC) method. We demonstrate the use of the methodology with (a) di?use large B??cell lymphoma (DLBCL) complementary DNA (cDNA) data and (b) Breast Carcinoma data. Lastly, we propose a mixture of Dirichlet process models using discrete wavelet transform for a curve clustering. In order to characterize these time??course gene expresssions, we consider them as trajectory functions of time and gene??speci?c parameters and obtain their wavelet coe?cients by a discrete wavelet transform. We then build cluster curves using a mixture of Dirichlet process priors.
|
7 |
Bayesian models for DNA microarray data analysisLee, Kyeong Eun 29 August 2005 (has links)
Selection of signi?cant genes via expression patterns is important in a microarray problem. Owing to small sample size and large number of variables (genes), the selection process can be unstable. This research proposes a hierarchical Bayesian model for gene (variable) selection. We employ latent variables in a regression setting and use a Bayesian mixture prior to perform the variable selection. Due to the binary nature of the data, the posterior distributions of the parameters are not in explicit form, and we need to use a combination of truncated sampling and Markov Chain Monte Carlo (MCMC) based computation techniques to simulate the posterior distributions. The Bayesian model is ?exible enough to identify the signi?cant genes as well as to perform future predictions. The method is applied to cancer classi?cation via cDNA microarrays. In particular, the genes BRCA1 and BRCA2 are associated with a hereditary disposition to breast cancer, and the method is used to identify the set of signi?cant genes to classify BRCA1 and others. Microarray data can also be applied to survival models. We address the issue of how to reduce the dimension in building model by selecting signi?cant genes as well as assessing the estimated survival curves. Additionally, we consider the wellknown Weibull regression and semiparametric proportional hazards (PH) models for survival analysis. With microarray data, we need to consider the case where the number of covariates p exceeds the number of samples n. Speci?cally, for a given vector of response values, which are times to event (death or censored times) and p gene expressions (covariates), we address the issue of how to reduce the dimension by selecting the responsible genes, which are controlling the survival time. This approach enables us to estimate the survival curve when n << p. In our approach, rather than ?xing the number of selected genes, we will assign a prior distribution to this number. The approach creates additional ?exibility by allowing the imposition of constraints, such as bounding the dimension via a prior, which in e?ect works as a penalty. To implement our methodology, we use a Markov Chain Monte Carlo (MCMC) method. We demonstrate the use of the methodology with (a) di?use large B??cell lymphoma (DLBCL) complementary DNA (cDNA) data and (b) Breast Carcinoma data. Lastly, we propose a mixture of Dirichlet process models using discrete wavelet transform for a curve clustering. In order to characterize these time??course gene expresssions, we consider them as trajectory functions of time and gene??speci?c parameters and obtain their wavelet coe?cients by a discrete wavelet transform. We then build cluster curves using a mixture of Dirichlet process priors.
|
8 |
Credit Scoring Methods And Accuracy RatioIscanoglu, Aysegul 01 August 2005 (has links) (PDF)
The credit scoring with the help of classification techniques provides to take easy and quick decisions in lending. However, no definite consensus has been reached with regard to the best method for credit scoring and in what conditions the methods performs best. Although a huge range of classification techniques has been used in this area, the logistic regression has been seen an important tool and used
very widely in studies. This study aims to examine accuracy and bias properties in parameter estimation of the logistic regression by using Monte Carlo simulations in four aspect which are dimension of the sets, length, the included percentage defaults in data and effect of variables on estimation. Moreover, application of some important statistical and non-statistical methods on Turkish credit default
data is provided and the method accuracies are compared for Turkish market. Finally, ratings on the results of best method is done by using receiver operating characteristic curve.
|
9 |
Bankruptcy prediction models on Swedish companies.Charraud, Jocelyn, Garcia Saez, Adrian January 2021 (has links)
Bankruptcies have been a sensitive topic all around the world for over 50 years. From their research, the authors have found that only a few bankruptcy studies have been conducted in Sweden and even less on the topic of bankruptcy prediction models. This thesis investigates the performance of the Altman, Ohlson and Zmijewski bankruptcy prediction models. This research investigates all Swedish companies during the years 2017 and 2018. This study has the intention to shed light on some of the most famous bankruptcy prediction models. It is interesting to explore the predictive abilities and usability of those three models in Sweden. The second purpose of this study is to create two models from the most significant variable out of the three models studied and to test its prediction power with the aim to create two models designed for Swedish companies. We identified a research gap in terms of Sweden, where bankruptcy prediction models have been rather unexplored and especially with those three models. Furthermore, we have identified a second research gap regarding the time period of the research. Only a few studies have been conducted on the topic of bankruptcy prediction models post the financial crisis of 2007/08. We have conducted a quantitative study in order to achieve the purpose of the study. The data used was secondary data gathered from the Serrano database. This research followed an abductive approach with a positive paradigm. This research has studied all active Swedish companies between the years 2017 and 2018. Finally, this contributed to the current field of knowledge on the topic through the analysis of the results of the models on Swedish companies, using the liquidity theory, solvency and insolvency theory, the pecking order theory, the profitability theory, the cash flow theory, and the contagion effect. The results aligned with the liquidity theory, the solvency and insolvency theory and the profitability theory. Moreover, from this research we have found that the Altman model has the lowest performance out of the three models, followed by the Ohlson model that shows some mixed results depending on the statistical analysis. Lastly, the Zmijewski model has the best performance out of the three models. Regarding the performance and the prediction power of the two new models were significantly higher than the three models studied.
|
10 |
A study of promotion and attrition of mid-grade officers in the U.S. Marine Corps: are assignments a key factor?Morgan, Jerry R. 03 1900 (has links)
Approved for public release, distribution is unlimited / This study analyzes the relationship between selection to major in the Marine Corps, and the survival of midgrade officers to the promotion point of major, by investigating the effects of billet assignments. Specifically, this study looks at the influence of the percentage of time spent in the Fleet Marine Forces (FMF), the percentage of time spent in primary military occupation (PMOS) billet assignments, and the effect of having served in combat, recruiting, security forces, joint, and drill field duties. Models were formulated using groundwork established in previous promotion, retention, and attrition studies. Assignment variables were then introduced to the models. To account for officers' choice for continued service vice forced attrition, the sample was restricted to officers who had attained five years of service. Probit regression was used to find the influence of career assignments on the probability of selection; Heckman's correction was used to control for self-selection bias; and, Cox proportionalhazard regression was used, utilizing the same assignment factors, to find the influence of assignments on the likelihood of attrition. The findings indicated that FMF and PMOS ratios above 60 percent had a negative effect on promotion and retention. Also indicated was that time spent outside the PMOS, in "B" billets, had a positive effect on retention. In a time of budgetary constraints, this information may provide assistance to personnel planners as an alternative to pecuniary measures used to maintain and shape the force. / Major, United States Marine Corps
|
Page generated in 0.1225 seconds