Global ETD Search

1	A Framework to Interpret Nonstandard Log-Linear Models Mair, Patrick January 2007 (has links) (PDF) The formulation of log-linear models within the framework of Generalized Linear Models offers new possibilities in modeling categorical data. The resulting models are not restricted to the analysis of contingency tables in terms of ordinary hierarchical interactions. Such models are considered as the family of nonstandard log-linear models. The problem that can arise is an ambiguous interpretation of parameters. In the current paper this problem is solved by looking at the effects coded in the design matrix and determining the numerical contribution of single effects. Based on these results, stepwise approaches are proposed in order to achieve parsimonious models. In addition, some testing strategies are presented to test such (eventually non-nested) models against each other. As a result, a whole interpretation framework is elaborated to examine nonstandard log-linear models in depth.
2	Logistic Regression Analysis to Determine the Significant Factors Associated with Substance Abuse in School-Aged Children Maxwell, Kori Lloyd Hugh 17 April 2009 (has links) Substance abuse is the overindulgence in and dependence on a drug or chemical leading to detrimental effects on the individual’s health and the welfare of those surrounding him or her. Logistic regression analysis is an important tool used in the analysis of the relationship between various explanatory variables and nominal response variables. The objective of this study is to use this statistical method to determine the factors which are considered to be significant contributors to the use or abuse of substances in school-aged children and also determine what measures can be implemented to minimize their effect. The logistic regression model was used to build models for the three main types of substances used in this study; Tobacco, Alcohol and Drugs and this facilitated the identification of the significant factors which seem to influence their use in children. Ordinal regression Logistic regression Residual plots Factor analysis Principal component analysis Stepwise selection Mathematics
3	Logistic Regression Analysis to Determine the Significant Factors Associated with Substance Abuse in School-Aged Children Maxwell, Kori Lloyd Hugh 17 April 2009 (has links) Substance abuse is the overindulgence in and dependence on a drug or chemical leading to detrimental effects on the individual’s health and the welfare of those surrounding him or her. Logistic regression analysis is an important tool used in the analysis of the relationship between various explanatory variables and nominal response variables. The objective of this study is to use this statistical method to determine the factors which are considered to be significant contributors to the use or abuse of substances in school-aged children and also determine what measures can be implemented to minimize their effect. The logistic regression model was used to build models for the three main types of substances used in this study; Tobacco, Alcohol and Drugs and this facilitated the identification of the significant factors which seem to influence their use in children. Ordinal regression Logistic regression Residual plots Factor analysis Principal component analysis Stepwise selection
4	An Analysis of Wind Power Plant Site Prospecting in the Central United States Carlos, Mark E. 01 December 2010 (has links) Rapid deployment of terrestrial wind power plants (WPPs) is a function of accurate identification of areas suitable for WPPs. Efficient WPP site prospecting not only decreases installation lead time, but also reduces site selection expenses and provides faster reductions of greenhouse gas emissions. Combining conventional predictor variables, such as wind strength and proximity to transmission lines, with nonconventional socioeconomic and demographic predictor variables, will result in improved identification of suitable counties for WPPs and therefore accelerate the site prospecting phase of wind power plant deployment. Existing and under-construction American terrestrial WPPs located in the top 12 windiest states (230 as of June 2009) plus 178 potential county level predictor variables are introduced to logistic regression with stepwise selection and a random sampling validation methodology to identify influential predictor variables. In addition to the wind resource and proximity to electricity transmission lines, existence of a Renewable Portfolio Standard, the population density within a 200 mile radius of the county center, median home values, and farm land area in the county are the four strongest nonconventional predictors (Hosmer and Lemeshow Chi-Square = 9.1250, N = 1009, df = 8, p = 0.3319, - 2LogLikelihood = 619.521). Evaluation of the final model using multiple statistics, including the Heidke skill score (0.2647), confirms overall model predictive skill. The model identifies the existence of 238 suitable counties in the twelve state region that do not possess WPPs (~73% validated overall accuracy) and eliminates 654 counties that are not classified as suitable for WPPs. The 238 counties identified by the model represent ideal counties for further exploration of WPP development and possible transmission line construction. The results of this study will therefore allow faster integration of renewable energy sources and limit climate change impacts from increasing atmospheric greenhouse gas concentrations. bagging GIS logistic regression stepwise selection wind energy wind power plant
5	Financial Risk Profiling using Logistic Regression / Finansiell riskprofilering med logistisk regression Emfevid, Lovisa, Nyquist, Hampus January 2018 (has links) As automation in the financial service industry continues to advance, online investment advice has emerged as an exciting new field. Vital to the accuracy of such service is the determination of the individual investors’ ability to bear financial risk. To do so, the statistical method of logistic regression is used. The aim of this thesis is to identify factors which are significant in determining a financial risk profile of a retail investor. In other words, the study seeks to map out the relationship between several socioeconomic- and psychometric variables to develop a predictive model able to determine the risk profile. The analysis is based on survey data from respondents living in Sweden. The main findings are that variables such as income, consumption rate, experience of a financial bear market, and various psychometric variables are significant in determining a financial risk profile. / I samband med en ökad automatiseringstrend har digital investeringsrådgivning dykt upp som ett nytt fenomen. Av central betydelse är tjänstens förmåga att bedöma en investerares förmåga till att bära finansiell risk. Logistik regression tillämpas för att bedöma en icke- professionell investerares vilja att bära finansiell risk. Målet med uppsatsen är således att identifiera ett antal faktorer med signifikant förmåga till att bedöma en icke-professionell investerares riskprofil. Med andra ord, så syftar denna uppsats till att studera förmågan hos ett antal socioekonomiska- och psykometriska variabler. För att därigenom utveckla en prediktiv modell som kan skatta en individs finansiella riskprofil. Analysen genomförs med hjälp av en enkätstudie hos respondenter bosatta i Sverige. Den huvudsakliga slutsatsen är att en individs inkomst, konsumtionstakt, tidigare erfarenheter av abnorma marknadsförhållanden, och diverse psykometriska komponenter besitter en betydande förmåga till att avgöra en individs finansiella risktolerans Logistic regression principal component analysis stepwise selection cross Computational Mathematics Beräkningsmatematik
6	Analysis of Factors Affecting Motorcycle-Motor Vehicle Crash Characteristics Zhu, Di 26 August 2014 (has links) No description available. Civil Engineering Transportation Motorcycle Influential Factors Traffic Crash Logistic Regression Stepwise Selection Traffic Safety
7	Three Essays on the Evolution of the Determinants of Educational Attainment and its Consequences Arafat, Md Yasin 07 February 2019 (has links) The dissertation focuses on the different determinants of education, their effects on the educational outcome, and the overall effect of education on the lifetime consequences. The first chapter focuses on the inequality of educational opportunity across different demographic factors. This chapter employs a broader set of social factors to provide fresh insights into the inequality situation in the USA relative to those of the extant literature. The chapter employs polynomial trends for the effects of social factors to identify long-term trends in the determinants of the differences in attainment of each of four achievements (high school graduation, some college, college graduation, and post-college work) across different endogenous social groups. Using the Panel Study of Income Dynamics (PSID) data for the years of 1968-2013, we show how inequality of educational opportunity and its determinants have evolved over the years. The chapter utilizes the machine-learning process and logistic regression model to identify inequality of opportunity. The second chapter examines the age demographic distribution of graduates across cohorts from 1940 until 1990. Using the PSID data, the paper explored the first and second moment of the age of graduating from high school and college across the US. To deal with the data deficiencies, a large part of the chapter dealt with data preparation. The chapter provides a unique method of extracting information on the graduating age of the individuals both from high school and from college. The results show a large dispersion across the full sample. The data truncated to a standard length, however, provides a much smaller dispersion and much smaller moments. The chapter concludes that as the time passes, people tend to attain education at a younger age. The third chapter investigates the trends of the contribution of different factors of income starting from 1910 cohort. Following Mincer (1974), a wave of papers studied how various factors contribute to the earnings of individuals. This paper contributes to that literature in three ways: (i) using the PSID data, it computes the actual working experience of the individuals, (ii) it studies the cohorts who were born in 1910 or afterwards, unlike the existing papers, and (iii) it adds two variables—technological progress and the occupation with which individuals start their careers—to an extended Mincerian equation. The results re-emphasize the importance of education in lifetime earnings. The results also show that while some of the determinants of income have become more important over the years, other factors have not changed much in importance. / PHD / The reason for choosing the theme ‘Evolution of the Determinants of Educational Attainment and its Consequences’ was to investigate the different determinants of education, their effects on the educational outcome, and the overall effect of education on the lifetime consequences. Education is considered as one of the tools to eradicate poverty. Yet, countries with high educational coverage keeps suffering from poverty, a reason for which is higher inequality of opportunity. In the first chapter, entitled ‘Inequality in Educational Opportunity in the United States’, opportunity inequality in education is illustrated. Much inequality stems from differences in educational attainment. A lack of educational attainment puts an individual behind in the career race, even before the race has started. While individuals are responsible for some of the differences in educational attainment, there are factors outside the control of individuals that play substantial roles. The inequality that arises from these factors is known as inequality of opportunity. This paper focuses on inequality of educational opportunity across socioeconomic background, race, and sex. The factors that are analyzed for their contributions to inequality of educational opportunity are father’s education, father’s occupation, mother’s education, and economic status of the individual’s family. The results show that inequality of opportunity has seen a consistent decline for high school completion. The inequality of opportunity (IO) declines for obtaining some college education for the bottom two social groups and remained persistent for the relatively more advantaged group. For college/post-college education, the IO is much lower and, in general, remained persistent across the social strata. Although the females were behind the males – given the equal opportunity – regardless of the race and socioeconomic status during the beginning and the mid twentieth century, the scenario reversed in the late twentieth century. In terms of educational disparity among races, African Americans trail their White counterparts along all the years. The second chapter ‘First and Second Moments of the Age Distributions of Graduates’ looks into the age characteristics (mean and variance) in graduating from high school and college across the cohorts from 1940s to 1990s. The idea of the paper largely came from the first chapter of the dissertation as we assumed the lack of opportunity at the earlier age could delay the attainment of education. The paper intends to find out the average age of graduation over the years. In the process, the paper put forward a method to extract the information of age of graduation from the Panel Study of Income Dynamics (PSID) data, as the database does not readily avail the information. The chapter concludes that as the time passes, people tend to attain education at a much younger age. Titled as ‘Factors Affecting Income: Education, Experience, and Beyond’, the third chapter investigates the contribution of different factors – education, experience, parental endowments, and labor market conditions – in the returns to education using the PSID data and compare the more recent scenarios with the past. This paper focuses on the trend of the rate of return to different factors of income across the two cohorts – those born between 1910 and 1950, and those born after 1950 – while identifying the changes in the returns for the same education level over time. The paper aims to find out how the contribution of the different factors of earning has changed in the USA over the years. The paper also intends to find out the role of technological progress in reducing the earning gaps across the different social groups. The results re-emphasize the importance of education in lifetime earnings. Experience has become a more important factor of income over the years. The chapter also suggests that income of an individual is a monotonic function of socioeconomic endowments and better endowments resulted in higher returns. Lastly, the chapter finds that the technological investment is progressive in manner. Inequality of Opportunity Education Social Factors Stepwise Selection Logistic Regression Model Machine-Learning Process Age Distribution Graduation Income Decomposition
8	Utvärdering av maskininlärningsmodeller för riktad marknadsföring inom dagligvaruhandeln / Evaluation of machine learning methods for direct marketing within the FMCG trade Sundström, Ebba, Goodbrand Skagerlind, Valentin January 2020 (has links) Företag inom dagligvaruhandeln använder sig ofta av database marketing för att anpassa deras erbjudande till deras kunder och därmed stärka kundrelationen och ökaderas försäljning. Länge har logistisk regression varit en modell som ofta används för att bygga upp maskininlärningsmodeller som kan förutse vilka erbjudanden som löses in av vilken kund. I arbetet utvärderas en maskininlärningsmodell med logistisk regression och stepwise selection på kunddata från en av Sveriges större aktörer inom dagligvaruhandeln. Modellen jämförs med en annan modell som istället använder sig utav elastic net, vilket är en regulariserad regressionsmetod. Modellerna testas på fem olika produkter ur företagets sortiment och baseras på ett femtiotal variabler som beskriver kundernas sociodemografiska data och historiska köpbeteende i företagets butiker. Dessa utvärderas med hjälp av en förväxlingsmatris och värden för deras Accuracy, Balanced Accuracy, Precision, Recall och F1-score. Dessutom utvärderas modellen utifrån affärsnytta, påverkan på kundrelationer och hållbarhet. Studien visade att den logistiska regressionen med stepwise selection hade ett genomsnittligt värde för Precision på 23 procent. Vid användning av elastic net ökade värdet för Precision med i genomsnitt 7 procentenheter för samtliga modeller. Detta kan bero på att vissa av parametrarna i modellen med stepwise selection får överdrivet stora värden samt att stepwise selection väljer ut variabler för modellen som inte är optimala för att förutsäga kundens beteende. Det noterades även att kunder generellt verkade nöjda med de erbjudanden de fått, men missnöjda ifall de kände sig missförstådda av företaget. / Companies within the FMCG trade often uses database marketing to customize offers to each customer, and thereby strengthen customer relationships to the company and increase their sales. For a long time, logistic regression has been the preferred machine modelling method to predict which offer to present to each costumer. This study evaluates a machinelearning model based on logistic regression and stepwise selection on costumer data from one of Sweden’s larger companies within the FMCG trade. The model is later compared to another model based on the elastic net-method, which is a regularized regressionmodel. The models are tested on five different products from the company’s assortment and are based on about fifty different variables which describes the costumers’ sociodemographic factors and purchasing history. The models are evaluated using a confusion matrix and values stating their Accuracy, BalancedAccuracy, Precision, Recall and F1-score. Furthermore, the model is evaluated in the perspectives of business advantages, costumer relations and sustainability. The study concluded that the logistic regression and stepwise selection-model had an average Precisionon 23 procent. When the elastic net-method was used the Precision increased with approximately 7 percentage points. This might depend on the fact that some of the parameters in the logistic regression-model had an overrated value and that the stepwise selection chose a subset of features that was not optimal to predict the consumer behaviour. It was also noted that costumers most often seemed content, but were dissatisfied if they felt misunderstood by the company. Machine Learning Prediction Direct Marketing Database Marketing Logistic Regression Elastic Net Stepwise Selection Costumer Behavior Computer and Information Sciences Data- och informationsvetenskap
9	Covariate Model Building in Nonlinear Mixed Effects Models Ribbing, Jakob January 2007 (has links) <p>Population pharmacokinetic-pharmacodynamic (PK-PD) models can be fitted using nonlinear mixed effects modelling (NONMEM). This is an efficient way of learning about drugs and diseases from data collected in clinical trials. Identifying covariates which explain differences between patients is important to discover patient subpopulations at risk of sub-therapeutic or toxic effects and for treatment individualization. Stepwise covariate modelling (SCM) is commonly used to this end. The aim of the current thesis work was to evaluate SCM and to develop alternative approaches. A further aim was to develop a mechanistic PK-PD model describing fasting plasma glucose, fasting insulin, insulin sensitivity and beta-cell mass.</p><p>The lasso is a penalized estimation method performing covariate selection simultaneously to shrinkage estimation. The lasso was implemented within NONMEM as an alternative to SCM and is discussed in comparison with that method. Further, various ways of incorporating information and propagating knowledge from previous studies into an analysis were investigated. In order to compare the different approaches, investigations were made under varying, replicated conditions. In the course of the investigations, more than one million NONMEM analyses were performed on simulated data. Due to selection bias the use of SCM performed poorly when analysing small datasets or rare subgroups. In these situations, the lasso method in NONMEM performed better, was faster, and additionally validated the covariate model. Alternatively, the performance of SCM can be improved by propagating knowledge or incorporating information from previously analysed studies and by population optimal design.</p><p>A model was also developed on a physiological/mechanistic basis to fit data from three phase II/III studies on the investigational drug, tesaglitazar. This model described fasting glucose and insulin levels well, despite heterogeneous patient groups ranging from non-diabetic insulin resistant subjects to patients with advanced diabetes. The model predictions of beta-cell mass and insulin sensitivity were well in agreement with values in the literature.</p> Pharmacokinetics/Pharmacotherapy Pharmacokinetics Pharmacodynamics Modeling Covariate selection Stepwise selection Covariate analysis Methodology Model validation Model evaluation Type-2 diabetes Beta-cell function Meta analysis Cross-validation Pharmacometrics ED optimization Farmakokinetik/Farmakoterapi
10	Covariate Model Building in Nonlinear Mixed Effects Models Ribbing, Jakob January 2007 (has links) Population pharmacokinetic-pharmacodynamic (PK-PD) models can be fitted using nonlinear mixed effects modelling (NONMEM). This is an efficient way of learning about drugs and diseases from data collected in clinical trials. Identifying covariates which explain differences between patients is important to discover patient subpopulations at risk of sub-therapeutic or toxic effects and for treatment individualization. Stepwise covariate modelling (SCM) is commonly used to this end. The aim of the current thesis work was to evaluate SCM and to develop alternative approaches. A further aim was to develop a mechanistic PK-PD model describing fasting plasma glucose, fasting insulin, insulin sensitivity and beta-cell mass. The lasso is a penalized estimation method performing covariate selection simultaneously to shrinkage estimation. The lasso was implemented within NONMEM as an alternative to SCM and is discussed in comparison with that method. Further, various ways of incorporating information and propagating knowledge from previous studies into an analysis were investigated. In order to compare the different approaches, investigations were made under varying, replicated conditions. In the course of the investigations, more than one million NONMEM analyses were performed on simulated data. Due to selection bias the use of SCM performed poorly when analysing small datasets or rare subgroups. In these situations, the lasso method in NONMEM performed better, was faster, and additionally validated the covariate model. Alternatively, the performance of SCM can be improved by propagating knowledge or incorporating information from previously analysed studies and by population optimal design. A model was also developed on a physiological/mechanistic basis to fit data from three phase II/III studies on the investigational drug, tesaglitazar. This model described fasting glucose and insulin levels well, despite heterogeneous patient groups ranging from non-diabetic insulin resistant subjects to patients with advanced diabetes. The model predictions of beta-cell mass and insulin sensitivity were well in agreement with values in the literature. Pharmacokinetics/Pharmacotherapy Pharmacokinetics Pharmacodynamics Modeling Covariate selection Stepwise selection Covariate analysis Methodology Model validation Model evaluation Type-2 diabetes Beta-cell function Meta analysis Cross-validation Pharmacometrics ED optimization Farmakokinetik/Farmakoterapi

Search results