Global ETD Search

341	Credit Scoring Methods And Accuracy Ratio Iscanoglu, Aysegul 01 August 2005 (has links) (PDF) The credit scoring with the help of classification techniques provides to take easy and quick decisions in lending. However, no definite consensus has been reached with regard to the best method for credit scoring and in what conditions the methods performs best. Although a huge range of classification techniques has been used in this area, the logistic regression has been seen an important tool and used very widely in studies. This study aims to examine accuracy and bias properties in parameter estimation of the logistic regression by using Monte Carlo simulations in four aspect which are dimension of the sets, length, the included percentage defaults in data and effect of variables on estimation. Moreover, application of some important statistical and non-statistical methods on Turkish credit default data is provided and the method accuracies are compared for Turkish market. Finally, ratings on the results of best method is done by using receiver operating characteristic curve. HG Credit, Debt, Loans 3691-3769
342	Non-global regression modelling Huang, Yunkai 21 June 2016 (has links) In this dissertation, a new non-global regression model - the partial linear threshold regression model (PLTRM) - is proposed. Various issues related to the PLTRM are discussed. In the first main section of the dissertation (Chapter 2), we define what is meant by the term “non-global regression model”, and we provide a brief review of the current literature associated with such models. In particular, we focus on their advantages and disadvantages in terms of their statistical properties. Because there are some weaknesses in the existing non-global regression models, we propose the PLTRM. The PLTRM combines non-parametric modelling with the traditional threshold regression models (TRMs), and hence can be thought of as an extension of the later models. We verify the performance of the PLTRM through a series of Monte Carlo simulation experiments. These experiments use a simulated data set that exhibits partial linear and partial nonlinear characteristics, and the PLTRM out-performs several competing parametric and non-parametric models in terms of the Mean Squared Error (MSE) of the within-sample fit. In the second main section of this dissertation (Chapter 3), we propose a method of estimation for the PLTRM. This requires estimating the parameters of the parametric part of the model; estimating the threshold; and fitting the non-parametric component of the model. An “unbalanced penalized least squares” approach is used. This involves using restricted penalized regression spline and smoothing spline techniques for the non-parametric component of the model; the least squares method for the linear parametric part of the model; together with a search procedure to estimate the threshold value. This estimation procedure is discussed for three mutually exclusive situations, which are classified according to the way in which the two components of the PLTRM “join” at the threshold. Bootstrap sampling distributions of the estimators are provided using the parametric bootstrap technique. The various estimators appear to have good sampling properties in most of the situations that are considered. Inference issues such as hypothesis testing and confidence interval construction for the PLTRM are also investigated. In the third main section of the dissertation (Chapter 4), we illustrate the usefulness of the PLTRM, and the application of the proposed estimation methods, by modelling various real-world data sets. These examples demonstrate both the good statistical performance, and the great application potential, of the PLTRM. / Graduate global regression non-global regression threshold regression partial linear model PLTRM spline smoothing unbalanced penalized least squares nonlinear trend piecewise model
343	Klassificering av köp på betalda sökannonser / Classification of purchases in paid search advertising Åkesson, Lisa, Henningsson, Denise January 2016 (has links) Datakonsultföretaget Knowit AB har en kund som annonserar på Google AdWords. Denna uppsats fokuserar huvudsakligen på att hitta de olika inställningarna i AdWords som genererar köp av kundens produkt. Om en inställning ofta genererar klick men sällan köp av produkten är den inställningen inte lönsam.Responsvariabeln i denna uppsats är binär och indikerar på om ett klick på annonsen lett till köp av produkten eller inte. Eftersom responsvariabelns fördelning var skev har samplingstekniken SMOTE använts för att skapa fler observationer i minoritetsklassen. De statistiska metoder som använts och presenterats i uppsatsen är logistisk regression, neurala nätverk och beslutsträd.Resultatet gav att de fyra undersökta inställningarna påverkar sannolikheten för köp. Den första inställningen resulterade i att om dator används vid sökning på Google är sannolikheten att ett klick leder till köp betydligt högre än för mobil och surfplatta. Den andra inställningen resulterar i att en ”exakt matchning” för sökordet ger högst sannolikhet till köp och ”bred matchning” ger lägst sannolikhet. Den tredje inställningen visar att vilken veckodag annonsen klickas på påverkar sannolikheten för köp. På söndag är sannolikheten högst att ett klick leder till köp, och de två dagar som har lägst sannolikhet är lördag och tisdag. Slutligen har det undersökts om annonsens genomsnittsposition påverkar sannolikheten att produkten köps. Resultatet som gavs är att ju högre värde på genomsnittsposition, desto högre blir sannolikheten för köp. / The data consultancy company Knowit AB has a client who advertises on Google AdWords. This thesis focuses mainly on finding which settings in AdWords generate purchases of the client’s product. If a setting frequently contributes to clicks but rarely to purchases of the product, the setting is not profitable.The target variable in this thesis is binary and indicates whether a click on the advertisement led to purchase of the product or not. Since the target variable’s distribution was skewed, the sampling technique SMOTE was used to create more observations in the minority class. The classification methods researched and presented in this thesis are logistic regression, neural networks and decision trees.The results showed that all four factor had significant affect on the probability of purchase. First, if a desktop or laptop computer was used to search on Google, the likelihood that a click leads to purchase is substantially higher compared to if a mobile or tablet was used. Second, an “exact match” setting for the keywords gives the highest probability of purchase and a “broad match” gives the lowest probability. Third, purchase rates are also affected by the day of the week. Sunday has the highest probability of purchase while Saturday and Tuesday have the lowest probability. Finally, an advertisement´s average position affects the likelihood of the product being purchased. The higher value of average position, the higher the likelihood of purchase. klassificering google adwords sökannonser betalda sökannonser data mining neurala nätverk beslutsträd sannolikhet regression multipel logistisk regression multipel regression knowit annons
344	The Applications of Regression Analysis in Auditing and Computer Systems Hubbard, Larry D. 05 1900 (has links) This thesis describes regression analysis and shows how it can be used in account auditing and in computer system performance analysis. The study first introduces regression analysis techniques and statistics. Then, the use of regression analysis in auditing to detect "out of line" accounts and to determine audit sample size is discussed. These applications led to the concept of using regression analysis to predict job completion times in a computer system. The feasibility of this application of regression analysis was tested by constructing a predictive model to estimate job completion times using a computer system simulator. The predictive model's performance for the various job streams simulated shows that job completion time prediction is a feasible application for regression analysis. regression analysis account auditing computer system perfomance analysis Regression analysis. Sampling (Statistics) Auditing.
345	Nonparametric statistical inference for dependent censored data El Ghouch, Anouar 05 October 2007 (has links) A frequent problem that appears in practical survival data analysis is censoring. A censored observation occurs when the observation of the event time (duration or survival time) may be prevented by the occurrence of an earlier competing event (censoring time). Censoring may be due to different causes. For example, the loss of some subjects under study, the end of the follow-up period, drop out or the termination of the study and the limitation in the sensitivity of a measurement instrument. The literature about censored data focuses on the i.i.d. case. However in many real applications the data are collected sequentially in time or space and so the assumption of independence in such case does not hold. Here we only give some typical examples from the literature involving correlated data which are subject to censoring. In the clinical trials domain it frequently happens that the patients from the same hospital have correlated survival times due to unmeasured variables like the quality of the hospital equipment. Censored correlated data are also a common problem in the domain of environmental and spatial (geographical or ecological) statistics. In fact, due to the process being used in the data sampling procedure, e.g. the analytical equipment, only the measurements which exceed some thresholds, for example the method detection limits or the instrumental detection limits, can be included in the data analysis. Many other examples can also be found in other fields like econometrics and financial statistics. Observations on duration of unemployment e.g., may be right censored and are typically correlated. When the data are not independent and are subject to censoring, estimation and inference become more challenging mathematical problems with a wide area of applications. In this context, we propose here some new and flexible tools based on a nonparametric approach. More precisely, allowing dependence between individuals, our main contribution to this domain concerns the following aspects. First, we are interested in developing more suitable confidence intervals for a general class of functionals of a survival distribution via the empirical likelihood method. Secondly, we study the problem of conditional mean estimation using the local linear technique. Thirdly, we develop and study a new estimator of the conditional quantile function also based on the local linear method. In this dissertation, for each proposed method, asymptotic results like consistency and asymptotic normality are derived and the finite sample performance is evaluated in a simulation study. Kernel smoothing Local linear Blocking Quantile regression Survival analysis Nonparametric regression Mean regression Mixing sequences Censoring Kaplan-Meier integral
346	Nonparametric statistical inference for dependent censored data El Ghouch, Anouar 05 October 2007 (has links) A frequent problem that appears in practical survival data analysis is censoring. A censored observation occurs when the observation of the event time (duration or survival time) may be prevented by the occurrence of an earlier competing event (censoring time). Censoring may be due to different causes. For example, the loss of some subjects under study, the end of the follow-up period, drop out or the termination of the study and the limitation in the sensitivity of a measurement instrument. The literature about censored data focuses on the i.i.d. case. However in many real applications the data are collected sequentially in time or space and so the assumption of independence in such case does not hold. Here we only give some typical examples from the literature involving correlated data which are subject to censoring. In the clinical trials domain it frequently happens that the patients from the same hospital have correlated survival times due to unmeasured variables like the quality of the hospital equipment. Censored correlated data are also a common problem in the domain of environmental and spatial (geographical or ecological) statistics. In fact, due to the process being used in the data sampling procedure, e.g. the analytical equipment, only the measurements which exceed some thresholds, for example the method detection limits or the instrumental detection limits, can be included in the data analysis. Many other examples can also be found in other fields like econometrics and financial statistics. Observations on duration of unemployment e.g., may be right censored and are typically correlated. When the data are not independent and are subject to censoring, estimation and inference become more challenging mathematical problems with a wide area of applications. In this context, we propose here some new and flexible tools based on a nonparametric approach. More precisely, allowing dependence between individuals, our main contribution to this domain concerns the following aspects. First, we are interested in developing more suitable confidence intervals for a general class of functionals of a survival distribution via the empirical likelihood method. Secondly, we study the problem of conditional mean estimation using the local linear technique. Thirdly, we develop and study a new estimator of the conditional quantile function also based on the local linear method. In this dissertation, for each proposed method, asymptotic results like consistency and asymptotic normality are derived and the finite sample performance is evaluated in a simulation study. Kernel smoothing Local linear Blocking Quantile regression Survival analysis Nonparametric regression Mean regression Mixing sequences Censoring Kaplan-Meier integral
347	Inkrementell responsanalys : Vilka kunder bör väljas vid riktad marknadsföring? / Incremental response analysis : Which customers should be selected in direct marketing? Karlsson, Jonas, Karlsson, Roger January 2013 (has links) If customers respond differently to a campaign, it is worthwhile to find those customers who respond most positively and direct the campaign towards them. This can be done by using so called incremental response analysis where respondents from a campaign are compared with respondents from a control group. Customers with the highest increased response from the campaign will be selected and thus may increase the company’s return. Incremental response analysis is applied to the mobile operator Tres historical data. The thesis intends to investigate which method that best explain the incremental response, namely to find those customers who give the highest incremental response of Tres customers, and what characteristics that are important.The analysis is based on various classification methods such as logistic regression, Lassoregression and decision trees. RMSE which is the root mean square error of the deviation between observed and predicted incremental response, is used to measure the incremental response prediction error. The classification methods are evaluated by Hosmer-Lemeshow test and AUC (Area Under the Curve). Bayesian logistic regression is also used to examine the uncertainty in the parameter estimates.The Lasso regression performs best compared to the decision tree, the ordinary logistic regression and the Bayesian logistic regression seen to the predicted incremental response. Variables that significantly affect the incremental response according to Lasso regression are age and how long the customer had their subscription. Incremental response modeling uplift modeling database marketing Net information value Lasso regression Bayesian logistic regression decision trees logistic regression
348	Regression approach to software reliability models Mostafa, Abdelelah M 01 June 2006 (has links) Many software reliability growth models have beenanalyzed for measuring the growth of software reliability. In this dissertation, regression methods are explored to study software reliability models. First, two parametric linear models are proposed and analyzed, the simple linear regression and transformed linearregression corresponding to a power law process. Some software failure data sets do not follow the linear pattern. Analysis of popular real life data showed that these contain outliers andleverage values. Linear regression methods based on least squares are sensitive to outliers and leverage values. Even though the parametric regression methods give good results in terms of error measurement criteria, these results may not be accurate due to violation of the parametric assumptions. To overcome these difficulties, nonparametric regression methods based on ranks are proposed as alternative techniques to build software reliability models. In particular, monotone regre ssion and rank regression methods are used to evaluate the predictive capability of the models. These models are applied to real life data sets from various projects as well as to diverse simulated data sets. Both the monotone and the rank regression methods are robust procedures that are less sensitive to outliers and leverage values. In particular, the regression approach explains predictive properties of the mean time to failure for modeling the patterns of software failure times.In order to decide on model preference and to asses predictive accuracy of the mean time between failure time estimates for the defined data sets, the following error measurements evaluative criteria are used: the mean square error, mean absolute value difference, mean magnitude of relative error, mean magnitude oferror relative to the estimate, median of the absolute residuals, and a measure of dispersion. The methods proposed in this dissertation, when applied to real software failure data, give lesserror in terms of all the measurement criteria compared to other popular methods from literature. Experimental results show that theregression approach offers a very promising technique in software reliability growth modeling and prediction. Linear regression Non-homogenous poisson process Power law process Non-parametric Monotone regression Rank regression American Studies Arts and Humanities
349	Modely s proměnlivými koeficienty / Varying coefficient models Sekera, Michal January 2017 (has links) The aim of this thesis is to provide an overview of the varying coefficient mod- els - a class of regression models that allow the coefficients to vary as functions of random variables. This concept is described for independent samples, longi- tudinal data, and time series. Estimation methods include polynomial spline, smoothing spline, and local polynomial methods for models of a linear form and local maximum likelihood method for models of a generalized linear form. The statistical properties focus on the consistency and asymptotical distribution of the estimators. The numerical study compares the finite sample performance of the estimators of coefficient functions. 1
350	Regression and time estimation in the manufacturing industry Bjernulf, Walter January 2023 (has links) In this thesis an analysis is performed on operation times for different sized products in a manufacturing company. The thesis will introduce and summarise most of the theory needed to perform regression and also cover a worked example where three different regression models are learned, evaluated and analysed. Conformal prediction, which at the moment is a hot topic in machine learning, will also be introduced and will be used in the worked example. regression linear regression regression trees random forests cross validation variable selection conformal prediction Probability Theory and Statistics Sannolikhetsteori och statistik

Search results