Global ETD Search

701	P-SGLD : Stochastic Gradient Langevin Dynamics with control variates Bruzzone, Andrea January 2017 (has links) Year after years, the amount of data that we continuously generate is increasing. When this situation started the main challenge was to find a way to store the huge quantity of information. Nowadays, with the increasing availability of storage facilities, this problem is solved but it gives us a new issue to deal with: find tools that allow us to learn from this large data sets. In this thesis, a framework for Bayesian learning with the ability to scale to large data sets is studied. We present the Stochastic Gradient Langevin Dynamics (SGLD) framework and show that in some cases its approximation of the posterior distribution is quite poor. A reason for this can be that SGLD estimates the gradient of the log-likelihood with a high variability due to naïve sampling. Our approach combines accurate proxies for the gradient of the log-likelihood with SGLD. We show that it produces better results in terms of convergence to the correct posterior distribution than the standard SGLD, since accurate proxies dramatically reduce the variance of the gradient estimator. Moreover, we demonstrate that this approach is more efficient than the standard Markov Chain Monte Carlo (MCMC) method and that it exceeds other techniques of variance reduction proposed in the literature such as SAGA-LD algorithm. This approach also uses control variates to improve SGLD so that it is straightforward the comparison with our approach. We apply the method to the Logistic Regression model. Big Data Bayesian Inference MCMC SGLD Estimated Gradient Logistic Regression Probability Theory and Statistics Sannolikhetsteori och statistik
702	Employing mHealth Applications for the Self-Assessment of Selected Eye Functions and Prediction of Chronic Major Eye Diseases among the Aging Population Abdualiyeva, Gulnara 24 May 2019 (has links) In the epoch of advanced mHealth (mobile health) use in ophthalmology, there is a scientific call for regulating the validity and reliability of eye-related apps. For a positive health outcome that works towards enhancing mobile-application guided diagnosis in joint decision-making between eye specialists and individuals, the aging population should be provided with a reliable and valid tool for assessment of their eye status outside the physician office. This interdisciplinary study aims to determine through hypothesis testing validity and reliability of a limited set of five mHealth apps (mHAs ) and through binary logistic regression the prediction possibilities of investigated apps to exclude the four major eye diseases in the particular demographic population. The study showed that 189 aging adults (45- 86 years old) who did complete the mHAs’ tests were able to produce reliable results of selected eye function tests through four out of five mHAs measuring visual acuity, contrast sensitivity, red desaturation, visual field and Amsler grid in comparison with a “gold standard” - comprehensive eye examination. Also, part of the participants was surveyed for assessing the Quality of Experience on mobile apps. Understanding of current reliability of existing eye-related mHAs will lead to the creation of ideal mobile application’ self-assessment protocol predicting the timely need for clinical assessment and treatment of age-related macular degeneration, diabetic retinopathy, glaucoma and cataract. Detecting the level of eye function impairments by mHAs is cost-effective and can contribute to research methodology in eye diseases’ prediction by expanding the system of clear criteria specially created for mobile applications and provide returning significant value in preventive ophthalmology. mHealth applications Reliability Validity Age-related eye diseases Hypothesis testing Binary logistic regression
703	Stratified Multilevel Logistic Regression Modeling for Risk Factors of Adolescent Obesity in Tennessee Zheng, Shimin, Strasser, Sheryl, Holt, Nicole, Quinn, Megan, Liu, Ying, Morrell, Casey 21 February 2018 (has links) Background: US adolescent obesity rates have quadrupled over the past 3 decades. Research examining complex factors associatedwith obesity is limited.Objectives: The purpose of this study was to utilize a representative sample of students (grades 6 - 8) in Tennessee to determine theco-occurrence of risk behaviors with adolescent obesity prevalence and to analyze variations by strata. Methods: The 2010 youth risk behavior survey dataset was used to examine associations of obesity with variables related to sampledemographics, risk and protective behaviors, and region. Hierarchical logistic regression analyses stratiﬁed by demographics andregion were conducted to evaluate variation in obesity risk occurring on three hierarchical levels: class, school and district. Results: The sample consisted of 60715 subjects. The overall obesity rate was 22%. High prevalence of obesity existed in males, non-white race, those ever smoked and was positively correlated with age. Across three state regions, race, gender, and speciﬁc behaviors (smoking, weight misperception, disordered eating, +3 hours TV viewing, and no sports team participation) persisted as signiﬁcantpredictors of adolescent obesity, although variations by region and demographics were observed. Multilevel analyses indicate that< 1%, 0 - 1.97% and4.03 - 13.06% of the variation in obesity was associated with district, school and class diﬀerences, respectively, whenstratifying the sample by demographic characteristics or region. Conclusions: Uniform school-based prevention eﬀorts targeting adolescent obesity risk may have limited impact if they fail torespond to geographical and demographic nuances that hierarchal modeling can detect. Study results reveal that stratiﬁed hi-erarchical analytic approaches to examine adolescent obesity risk have tremendous potential to elucidate signiﬁcant prevention insights. adolescentes obesity risk behavior stratified hierarchical logistic regression Biostatistics and Epidemiology Public Health
704	Omnichannel path to purchase : Viability of Bayesian Network as Market Attribution Models Dikshit, Anubhav January 2020 (has links) Market attribution is the problem of interpreting the influence of advertisements onthe user’s decision process. Market attribution is a hard problem, and it happens to be asignificant reason for Google’s revenue. There are broadly two types of attribution models- data-driven and heuristics.This thesis focuses on the data driven attribution modeland explores the viability of using Bayesian Network as market attribution models andbenchmarks the performance against a logistic regression. The data used in this thesiswas prepossessed using undersampling technique. Furthermore, multiple techniques andalgorithms to learn and train Bayesian Network are explored and evaluated.For the given dataset, it was found that Bayesian Network can be used for market at-tribution modeling and that its performance is better than the baseline logistic model. Keywords: Market Attribution Model, Bayesian Network, Logistic Regression. Market Attribution Model Bayesian Network Logistic Regression Data driven marketing Probability Theory and Statistics Sannolikhetsteori och statistik
705	Two-Stage Logistic Regression Models for Improved Credit Scoring / Två-stegs logistiska regressioner för förbättrad credit scoring Lund, Anton January 2015 (has links) This thesis has investigated two-stage regularized logistic regressions applied on the credit scoring problem. Credit scoring refers to the practice of estimating the probability that a customer will default if given credit. The data was supplied by Klarna AB, and contains a larger number of observations than many other research papers on credit scoring. In this thesis, a two-stage regression refers to two staged regressions were the some kind of information from the first regression is used in the second regression to improve the overall performance. In the best performing models, the first stage was trained on alternative labels, payment status at earlier dates than the conventional. The predictions were then used as input to, or to segment, the second stage. This gave a gini increase of approximately 0.01. Using conventional scorecutoffs or distance to a decision boundary to segment the population did not improve performance. / Denna uppsats har undersökt tvåstegs regulariserade logistiska regressioner för att estimera credit score hos konsumenter. Credit score är ett mått på kreditvärdighet och mäter sannolikheten att en person inte betalar tillbaka sin kredit. Data kommer från Klarna AB och innehåller fler observationer än mycket annan forskning om kreditvärdighet. Med tvåstegsregressioner menas i denna uppsats en regressionsmodell bestående av två steg där information från det första steget används i det andra steget för att förbättra den totala prestandan. De bäst presterande modellerna använder i det första steget en alternativ förklaringsvariabel, betalningsstatus vid en tidigare tidpunkt än den konventionella, för att segmentera eller som variabel i det andra steget. Detta gav en giniökning på approximativt 0,01. Användandet av enklare segmenteringsmetoder så som score-gränser eller avstånd till en beslutsgräns visade sig inte förbättra prestandan. Machine Learning Credit Scoring Two-stage Logistic Regressions Computer Sciences Datavetenskap (datalogi)
706	Geostatistical techniques for predicting bird species occurrences Shahiruzzaman, Mohammad, Rauf, Adnan January 2011 (has links) Habitat loss and fragmentation are major threats to biodiversity. Geostatistical methods, especially kriging, are widely used in ecology. Bird counts data often fail to show normal distribution over an area which is required for most of the kriging methods. Hence choosing an interpolation method without understanding the implications may lead to bias results. United Kingdom’s Exprodat Consulting Ltd had set an Exploratory Spatial Data Analysis (ESDA) workflow for optimising interpolation of petroleum dataset. This workflow was applied in this study to predict capercaillie bird species over whole Sweden. There was no trend found in the dataset. Also the dataset was not spatially auto-correlated. A completely regularized spline surface model was created with RMSE 1.336. Medium to high occurrences (8-16) were found over two very small areas, within Västerbottens county and Västra Götlands county. Low occurrences (1-3) were found all over Sweden. Urban areas like Stockholm city and Malmö city had low occurrences. Another kriging prediction surface was created with RMSE 1.314 to compare the results. There were no prediction values from 5 to 16 in kriging surface. In-depth studies were carried out by selecting three areas. The studies showed that the results of local kriging surfaces did not match with the results of global surface. Uncertainty in GIS may exist at any level. Having low RMSE value does not always mean a good result. Hence ESDA before choosing interpolation method is an effective way. And a post result field investigation could make it more valid. Regression analysis is also widely used in ecology and there are certain different methods that are available to be used. Ordinary Least Squares is the first method that was tested upon bird counts data set. Adjusted R-squared value was 0.008616 which indicated that explanatory variables pine, spruce, roads, urban areas and wetlands were just contributing to 0.8% to the dependent variable bird counts. It was also found that there was no linear relationship between dependent and explanatory variables. Logistic regression was the next step as it had the capability to work with nonlinear data also. The Spatial Data Modeller (SDM) tool was used to perform logistic regression in ArcGIS 9.3. Initially results of logistic regression were unexpected, hence focal statistics was performed upon all the independent variables. Logistic regression with these new independent variables generated meaningful results. This time the probability of occurrence of birds had weak positive relationship with all the independent variables. Coefficients of pine, spruce, roads, urban areas and wetlands were found to be 0.39, 0.23, 0.13, 0.24 and 0.14 respectively. Pine and spruce are natural attractors for birds, hence results were quite acceptable. But the overall model performance remained poor. Positive coefficient for roads, urban areas and wetlands may well be due to redundancy in these datasets or observer bias in bird species reporting. IDRISI Andes also came up with almost the same results when logistic regression with same dependent and independent variables was performed. IDRISI Andes output contained the pseudo R-square value, found to be 0.0416. This was an indication of biasness in the dataset also. The results of in-depth studies by selecting three areas also showed that LR with focal statistics were having better results than LR without focal statistics, but the overall performance remained poor. The SDM tool is a good choice for performing logistic regression on small scale datasets due to its limitation. Comparison of results between the two geostatistical methods, interpolation and regression depicts the similarity at discrete places; an unbiased dataset might have resulted in a better comparison of two methods. ESDA Kriging Spline OLS SDM Logistic regression Computer and Information Sciences Data- och informationsvetenskap
707	Factors Affecting the Preference of Buying Hybrid and Electric Vehicles Zhao, Zhenyu January 2021 (has links) Electric Vehicles is regarded as an important solution for emission reduction. But, the adoption to it is still a problem in many countries. With survey data containing demographic and attitude factors of respondents, this paper proposes two classification models: logistic regression and random forest using the Multiple Correspondence Analysis (MCA) as an intermediate step to identify the factors affecting the willingness of electric vehicles purchase. The analysis shows that the addition of MCA does enhance the explanatory power while it takes a low cost on prediction performance, and the results reveal that characteristics such as frequency of using modern transport services, car-sharing subscription, living place, mode of frequent trip do have a significant impact on EV purchases. Electric Vehicles Multiple Correspondence Analysis Logistic regression Random forest Probability Theory and Statistics Sannolikhetsteori och statistik
708	CAN STATISTICAL MODELS BEAT BENCHMARK PREDICTIONS BASED ON RANKINGS IN TENNIS? Svensson, William January 2021 (has links) The aim of this thesis is to beat a benchmark prediction of 64.58 percent based on player rankings on the ATP tour in tennis. That means that the player with the best rank in a tennis match is deemed as the winner. Three statistical model are used, logistic regression, random forest and XGBoost. The data are over a period between the years 2000-2010 and has over 60 000 observations with 49 variables each. After the data was prepared, new variables were created and the difference between the two players in hand taken all three statistical models did outperform the benchmark prediction. All three variables had an accuracy around 66 percent with the logistic regression performing the best with an accuracy of 66.45 percent. The most important variable overall for the models is the total win rate on different surfaces, the total win rate and rank. Logistic Regression Random Forest XGBoost ATP tour Probability Theory and Statistics Sannolikhetsteori och statistik
709	Analýza příčin a povahy etnických konfliktů / Analysis of the Causes and Nature of Ethnic Conflicts Kohout, Jan January 2015 (has links) The aim of this thesis is to analyze factors responsible for onset of ethnic conflicts and selected characteristics. By comparing to non-ethnic conflicts it was determined, if there are any differences in onset mechanisms of these two types of conflicts and thus if there is a space for explanatory role of ethnicity as a cause of ethnic conflicts. Selection of examined factors is congruent with the relevant literature and existing analyses and reflects the context of contemporary conflict research. The influence of male unemployment rate, level of Human development index and its inequality-adjusted version, human rights and finally the influence of conflicts in neighbouring countries on the onset of conflict is tested by statistical methods in component analyses. Also the intensity of ethnic and non-ethnic conflicts, war years and HDI are also compared. The comparative style of the research helps to understand the true nature of causes of intrastate conflicts and indicates, that there is no difference between the two types. Empirical character of this thesis is also the reason for assessing it within the context of other quantitative studies of conflict, comparing the results and defining the proper level of analysis for reaching tangible contributions.
710	Model Driven Logistics Integration Engineering Kunkel, Robert January 2011 (has links) Der Logistikdienstleistungssektor ist durch arbeitsteilige sowie kurz-, mittel und langfristige Zusammenarbeit gekennzeichnet. Insbesondere Fourth Party Logistics (4PL) stehen permanent vor der Aufgabe unterschiedliche Logistikdienstleister und damit auch deren Informationssysteme ad-hoc und medienbruchfrei in unternehmensübergreifende Informationsflüsse zu integrieren. Dieser Beitrag stellt verschiedene logistikspezifische Integrationsvarianten, einen modellgetriebenen Integrationsansatz sowie ein Lösungskonzept auf Basis der Logistik Service Engineering & Management (LSEM)-Plattform vor. info:eu-repo/classification/ddc/330 ddc:330 Wirtschaftsinformatik, Logistik Information Systems, Logistic

Search results