Global ETD Search

171	Regression då data utgörs av urval av ranger Widman, Linnea January 2012 (has links) För alpina skidåkare mäter man prestationer i så kallad FIS-ranking. Vi undersöker några metoder för hur man kan analysera data där responsen består av ranger som dessa. Vid situationer då responsdata utgörs av urval av ranger finns ingen självklar analysmetod. Det vi undersöker är skillnaderna vid användandet av olika regressionsanpassningar så som linjär, logistisk och ordinal logistisk regression för att analysera data av denna typ. Vidare används bootstrap för att bilda konfidensintervall. Det visar sig att för våra datamaterial ger metoderna liknande resultat när det gäller att hitta betydelsefulla förklarande variabler. Man kan därmed utgående från denna undersökning, inte se några skäl till varför man ska använda de mer avancerade modellerna. / Alpine skiers measure their performance in FIS ranking. We will investigate some methods on how to analyze data where response data is based on ranks like this. In situations where response data is based on ranks there is no obvious method of analysis. Here, we examine differences in the use of linear, logistic and ordinal logistic regression to analyze data of this type. Bootstrap is used to make confidence intervals. For our data these methods give similar results when it comes to finding important explanatory variables. Based on this survey we cannot see any reason why one should use the more advanced models. Ranks Linear regression Logistic regression Ordinal logistic regression Bootstrap Ranger Linjär regression Logistisk regression Ordinal logistisk regression Bootsrap
172	Inkrementell responsanalys : Vilka kunder bör väljas vid riktad marknadsföring? / Incremental response analysis : Which customers should be selected in direct marketing? Karlsson, Jonas, Karlsson, Roger January 2013 (has links) If customers respond differently to a campaign, it is worthwhile to find those customers who respond most positively and direct the campaign towards them. This can be done by using so called incremental response analysis where respondents from a campaign are compared with respondents from a control group. Customers with the highest increased response from the campaign will be selected and thus may increase the company’s return. Incremental response analysis is applied to the mobile operator Tres historical data. The thesis intends to investigate which method that best explain the incremental response, namely to find those customers who give the highest incremental response of Tres customers, and what characteristics that are important.The analysis is based on various classification methods such as logistic regression, Lassoregression and decision trees. RMSE which is the root mean square error of the deviation between observed and predicted incremental response, is used to measure the incremental response prediction error. The classification methods are evaluated by Hosmer-Lemeshow test and AUC (Area Under the Curve). Bayesian logistic regression is also used to examine the uncertainty in the parameter estimates.The Lasso regression performs best compared to the decision tree, the ordinary logistic regression and the Bayesian logistic regression seen to the predicted incremental response. Variables that significantly affect the incremental response according to Lasso regression are age and how long the customer had their subscription. Incremental response modeling uplift modeling database marketing Net information value Lasso regression Bayesian logistic regression decision trees logistic regression
173	Smart task logging : Prediction of tasks for timesheets with machine learning Bengtsson, Emil, Mattsson, Emil January 2018 (has links) Every day most people are using applications and services that are utilising machine learning, in some way, without even knowing it. Some of these applications and services could, for example, be Google’s search engine, Netflix’s recommendations, or Spotify’s music tips. For machine learning to work it needs data, and often a large amount of it. Roughly 2,5 quintillion bytes of data are created every day in the modern information society. This huge amount of data can be utilised to make applications and systems smarter and automated. Time logging systems today are usually not smart since users of these systems still must enter data manually. This bachelor thesis will explore the possibility of applying machine learning to task logging systems, to make it smarter and automated. The machine learning algorithm that is used to predict the user’s task, is called multiclass logistic regression, which is categorical. When a small amount of training data was used in the machine learning process the predictions of a task had a success rate of about 91%. Computer science machine learning multiclass logistic regression multinomial logistic regression Scala JavaScript web application training data Computer Systems Datorsystem
174	Detection of erroneous payments utilizing supervised and utilizing supervised and unsupervised data mining techniques Yanik, Todd E. 09 1900 (has links) Approved for public release; distribution in unlimited. / In this thesis we develop a procedure for detecting erroneous payments in the Defense Finance Accounting Service, Internal Review's (DFAS IR) Knowledge Base Of Erroneous Payments (KBOEP), with the use of supervised (Logistic Regression) and unsupervised (Classification and Regression Trees (C & RT)) modeling algorithms. S-Plus software was used to construct a supervised model of vendor payment data using Logistic Regression, along with the Hosmer-Lemeshow Test, for testing the predictive ability of the model. The Clementine Data Mining software was used to construct both supervised and unsupervised model of vendor payment data using Logistic Regression and C & RT algorithms. The Logistic Regression algorithm, in Clementine, generated a model with predictive probabilities, which were compared against the C & RT algorithm. In addition to comparing the predictive probabilities, Receiver Operating Characteristic (ROC) curves were generated for both models to determine which model provided the best results for a Coincidence Matrix's True Positive, True Negative, False Positive and False Negative Fractions. The best modeling technique was C & RT and was given to DFAS IR to assist in reducing the manual record selection process currently being used. A recommended ruleset was provided, along with a detailed explanation of the algorithm selection process. / Lieutenant Commander, United States Navy Data mining Logistic regression analysis Regression analysis Data Mining Erroneous Payments Logistic Regression Hosmer Lemeshow Test Classification and Regression Trees Receiver Operator Characteristic curves supervised and unsupervised modeling
175	High-dimensional classification and attribute-based forecasting Lo, Shin-Lian 27 August 2010 (has links) This thesis consists of two parts. The first part focuses on high-dimensional classification problems in microarray experiments. The second part deals with forecasting problems with a large number of categories in predictors. Classification problems in microarray experiments refer to discriminating subjects with different biologic phenotypes or known tumor subtypes as well as to predicting the clinical outcomes or the prognostic stages of subjects. One important characteristic of microarray data is that the number of genes is much larger than the sample size. The penalized logistic regression method is known for simultaneous variable selection and classification. However, the performance of this method declines as the number of variables increases. With this concern, in the first study, we propose a new classification approach that employs the penalized logistic regression method iteratively with a controlled size of gene subsets to maintain variable selection consistency and classification accuracy. The second study is motivated by a modern microarray experiment that includes two layers of replicates. This new experimental setting causes most existing classification methods, including penalized logistic regression, not appropriate to be directly applied because the assumption of independent observations is violated. To solve this problem, we propose a new classification method by incorporating random effects into penalized logistic regression such that the heterogeneity among different experimental subjects and the correlations from repeated measurements can be taken into account. An efficient hybrid algorithm is introduced to tackle computational challenges in estimation and integration. Applications to a breast cancer study show that the proposed classification method obtains smaller models with higher prediction accuracy than the method based on the assumption of independent observations. The second part of this thesis develops a new forecasting approach for large-scale datasets associated with a large number of predictor categories and with predictor structures. The new approach, beyond conventional tree-based methods, incorporates a general linear model and hierarchical splits to make trees more comprehensive, efficient, and interpretable. Through an empirical study in the air cargo industry and a simulation study containing several different settings, the new approach produces higher forecasting accuracy and higher computational efficiency than existing tree-based methods. Classification Microarray experiments Tree-based methods Variable selection Penalized logistic regression Forecasting Computational biology Bioinformatics Pattern recognition systems DNA microarrays Classification Logistic regression analysis
176	A computational approach to discovering p53 binding sites in the human genome Lim, Ji-Hyun January 2013 (has links) The tumour suppressor p53 protein plays a central role in the DNA damage response/checkpoint pathways leading to DNA repair, cell cycle arrest, apoptosis and senescence. The activation of p53-mediated pathways is primarily facilitated by the binding of tetrameric p53 to two 'half-sites', each consisting of a decameric p53 response element (RE). Functional REs are directly adjacent or separated by a small number of 1-13 'spacer' base pairs (bp). The p53 RE is detected by exact or inexact matches to the palindromic sequence represented by the regular expression [AG][AG][AG]C[AT][TA]G[TC][TC][TC] or a position weight matrix (PWM). The use of matrix-based and regular expression pattern-matching techniques, however, leads to an overwhelming number of false positives. A more specific model, which combines multiple factors known to influence p53-dependent transcription, is required for accurate detection of the binding sites. In this thesis, we present a logistic regression based model which integrates sequence information and epigenetic information to predict human p53 binding sites. Sequence information includes the PWM score and the spacer length between the two half-sites of the observed binding site. To integrate epigenetic information, we analyzed the surrounding region of the binding site for the presence of mono- and trimethylation patterns of histone H3 lysine 4 (H3K4). Our model showed a high level of performance on both a high-resolution data set of functional p53 binding sites from the experimental literature (ChIP data) and the whole human genome. Comparing our model with a simpler sequence-only model, we demonstrated that the prediction accuracy of the sequence-only model could be improved by incorporating epigenetic information, such as the two histone modification marks H3K4me1 and H3K4me3. 610
177	Influence of Regional-Level Institutional Factors on Firm-Level Innovation in an Emerging Economy - India Yadati Narasimhulu, Supriya 09 June 2020 (has links) This thesis examines how regional-level factors combined with firm-level factors influence innovation in an emerging economy – India. Past literature has shown that differences in both country contexts and firm-level factors influence innovation. The bulk of this literature tended to focus on developed economies. The handful of studies that have considered contextual differences have studied these at the country-level or within regional blocks such as regions of Europe or Africa. There is a paucity of research, which investigates how differences in state-level factors within a single country combined with firm-level factors influence innovation within firms. Therefore, it is an open question whether the findings derived from developed economies and country-level studies apply equally to emerging economies, particularly at the state level within a single country. Thus, there is a gap in the literature regarding our understanding of the impact of combined state- and firm-level factors on innovation within a single country. This thesis aims to contribute to a better understanding of how state and firm-level factors drive innovation in India, an emerging economy. India is selected because it is a fast-growing emerging economy that is increasingly being integrated into the globalized world economy and thus understanding how these factors influence innovation in an emerging economy would complement the literature that focuses on developed countries. Moreover, India is a huge country with substantial varieties in resources, capabilities, institutions (both formal and informal institutions) as well as ethnic, religious, and cultural varieties. Contextually, these state-level differences are quite different from regions in the developed world where institutional differences tend to be relatively consistent (less varieties). Thus, the insights generated from this study of the Indian context complement prior research by identifying the state and firm factors that combine to drive firm-level innovation. This study also extends the innovation literature by focussing on state-level differences within a single emerging economy, for which there is limited research. The findings could also have practical managerial and policy implications. From a policy perspective, policymakers in India can get a deeper understanding of the relevant factors that influence firm-level innovation so that they can direct policy and resources to promote innovation in their respective states. From a managerial perspective, managers can also get a better understanding of strategies and investments they should take to enhance innovation within their firms. This study is based on data gathered from various sources including the World Bank Enterprise Survey and several sources from within India (Indiastat.com, NCAER State Investment Potential Index, India Innovation Index). The World Bank Enterprise Survey provides firm-level data while state-level data were obtained from the other reputable sources in India. The data were analyzed using logistic regression and multi-level modeling, given that firms are nested within states, thus, we can simultaneously model the micro and macro levels to assess the relevance of the regional context. The results of this study show that regional factors such as regulatory quality, corruption, and rule of law barriers negatively influence innovation in firms that invest in internal R&D to promote innovation. The results also show that regions that devote a higher proportion of their gross domestic product to innovation achieve higher levels of innovation. Further, regions that have higher levels of human capital stock (more skilled workers) and export technology tend to be more innovative. At the firm level, investments in both internal and external R&D and those that have highly experienced managers are more innovative than their peers. These results suggest that governments and policymakers can increase innovative activities of firms by providing a highly skilled labor force, invest heavily in R&D, reduce corruption, regulatory quality, and the rule of law barriers. For firm-level managers, this study indicates that higher levels of managerial capability and greater investments in both internal and external R&D can enhance the technical and innovative capabilities (absorptive capacity) of their firms. This may result in a competitive advantage through increased innovation. India innovation institutional regional firm world bank enterprise survey emerging economy product innovation logistic regression multi-level mixed-effects logistic regression
178	Klasifikace vozidel na základě odezvy indukčních senzorů / Vehicle classification using inductive loops sensors Halachkin, Aliaksei January 2017 (has links) This project is dedicated to the problem of vehicle classification using inductive loop sensors. We created the dataset that contains more than 11000 labeled inductive loop signatures collected at different times and from different parts of the world. Multiple classification methods and their optimizations were employed to the vehicle classification. Final model that combines K-nearest neighbors and logistic regression achieves 94\% accuracy on classification scheme with 9 classes. The vehicle classifier was implemented in C++.
179	How Housing Instability Occurs: Evidence from Panel Study of Income Dynamics Kang, Seungbeom 27 August 2019 (has links) No description available. Urban Planning
180	Modeling Success Factors for Start-ups in Western Europe through a Statistical Learning Approach / Modellering av framgångsfaktorer för startups i Västeuropa genom statistisk inlärning Kamal, Adib, Sabani, Kenan January 2021 (has links) The purpose of this thesis was to use a quantitative method to expand on previous research in the field of start-up success prediction. This was accomplished by including more criteria in the study, which was made possible by the Crunchbase database, which is the largest available information source for start-ups. Furthermore, the data used in this thesis was limited to Western European start-ups only in order to study the effects of limiting the data to a certain geographical region on the prediction models, which to our knowledge has not been done before in this type of research. The quantitative method used was machine learning and specifically the three machine learning predictors used in this thesis were Logistic Regression, Random Forest and K-nearest Neighbor (KNN). All three models proposed and evaluated have a better prediction accuracy than guessing the outcome at random. When tested on data previously unknown to the model, Random Forest produced the greatest results, predicting a successful company as a success and a failed company as a failure with 79 percent accuracy. With accuracies of 65 percent and 59 percent, respectively, both logistic regression and K-Nearest Neighbor (KNN) were close behind. / Syftet med denna avhandling var att använda en kvantitativ metod för att utöka tidigare forskning inom modellering av framgångsfaktorer för start-ups genom maskininlärning. Detta kunde åstadkommas genom att inkludera fler kriterier i studien än vad som har gjorts tidigare, vilket möjliggjordes av Crunchbase-databasen, som är den största tillgängliga informationskällan för nystartade företag. Dessutom är den data som användes i denna avhandling begränsad till endast västeuropeiska start-ups för att studera effekterna av att begränsa data till ett visst geografiskt område i prediktionsmodellerna, vilket inte har gjorts tidigare i denna typ av forskning. Den kvantitativa metoden som användes var maskininlärning och specifikt var de tre maskininlärningsmodellerna som användes i denna avhandling Logistic Regression, Random Forest och K-Nearest Neighbor (KNN). Alla tre modeller som inkluderats och utvärderats har en bättre förutsägelsesnoggrannhet än att gissa resultatet slumpmässigt. När modellerna testades med data som tidigare varit okänd för modellerna, gav Random Forest det bästa resultatet och predikterade ett framgångsrikt företag korrekt och ett misslyckat företag korrekt med 79 procents noggrannhet. Nära efter kom både K-Nearest Neighbor (KNN) och Logistic Regression med respektive noggrannheter på 65 och 59 procent. Machine learning KNN Random Forest Logistic Regression Start-up Success Maskininlärning KNN Random Forest Logistic Regression Start-up Framgångsfaktorer Economics and Business Ekonomi och näringsliv Other Engineering and Technologies Annan teknik

Search results