Global ETD Search

761	High-Dimensional Classification Models with Applications to Email Targeting / Högdimensionella klassificeringsmetoder med tillämpning på målgruppsinriktning för e-mejl Pettersson, Anders January 2015 (has links) Email communication is valuable for any modern company, since it offers an easy mean for spreading important information or advertising new products, features or offers and much more. To be able to identify which customers that would be interested in certain information would make it possible to significantly improve a company's email communication and as such avoiding that customers start ignoring messages and creating unnecessary badwill. This thesis focuses on trying to target customers by applying statistical learning methods to historical data provided by the music streaming company Spotify. An important aspect was the high-dimensionality of the data, creating certain demands on the applied methods. A binary classification model was created, where the target was whether a customer will open the email or not. Two approaches were used for trying to target the costumers, logistic regression, both with and without regularization, and random forest classifier, for their ability to handle the high-dimensionality of the data. Performance accuracy of the suggested models were then evaluated on both a training set and a test set using statistical validation methods, such as cross-validation, ROC curves and lift charts. The models were studied under both large-sample and high-dimensional scenarios. The high-dimensional scenario represents when the number of observations, N, is of the same order as the number of features, p and the large sample scenario represents when N ≫ p. Lasso-based variable selection was performed for both these scenarios, to study the informative value of the features. This study demonstrates that it is possible to greatly improve the opening rate of emails by targeting users, even in the high dimensional scenario. The results show that increasing the amount of training data over a thousand fold will only improve the performance marginally. Rather efficient customer targeting can be achieved by using a few highly informative variables selected by the Lasso regularization. / Företag kan använda e-mejl för att på ett enkelt sätt sprida viktig information, göra reklam för nya produkter eller erbjudanden och mycket mer, men för många e-mejl kan göra att kunder slutar intressera sig för innehållet, genererar badwill och omöjliggöra framtida kommunikation. Att kunna urskilja vilka kunder som är intresserade av det specifika innehållet skulle vara en möjlighet att signifikant förbättra ett företags användning av e-mejl som kommunikationskanal. Denna studie fokuserar på att urskilja kunder med hjälp av statistisk inlärning applicerad på historisk data tillhandahållen av musikstreaming-företaget Spotify. En binärklassificeringsmodell valdes, där responsvariabeln beskrev huruvida kunden öppnade e-mejlet eller inte. Två olika metoder användes för att försöka identifiera de kunder som troligtvis skulle öppna e-mejlen, logistisk regression, både med och utan regularisering, samt random forest klassificerare, tack vare deras förmåga att hantera högdimensionella data. Metoderna blev sedan utvärderade på både ett träningsset och ett testset, med hjälp av flera olika statistiska valideringsmetoder så som korsvalidering och ROC kurvor. Modellerna studerades under både scenarios med stora stickprov och högdimensionella data. Där scenarion med högdimensionella data representeras av att antalet observationer, N, är av liknande storlek som antalet förklarande variabler, p, och scenarion med stora stickprov representeras av att N ≫ p. Lasso-baserad variabelselektion utfördes för båda dessa scenarion för att studera informationsvärdet av förklaringsvariablerna. Denna studie visar att det är möjligt att signifikant förbättra öppningsfrekvensen av e-mejl genom att selektera kunder, även när man endast använder små mängder av data. Resultaten visar att en enorm ökning i antalet träningsobservationer endast kommer förbättra modellernas förmåga att urskilja kunder marginellt. Statistical learning logistic regression random forest classifier customer relationship management customer targeting. Statistisk inlärning logistisk regression random forest klassificerare kundrelationshantering kundinriktning. Mathematical Analysis Matematisk analys
762	Prediction of Credit Risk using Machine Learning Models Isaac, Philip January 2022 (has links) This thesis aims to investigate different machine learning (ML) models and their performance to find the best performing model to predict credit risk at a specific company. Since granting credit to corporate customers is a part of this company's core business, managing the credit risk is of high importance. The company has of today only one credit risk measurement, which is obtained through an external company, and the goal is to find a model that outperforms this measurement. The study consists of two ML models, Logistic Regression (LR) and eXtreme Gradient Boosting. This thesis proves that both methods perform better than the external risk measurement and the LR method achieves the overall best performance. One of the most important analyses done in this thesis was handling the dataset and finding the best-suited combination of features that the ML models should use. Credit Risk Credit Risk Scorecard Machine Learning Artificial Intelligence AI Logistic Regression eXtreme Gradient Boosting ROC-AUC Binning Cross-Validation Correlation Computer Sciences Datavetenskap (datalogi)
763	Bankruptcy prediction models on Swedish companies. Charraud, Jocelyn, Garcia Saez, Adrian January 2021 (has links) Bankruptcies have been a sensitive topic all around the world for over 50 years. From their research, the authors have found that only a few bankruptcy studies have been conducted in Sweden and even less on the topic of bankruptcy prediction models. This thesis investigates the performance of the Altman, Ohlson and Zmijewski bankruptcy prediction models. This research investigates all Swedish companies during the years 2017 and 2018. This study has the intention to shed light on some of the most famous bankruptcy prediction models. It is interesting to explore the predictive abilities and usability of those three models in Sweden. The second purpose of this study is to create two models from the most significant variable out of the three models studied and to test its prediction power with the aim to create two models designed for Swedish companies. We identified a research gap in terms of Sweden, where bankruptcy prediction models have been rather unexplored and especially with those three models. Furthermore, we have identified a second research gap regarding the time period of the research. Only a few studies have been conducted on the topic of bankruptcy prediction models post the financial crisis of 2007/08. We have conducted a quantitative study in order to achieve the purpose of the study. The data used was secondary data gathered from the Serrano database. This research followed an abductive approach with a positive paradigm. This research has studied all active Swedish companies between the years 2017 and 2018. Finally, this contributed to the current field of knowledge on the topic through the analysis of the results of the models on Swedish companies, using the liquidity theory, solvency and insolvency theory, the pecking order theory, the profitability theory, the cash flow theory, and the contagion effect. The results aligned with the liquidity theory, the solvency and insolvency theory and the profitability theory. Moreover, from this research we have found that the Altman model has the lowest performance out of the three models, followed by the Ohlson model that shows some mixed results depending on the statistical analysis. Lastly, the Zmijewski model has the best performance out of the three models. Regarding the performance and the prediction power of the two new models were significantly higher than the three models studied. Bankruptcy prediction models Ohlson model Altman model Zmijewski model Sweden logistic regression probit regression AUC Youden index Liu index Business Administration Företagsekonomi
764	Employment Status and Professional Integration of IMGs in Ontario Jablonski, Jan O. D. January 2012 (has links) This study investigated international medical graduates (IMGs), registered between January 1, 2007 and April 14, 2011, at the Access Centre for Internationally Educated Health Professionals in Ontario. By way of logistic regression in a cross-sectional design, it was found that permanent residents who were recent immigrants had lesser chances of being employed full-time at registration (baseline). By way of survival analysis in a cohort design, it was found that younger IMGs who have been in Canada less than 5 years and who have taken the Medical Council of Canada Evaluating Exam (MCCEE) have the greatest chances of securing residency positions in Canada or the US, whereas IMGs from Eastern Europe, South Asia and Africa have lesser chances. It was revealed that registered IMGs are a vulnerable population, and certain groups may be disadvantaged due to underlying characteristics. These groups can be targeted for specific interventions. International Medical Graduates Residency Postgraduate medical education Logistic regression Survival analysis Policy and program implications Access Centre
765	Effects of COVID-19 on temporal urban diversity : A quantitative study using mobile phone data as a proxy for human mobility patterns Sjöblom, Feliks January 2021 (has links) The present paper examines possible changes in temporal urban diversity caused by the COVID-19 pandemic in Stockholm and Uppsala metropolitan areas. In addition to general changes in diversity, potential differences of diversity levels at locations with varying socioeconomic characteristics are examined. The diversity levels are calculated based on mobile phone data and defined by the inflow and distribution of individuals to locations. The time frame involves eight study dates and extends from January to April 2020. The paper reaches the following conclusions. (1) Diversity levels display a general decline during the pandemic, with one exception - Easter Holidays. (2) Individuals residing in areas with high proportions of highly educated individuals or visible minorities experience a decrease in diversity whereas the opposite is true for areas with high proportions of low-income earners or senior citizens (3) The increase in diversity in the two last mentioned areas, which are located in remote parts of the metropolitan area, coincide with decreasing levels of diversity in the central parts of the metropolitan area. It is possible that changes in diversity levels in these areas can be explained by changes in general behavioural trends, e.g. incentives to avoid crowded city center areas. COVID-19 urban diversity human mobility mobile phone data logistic regression COVID-19 urban mångfald mänsklig mobilitet mobiltelefondata logistisk regression Human Geography Kulturgeografi
766	Národnostní skupiny v prostoru bývalého Sovětského svazu / Ethnic groups in the former Soviet Union space Tkáčová, Kateřina January 2012 (has links) The topic of this diploma thesis is ethnic groups in the space of the former Soviet Union in the time period 1994-2006 and their involvement in ethnic conflicts. The aim of this thesis is to identify key parameters driving these ethnic groups towards armed conflict as a response to their needs, interests and living conditions. Key assumptions of this thesis are derived from qauntitative as well as qualitative studies. Important characteristics of ethnic groups are also included in the analysis of possible causes of ethnic conflicts. The theoretical discussion shows three main factors which can make ethnic groups more prone to conflict: permanent exclusion, strong identity and lastly dissimilarity of an ethnic group. Influence of these factors is tested using descriptive statistics, odds ratio, correlation and logistic regression. Statistical results shows that strong identity as well as discrimination of ethnic groups increase the probability of ethnic conflicts.
767	Factors Associated with Crash Severities in Built-up Areas Along Rural Highways of Nevada: A Case Study of 11 Towns Shrestha, Pramen P., Shrestha, Joseph 01 February 2017 (has links) In 2014, 32,675 deaths were recorded in vehicle crashes within the United States. Out of these, 51% of the fatalities occurred in rural highways compared to 49% in urban highways. No specific crash data are available for the built-up areas along rural highways. Due to high fatalities in rural highways, it is important to identify the factors that cause the vehicle crashes. The main objective of this study is to determine the factors associated with severities of crashes that occurred in built-up areas along the rural highways of Nevada. Those factors could aid in making informed decisions while setting up speed zones in these built-up areas. Using descriptive statistics and binary logistic regression model, 337 crashes that occurred in 11 towns along the rural highways from 2002 to 2010 were analyzed. The results showed that more crashes occurred during favorable driving conditions, e.g., 87% crashes on dry roads and 70% crashes in clear weather. The binary logistic regression model showed that crashes occurred from midnight until 4 a.m. were 58.3% likely to be injury crashes rather than property damage only crashes, when other factors were kept at their mean values. Crashes on weekdays were three times more likely to be injury crashes than that occurred on weekends. When other factors were kept at their mean value, crashes involving motorcycles had an 80.2% probability of being injury crashes. Speeding was found to be 17 times more responsible for injury crashes than mechanical defects of the vehicle. As a result of this study, the Nevada Department of Transportation now can take various steps to improve public safety, including steps to reduce speeding and encourage the use of helmets for motorcycle riders. Binary logistic regression model Crash severity Nevada department of transportation Rural highway Speed-zone guideline Construction Engineering and Management
768	A Logistic regression analysis model for predicting the success of computer networking projects in Zimbabwe Masamha, Tavengwa 02 1900 (has links) Information and communication technology (ICT) greatly influence today’s business processes be it in public or private sectors. Everything that is done in business requires ICT in one way or the other. Research in ICTs is therefore critical. So much research was and is still carried out in projects that develop or enhance ICT but it is still apparent that the success rate of these projects is still very low. The extensive coverage of ICTs implies that if the success rate is still that low, many resources are being wasted in the failed projects; therefore, more research is needed to improve the success rate. Previous research has focussed on factors which are critical for the success of ICT projects, assuming that all ICT projects are the same. As a result, literature is full of different suggestions and guidelines of the factors critical to ICT projects’ success. This scenario brings challenges to project managers who end up using their own personal judgement to select which factors to consider for any project at hand. The end result is the high failure rate of ICT projects since there is a very high chance of applying the same critical success factors to different types of ICT projects. This research answered the question: which factors are critical to the success of computer networking projects in Zimbabwe and how these factors could be used for building a model that determines in advance the success of such projects? Literature reviewed indicated that most CSFs were not focused on specific types of ICT projects, hence were generalised. No literature was found on ICT projects’ CSFs in Zimbabwe. More so, no CSFs were found for computer networking projects as a specific instance of ICT projects. No model existed that predicts computer networking projects’ success. This study addressed the gaps by developing a CSF framework for ICT projects in Zimbabwe, determining CSFs for computer networking projects in Zimbabwe and the development of a logistic regression analysis model to predict computer networking projects’ success in Zimbabwe. Data was collected in Zimbabwe using a unique three-staged process which comprise metasynthesis analysis, questionnaire and interviews. The study was motivated by the fact that most available research focused on CSFs for general ICT projects and that no research was found on CSFs influencing projects in computer networking. Meta-synthesis analysis was therefore conducted on literature in order to identify CSFs as given in literature. The approach was appropriate since the researcher had noticed that there were extensive ICT projects’ CSFs and that no such research has been carried out in Zimbabwe. These CSFs formed the basis for the determination (using a questionnaire) of ICT projects CSFs for Zimbabwe in particular. Project practitioners’ viewpoints were sought through questionnaires. Once CSFs for ICT projects in Zimbabwe were determined, they formed the basis for the determination of unique critical success factors for computer networking projects in Zimbabwe. Interviews were used to get further information that would have been left out by questionnaires. The interview questions were set to clarify some unclear or conflicting responses from the questionnaire and providing in-depth insights into the factors critical to computer networking projects in Zimbabwe. The data i.e. critical success factors for computer networking projects guided the development of the logistic regression analysis model for the prediction of computer networking projects’ success in Zimbabwe. Data analysis from the questionnaire was analysed using SPSS Version 23.0. Factor analysis and principal component analysis were some of the techniques used in the analysis. Interview data was analysed through NVivo Version 10.0. From the results it was deduced that factors critical to ICT project management in Zimbabwe were closely related to those found in the literature. The only apparent difference was that CSFs for ICT projects in Zimbabwe were more specific thereby enhancing their applicability. Computer networking projects had fewer CSFs than general ICT projects. In addition, CSFs for general ICT projects were different from those critical to computer networking projects in Zimbabwe. The development of a comprehensive set of general ICT projects’ CSFs was the first contribution of this study. This was achieved through meta-synthesis analysis. The other contribution was the development of a CSF framework for ICT projects specific to Zimbabwe and those specific to computer networking projects in Zimbabwe. The major contribution was the development of the logistic regression analysis model that predicts computer networking projects’ success in Zimbabwe. These contributions will provide literature on ICT project management in Zimbabwe which will subsequently assist ICT project managers to concentrate on specific factors. The developed prediction model can be used by project managers to determine possible success or failure of ICT projects; thereby possible reducing wastage of resource. / School of Computing Computer networking project Critical success factor Logistic regression analysis model Project success Prediction Meta-synthesis ICT project Success criteria Principal component analysis Factor analysis
769	Relationship Between Active Learning Methodologies and Community College Students' STEM Course Grades Lesko, Cherish Christina 01 January 2017 (has links) Active learning methodologies (ALM) are associated with student success, but little research on this topic has been pursued at the community college level. At a local community college, students in science, technology, engineering, and math (STEM) courses exhibited lower than average grades. The purpose of this study was to examine whether the use of ALM predicted STEM course grades while controlling for academic discipline, course level, and class size. The theoretical framework was Vygotsky's social constructivism. Descriptive statistics and multinomial logistic regression were performed on data collected through an anonymous survey of 74 instructors of 272 courses during the 2016 fall semester. Results indicated that students were more likely to achieve passing grades when instructors employed in-class, highly structured activities, and writing-based ALM, and were less likely to achieve passing grades when instructors employed project-based or online ALM. The odds ratios indicated strong positive effects (greater likelihoods of receiving As, Bs, or Cs in comparison to the grade of F) for writing-based ALM (39.1-43.3%, 95% CI [10.7-80.3%]), highly structured activities (16.4-22.2%, 95% CI [1.8-33.7%]), and in-class ALM (5.0-9.0%, 95% CI [0.6-13.8%]). Project-based and online ALM showed negative effects (lower likelihoods of receiving As, Bs, or Cs in comparison to the grade of F) with odds ratios of 15.7-20.9%, 95% CI [9.7-30.6%] and 16.1-20.4%, 95% CI [5.9-25.2%] respectively. A white paper was developed with recommendations for faculty development, computer skills assessment and training, and active research on writing-based ALM. Improving student grades and STEM course completion rates could lead to higher graduation rates and lower college costs for at-risk students by reducing course repetition and time to degree completion. active learning methods community college course completion multinomial logistic regression STEM education Higher Education Administration Higher Education and Teaching Science and Mathematics Education
770	Using Natural Language Processing and Machine Learning for Analyzing Clinical Notes in Sickle Cell Disease Patients Khizra, Shufa January 2018 (has links) No description available. Computer Science Sickle Cell Disease SCD cTAKES Natural Language Processing NLP Logistic Regression Random Forest Support Vector Machines Multinomial Naive Bayes

Search results