• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 453
  • 158
  • 49
  • 47
  • 46
  • 38
  • 33
  • 25
  • 20
  • 8
  • 6
  • 6
  • 5
  • 4
  • 4
  • Tagged with
  • 1045
  • 1045
  • 250
  • 147
  • 129
  • 124
  • 113
  • 112
  • 96
  • 95
  • 88
  • 84
  • 83
  • 80
  • 79
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
551

A Statistical Analysis of Medical Data for Breast Cancer and Chronic Kidney Disease

Yang, Kaolee 05 May 2020 (has links)
No description available.
552

Machine Learning Based Sentiment Classification of Text, with Application to Equity Research Reports / Maskininlärningsbaserad sentimentklassificering av text, med tillämpning på aktieanalysrapporte

Blomkvist, Oscar January 2019 (has links)
In this thesis, we analyse the sentiment in equity research reports written by analysts at Skandinaviska Enskilda Banken (SEB). We provide a description of established statistical and machine learning methods for classifying the sentiment in text documents as positive or negative. Specifically, a form of recurrent neural network known as long short-term memory (LSTM) is of interest. We investigate two different labelling regimes for generating training data from the reports. Benchmark classification accuracies are obtained using logistic regression models. Finally, two different word embedding models and bidirectional LSTMs of varying network size are implemented and compared to the benchmark results. We find that the logistic regression works well for one of the labelling approaches, and that the best LSTM models outperform it slightly. / I denna rapport analyserar vi sentimentet, eller attityden, i aktieanalysrapporter skrivna av analytiker på Skandinaviska Enskilda Banken (SEB). Etablerade statistiska metoder och maskininlärningsmetoder för klassificering av sentimentet i textdokument som antingen positivt eller negativt presenteras. Vi är speciellt intresserade av en typ av rekurrent neuronnät känt som long short-term memory (LSTM). Vidare undersöker vi två olika scheman för att märka upp träningsdatan som genereras från rapporterna. Riktmärken för klassificeringsgraden erhålls med hjälp av logistisk regression. Slutligen implementeras två olika ordrepresentationsmodeller och dubbelriktad LSTM av varierande nätverksstorlek, och jämförs med riktmärkena. Vi finner att logistisk regression presterar bra för ett av märkningsschemana, och att LSTM har något bättre prestanda.
553

Predicting Risk Level in Life Insurance Application : Comparing Accuracy of Logistic Regression, DecisionTree, Random Forest and Linear Support VectorClassifiers

Karthik Reddy, Pulagam, Veerababu, Sutapalli January 2023 (has links)
Background: Over the last decade, there has been a significant rise in the life insurance industry. Every life insurance application is associated with some level ofrisk, which determines the premium they charge. The process of evaluating this levelof risk for a life insurance application is time-consuming. In the present scenario, it is hard for the insurance industry to process millions of life insurance applications.One potential approach is to involve machine learning to establish a framework forevaluating the level of risk associated with a life insurance application. Objectives: The aim of this thesis is to perform two comparison studies. The firststudy aims to compare the accuracy of the logistic regression classifier, decision tree classifier, random forest classifier and linear support vector classifier for evaluatingthe level of risk associated with a life insurance application. The second study aimsto identify the impact of changes in the dataset over the accuracy of these selected classification models. Methods: The chosen approach was an experimentation methodology to attain theaim of the thesis and address its research questions. The experimentation involvedcomparing four ML algorithms, namely the LRC, DTC, RFC and Linear SVC. These algorithms were trained, validated and tested on two datasets. A new dataset wascreated by replacing the "BMI" variable with the "Life Expectancy" variable. Thefour selected ML algorithms were compared based on their performance metrics,which included accuracy, precision, recall and f1-score. Results: Among the four selected machine learning algorithms, random forest classifier attained higher accuracy with 53.79% and 52.80% on unmodified and modifieddatasets respectively. Hence, it was the most accurate algorithm for predicting risklevel in life insurance application. The second best algorithm was decision tree classifier with 51.12% and 50.79% on unmodified and modified datasets. The selectedmodels attained higher accuracies when they are trained, validated and tested withunmodified dataset. Conclusions: The random forest classifier scored high accuracy among the fourselected algorithms on both unmodified dataset and modified datasets. The selected models attained higher accuracies when they are trained, validated and tested with unmodified compared to modified dataset. Therefore, the unmodified dataset is more suitable for predicting risk level in life insurance application.
554

Probability of Default Machine Learning Modeling : A Stress Testing Evaluation

Andersson, Tobias, Mentes, Mattias January 2023 (has links)
This thesis aims to assist in the development of machine learning models tailored for stress testing. The main objective is to create models that can predict loan defaults while considering the impact of macroeconomic stress. By achieving this, Nordea can continue the development of machine learning models for stress testing by utilizing the models as a basis for further advancement. The research begins with an analysis of historical loan data, encompassing diverse customer and macroeconomic variables that influence loan default rates. Leveraging machine learning algorithms, feature selection methods, data imbalance management and model training techniques, a set of predictive models is constructed. These models aim to capture the intricate relationships between the identified variables and loan defaults, ensuring their suitability for stress testing purposes. The subsequent phase of the research focuses on subjecting the developed models to simulated adverse economic conditions during stress testing. By evaluating the models’ performance under various stressed scenarios, their ability to provide predictions is assessed. This stress testing process allows us to analyse the models’ capabilities of incorporating a stressed scenario in their predictions. The thesis concludes with an evaluation of the developed machine learning models and their abilities to identify defaulted loans in a stressed macroeconomy. By creating these models specifically tailored for stress testing loans, we will provide a basis for further development within the area of stress testing modeling. / Denna uppsats syftar till att bidra till utvecklingen av maskininlärningsmodeller lämpade för stress testing. Det främsta målet är att skapa modeller som kan förutsäga lån som kommer att misslyckas samtidigt som de beaktar påverkan av makroekonomisk stress. Genom att uppnå detta kan Nordea fortsätta utvecklingen av maskininlärningsmodeller för stress testning genom att använda modellerna som grund för ytterligare utveckling. Arbetet inleds med en analys av historisk lånedata, som omfattar olika kund- och makroekonomiska variabler som påverkar lån. Genom att använda oss av maskininlärningsalgoritmer, metoder för urval av förklarande variabler, hantering av dataobalans och tekniker för modellträning konstrueras en uppsättning prediktiva modeller. Dessa modeller syftar till att fånga de komplexa relationerna mellan de identifierade variablerna och låneavvikelser och säkerställa deras lämplighet för stress testning. Den efterföljande fasen av arbetet fokuserar på att utsätta de utvecklade modellerna för simulerade stressade ekonomiska förhållanden. Genom att utvärdera modellernas prestanda under olika stressade förhållanden bedöms deras förmåga att prediktera uteblivna lån. Denna process för stress testning gör det möjligt för oss att analysera modellernas förmåga att inkludera stressade förhållanden i sina prediktioner. Uppsatsen avslutas med en utvärdering av de utvecklade maskininlärningsmodellerna och deras förmåga att identifiera uteblivna lån i en stressad makroekonomi. Genom att skapa dessa modeller specifikt anpassade för stresstestning av lån kommer vi att ge en grund för ytterligare utveckling inom området.
555

Failure Probability and Lifetime Estimation for Industrial Robots : A Logistic Regression and Lifetime Analysis Approach

Fahlbeck Carlsson, Erik, Herbert, Martin January 2023 (has links)
The ability to handle and process data for information extraction is getting more and more important. Using extracted data from the business to improve productivity is seen as an important part in developing the business processes. In this thesis, industrial robots and their survival times are analyzed. The work is about predicting the probability that a specific robot will fail during a specified time period. Also, survival analysis is conducted where the median lifetime and conditional median lifetime for industrial robots are estimated. Two approaches are used, logistic regression and survival analysis. A logistic regression model is made to predict the probability for different industrial robots to break during a specified time period. The logistic model achieves an accuracy of 0.694 with even higher accuracy regarding high – and low risk robots. The survival analysis uses a Cox PH model to check validity for proportional hazards and then a parametric model with Weibull distribution is fitted. The parametrical survival model is used to estimate the median lifetime and the remaining median lifetime for the robots. The estimated probabilities and lifetimes can be used as an indication of which robots are in risk of failure.
556

Detecting Fraud in Affiliate Marketing: Comparative Analysis of Supervised Machine Learning Algorithms

Ahlqvist, Oskar January 2023 (has links)
Affiliate marketing has become a rapidly growing part of the digital marketing sector. However, fraud in affiliate marketing raises a serious threat to the trust and financial stability of the involved parties. This thesis investigates the performance of three supervised machine learning algorithms - random forest, logistic regression, and support vector machine in detecting fraud in affiliate marketing. The objective is to answer the following main research question by answering two sub-questions: How much can Random Forest, Logistic Regression, and Support Vector Machine contribute to the detection of fraud in affiliate marketing? 1. How can the models be compared in an experiment? 2. How can they be optimized and applied within an affiliate marketing framework? To answer these questions, a dataset of transaction logs is analyzed in collaboration with an affiliate network company. The machine learning experiment employs k-fold crossvalidation and the Area Under the ROC Curve (AUC-ROC) performance metric to evaluate the effectiveness of the classifiers in distinguishing fraudulent from non-fraudulent transactions. The results indicate that the random forest classifier performs best out of the models, achieving the highest mean AUC of 0.7172. Furthermore, using feature importance analysis demonstrates that each feature category had different impact on the performance of the models. It was discovered that the models computes different feature importance meaning that some features displayed greater influence on specific models. By fine-tuning and optimizing the hyperparameters for each model, it is possible to enhance their performance. Despite certain limitations, such as time constraints, data availability, and security restrictions, this study highlights the potential of supervised machine learning algorithms. Particularly random forest showed to how it could be used to improve fraud detection capabilities in affiliate marketing.The insights contribute to closing the knowledge gap in comparing the effectiveness of various classification methods and practical applications for fraud detection.
557

Vegetation Dynamics of an Old-growth Mixed Mesophytic Forest in Southeastern Ohio, USA

Murphy, Stephen J. January 2012 (has links)
No description available.
558

Models and Graphics in the Analysis of Categorical Variables: The Case of the Youth Tobacco Survey.

Hosler, Deborah Susan 16 August 2002 (has links) (PDF)
Youth Tobacco Surveys have been conducted in several states in the U.S. in recent years, in order to design policies with the goal of reducing tobacco use among young people. Some primary analysis of those surveys has been done, but few analyses include modeling, and the study of independence has been addressed, mainly, in the bivariate context. In this work contemporary methods, which are of relative recent appearance in categorical data analysis, will be examined, including logistic and log-linear modeling as well as graphical displays and correspondence analysis. These methods will be applied to data from the 2000 Tennessee Youth Tobacco Survey. The objective is to demonstrate that methods of multivariate categorical data analysis can provide fresh insight about the behavior of adolescents with respect to tobacco use. The ultimate purpose of this work is to recommend methodology that goes beyond that which is currently published.
559

Evaluation of the decision-making process for credit decisions at Preem AB / Utvärdering av beslutsprocessen för kreditbeslut på Preem AB

Holgersson, Annie, Döös, Theresa January 2022 (has links)
The purpose of the following bachelor thesis report within mathematical statistics was to evaluate the decision making process at the credit department at Preem AB. The study used a logistic regression model to find a relationship between the probability of an application for credit being accepted and some quantitative and categorical factors about the applicant. These factors were both found in the applicant's financial statement and annual report as well as in data regarding risk level given to Preem AB by Upplysningscentralen. This data set was used to develop and train the logistic regression model with the aim of evaluating which factors have the biggest impact on the decisions being made after an application goes to trial at the credit department. The model was evaluated and perfected using different methods for variable selection and model evaluation. The study found that no statistically significant model could be created, and came to the conclusion there must exist further factors not covered by this study that affects a decision, or the decisions are taken randomly. Further research can therefore study which factors, such as financial security offered and level of knowledge regarding industry and financial statements among the credit controllers, affect the outcome of the manual trial of a credit application. / Syftet med detta kandidatexamensarbete inom matematisk statistik var att utvärdera prövningsprocessen på kreditavdelningen på Preem AB. I detta examensarbete användes en logistisk regressionsanalys för att finna ett samband mellan sannolikheten att en ansökan om kredit blir godkänd och några kvantitativa och kategoriska variabler om det ansökande företaget. Dessa variabler var hämtade dels från det ansökande företagets årsredovisning, dels från information gällande riskklass framtagen av Upplysningscentralen. Datasetet användes sedan för att bygga och träna en logstisk regressionsmodell med syftet att utvärdera vilka faktorer som har den största påverkan på om en ansökan för kredit blir godkänd eller ej efter den gått till manuell prövning på Preem AB. Modellen utvärderades och förbättrades genom att använda olika metoder för urval av variabler och utvärdering av modellen. Avhandlingen fann att modellen saknade stark prediktiv förmåga och det kan sägas att det bör finnas ytterligare faktorer som påverkar vilket beslut som tas vid manuell prövning på kreditavdelningen. Vidare undersökningar kan därför studera hur faktorer som finansiell säkerhet och kunskap om bransch och ekonomi bland medarbetarna på kreditavdelningen påverkar de manuella besluten som tas.
560

Analys och modellering av sannolikheterna för utfallen i en fotbollsmatch utifrån matchstatistik / Analysis and modeling of the probabilities of the outcomes in a football match based on match statistics

Wikblad, Filip, Hansson, Oskar January 2022 (has links)
Studien undersöker vilken modell som bäst modellerar matchutfallet (1,X,2 - Hemmavinst, Oavgjort, Bortavinst) på en fotbollsmatch utifrån matchstatistik. Datan som analyserats är sammanställd från den engelska fotbollens tre högsta divisioner från 2005 och framåt. Multinomial logistisk regression tillämpas för att modellera responsvariabeln utifrån förklaringsvariablerna. Med hjälp av best subset regression undersöks alla kombinationer av variabler och modellerna jämförs utifrån Akaike Information Criterion (AIC). Tillsammans med resultatet från regressionerna och en analys över multikollinearitet väljs den bästa modellen.  Resultatet visar på både väntade och oväntade effekter vilket skapar grund för framtida studier. Förbättringsområden för framtida studier innefattar fler förklaringsvariabler, jämförelser med spelbolagens odds och test på ny testdata. Tillämpningsområden för modellen är inom spelbranschen där modellen kan användas för att värdera kombinationsspel och liveodds. / This study aims to find the best model to predict the outcome of football (1,X,2 - Home Win, Draw, Away Win) games by looking at match data. The data used is put together from the three highest football divisions in England and go back to the year 2005. Multinomial logistic regression is used to model the response variable from the regressors. A best subset regression is used to find the models with the lowest Akaike Information Criterion (AIC). By doing a multicollinearity analysis these models are further examined and the best one is chosen.  The results show both expected and unexpected effects that create foundation for future studies. Areas for model improvement include more variables, comparison with the bookmaker’s odds and tests on new test data. The application of the model is in sports betting where it can be used to value multi bets and live odds.

Page generated in 0.0346 seconds