Global ETD Search

41	Building Information Extraction and Refinement from VHR Satellite Imagery using Deep Learning Techniques Bittner, Ksenia 26 March 2020 (has links) Building information extraction and reconstruction from satellite images is an essential task for many applications related to 3D city modeling, planning, disaster management, navigation, and decision-making. Building information can be obtained and interpreted from several data, like terrestrial measurements, airplane surveys, and space-borne imagery. However, the latter acquisition method outperforms the others in terms of cost and worldwide coverage: Space-borne platforms can provide imagery of remote places, which are inaccessible to other missions, at any time. Because the manual interpretation of high-resolution satellite image is tedious and time consuming, its automatic analysis continues to be an intense field of research. At times however, it is difficult to understand complex scenes with dense placement of buildings, where parts of buildings may be occluded by vegetation or other surrounding constructions, making their extraction or reconstruction even more difficult. Incorporation of several data sources representing different modalities may facilitate the problem. The goal of this dissertation is to integrate multiple high-resolution remote sensing data sources for automatic satellite imagery interpretation with emphasis on building information extraction and refinement, which challenges are addressed in the following: Building footprint extraction from Very High-Resolution (VHR) satellite images is an important but highly challenging task, due to the large diversity of building appearances and relatively low spatial resolution of satellite data compared to airborne data. Many algorithms are built on spectral-based or appearance-based criteria from single or fused data sources, to perform the building footprint extraction. The input features for these algorithms are usually manually extracted, which limits their accuracy. Based on the advantages of recently developed Fully Convolutional Networks (FCNs), i.e., the automatic extraction of relevant features and dense classification of images, an end-to-end framework is proposed which effectively combines the spectral and height information from red, green, and blue (RGB), pan-chromatic (PAN), and normalized Digital Surface Model (nDSM) image data and automatically generates a full resolution binary building mask. The proposed architecture consists of three parallel networks merged at a late stage, which helps in propagating fine detailed information from earlier layers to higher levels, in order to produce an output with high-quality building outlines. The performance of the model is examined on new unseen data to demonstrate its generalization capacity. The availability of detailed Digital Surface Models (DSMs) generated by dense matching and representing the elevation surface of the Earth can improve the analysis and interpretation of complex urban scenarios. The generation of DSMs from VHR optical stereo satellite imagery leads to high-resolution DSMs which often suffer from mismatches, missing values, or blunders, resulting in coarse building shape representation. To overcome these problems, a methodology based on conditional Generative Adversarial Network (cGAN) is developed for generating a good-quality Level of Detail (LoD) 2 like DSM with enhanced 3D object shapes directly from the low-quality photogrammetric half-meter resolution satellite DSM input. Various deep learning applications benefit from multi-task learning with multiple regression and classification objectives by taking advantage of the similarities between individual tasks. Therefore, an observation of such influences for important remote sensing applications such as realistic elevation model generation and roof type classification from stereo half-meter resolution satellite DSMs, is demonstrated in this work. Recently published deep learning architectures for both tasks are investigated and a new end-to-end cGAN-based network is developed, which combines different models that provide the best results for their individual tasks. To benefit from information provided by multiple data sources, a different cGAN-based work-flow is proposed where the generative part consists of two encoders and a common decoder which blends the intensity and height information within one network for the DSM refinement task. The inputs to the introduced network are single-channel photogrammetric DSMs with continuous values and pan-chromatic half-meter resolution satellite images. Information fusion from different modalities helps in propagating fine details, completes inaccurate or missing 3D information about building forms, and improves the building boundaries, making them more rectilinear. Lastly, additional comparison between the proposed methodologies for DSM enhancements is made to discuss and verify the most beneficial work-flow and applicability of the resulting DSMs for different remote sensing approaches. Digital Surface Models Data Fusion Building Shape Refinement Building Footprint Binary Classification Deep Learning Fully Convolutional Networks Remote Sensing ddc:004 ddc:550 ddc:510
42	Incorporating speaker’s role in classification of text-based dialogues Stålhandske, Therese January 2020 (has links) Dialogues are an interesting type of document, as they contain a speaker role feature not found in other types of texts. Previous work has included incorporating a speaker role dependency in text-generation, but little has been done in the realm of text classification. In this thesis, we incorporate speaker role dependency in a classification model by creating different speaker dependent word representations and simulating a conversation within neural networks. The results show a significant improvement in the performance of the binary classification of dialogues, with incorporated speaker role information. Further, by extracting attention weights from the model, we are given an insight into how the speaker’s role affects the interpretation of utterances, giving an intuitive explanation of our model. / Konversationer är en speciell typ av text, då den innehåller information om talare som inte hittas i andra typer av dokument. Tidigare arbeten har inkluderat en talares roll i generering av text, men lite har gjorts inom textklassificering. I det här arbetet, introducerar vi deltagarens roller till en klassifikationsmodell. Detta görs genom att skapa ordrepresentationer, som är beroende på deltagaren i konversationen, samt simulering av en konversation inom ett neuralt nätverk. Resultaten visar en signifikant förbättring av prestandan i binär klassificering av dialoger, med talares roll inkluderat. Vidare, genom utdragning av attentionvikterna, kan vi få en bättre överblick över hur en talares roll påverkar tolkningen av yttranden, vilket i sin tur ger en mer intuitiv förklaring av vår modell. Natural Language Processing Text classification Binary Classification Conversations Speaker context Hierarchical Attention Networks Attention Role Dependent Classification Model Språkteknologi klassificering av konversationer textklassifisering neurala nätverk Computer and Information Sciences Data- och informationsvetenskap
43	Time Series Analysis and Binary Classification in a Car-Sharing Service : Application of data-driven methods for analysing trends, seasonality, residuals and prediction of user demand / Tidsseriaanalys och binär klassificering i en bildelningstjänst : Applicering av datadrivna metoder för att analysera trender, säsongsvaritoner, residuals samt predicering av användares efterfrågan Uhr, Aksel January 2023 (has links) Researchers have estimated a 20-percentage point increase in the world’s population residing in urban areas between 2011 and 2050. The increase in denser cities results in opportunities and challenges. Two of the challenges concern sustainability and mobility. With the advancement in technology, smart mobility and car-sharing have emerged as a part of the solution. It has been estimated by research that car-sharing reduces toxic emissions and reduces car ownership, thus decreasing the need for private cars to some extent. Despite being a possible solution to the future’s mobility challenges in urban areas, car-sharing providers suffer from profitability issues. To keep assisting society in the transformation to sustainable mobility alternatives in the future, profitability needs to be reached. Two central challenges to address to reach profitability are user segmentation and demand forecasting. This study focuses on the latter problem and the aim is to understand the demand of different car types and car-sharing users’ individual demands. Quantitative research was conducted, namely, time series analysis and binary classification were selected to answer the research questions. It was concluded that there are a trend, seasonality and residual patterns in the time series capturing bookings per car type per week. However, the patterns were not extensive. Subsequently, a random forest was trained on a data set utilizing moving average feature engineering and consisting of weekly bookings of users having at least 33 journeys during an observation period over 66 weeks (N = 1335705). The final model predicted who is likely to use the service in the upcoming week in an attempt to predict individual demand. In terms of metrics, the random forest achieved a score of .89 in accuracy (both classes), .91 in precision (positive class), .73 in recall (positive class) and .82 in F1-score (positive class). We, therefore, concluded that a machine learning model can predict weekly individual demand fairly well. Future research involves further feature engineering and mapping the predictions to business actions. / Forskare har estimerat att världens befolkning som kommer bo i stadsområden kommer öka med 20 procentenheter. Ökningen av mer tätbeboliga städer medför såväl möjligheter som utmaningar. Två av utmaningarna berör hållbarhet och mobilitet. Med teknologiska framsteg har så kallad smart mobilitet och bildelning blivit en del av lösningen. Annan forskning har visat att bildelning minskar utsläpp av skadliga ämnen och minskar ägandet av bilar, vilket därmed till viss del minskar behovet av privata bilar. Trots att det är en möjlig lösning på framtidens mobilitetsutmaningar och behov i stadsområden, lider bildelningstjänster av lönsamhetsproblem. För att fortsätta bidra till samhället i omställningen till hållbara mobilitetsalternativ i framtiden, så måste lönsamhet nås. Två centrala utmaningar för att uppnå lönsamhet är användarsegmentering och efterfrågeprognoser. Denna studie fokuserar på det sistnämnda problemet. Syftet med studien är att förstå efterfrågan på olika typer av bilar samt individuell efterfrågan hos bildelninganvändare. Kvantitativ forskning genomfördes, nämligen tidsserieanalys och binär klassificering för att besvara studiens forskningsfrågor. Efter att ha genomfört statistiska tidsserietester konstaterades det att det finns trender, säsongsvariationer och residualmönster i tidsserier som beskriver bokningar per biltyp per vecka. Dessa mönster var dock inte omfattande. Därefter tränades ett så kallat random forest på en datamängd med hjälp av rörliga medelvärden (eng. moving average). Denna datamängd bestod av veckovisa bokningar från användare som hade minst 33 resor under en observationsperiod på 66 veckor (N = 1335705). Den slutliga modellen förutsade vilka som sannolikt skulle använda tjänsten kommande vecka i ett försök att prognostisera individuell efterfrågan. Med avseende på metriker uppnådde modellen ett resultat på 0,89 i noggrannhet (för båda klasserna), 0,91 i precision (positiva klassen), 0,73 i recall (positiva klassen) och 0,82 i F1-poäng (positiv klass). Vi drog därför slutsatsen att en maskininlärningsmodell kan förutsäga veckovis individuell efterfrågan relativt bra med avseende på dess slutgiltiga användning. Framtida forskning innefattar ytterligare dataselektion, samt kartläggning av prognosen till affärsåtgärder Smart mobility Car-sharing Time series analysis Demand prediction Machine learning Supervised learning Binary classification Random forest Smart mobilitet Bildelning Tidsseriaanalys Efterfrågansprediktering Maskininlärning Väglett lärande Binär klassificering Slumpmässiga skogar Computer and Information Sciences Data- och informationsvetenskap
44	Machine Learning based Predictive Data Analytics for Embedded Test Systems Al Hanash, Fayad January 2023 (has links) Organizations gather enormous amounts of data and analyze these data to extract insights that can be useful for them and help them to make better decisions. Predictive data analytics is a crucial subfield within data analytics that make accurate predictions. Predictive data analytics extracts insights from data by using machine learning algorithms. This thesis presents the supervised learning algorithm to perform predicative data analytics in Embedded Test System at the Nordic Engineering Partner company. Predictive Maintenance is a concept that is often used in manufacturing industries which refers to predicting asset failures before they occur. The machine learning algorithms used in this thesis are support vector machines, multi-layer perceptrons, random forests, and gradient boosting. Both binary and multi-class classifier have been provided to fit the models, and cross-validation, sampling techniques, and a confusion matrix have been provided to accurately measure their performance. In addition to accuracy, recall, precision, f1, kappa, mcc, and roc auc measurements are used as well. The prediction models that are fitted achieve high accuracy. Machine learning Artificial Intelligence Predictive data analytics Embedded test systems Confusion matrix Predictive maintenance Support vector machines Random forest Gradient Boosting Multi-layer perceptron Binary classification Multi-class classification Computer Sciences Datavetenskap (datalogi)
45	Employee Churn Prediction in Healthcare Industry using Supervised Machine Learning / Förutsägelse av Personalavgång inom Sjukvården med hjälp av Övervakad Maskininlärning Gentek, Anna January 2022 (has links) Given that employees are one of the most valuable assets of any organization, losing an employee has a detrimental impact on several aspects of business activities. Loss of competence, deteriorated productivity and increased hiring costs are just a small fraction of the consequences associated with high employee churn. To deal with this issue, organizations within many industries rely on machine learning and predictive analytics to model, predict and understand the cause of employee churn so that appropriate proactive retention strategies can be applied. However, up to this date, the problem of excessive churn prevalent in the healthcare industry has not been addressed. To fill this research gap, this study investigates the applicability of a machine learning-based employee churn prediction model for a Swedish healthcare organization. We start by extracting relevant features from real employee data followed by a comprehensive feature analysis using Recursive Feature Elimination (RFE) method. A wide range of prediction models including traditional classifiers, such as Random Forest, Support Vector Machine and Logistic Regression are then implemented. In addition, we explore the performance of ensemble machine learning model, XGBoost and neural networks, specifically Artificial Neural Network (ANN). The results of this study show superiority of an SVM model with a recall of 94.8% and a ROC-AUC accuracy of 91.1%. Additionally, to understand and identify the main churn contributors, model-agnostic interpretability methods are examined and applied on top of the predictions. The analysis has shown that wellness contribution, employment rate and number of vacations days as well as number of sick day are strong indicators of churn among healthcare employees. / Det sägs ofta att anställda är en verksamhets mest värdefulla tillgång. Att förlora en anställd har därmed ofta skadlig inverkan på flera aspekter av affärsverksamheter. Därtill hör bland annat kompetensförlust, försämrad produktivitet samt ökade anställningskostnader. Dessa täcker endast en bråkdel av konsekvenserna förknippade med en för hög personalomsättningshastighet. För att hantera och förstå hög personalomsättning har många verksamheter och organisationer börjat använda sig av maskininlärning och statistisk analys där de bland annat analyserar beteendedata i syfte att förutsäga personalomsättning samt för att proaktivt skapa en bättre arbetsmiljö där anställda väljer att stanna kvar. Trots att sjukvården är en bransch som präglas av hög personalomsättning finns det i dagsläget inga studier som adresserar detta uppenbara problem med utgångspunkt i maskininlärning. Denna studien undersöker tillämpbarheten av maskininlärningsmodeller för att modellera och förutsäga personalomsättning i en svensk sjukvårdsorganisation. Med utgångspunkt i relevanta variabler från faktisk data på anställda tillämpar vi Recursive Feature Elimination (RFE) som den primära analysmetoden. I nästa steg tillämpar vi flertalet prediktionsmodeller inklusive traditionella klassificerare såsom Random Forest, Support Vector Machine och Logistic Regression. Denna studien utvärderar också hur pass relevanta Neural Networks eller mer specifikt Artificial Neural Networks (ANN) är i syfte att förutse personalomsättning. Slutligen utvärderar vi precisionen av en sammansatt maskininlärningsmodell, Extreme Gradient Boost. Studiens resultat påvisar att SVM är en överlägsen model med 94.8% noggranhet. Resultaten från studien möjliggör även identifiering av variabler som mest bidrar till personalomsättning. Vår analys påvisar att variablerna relaterade till avhopp är friskvårdbidrag, sysselsättningsgrad, antal semesterdagar samt sjuktid är starkt korrelerade med personalomsättning i sjukvården. Employee churn Churn Prediction Predictive modeling Machine learning Deep-Learning Data mining Binary Classification Personalomsättning Avhoppsanalys Prediktiv Modellering Maskininlärning Datautvinning Binär Klassificering Computer Sciences Datavetenskap (datalogi) Computer Engineering Datorteknik
46	Data-driven decision support in digital retailing Sweidan, Dirar January 2023 (has links) In the digital era and advent of artificial intelligence, digital retailing has emerged as a notable shift in commerce. It empowers e-tailers with data-driven insights and predictive models to navigate a variety of challenges, driving informed decision-making and strategic formulation. While predictive models are fundamental for making data-driven decisions, this thesis spotlights binary classifiers as a central focus. These classifiers reveal the complexities of two real-world problems, marked by their particular properties. Specifically, binary decisions are made based on predictions, relying solely on predicted class labels is insufficient because of the variations in classification accuracy. Furthermore, prediction outcomes have different costs associated with making different mistakes, which impacts the utility. To confront these challenges, probabilistic predictions, often unexplored or uncalibrated, is a promising alternative to class labels. Therefore, machine learning modelling and calibration techniques are explored, employing benchmark data sets alongside empirical studies grounded in industrial contexts. These studies analyse predictions and their associated probabilities across diverse data segments and settings. The thesis found, as a proof of concept, that specific algorithms inherently possess calibration while others, with calibrated probabilities, demonstrate reliability. In both cases, the thesis concludes that utilising top predictions with the highest probabilities increases the precision level and minimises the false positives. In addition, adopting well-calibrated probabilities is a powerful alternative to mere class labels. Consequently, by transforming probabilities into reliable confidence values through classification with a rejection option, a pathway emerges wherein confident and reliable predictions take centre stage in decision-making. This enables e-tailers to form distinct strategies based on these predictions and optimise their utility. This thesis highlights the value of calibrated models and probabilistic prediction and emphasises their significance in enhancing decision-making. The findings have practical implications for e-tailers leveraging data-driven decision support. Future research should focus on producing an automated system that prioritises high and well-calibrated probability predictions while discarding others and optimising utilities based on the costs and gains associated with the different prediction outcomes to enhance decision support for e-tailers. / <p>The current thesis is a part of the industrial graduate school in digital retailing (INSiDR) at the University of Borås and funded by the Swedish Knowledge Foundation.</p> Digital Retailing Decision Support Probabilistic Prediction Calibration Product Returns Customer Churn Binary Classification Scikit-Learn Other Computer and Information Science Annan data- och informationsvetenskap Computer Sciences Datavetenskap (datalogi) Computer Systems Datorsystem Software Engineering Programvaruteknik Business Administration Företagsekonomi
47	La reconnaissance automatique des brins complémentaires : leçons concernant les habiletés des algorithmes d'apprentissage automatique en repliement des acides ribonucléiques Chasles, Simon 07 1900 (has links) L'acide ribonucléique (ARN) est une molécule impliquée dans de nombreuses fonctions cellulaires comme la traduction génétique et la régulation de l’expression des gènes. Les récents succès des vaccins à ARN témoignent du rôle que ce dernier peut jouer dans le développement de traitements thérapeutiques. La connaissance de la fonction d’un ARN passe par sa séquence et sa structure lesquelles déterminent quels groupes chimiques (et de quelles manières ces groupes chimiques) peuvent interagir avec d’autres molécules. Or, les structures connues sont rares en raison du coût et de l’inefficacité des méthodes expérimentales comme la résonnance magnétique nucléaire et la cristallographie aux rayons X. Par conséquent, les méthodes calculatoires ne cessent d’être raffinées afin de déterminer adéquatement la structure d’un ARN à partir de sa séquence. Compte tenu de la croissance des jeux de données et des progrès incessants de l’apprentissage profond, de nombreuses architectures de réseaux neuronaux ont été proposées afin de résoudre le problème du repliement de l’ARN. Toutefois, les jeux de données actuels et la nature des mécanismes de repliement de l’ARN dressent des obstacles importants à l’application de l’apprentissage statistique en prédiction de structures d’ARN. Ce mémoire de maîtrise se veut une couverture des principaux défis inhérents à la résolution du problème du repliement de l’ARN par apprentissage automatique. On y formule une tâche fondamentale afin d’étudier le comportement d’une multitude d’algorithmes lorsque confrontés à divers contextes statistiques, le tout dans le but d’éviter le surapprentissage, problème dont souffre une trop grande proportion des méthodes publiées jusqu’à présent. / Ribonucleic acid (RNA) is a molecule involved in many cellular functions like translation and regulation of gene expression. The recent success of RNA vaccines demonstrates the role RNA can play in the development of therapeutic treatments. The function of an RNA depends on its sequence and structure, which determine which chemical groups (and in what ways these chemical groups) can interact with other molecules. However, only a few RNA structures are known due to the high cost and low throughput of experimental methods such as nuclear magnetic resonance and X-ray crystallography. As a result, computational methods are constantly being refined to accurately determine the structure of an RNA from its sequence. Given the growth of datasets and the constant progress of deep learning, many neural network architectures have been proposed to solve the RNA folding problem. However, the nature of current datasets and RNA folding mechanisms hurdles the application of statistical learning to RNA structure prediction. Here, we cover the main challenges one can encounter when solving the RNA folding problem by machine learning. With an emphasis on overfitting, a problem that affects too many of the methods published so far, we formulate a fundamental RNA problem to study the behaviour of a variety of algorithms when confronted with various statistical contexts. Intelligence artificielle Apprentissage automatique Réseau de neurones Classification binaire Surapprentissage Acide ribonucléique Repliement Prédiction de structure Nucléotide Complémentarité Artificial intelligence Machine learning Neural network Binary classification Overfitting Ribonucleic acid Folding Structure prediction Nucleotide Complementarity
48	Realization of Model-Driven Engineering for Big Data: A Baseball Analytics Use Case Koseler, Kaan Tamer 27 April 2018 (has links) No description available. Computer Science
49	[en] PORTFOLIO SELECTION USING ROBUST OPTIMIZATION AND SUPPORT VECTOR MACHINE (SVM) / [pt] SELEÇÃO DE PORTFÓLIO USANDO OTIMIZAÇÃO ROBUSTA E MÁQUINAS DE SUPORTE VETORIAL ROBERTO PEREIRA GARCIA JUNIOR 26 October 2021 (has links) [pt] A dificuldade de se prever movimento de ativos financeiros é objeto de estudo de diversos autores. A fim de se obter ganhos, se faz necessário estimar a direção (subida ou descida) e a magnitude do retorno do ativo no qual pretende-se comprar ou vender. A proposta desse trabalho consiste em desenvolver um modelo de otimização matemática com variáveis binárias capaz de prever movimentos de subidas e descidas de ativos financeiros e utilizar um modelo de otimização de portfólio para avaliar os resultados obtidos. O modelo de previsão será baseado no Support Vector Machine (SVM), no qual faremos modificações na regularização do modelo tradicional. Para o gerenciamento de portfólio será utilizada otimização robusta. As técnicas de otimização estão sendo cada vez mais aplicadas no gerenciamento de portfólio, pois são capazes de lidar com os problemas das incertezas introduzidas na estimativa dos parâmetros. Vale ressaltar que o modelo desenvolvido é data-driven, i.e, as previsões são feitas utilizando sinais não-lineares baseados em dados de retorno/preço histórico passado sem ter nenhum tipo de intervenção humana. Como os preços dependem de muitos fatores é de se esperar que um conjunto de parâmetros só consiga descrever a dinâmica dos preços dos ativos financeiros por um pequeno intervalo de dias. Para capturar de forma mais precisa essa mudança na dinâmica, a estimação dos parâmetros dos modelos é feita em janela móvel. Para testar a acurácia dos modelos e os ganhos obtidos foi feito um estudo de caso utilizando 6 ativos financeiros das classes de moedas, renda fixa, renda variável e commodities. Os dados abrangem o período de 01/01/2004 até 30/05/2018 totalizando um total de 3623 cotações diárias. Considerando os custos de transações e os resultados out-of-sample obtidos no período analisado percebe-se que a carteira de investimentos desenvolvida neste trabalho exibe resultados superiores aos dos índices tradicionais com risco limitado. / [en] The difficulty of predicting the movement of financial assets is the subject of study by several authors. In order to obtain gains, it is necessary to estimate the direction (rise or fall) and the magnitude of the return on the asset in which it is intended to be bought or sold. The purpose of this work is to develop a mathematical optimization model with binary variables capable of predicting up and down movements of financial assets and using a portfolio optimization model to evaluate the results obtained. The prediction model will be based on the textit Support Vector Machine (SVM), in which we will make modifications in the regularization of the traditional model. For the portfolio management will be used robust optimization. The robust optimization techniques are being increasingly applied in portfolio management, since they are able to deal with the problems of the uncertainties introduced in the estimation of the parameters. It is noteworthy that the developed model is data-driven, i.e., the predictions are made using nonlinear signals based on past historical price / return data without any human intervention. As prices depend on many factors it is to be expected that a set of parameters can only describe the dynamics of the prices of financial assets for a small interval of days. In order to more accurately capture this change in dynamics, the estimation of model parameters is done in a moving window To test the accuracy of the models and the gains obtained, a case study was made using 6 financial assets of the currencies, fixed income, variable income and commodities classes. The data cover the period from 01/01/2004 until 05/30/2018 totaling a total of 3623 daily quotations. Considering the transaction costs and out-of-sample results obtained in the analyzed period, it can be seen that the investment portfolio developed in this work shows higher results than the traditional indexes with limited risk. [pt] APRENDIZADO DE MAQUINA [pt] CLASSIFICACAO BINARIA [pt] TEORIA DO APRENDIZADO ESTATISTICO [pt] TEORIA DA OTIMIZACAO [pt] OTIMIZACAO ROBUSTA [pt] ANALISE TECNICA [en] MACHINE LEARNING [en] BINARY CLASSIFICATION [en] THEORY OF STATISTICAL LEARNING [en] THEORY OF OPTIMIZATION [en] ROBUST OPTIMIZATION [en] TECHNICAL ANALYSIS
50	Early Warning System of Students Failing a Course : A Binary Classification Modelling Approach at Upper Secondary School Level / lFörebyggande Varningssystem av elever med icke godkänt betyg : Genom applicering av binär klassificeringsmodell inom gymnasieskolan Karlsson, Niklas, Lundell, Albin January 2022 (has links) Only 70% of the Swedish students graduate from upper secondary school within the given time frame. Earlier research has shown that unfinished degrees disadvantage the individual student, policy makers and society. A first step for preventing dropouts is to indicate students about to fail courses. Thus the purpose is to identify tendencies whether a student will pass or not pass a course. In addition, the thesis accounts for the development of an Early Warning System to be applied to signal which students need additional support from a professional teacher. The used algorithm Random Forest functioned as a binary classification model of a failed grade against a passing grade. Data in the study are in samples of approximately 700 students from an upper secondary school within the Stockholm municipality. The chosen method originates from a Design Science Research Methodology that allows the stakeholders to be involved in the process. The results showed that the most dominant indicators for classifying correct were Absence, Previous grades and Mathematics diagnosis. Furthermore, were variables from the Learning Management System predominant indicators when the system also was utilised by teachers. The prediction accuracy of the algorithm indicates a positive tendency for classifying correctly. On the other hand, the small number of data points imply doubt if an Early Warning System can be applied in its current state. Thus, one conclusion is in further studies, it is necessary to increase the number of data points. Suggestions to address the problem are mentioned in the Discussion. Moreover, the results are analysed together with a review of the potential Early Warning Systemfrom a didactic perspective. Furthermore, the ethical aspects of the thesis are discussed thoroughly. / Endast 70% av svenska gymnasieelever tar examen inom den givna tidsramen. Tidigare forskning har visat att en oavslutad gymnasieutbildning missgynnar både eleven och samhället i stort. Ett första steg mot att förebygga att elever avviker från gymnasiet är att indikera vilka studenter som är på väg mot ett underkänt betyg i kurser. Därmed är syftet med rapporten att identifiera vilka trender som bäst indikerar att en elev kommer klara en kurs eller inte. Dessutom redogör rapporten för utvecklandet av ett förebyggande varningssystem som kan appliceras för att signalera vilka studenter som behöver ytterligare stöd från läraren och skolan. Algoritmen som användes var Random Forest och fungerar som en binär klassificeringsmodell av ett underkänt betyg mot ett godkänt. Den data som använts i studien är datapunkter för ungefär 700 elever från en gymnasieskola i Stockholmsområdet. Den valda metoden utgår från en Design Science Researchmetodik vilket möjliggör för intressenter att vara involverade i processen. Resultaten visade att de viktigaste variablerna var frånvaro, tidigare betyg och resultat från Stockholmsprovet (kommunal matematikdiagnos). Vidare var variabler från lärplattformen en viktig indikator ifall lärplattformen användes av läraren. Algoritmens noggrannhet indikerade en positiv trend för att klassificeringen gjordes korrekt. Å andra sidan är det tveksamt ifall det förebyggande systemet kan användas i sitt nuvarande tillstånd då mängden data som användes för att träna algoritmen var liten. Därav är en slutsats att det är nödvändigt för vidare studier att öka mängden datapunkter som används. I Diskussionen nämns förslag på hur problemet ska åtgärdas. Dessutom analyseras resultaten tillsammans med en utvärdering av systemet från ett didaktiskt perspektiv. Vidare diskuteras rapportens etiska aspekter genomgående. Machine learning Random Forest Early Warning System Drop out Algorithm Binary classification model Upper secondary school Maskininlärning Random Forest Förebyggande Varningssystem Algoritm Binär klassificeringsmodell Gymnasieskola Other Engineering and Technologies Annan teknik Learning Lärande

Search results