Spelling suggestions: "subject:"[een] BINARY CLASSIFICATION"" "subject:"[enn] BINARY CLASSIFICATION""
41 |
Incorporating speaker’s role in classification of text-based dialoguesStålhandske, Therese January 2020 (has links)
Dialogues are an interesting type of document, as they contain a speaker role feature not found in other types of texts. Previous work has included incorporating a speaker role dependency in text-generation, but little has been done in the realm of text classification. In this thesis, we incorporate speaker role dependency in a classification model by creating different speaker dependent word representations and simulating a conversation within neural networks. The results show a significant improvement in the performance of the binary classification of dialogues, with incorporated speaker role information. Further, by extracting attention weights from the model, we are given an insight into how the speaker’s role affects the interpretation of utterances, giving an intuitive explanation of our model. / Konversationer är en speciell typ av text, då den innehåller information om talare som inte hittas i andra typer av dokument. Tidigare arbeten har inkluderat en talares roll i generering av text, men lite har gjorts inom textklassificering. I det här arbetet, introducerar vi deltagarens roller till en klassifikationsmodell. Detta görs genom att skapa ordrepresentationer, som är beroende på deltagaren i konversationen, samt simulering av en konversation inom ett neuralt nätverk. Resultaten visar en signifikant förbättring av prestandan i binär klassificering av dialoger, med talares roll inkluderat. Vidare, genom utdragning av attentionvikterna, kan vi få en bättre överblick över hur en talares roll påverkar tolkningen av yttranden, vilket i sin tur ger en mer intuitiv förklaring av vår modell.
|
42 |
Time Series Analysis and Binary Classification in a Car-Sharing Service : Application of data-driven methods for analysing trends, seasonality, residuals and prediction of user demand / Tidsseriaanalys och binär klassificering i en bildelningstjänst : Applicering av datadrivna metoder för att analysera trender, säsongsvaritoner, residuals samt predicering av användares efterfråganUhr, Aksel January 2023 (has links)
Researchers have estimated a 20-percentage point increase in the world’s population residing in urban areas between 2011 and 2050. The increase in denser cities results in opportunities and challenges. Two of the challenges concern sustainability and mobility. With the advancement in technology, smart mobility and car-sharing have emerged as a part of the solution. It has been estimated by research that car-sharing reduces toxic emissions and reduces car ownership, thus decreasing the need for private cars to some extent. Despite being a possible solution to the future’s mobility challenges in urban areas, car-sharing providers suffer from profitability issues. To keep assisting society in the transformation to sustainable mobility alternatives in the future, profitability needs to be reached. Two central challenges to address to reach profitability are user segmentation and demand forecasting. This study focuses on the latter problem and the aim is to understand the demand of different car types and car-sharing users’ individual demands. Quantitative research was conducted, namely, time series analysis and binary classification were selected to answer the research questions. It was concluded that there are a trend, seasonality and residual patterns in the time series capturing bookings per car type per week. However, the patterns were not extensive. Subsequently, a random forest was trained on a data set utilizing moving average feature engineering and consisting of weekly bookings of users having at least 33 journeys during an observation period over 66 weeks (N = 1335705). The final model predicted who is likely to use the service in the upcoming week in an attempt to predict individual demand. In terms of metrics, the random forest achieved a score of .89 in accuracy (both classes), .91 in precision (positive class), .73 in recall (positive class) and .82 in F1-score (positive class). We, therefore, concluded that a machine learning model can predict weekly individual demand fairly well. Future research involves further feature engineering and mapping the predictions to business actions. / Forskare har estimerat att världens befolkning som kommer bo i stadsområden kommer öka med 20 procentenheter. Ökningen av mer tätbeboliga städer medför såväl möjligheter som utmaningar. Två av utmaningarna berör hållbarhet och mobilitet. Med teknologiska framsteg har så kallad smart mobilitet och bildelning blivit en del av lösningen. Annan forskning har visat att bildelning minskar utsläpp av skadliga ämnen och minskar ägandet av bilar, vilket därmed till viss del minskar behovet av privata bilar. Trots att det är en möjlig lösning på framtidens mobilitetsutmaningar och behov i stadsområden, lider bildelningstjänster av lönsamhetsproblem. För att fortsätta bidra till samhället i omställningen till hållbara mobilitetsalternativ i framtiden, så måste lönsamhet nås. Två centrala utmaningar för att uppnå lönsamhet är användarsegmentering och efterfrågeprognoser. Denna studie fokuserar på det sistnämnda problemet. Syftet med studien är att förstå efterfrågan på olika typer av bilar samt individuell efterfrågan hos bildelninganvändare. Kvantitativ forskning genomfördes, nämligen tidsserieanalys och binär klassificering för att besvara studiens forskningsfrågor. Efter att ha genomfört statistiska tidsserietester konstaterades det att det finns trender, säsongsvariationer och residualmönster i tidsserier som beskriver bokningar per biltyp per vecka. Dessa mönster var dock inte omfattande. Därefter tränades ett så kallat random forest på en datamängd med hjälp av rörliga medelvärden (eng. moving average). Denna datamängd bestod av veckovisa bokningar från användare som hade minst 33 resor under en observationsperiod på 66 veckor (N = 1335705). Den slutliga modellen förutsade vilka som sannolikt skulle använda tjänsten kommande vecka i ett försök att prognostisera individuell efterfrågan. Med avseende på metriker uppnådde modellen ett resultat på 0,89 i noggrannhet (för båda klasserna), 0,91 i precision (positiva klassen), 0,73 i recall (positiva klassen) och 0,82 i F1-poäng (positiv klass). Vi drog därför slutsatsen att en maskininlärningsmodell kan förutsäga veckovis individuell efterfrågan relativt bra med avseende på dess slutgiltiga användning. Framtida forskning innefattar ytterligare dataselektion, samt kartläggning av prognosen till affärsåtgärder
|
43 |
Machine Learning based Predictive Data Analytics for Embedded Test SystemsAl Hanash, Fayad January 2023 (has links)
Organizations gather enormous amounts of data and analyze these data to extract insights that can be useful for them and help them to make better decisions. Predictive data analytics is a crucial subfield within data analytics that make accurate predictions. Predictive data analytics extracts insights from data by using machine learning algorithms. This thesis presents the supervised learning algorithm to perform predicative data analytics in Embedded Test System at the Nordic Engineering Partner company. Predictive Maintenance is a concept that is often used in manufacturing industries which refers to predicting asset failures before they occur. The machine learning algorithms used in this thesis are support vector machines, multi-layer perceptrons, random forests, and gradient boosting. Both binary and multi-class classifier have been provided to fit the models, and cross-validation, sampling techniques, and a confusion matrix have been provided to accurately measure their performance. In addition to accuracy, recall, precision, f1, kappa, mcc, and roc auc measurements are used as well. The prediction models that are fitted achieve high accuracy.
|
44 |
Employee Churn Prediction in Healthcare Industry using Supervised Machine Learning / Förutsägelse av Personalavgång inom Sjukvården med hjälp av Övervakad MaskininlärningGentek, Anna January 2022 (has links)
Given that employees are one of the most valuable assets of any organization, losing an employee has a detrimental impact on several aspects of business activities. Loss of competence, deteriorated productivity and increased hiring costs are just a small fraction of the consequences associated with high employee churn. To deal with this issue, organizations within many industries rely on machine learning and predictive analytics to model, predict and understand the cause of employee churn so that appropriate proactive retention strategies can be applied. However, up to this date, the problem of excessive churn prevalent in the healthcare industry has not been addressed. To fill this research gap, this study investigates the applicability of a machine learning-based employee churn prediction model for a Swedish healthcare organization. We start by extracting relevant features from real employee data followed by a comprehensive feature analysis using Recursive Feature Elimination (RFE) method. A wide range of prediction models including traditional classifiers, such as Random Forest, Support Vector Machine and Logistic Regression are then implemented. In addition, we explore the performance of ensemble machine learning model, XGBoost and neural networks, specifically Artificial Neural Network (ANN). The results of this study show superiority of an SVM model with a recall of 94.8% and a ROC-AUC accuracy of 91.1%. Additionally, to understand and identify the main churn contributors, model-agnostic interpretability methods are examined and applied on top of the predictions. The analysis has shown that wellness contribution, employment rate and number of vacations days as well as number of sick day are strong indicators of churn among healthcare employees. / Det sägs ofta att anställda är en verksamhets mest värdefulla tillgång. Att förlora en anställd har därmed ofta skadlig inverkan på flera aspekter av affärsverksamheter. Därtill hör bland annat kompetensförlust, försämrad produktivitet samt ökade anställningskostnader. Dessa täcker endast en bråkdel av konsekvenserna förknippade med en för hög personalomsättningshastighet. För att hantera och förstå hög personalomsättning har många verksamheter och organisationer börjat använda sig av maskininlärning och statistisk analys där de bland annat analyserar beteendedata i syfte att förutsäga personalomsättning samt för att proaktivt skapa en bättre arbetsmiljö där anställda väljer att stanna kvar. Trots att sjukvården är en bransch som präglas av hög personalomsättning finns det i dagsläget inga studier som adresserar detta uppenbara problem med utgångspunkt i maskininlärning. Denna studien undersöker tillämpbarheten av maskininlärningsmodeller för att modellera och förutsäga personalomsättning i en svensk sjukvårdsorganisation. Med utgångspunkt i relevanta variabler från faktisk data på anställda tillämpar vi Recursive Feature Elimination (RFE) som den primära analysmetoden. I nästa steg tillämpar vi flertalet prediktionsmodeller inklusive traditionella klassificerare såsom Random Forest, Support Vector Machine och Logistic Regression. Denna studien utvärderar också hur pass relevanta Neural Networks eller mer specifikt Artificial Neural Networks (ANN) är i syfte att förutse personalomsättning. Slutligen utvärderar vi precisionen av en sammansatt maskininlärningsmodell, Extreme Gradient Boost. Studiens resultat påvisar att SVM är en överlägsen model med 94.8% noggranhet. Resultaten från studien möjliggör även identifiering av variabler som mest bidrar till personalomsättning. Vår analys påvisar att variablerna relaterade till avhopp är friskvårdbidrag, sysselsättningsgrad, antal semesterdagar samt sjuktid är starkt korrelerade med personalomsättning i sjukvården.
|
45 |
Data-driven decision support in digital retailingSweidan, Dirar January 2023 (has links)
In the digital era and advent of artificial intelligence, digital retailing has emerged as a notable shift in commerce. It empowers e-tailers with data-driven insights and predictive models to navigate a variety of challenges, driving informed decision-making and strategic formulation. While predictive models are fundamental for making data-driven decisions, this thesis spotlights binary classifiers as a central focus. These classifiers reveal the complexities of two real-world problems, marked by their particular properties. Specifically, binary decisions are made based on predictions, relying solely on predicted class labels is insufficient because of the variations in classification accuracy. Furthermore, prediction outcomes have different costs associated with making different mistakes, which impacts the utility. To confront these challenges, probabilistic predictions, often unexplored or uncalibrated, is a promising alternative to class labels. Therefore, machine learning modelling and calibration techniques are explored, employing benchmark data sets alongside empirical studies grounded in industrial contexts. These studies analyse predictions and their associated probabilities across diverse data segments and settings. The thesis found, as a proof of concept, that specific algorithms inherently possess calibration while others, with calibrated probabilities, demonstrate reliability. In both cases, the thesis concludes that utilising top predictions with the highest probabilities increases the precision level and minimises the false positives. In addition, adopting well-calibrated probabilities is a powerful alternative to mere class labels. Consequently, by transforming probabilities into reliable confidence values through classification with a rejection option, a pathway emerges wherein confident and reliable predictions take centre stage in decision-making. This enables e-tailers to form distinct strategies based on these predictions and optimise their utility. This thesis highlights the value of calibrated models and probabilistic prediction and emphasises their significance in enhancing decision-making. The findings have practical implications for e-tailers leveraging data-driven decision support. Future research should focus on producing an automated system that prioritises high and well-calibrated probability predictions while discarding others and optimising utilities based on the costs and gains associated with the different prediction outcomes to enhance decision support for e-tailers. / <p>The current thesis is a part of the industrial graduate school in digital retailing (INSiDR) at the University of Borås and funded by the Swedish Knowledge Foundation.</p>
|
46 |
La reconnaissance automatique des brins complémentaires : leçons concernant les habiletés des algorithmes d'apprentissage automatique en repliement des acides ribonucléiquesChasles, Simon 07 1900 (has links)
L'acide ribonucléique (ARN) est une molécule impliquée dans de nombreuses fonctions cellulaires comme la traduction génétique et la régulation de l’expression des gènes. Les récents succès des vaccins à ARN témoignent du rôle que ce dernier peut jouer dans le développement de traitements thérapeutiques. La connaissance de la fonction d’un ARN passe par sa séquence et sa structure lesquelles déterminent quels groupes chimiques (et de quelles manières ces groupes chimiques) peuvent interagir avec d’autres molécules. Or, les structures connues sont rares en raison du coût et de l’inefficacité des méthodes expérimentales comme la résonnance magnétique nucléaire et la cristallographie aux rayons X. Par conséquent, les méthodes calculatoires ne cessent d’être raffinées afin de déterminer adéquatement la structure d’un ARN à partir de sa séquence. Compte tenu de la croissance des jeux de données et des progrès incessants de l’apprentissage profond, de nombreuses architectures de réseaux neuronaux ont été proposées afin de résoudre le problème du repliement de l’ARN. Toutefois, les jeux de données actuels et la nature des mécanismes de repliement de l’ARN dressent des obstacles importants à l’application de l’apprentissage statistique en prédiction de structures d’ARN. Ce mémoire de maîtrise se veut une couverture des principaux défis inhérents à la résolution du problème du repliement de l’ARN par apprentissage automatique. On y formule une tâche fondamentale afin d’étudier le comportement d’une multitude d’algorithmes lorsque confrontés à divers contextes statistiques, le tout dans le but d’éviter le surapprentissage, problème dont souffre une trop grande proportion des méthodes publiées jusqu’à présent. / Ribonucleic acid (RNA) is a molecule involved in many cellular functions like translation and regulation of gene expression. The recent success of RNA vaccines demonstrates the role RNA can play in the development of therapeutic treatments. The function of an RNA depends on its sequence and structure, which determine which chemical groups (and in what ways these chemical groups) can interact with other molecules. However, only a few RNA structures are known due to the high cost and low throughput of experimental methods such as nuclear magnetic resonance and X-ray crystallography. As a result, computational methods are constantly being refined to accurately determine the structure of an RNA from its sequence. Given the growth of datasets and the constant progress of deep learning, many neural network architectures have been proposed to solve the RNA folding problem. However, the nature of current datasets and RNA folding mechanisms hurdles the application of statistical learning to RNA structure prediction. Here, we cover the main challenges one can encounter when solving the RNA folding problem by machine learning. With an emphasis on overfitting, a problem that affects too many of the methods published so far, we formulate a fundamental RNA problem to study the behaviour of a variety of algorithms when confronted with various statistical contexts.
|
47 |
Realization of Model-Driven Engineering for Big Data: A Baseball Analytics Use CaseKoseler, Kaan Tamer 27 April 2018 (has links)
No description available.
|
48 |
[en] PORTFOLIO SELECTION USING ROBUST OPTIMIZATION AND SUPPORT VECTOR MACHINE (SVM) / [pt] SELEÇÃO DE PORTFÓLIO USANDO OTIMIZAÇÃO ROBUSTA E MÁQUINAS DE SUPORTE VETORIALROBERTO PEREIRA GARCIA JUNIOR 26 October 2021 (has links)
[pt] A dificuldade de se prever movimento de ativos financeiros é objeto
de estudo de diversos autores. A fim de se obter ganhos, se faz necessário
estimar a direção (subida ou descida) e a magnitude do retorno do ativo
no qual pretende-se comprar ou vender. A proposta desse trabalho consiste
em desenvolver um modelo de otimização matemática com variáveis
binárias capaz de prever movimentos de subidas e descidas de ativos financeiros
e utilizar um modelo de otimização de portfólio para avaliar os
resultados obtidos. O modelo de previsão será baseado no Support Vector
Machine (SVM), no qual faremos modificações na regularização do modelo
tradicional. Para o gerenciamento de portfólio será utilizada otimização robusta.
As técnicas de otimização estão sendo cada vez mais aplicadas no
gerenciamento de portfólio, pois são capazes de lidar com os problemas das
incertezas introduzidas na estimativa dos parâmetros. Vale ressaltar que o
modelo desenvolvido é data-driven, i.e, as previsões são feitas utilizando sinais
não-lineares baseados em dados de retorno/preço histórico passado sem
ter nenhum tipo de intervenção humana.
Como os preços dependem de muitos fatores é de se esperar que um
conjunto de parâmetros só consiga descrever a dinâmica dos preços dos
ativos financeiros por um pequeno intervalo de dias. Para capturar de forma
mais precisa essa mudança na dinâmica, a estimação dos parâmetros dos
modelos é feita em janela móvel.
Para testar a acurácia dos modelos e os ganhos obtidos foi feito um estudo de
caso utilizando 6 ativos financeiros das classes de moedas, renda fixa, renda
variável e commodities. Os dados abrangem o período de 01/01/2004 até
30/05/2018 totalizando um total de 3623 cotações diárias. Considerando
os custos de transações e os resultados out-of-sample obtidos no período
analisado percebe-se que a carteira de investimentos desenvolvida neste
trabalho exibe resultados superiores aos dos índices tradicionais com risco
limitado. / [en] The difficulty of predicting the movement of financial assets is the
subject of study by several authors. In order to obtain gains, it is necessary
to estimate the direction (rise or fall) and the magnitude of the return on
the asset in which it is intended to be bought or sold. The purpose of this
work is to develop a mathematical optimization model with binary variables
capable of predicting up and down movements of financial assets and using
a portfolio optimization model to evaluate the results obtained. The prediction
model will be based on the textit Support Vector Machine (SVM),
in which we will make modifications in the regularization of the traditional
model. For the portfolio management will be used robust optimization. The
robust optimization techniques are being increasingly applied in portfolio
management, since they are able to deal with the problems of the uncertainties
introduced in the estimation of the parameters. It is noteworthy that
the developed model is data-driven, i.e., the predictions are made using
nonlinear signals based on past historical price / return data without any
human intervention. As prices depend on many factors it is to be expected that a set of
parameters can only describe the dynamics of the prices of financial assets
for a small interval of days. In order to more accurately capture this change
in dynamics, the estimation of model parameters is done in a moving window
To test the accuracy of the models and the gains obtained, a case study
was made using 6 financial assets of the currencies, fixed income, variable
income and commodities classes. The data cover the period from 01/01/2004
until 05/30/2018 totaling a total of 3623 daily quotations. Considering the
transaction costs and out-of-sample results obtained in the analyzed period,
it can be seen that the investment portfolio developed in this work shows
higher results than the traditional indexes with limited risk.
|
49 |
Early Warning System of Students Failing a Course : A Binary Classification Modelling Approach at Upper Secondary School Level / lFörebyggande Varningssystem av elever med icke godkänt betyg : Genom applicering av binär klassificeringsmodell inom gymnasieskolanKarlsson, Niklas, Lundell, Albin January 2022 (has links)
Only 70% of the Swedish students graduate from upper secondary school within the given time frame. Earlier research has shown that unfinished degrees disadvantage the individual student, policy makers and society. A first step for preventing dropouts is to indicate students about to fail courses. Thus the purpose is to identify tendencies whether a student will pass or not pass a course. In addition, the thesis accounts for the development of an Early Warning System to be applied to signal which students need additional support from a professional teacher. The used algorithm Random Forest functioned as a binary classification model of a failed grade against a passing grade. Data in the study are in samples of approximately 700 students from an upper secondary school within the Stockholm municipality. The chosen method originates from a Design Science Research Methodology that allows the stakeholders to be involved in the process. The results showed that the most dominant indicators for classifying correct were Absence, Previous grades and Mathematics diagnosis. Furthermore, were variables from the Learning Management System predominant indicators when the system also was utilised by teachers. The prediction accuracy of the algorithm indicates a positive tendency for classifying correctly. On the other hand, the small number of data points imply doubt if an Early Warning System can be applied in its current state. Thus, one conclusion is in further studies, it is necessary to increase the number of data points. Suggestions to address the problem are mentioned in the Discussion. Moreover, the results are analysed together with a review of the potential Early Warning Systemfrom a didactic perspective. Furthermore, the ethical aspects of the thesis are discussed thoroughly. / Endast 70% av svenska gymnasieelever tar examen inom den givna tidsramen. Tidigare forskning har visat att en oavslutad gymnasieutbildning missgynnar både eleven och samhället i stort. Ett första steg mot att förebygga att elever avviker från gymnasiet är att indikera vilka studenter som är på väg mot ett underkänt betyg i kurser. Därmed är syftet med rapporten att identifiera vilka trender som bäst indikerar att en elev kommer klara en kurs eller inte. Dessutom redogör rapporten för utvecklandet av ett förebyggande varningssystem som kan appliceras för att signalera vilka studenter som behöver ytterligare stöd från läraren och skolan. Algoritmen som användes var Random Forest och fungerar som en binär klassificeringsmodell av ett underkänt betyg mot ett godkänt. Den data som använts i studien är datapunkter för ungefär 700 elever från en gymnasieskola i Stockholmsområdet. Den valda metoden utgår från en Design Science Researchmetodik vilket möjliggör för intressenter att vara involverade i processen. Resultaten visade att de viktigaste variablerna var frånvaro, tidigare betyg och resultat från Stockholmsprovet (kommunal matematikdiagnos). Vidare var variabler från lärplattformen en viktig indikator ifall lärplattformen användes av läraren. Algoritmens noggrannhet indikerade en positiv trend för att klassificeringen gjordes korrekt. Å andra sidan är det tveksamt ifall det förebyggande systemet kan användas i sitt nuvarande tillstånd då mängden data som användes för att träna algoritmen var liten. Därav är en slutsats att det är nödvändigt för vidare studier att öka mängden datapunkter som används. I Diskussionen nämns förslag på hur problemet ska åtgärdas. Dessutom analyseras resultaten tillsammans med en utvärdering av systemet från ett didaktiskt perspektiv. Vidare diskuteras rapportens etiska aspekter genomgående.
|
50 |
Further development and optimisation of the CNN-classicification algorithm of Alfrödull for more accurate aerial image detection of decentralised solar energy systems : A study on how the performance of neural networks can beimproved through additional training data, image preprocessing, class balancing and sliding windowclassificationLindvall, Erik January 2024 (has links)
The global use of solar power is growing at an unprecedented rate, making the need toaccurately track the energy generation of decentralised solar energy systems (SES) more andmore relevant. The purpose of this thesis is to further develop a binary image classifier for thesimulation system framework known as Alfrödull, which will be used to detect and segment SESfrom aerial images to simulate the energy generation within a given Swedish municipality on anhourly basis. This project focuses on improving the Alfrödull classifier through four differentanalyses. the first focusing on examining how additional training data from publicly availabledatasets affects the model performance. The second on how the model can be improvedthrough the use of various image pre-processing techniques. The third on how the model canbe improved through balancing the training datasets to make up for the low amount of positiveimages as well as utilising model ensembles for joint classification. Finally, the fourth analysisemploys a sliding window approach to classify overlapping image tiles. The results show thathaving training data that is a good representation of the environment the model will be used in iscrucial, that the use of image augmentation policies can significantly improve modelperformance, that compensating for class imbalance as well as utilising ensemble methodspositively impacts model performance and that a sliding window approach to classifyingoverlapping images significantly decreases the amount of missed SES at the cost of clusters offalsely classified negative images (false positives). In conclusion, this thesis serves as animportant stepping stone in the practical implementation of the Alfrödull framework, showcasingthe key aspects in making a well performing binary image classifier of SES in Sweden.
|
Page generated in 0.0499 seconds