• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 144
  • 60
  • 27
  • 14
  • 11
  • 11
  • 9
  • 8
  • 6
  • 4
  • 2
  • 2
  • 1
  • 1
  • Tagged with
  • 333
  • 333
  • 105
  • 90
  • 87
  • 67
  • 57
  • 49
  • 46
  • 44
  • 41
  • 40
  • 38
  • 36
  • 35
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
121

[pt] ALGORITMOS DE APROXIMAÇÃO PARA ÁRVORES DE DECISÃO / [en] APPROXIMATION ALGORITHMS FOR DECISION TREES

ALINE MEDEIROS SAETTLER 13 December 2021 (has links)
[pt] A construção de árvores de decisão é um problema central em diversas áreas da ciência da computação, por exemplo, teoria de banco de dados e aprendizado computacional. Este problema pode ser visto como o problema de avaliar uma função discreta, onde para verificar o valor de cada variável da função temos que pagar um custo, e os pontos onde a função está definida estão associados a uma distribuição de probabilidade. O objetivo do problema é avaliar a função minimizando o custo gasto (no pior caso ou no caso médio). Nesta tese, apresentamos quatro contribuições relacionadas a esse problema. A primeira é um algoritmo que alcança uma aproximação de O(log(n)) em relação a tanto o custo esperado quanto ao pior custo. A segunda é um método que combina duas árvores, uma com pior custo W e outra com custo esperado E, e produz uma árvore com pior custo de no máximo (1+p)W e custo esperado no máximo (1/(1-e-p))E, onde p é um parâmetro dado. Nós também provamos que esta é uma caracterização justa do melhor trade-off alcançável, mostrando que existe um número infinito de instâncias para as quais não podemos obter uma árvore de decisão com tanto o pior custo menor que (1 + p)OPTW(I) quanto o custo esperado menor que (1/(1 - e - p))OPTE(I), onde OPTW(I) (resp. OPTE(I)) denota o pior custo da árvore de decisão que minimiza o pior custo (resp. custo esperado) para uma instância I do problema. A terceira contribuição é um algoritmo de aproximação de O(log(n)) para a minimização do pior custo para uma variante do problema onde o custo de ler uma variável depende do seu valor. Nossa última contribuição é um algoritmo randomized rounding que, dada uma instância do problema (com um inteiro adicional (k > 0) e um parâmetro 0 < e < 1/2, produz uma árvore de decisão oblivious com custo no máximo (3/(1 - 2e))ln(n)OPT(I) e que produz no máximo (k/e) erros, onde OPT(I) denota o custo da árvore de decisão oblivious com o menor custo entre todas as árvores oblivious para a instância I que produzem no máximo k erros de classificação. / [en] Decision tree construction is a central problem in several areas of computer science, for example, data base theory and computational learning. This problem can be viewed as the problem of evaluating a discrete function, where to check the value of each variable of the function we have to pay a cost, and the points where the function is defined are associated with a probability distribution. The goal of the problem is to evaluate the function minimizing the cost spent (in the worst case or in expectation). In this Thesis, we present four contributions related to this problem. The first one is an algorithm that achieves an O(log(n)) approximation with respect to both the expected and the worst costs. The second one is a procedure that combines two trees, one with worst costW and another with expected cost E, and produces a tree with worst cost at most (1+p)W and expected cost at most (1/(1-e-p))E, where p is a given parameter. We also prove that this is a sharp characterization of the best possible trade-off attainable, showing that there are infinitely many instances for which we cannot obtain a decision tree with both worst cost smaller than (1+p)OPTW(I) and expected cost smaller than (1/(1-e-p))OPTE(I), where OPTW(I) (resp. OPTE(I)) denotes the cost of the decision tree that minimizes the worst cost (resp. expected cost) for an instance I of the problem. The third contribution is an O(log(n)) approximation algorithm for the minimization of the worst cost for a variant of the problem where the cost of reading a variable depends on its value. Our final contribution is a randomized rounding algorithm that, given an instance of the problem (with an additional integer k > 0) and a parameter 0 < e < 1/2, builds an oblivious decision tree with cost at most (3/(1 - 2e))ln(n)OPT(I) and produces at most (k/e) errors, where OPT(I) denotes the cost of the oblivious decision tree with minimum cost among all oblivious decision trees for instance I that make at most k classification errors.
122

Co-Location Decision Tree for Enhancing Decision-Making of Pavement Maintenance and Rehabilitation

Zhou, Guoqing 02 March 2011 (has links)
A pavement management system (PMS) is a valuable tool and one of the critical elements of the highway transportation infrastructure. Since a vast amount of pavement data is frequently and continuously being collected, updated, and exchanged due to rapidly deteriorating road conditions, increased traffic loads, and shrinking funds, resulting in the rapid accumulation of a large pavement database, knowledge-based expert systems (KBESs) have therefore been developed to solve various transportation problems. This dissertation presents the development of theory and algorithm for a new decision tree induction method, called co-location-based decision tree (CL-DT.) This method will enhance the decision-making abilities of pavement maintenance personnel and their rehabilitation strategies. This idea stems from shortcomings in traditional decision tree induction algorithms, when applied in the pavement treatment strategies. The proposed algorithm utilizes the co-location (co-occurrence) characteristics of spatial attribute data in the pavement database. With the proposed algorithm, one distinct event occurrence can associate with two or multiple attribute values that occur simultaneously in spatial and temporal domains. This research dissertation describes the details of the proposed CL-DT algorithms and steps of realizing the proposed algorithm. First, the dissertation research describes the detailed colocation mining algorithm, including spatial attribute data selection in pavement databases, the determination of candidate co-locations, the determination of table instances of candidate colocations, pruning the non-prevalent co-locations, and induction of co-location rules. In this step, a hybrid constraint, i.e., spatial geometric distance constraint condition and a distinct event-type constraint condition, is developed. The spatial geometric distance constraint condition is a neighborhood relationship-based spatial joins of table instances for many prevalent co-locations with one prevalent co-location; and the distance event-type constraint condition is a Euclidean distance between a set of attributes and its corresponding clusters center of attributes. The dissertation research also developed the spatial feature pruning method using the multi-resolution pruning criterion. The cross-correlation criterion of spatial features is used to remove the nonprevalent co-locations from the candidate prevalent co-location set under a given threshold. The dissertation research focused on the development of the co-location decision tree (CL-DT) algorithm, which includes the non-spatial attribute data selection in the pavement management database, co-location algorithm modeling, node merging criteria, and co-location decision tree induction. In this step, co-location mining rules are used to guide the decision tree generation and induce decision rules. For each step, this dissertation gives detailed flowcharts, such as flowchart of co-location decision tree induction, co-location/co-occurrence decision tree algorithm, algorithm of colocation/co-occurrence decision tree (CL-DT), and outline of steps of SFS (Sequential Feature Selection) algorithm. Finally, this research used a pavement database covering four counties, which are provided by NCDOT (North Carolina Department of Transportation), to verify and test the proposed method. The comparison analyses of different rehabilitation treatments proposed by NCDOT, by the traditional DT induction algorithm and by the proposed new method are conducted. Findings and conclusions include: (1) traditional DT technology can make a consistent decision for road maintenance and rehabilitation strategy under the same road conditions, i.e., less interference from human factors; (2) the traditional DT technology can increase the speed of decision-making because the technology automatically generates a decision-tree and rules if the expert knowledge is given, which saves time and expenses for PMS; (3) integration of the DT and GIS can provide the PMS with the capabilities of graphically displaying treatment decisions, visualizing the attribute and non-attribute data, and linking data and information to the geographical coordinates. However, the traditional DT induction methods are not as quite intelligent as one's expectations. Thus, post-processing and refinement is necessary. Moreover, traditional DT induction methods for pavement M&R strategies only used the non-spatial attribute data. It has been demonstrated from this dissertation research that the spatial data is very useful for the improvement of decision-making processes for pavement treatment strategies. In addition, the decision trees are based on the knowledge acquired from pavement management engineers for strategy selection. Thus, different decision-trees can be built if the requirement changes. / Ph. D.
123

Understanding matrix-assisted continuous co-crystallization using a data mining approach in Quality by Design (QbD)

Chabalenge, Billy, Korde, Sachin A., Kelly, Adrian L., Neagu, Daniel, Paradkar, Anant R 27 July 2020 (has links)
Yes / The present study demonstrates the application of decision tree algorithms to the co-crystallization process. Fifty four (54) batches of carbamazepine-salicylic acid co-crystals embedded in poly(ethylene oxide) were manufactured via hot melt extrusion and characterized by powder X-ray diffraction, differnetial scanning calorimetry, and near-infrared spectroscopy. This dataset was then applied in WEKA, which is an open-sourced machine learning software to study the effect of processing temperature, screw speed, screw configuration, and poly(ethylene oxide) concentration on the percentage of co-crystal conversion. The decision trees obtained provided statistically meaningful and easy-to-interpret rules, demonstrating the potential to use the method to make rational decisions during the development of co-crystallization processes. / Commonwealth Scholarship Commission in the UK (ZMCS-2018-783) and Engineering and Physical Sciences Research Council (EPSRC EP/J003360/1 and EP/L027011/1)
124

Implementation of the Security-Dependability Adaptive Voting Scheme

Thomas, Michael Kyle 01 June 2011 (has links)
As the world moves further into the 21st century, the electricity demand worldwide continues to rapidly grow. The power systems that supply this growing demand continue to be pushed closer to their limits. When those limits are exceeded, system blackouts occur that have massive societal and economical impact. Power system protection relays make up a piece of these limits and can be important factors in preventing or causing a system blackout. The purpose of this thesis is to present a working implementation of an adaptive protection scheme known as the adaptive voting scheme, used to alter the security/dependability balance of protection schemes. It is argued that as power system conditions change, the ability of protection relays to adjust the security/dependability balance based on those conditions can allow relays to play a part in preventing power system catastrophes. It is shown that the adaptive voting scheme can be implemented on existing protection technology given Wide Area Measurements (WAMs) provided by Phasor Measurement Units (PMUs). The proposed implementation characteristics allow numerous existing protection practices to be used without changing the basic operation of the practices. / Master of Science
125

Data driven driving evaluation : A supervised machine learning approach for classification of high frequency triaxial acceleration

Lundberg, Henrik January 2024 (has links)
The ability to navigate through a continuously changing business landscape has been a success factor for Scania to stay a competitive business, when the landscape continues to change. Digitalization has enabled data to be collected from various sources and the ability to embrace the possibilities that come with it and turn it into an advantage is crucial to make sure that Scania is driving the changing industry. Today, Scania is good at collecting and analyzing data but there is room for improvements when it comes to utilizing the data to create data-driven decision-making. This study aims to investigate the possibility of learning more about the users driving behavior through data-driven driving evaluation. This is done with a machine learning approach where a CNN-GRU neural network with an XGBoost classifier is created to classify triaxial acceleration data into normal or aggressive driving behavior. The findings show that this model architecture has a classification accuracy of 87.80 % and the result is discussed with respect to method implementation, quality of data, hyperparameter tuning, and future studies.
126

Tratamento de imprecisão na geração de árvores de decisão

Lopes, Mariana Vieira Ribeiro 03 March 2016 (has links)
Submitted by Ronildo Prado (ronisp@ufscar.br) on 2017-08-08T20:30:11Z No. of bitstreams: 1 DissMVRL.pdf: 2179441 bytes, checksum: 3c4089c4b24a3d98521f8561c6f2c515 (MD5) / Approved for entry into archive by Ronildo Prado (ronisp@ufscar.br) on 2017-08-08T20:30:33Z (GMT) No. of bitstreams: 1 DissMVRL.pdf: 2179441 bytes, checksum: 3c4089c4b24a3d98521f8561c6f2c515 (MD5) / Approved for entry into archive by Ronildo Prado (ronisp@ufscar.br) on 2017-08-08T20:30:39Z (GMT) No. of bitstreams: 1 DissMVRL.pdf: 2179441 bytes, checksum: 3c4089c4b24a3d98521f8561c6f2c515 (MD5) / Made available in DSpace on 2017-08-08T20:31:24Z (GMT). No. of bitstreams: 1 DissMVRL.pdf: 2179441 bytes, checksum: 3c4089c4b24a3d98521f8561c6f2c515 (MD5) Previous issue date: 2016-03-03 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) / Inductive Decision Trees (DT) are mechanisms based on the symbolic paradigm of machine learning which main characteristics are easy interpretability and low computational cost. Though they are widely used, the DTs can represent problems with just discrete or continuous variables. However, for some problems, the variables are not well represented in this way. In order to improve DTs, the Fuzzy Decision Trees (FDT) were developed, adding the ability to deal with fuzzy variables to the Inductive Decision Trees, making them capable to deal with imprecise knowledge. In this text, it is presented a new algorithm for fuzzy decision trees induction. Its fuzification method is applied during the induction and it is inspired by the C4.5’s partitioning method for continuous attributes. The proposed algorithm was tested with 20 datasets from UCI repository (LICHMAN, 2013). It was compared with other three algorithms that implement different solutions to classification problem: C4.5, which induces an Inductive Decision Tree, FURIA, that induces a Rule-based Fuzzy System and FuzzyDT, which induces a Fuzzy Decision Tree where the fuzification is done before tree’s induction is performed. The results are presented in Chapter 4. / As Árvores de Decisão Indutivas (AD) são um mecanismo baseado no paradigma simbólico do Aprendizado de Máquina que tem como principais características a fácil interpretabilidade e baixo custo computacional. Ainda que sejam amplamente utilizadas, as ADs são limitadas à representação de problemas cujas variáveis são do tipo discreto ou contínuo. No entanto, para alguns tipos de problemas, pode haver variáveis que não são bem representadas por estes formatos. Diante deste contexto, foram criadas as Árvores de Decisão Fuzzy (ADF), que adicionam à interpretabilidade das Árvores de Decisão Indutivas, a capacidade de lidar com variáveis fuzzy, as quais representam adequadamente conhecimentos imprecisos. Neste texto, apresentamos o trabalho desenvolvido durante o mestrado, que tem como principal resultado um novo algoritmo para indução de Árvores de Decisão Fuzzy, cujo método de fuzificação dos atributos contínuos é realizado durante a indução da árvore e foi inspirado no método de particionamento de atributos contínuos adotado pelo C4.5. Para validação do algoritmo, foram realizados testes com 20 conjuntos de dados do repositório UCI (LICHMAN, 2013) e o algoritmo foi comparado com outros três algoritmos que abordam o problema de classificação por meio de técnicas diferentes: o C4.5 que induz uma Árvore de Decisão Indutiva, o FURIA, que induz um Sistema Fuzzy Baseado em Regras, porém não segue a estrutura de árvore e o FuzzyDT que induz uma Árvore de Decisão fuzzy realizando a fuzificação dos atributos contínuos antes da indução da árvore. Os resultados dos experimentos realizados são apresentados e discutidos no Capítulo 4 deste texto.
127

Carbon Intensity Estimation of Publicly Traded Companies / Uppskattning av koldioxidintensitet hos börsnoterade bolag

Ribberheim, Olle January 2021 (has links)
The purpose of this master thesis is to develop a model to estimate the carbon intensity, i.e the carbon emission relative to economic activity, of publicly traded companies which do not report their carbon emissions. By using statistical and machine learning models, the core of this thesis is to develop and compare different methods and models with regard to accuracy, robustness, and explanatory value when estimating carbon intensity. Both discrete variables, such as the region and sector the company is operating in, and continuous variables, such as revenue and capital expenditures, are used in the estimation. Six methods were compared, two statistically derived and four machine learning methods. The thesis consists of three parts: data preparation, model implementation, and model comparison. The comparison indicates that boosted decision tree is both the most accurate and robust model. Lastly, the strengths and weaknesses of the methodology is discussed, as well as the suitability and legitimacy of the boosted decision tree when estimating carbon intensity. / Syftet med denna masteruppsats är att utveckla en modell som uppskattar koldioxidsintensiteten, det vill säga koldioxidutsläppen i förhållande till ekonomisk aktivitet, hos publika bolag som inte rapporterar sina koldioxidutsläpp. Med hjälp av statistiska och maskininlärningsmodeller kommer stommen i uppsatsen vara att utveckla och jämföra olika metoder och modeller utifrån träffsäkerhet, robusthet och förklaringsvärde vid uppskattning av koldioxidintensitet. Både diskreta och kontinuerliga variabler används vid uppskattningen, till exempel region och sektor som företaget är verksam i, samt omsättning och kapitalinvesteringar. Sex stycken metoder jämfördes, två statistiskt härledda och fyra maskininlärningsmetoder. Arbetet består av tre delar; förberedelse av data, modellutveckling och modelljämförelse, där jämförelsen indikerar att boosted decision tree är den modell som är både mest träffsäker och robust. Slutligen diskuteras styrkor och svagheter med metodiken, samt lämpligheten och tillförlitligheten med att använda ett boosted decision tree för att uppskatta koldioxidintensitet.
128

Automatic Analysis of Peer Feedback using Machine Learning and Explainable Artificial Intelligence / Automatisk analys av Peer feedback med hjälp av maskininlärning och förklarig artificiell Intelligence

Huang, Kevin January 2023 (has links)
Peer assessment is a process where learners evaluate and provide feedback on one another’s performance, which is critical to the student learning process. Earlier research has shown that it can improve student learning outcomes in various settings, including the setting of engineering education, in which collaborative teaching and learning activities are common. Peer assessment activities in computer-supported collaborative learning (CSCL) settings are becoming more and more common. When using digital technologies for performing these activities, much student data (e.g., peer feedback text entries) is generated automatically. These large data sets can be analyzed (through e.g., computational methods) and further used to improve our understanding of how students regulate their learning in CSCL settings in order to improve their conditions for learning by for example, providing in-time feedback. Yet there is currently a need to automatise the coding process of these large volumes of student text data since it is a very time- and resource consuming task. In this regard, the recent development in machine learning could prove beneficial. To understand how we can harness the affordances of machine learning technologies to classify student text data, this thesis examines the application of five models on a data set containing peer feedback from 231 students in the settings of a large technical university course. The models used to evaluate on the dataset are: the traditional models Multi Layer Perceptron (MLP), Decision Tree and the transformers-based models BERT, RoBERTa and DistilBERT. To evaluate each model’s performance, Cohen’s κ, accuracy, and F1-score were used as metrics. Preprocessing of the data was done by removing stopwords; then it was examined whether removing them improved the performance of the models. The results showed that preprocessing on the dataset only made the Decision Tree increase in performance while it decreased on all other models. RoBERTa was the model with the best performance on the dataset on all metrics used. Explainable artificial intelligence (XAI) was used on RoBERTa as it was the best performing model and it was found that the words considered as stopwords made a difference in the prediction. / Kamratbedömning är en process där eleverna utvärderar och ger feedback på varandras prestationer, vilket är avgörande för elevernas inlärningsprocess. Tidigare forskning har visat att den kan förbättra studenternas inlärningsresultat i olika sammanhang, däribland ingenjörsutbildningen, där samarbete vid undervisning och inlärning är vanligt förekommande. I dag blir det allt vanligare med kamratbedömning inom datorstödd inlärning i samarbete (CSCL). När man använder digital teknik för att utföra dessa aktiviteter skapas många studentdata (t.ex. textinlägg om kamratåterkoppling) automatiskt. Dessa stora datamängder kan analyseras (genom t.ex, beräkningsmetoder) och användas vidare för att förbättra våra kunskaper om hur studenterna reglerar sitt lärande i CSCL-miljöer för att förbättra deras förutsättningar för lärande. Men för närvarande finns det ett stort behov av att automatisera kodningen av dessa stora volymer av textdata från studenter. I detta avseende kan den senaste utvecklingen inom maskininlärning vara till nytta. För att förstå hur vi kan nyttja möjligheterna med maskininlärning teknik för att klassificera textdata från studenter, undersöker vi i denna studie hur vi kan använda fem modeller på en datamängd som innehåller feedback från kamrater till 231 studenter. Modeller som används för att utvärdera datasetet är de traditionella modellerna Multi Layer Perceptron (MLP), Decision Tree och de transformer-baserade modellerna BERT, RoBERTa och DistilBERT. För att utvärdera varje modells effektivitet användes Cohen’s κ, noggrannhet och F1-poäng som mått. Förbehandling av data gjordes genom att ta bort stoppord, därefter undersöktes om borttagandet av dem förbättrade modellernas effektivitet. Resultatet visade att förbehandlingen av datasetet endast fick Decision Tree att öka sin prestanda, medan den minskade för alla andra modeller. RoBERTa var den modell som presterade bäst på datasetet för alla mätvärden som användes. Förklarlig artificiell intelligens (XAI) användes på RoBERTa eftersom det var den modell som presterade bäst, och det visade sig att de ord som ansågs vara stoppord hade betydelse för prediktionen.
129

Consensus Algorithms in Blockchain : A survey to create decision trees for blockchain applications / Konsensusalgoritmer i Blockchain : En undersökning för att skapa beslutsträd för blockchain-applikationer

Zhu, Xinlin January 2023 (has links)
Blockchain is a decentralized database that is distributed among a computer network. To enable a smooth decision making process without any authority, different blockchain applications use their own consensus algorithms. The problem is that for a new blockchain application, there is limited aid in deciding which algorithm it should implement. Selecting consensus algorithms is crucial because reaching consensus is the fundamental issue of a decentralized system. Different algorithms are designed with their own advantages and limitations, making it complex to navigate one’s way through a list of consensus algorithms. This thesis attempts to contribute to solving this problem by surveying 15 existing cryptocurrencies’ consensus algorithms used in their blockchain application and then producing a decision tree as the aid for algorithm selection. The top 5 algorithms from each category in Proof of Work (PoW), Proof of Stake (PoS), and Hybrid Proof of Work + Proof of Stake (PoW + PoS) are selected. The research method is qualitative. The study shows that different consensus algorithms often share some properties, but they are usually built to solve the issues of another algorithm, which means they also have their own distinctive advantages. Therefore, the decision tree reveals how these algorithms are logically connected and the key properties blockchain consensus algorithms possess. Based on the result of this thesis, further research can be conducted to include more algorithms in order to make the decision tree more comprehensive. Implementations of these algorithms in similar network setup can also be done to experiment with their claimed properties. The decision tree can be sent to industry for further feedback. / Blockchain är en decentraliserad databas som distribueras i ett datornätverk. För att möjliggöra en smidig beslutsprocess utan någon auktoritet använder olika blockkedjeapplikationer sina egna konsensusalgoritmer. Problemet är att för en ny blockchain-applikation finns det begränsad hjälp för att bestämma vilken algoritm den ska implementera. Att välja konsensusalgoritmer är avgörande eftersom att nå konsensus är den grundläggande frågan för ett decentraliserat system. Olika algoritmer är designade med sina egna fördelar och begränsningar, vilket gör det komplicerat att navigera sig igenom en lista med konsensusalgoritmer. Forskningsmetoden är kvalitativ. Det här dokumentet försöker bidra till att lösa detta problem genom att kartlägga 15 befintliga kryptovalutors konsensusalgoritmer som används i deras blockkedjeapplikation och sedan ta fram ett beslutsträd som hjälp för val av algoritmer. De 5 bästa algoritmerna från varje kategori i Proof of Work (PoW), Proof of Stake (PoS) och Hybrid Proof of Work + Proof of Stake (PoW + PoS) väljs. Studien visar att olika konsensusalgoritmer ofta delar vissa egenskaper, men de är vanligtvis byggda för att lösa problem med en annan algoritm, vilket innebär att de också har sina egna distinkta fördelar. Därför avslöjar beslutsträdet hur dessa algoritmer är logiskt kopplade och de nyckelegenskaper som blockchain konsensusalgoritmer besitter. Baserat på resultatet av denna artikel kan ytterligare forskning utföras för att inkludera fler algoritmer för att göra beslutsträdet mer heltäckande. Implementeringar av dessa algoritmer i liknande nätverksuppsättningar kan också göras för att experimentera med deras påstådda egenskaper. Beslutsträdet kan skickas till industrin för vidare feedback.
130

Categorization of Swedish e-mails using Supervised Machine Learning / Kategorisering av svenska e-postmeddelanden med användning av övervakad maskininlärning

Mann, Anna, Höft, Olivia January 2021 (has links)
Society today is becoming more digitalized, and a common way of communication is to send e-mails. Currently, the company Auranest has a filtering method for categorizing e-mails, but the method is a few years old. The filter provides a classification of valuable e-mails for jobseekers, where employers can make contact. The company wants to know if the categorization can be performed with a different method and improved. The degree project aims to investigate whether the categorization can be proceeded with higher accuracy using machine learning. Three supervised machine learning algorithms, Naïve Bayes, Support Vector Machine (SVM), and Decision Tree, have been examined, and the algorithm with the highest results has been compared with Auranest's existing filter. Accuracy, Precision, Recall, and F1 score have been used to determine which machine learning algorithm received the highest results and in comparison, with Auranest's filter. The results showed that the supervised machine learning algorithm SVM achieved the best results in all metrics. The comparison between Auranest's existing filter and SVM showed that SVM performed better in all calculated metrics, where the accuracy showed 99.5% for SVM and 93.03% for Auranest’s filter. The comparative results showed that accuracy was the only factor that received similar results. For the other metrics, there was a noticeable difference. / Dagens samhälle blir alltmer digitaliserat och ett vanligt kommunikationssätt är att skicka e-postmeddelanden. I dagsläget har företaget Auranest ett filter för att kategorisera e-postmeddelanden men filtret är några år gammalt. Användningsområdet för filtret är att sortera ut värdefulla e-postmeddelanden för arbetssökande, där kontakt kan ske från arbetsgivare. Företaget vill veta ifall kategoriseringen kan göras med en annan metod samt förbättras. Målet med examensarbetet är att undersöka ifall filtreringen kan göras med högre träffsäkerhet med hjälp av maskininlärning. Tre övervakade maskininlärningsalgoritmer, Naïve Bayes, Support Vector Machine (SVM) och Decision Tree, har granskats och algoritmen med de högsta resultaten har jämförts med Auranests befintliga filter. Träffsäkerhet, precision, känslighet och F1-poäng har använts för att avgöra vilken maskininlärningsalgoritm som gav högst resultat sinsemellan samt i jämförelse med Auranests filter. Resultatet påvisade att den övervakade maskininlärningsmetoden SVM åstadkom de främsta resultaten i samtliga mätvärden. Jämförelsen mellan Auranests befintliga filter och SVM visade att SVM presterade bättre i alla kalkylerade mätvärden, där träffsäkerheten visade 99,5% för SVM och 93,03% för Auranests filter. De jämförande resultaten visade att träffsäkerheten var den enda faktorn som gav liknande resultat. För de övriga mätvärdena var det en märkbar skillnad.

Page generated in 0.139 seconds