Global ETD Search

141	Application of numerical weather prediction with machine learning techniques to improve middle latitude rapid cyclogenesis forecasting Snyder, Colin Matthew 13 August 2024 (has links) (PDF) This study goal was to first determine the baseline Global Forecast System (GFS) skill in forecasting borderline (non-bomb:0.75-0.95, bomb: 1.-1.25) bomb events, and second to determine if machine learning (ML) techniques as a post-processor can improve the forecasts. This was accomplished by using the Tempest Extreme cyclone tracking software and ERA5 analysis to develop a case list during the period of October to March for the years 2008-2021. Based on the case list, GFS 24-hour forecasts of atmospheric base state variables in 10-degree by 10-degree cyclone center subdomains was compressed using S-mode Principal Component Analysis. A genetic algorithm was then used to determine the best predictors. These predictors were then used to train a logistic regression as a baseline ML skill and a Support Vector Machine (SVM) model. Both the logistic regression and SVM provided an improved bias over the GFS baseline skill, but only the logistic regression improved skill.
142	Sélection de modèle par chemin de régularisation pour les machines à vecteurs support à coût quadratique / Model selection using regularization path for quadratic cost support vector machines Bonidal, Rémi 19 June 2013 (has links) La sélection de modèle est un thème majeur de l'apprentissage statistique. Dans ce manuscrit, nous introduisons des méthodes de sélection de modèle dédiées à des SVM bi-classes et multi-classes. Ces machines ont pour point commun d'être à coût quadratique, c'est-à-dire que le terme empirique de la fonction objectif de leur problème d'apprentissage est une forme quadratique. Pour les SVM, la sélection de modèle consiste à déterminer la valeur optimale du coefficient de régularisation et à choisir un noyau approprié (ou les valeurs de ses paramètres). Les méthodes que nous proposons combinent des techniques de parcours du chemin de régularisation avec de nouveaux critères de sélection. La thèse s'articule autour de trois contributions principales. La première est une méthode de sélection de modèle par parcours du chemin de régularisation dédiée à la l2-SVM. Nous introduisons à cette occasion de nouvelles approximations de l'erreur en généralisation. Notre deuxième contribution principale est une extension de la première au cas multi-classe, plus précisément à la M-SVM². Cette étude nous a conduits à introduire une nouvelle M-SVM, la M-SVM des moindres carrés. Nous présentons également de nouveaux critères de sélection de modèle pour la M-SVM de Lee, Lin et Wahba à marge dure (et donc la M-SVM²) : un majorant de l'erreur de validation croisée leave-one-out et des approximations de cette erreur. La troisième contribution principale porte sur l'optimisation des valeurs des paramètres du noyau. Notre méthode se fonde sur le principe de maximisation de l'alignement noyau/cible, dans sa version centrée. Elle l'étend à travers l'introduction d'un terme de régularisation. Les évaluations expérimentales de l'ensemble des méthodes développées s'appuient sur des benchmarks fréquemment utilisés dans la littérature, des jeux de données jouet et des jeux de données associés à des problèmes du monde réel / Model selection is of major interest in statistical learning. In this document, we introduce model selection methods for bi-class and multi-class support vector machines. We focus on quadratic loss machines, i.e., machines for which the empirical term of the objective function of the learning problem is a quadratic form. For SVMs, model selection consists in finding the optimal value of the regularization coefficient and choosing an appropriate kernel (or the values of its parameters). The proposed methods use path-following techniques in combination with new model selection criteria. This document is structured around three main contributions. The first one is a method performing model selection through the use of the regularization path for the l2-SVM. In this framework, we introduce new approximations of the generalization error. The second main contribution is the extension of the first one to the multi-category setting, more precisely the M-SVM². This study led us to derive a new M-SVM, the least squares M-SVM. Additionally, we present new model selection criteria for the M-SVM introduced by Lee, Lin and Wahba (and thus the M-SVM²). The third main contribution deals with the optimization of the values of the kernel parameters. Our method makes use of the principle of kernel-target alignment with centered kernels. It extends it through the introduction of a regularization term. Experimental validation of these methods was performed on classical benchmark data, toy data and real-world data Apprentissage Discrimination Machine à vecteurs support (SVM) Sélection de modèle Chemin de régularisation Machine learning Classification Support Vector Machine (SVM) Model selection Regularization path 006.31
143	Método para detecção de anomalias em tráfego de redes Real Time Ethernet aplicado em PROFINET e em SERCOS III / Method for detecting traffic anomalies of Real Time Ethernet networks applied to PROFINET and SERCOS III Sestito, Guilherme Serpa 24 October 2018 (has links) Esta tese propõe uma metodologia de detecção de anomalias por meio da otimização da extração, seleção e classificação de características relacionadas ao tráfego de redes Real Time Ethernet (RTE). Em resumo, dois classificadores são treinados usando características que são extraídas do tráfego por meio da técnica de janela deslizante e posteriormente selecionadas de acordo com sua correlação com o evento a ser classificado. O número de características relevantes pode variar de acordo com os indicadores de desempenho de cada classificador. Reduzindo a dimensionalidade do evento a ser classificado com o menor número de características possíveis que o represente, são garantidos a redução do esforço computacional, ganho de tempo, dentre outros benefícios. Posteriormente, os classificadores são comparados em função dos indicadores de desempenho: acurácia, taxa de falsos positivos, taxa de falsos negativos, tempo de processamento e erro relativo. A metodologia proposta foi utilizada para identificar quatro diferentes eventos (três anomalias e o estado normal de operação) em redes PROFINET reais e com configurações distintas entre si; também foi aplicada em três eventos (duas anomalias e o estado normal de operação) em redes SERCOS III. O desempenho de cada classificador é analisado em suas particularidades e comparados com pesquisas correlatas. Por fim, é explorada a possibilidade de aplicação da metodologia proposta para outros protocolos baseados em RTE. / This thesis proposes an anomaly detection methodology by optimizing extraction, selection and classification of characteristics related to Real Time Ethernet (RTE) network traffic. In summary, two classifiers are trained using features which are extracted from network traffic through the sliding window technique and selected according to their correlation with the event being classified. The number of relevant characteristics could vary according to performance indicators of each classifier. Reducing the dimensionality of the event to be classified using the smallest number of characteristics which represent it, guarantees reduction in computational effort, processing time, among other benefits. The classifiers are compared according to performance indicators: accuracy, false positive rate, false negative rate, processing time and relative error. The proposed methodology was used to identify four different events (three anomalies and normal operation) in real PROFINET networks, using different configurations. It was also applied in 3 events (two anomalies and normal operation) in SERCOS III networks. The results obtained are analyzed in its particularities and compared with related research. Finally, the possibility of applying the proposed methodology for other protocols based on RTE is explored. Real Time Ethernet Support Vector Machine Artificial Neural Networks Extração de características Feature Extraction Feature Selection Optimization Otimização PROFINET PROFINET Real Time Ethernet Redes Neurais Artificiais Seleção de características SERCOS III Support Vector Machine
144	System för att upptäcka Phishing : Klassificering av mejl Karlsson, Nicklas January 2008 (has links) <p>Denna rapport tar en titt på phishing-problemet, något som många har råkat ut för med bland annat de falska Nordea eller eBay mejl som på senaste tiden har dykt upp i våra inkorgar, och ett eventuellt sätt att minska phishingens effekt. Fokus i rapporten ligger på klassificering av mejl och den huvudsakliga frågeställningen är: ”Är det, med hög träffsäkerhet, möjligt att med hjälp av ett klassificeringsverktyg sortera ut mejl som har med phishing att göra från övrig skräppost.” Det visade sig svårare än väntat att hitta phishing mejl att använda i klassificeringen. I de klassificeringar som genomfördes visade det sig att både metoden Naive Bayes och med Support Vector Machine kan hitta upp till 100 % av phishing mejlen. Rapporten pressenterar arbetsgången, teori om phishing och resultaten efter genomförda klassificeringstest.</p> / <p>This report takes a look at the phishing problem, something that many have come across with for example the fake Nordea or eBay e-mails that lately have shown up in our e-mail inboxes, and a possible way to reduce the effect of phishing. The focus in the report lies on classification of e-mails and the main question is: “Is it, with high accuracy, possible with a classification tool to sort phishing e-mails from other spam e-mails.” It was more difficult than expected to find phishing e-mails to use in the classification. The classifications that were made showed that it was possible to find up to 100 % of the phishing e-mails with both Naive Bayes and with Support Vector Machine. The report presents the work done, facts about phishing and the results of the classification tests made.</p> Phishing spam classification Naive Bayes Support Vector Machine RainBow Cygwin Anti-Phishing Working Group spam filer Phishing nätfiske spam skräppost klassificering Naive Bayes Support Vector Machine RainBow Cygwin Anti-Phishing Working Group spamfiler Computer science Datalogi
145	Forecasting Mid-Term Electricity Market Clearing Price Using Support Vector Machines 2014 May 1900 (has links) In a deregulated electricity market, offering the appropriate amount of electricity at the right time with the right bidding price is of paramount importance. The forecasting of electricity market clearing price (MCP) is a prediction of future electricity price based on given forecast of electricity demand, temperature, sunshine, fuel cost, precipitation and other related factors. Currently, there are many techniques available for short-term electricity MCP forecasting, but very little has been done in the area of mid-term electricity MCP forecasting. The mid-term electricity MCP forecasting focuses electricity MCP on a time frame from one month to six months. Developing mid-term electricity MCP forecasting is essential for mid-term planning and decision making, such as generation plant expansion and maintenance schedule, reallocation of resources, bilateral contracts and hedging strategies. Six mid-term electricity MCP forecasting models are proposed and compared in this thesis: 1) a single support vector machine (SVM) forecasting model, 2) a single least squares support vector machine (LSSVM) forecasting model, 3) a hybrid SVM and auto-regression moving average with external input (ARMAX) forecasting model, 4) a hybrid LSSVM and ARMAX forecasting model, 5) a multiple SVM forecasting model and 6) a multiple LSSVM forecasting model. PJM interconnection data are used to test the proposed models. Cross-validation technique was used to optimize the control parameters and the selection of training data of the six proposed mid-term electricity MCP forecasting models. Three evaluation techniques, mean absolute error (MAE), mean absolute percentage error (MAPE) and mean square root error (MSRE), are used to analysis the system forecasting accuracy. According to the experimental results, the multiple SVM forecasting model worked the best among all six proposed forecasting models. The proposed multiple SVM based mid-term electricity MCP forecasting model contains a data classification module and a price forecasting module. The data classification module will first pre-process the input data into corresponding price zones and then the forecasting module will forecast the electricity price in four parallel designed SVMs. This proposed model can best improve the forecasting accuracy on both peak prices and overall system compared with other 5 forecasting models proposed in this thesis. Classiﬁcation Deregulated electric market Electricity market clearing price Electricity price forecasting PJM Support vector machine (SVM) Peak price
146	Twittersentimentanalys : Jämförelse av klassificeringsmodeller tränade på olika datamängder. / Twitter Sentiment Analysis : Comparison of classification models trained on different data sets. Bandgren, Johannes, Selberg, Johan January 2018 (has links) Twitter är en av de populäraste mikrobloggarna, som används för att uttryckatankar och åsikter om olika ämnen. Ett område som har dragit till sig mycketintresse under de senaste åren är twittersentimentanalys. Twittersentimentanalyshandlar om att bedöma vad för sentiment ett inlägg på Twitter uttrycker, om detuttrycker någonting positivt eller negativt. Olika metoder kan användas för attutföra twittersentimentanalys, där vissa lämpar sig bättre än andra. De vanligastemetoderna för twittersentimentanalys använder maskininlärning.Syftet med denna studie är att utvärdera tre stycken klassificeringsalgoritmerinom maskininlärning och hur märkningen av en datamängd påverkar en klassifi-ceringsmodells förmåga att märka ett twitterinlägg korrekt för twittersentimenta-nalys. Naive Bayes, Support Vector Machine och Convolutional Neural Network ärklassificeringsalgoritmerna som har utvärderats. För varje klassificeringsalgoritmhar två klassificeringsmodeller tagits fram, som har tränats och testats på två se-parata datamängder: Stanford Twitter Sentiment och SemEval. Det som skiljer detvå datamängderna åt, utöver innehållet i twitterinläggen, är märkningsmetodenoch mängden twitterinlägg. Utvärderingen har gjorts utefter vilken prestanda deframtagna klassificeringmodellerna uppnår på respektive datamängd, hur lång tidde tar att träna och hur invecklade de var att implementera.Resultaten av studien visar att samtliga modeller som tränades och testades påSemEval uppnådde en högre prestanda än de som tränades och testades på Stan-ford Twitter Sentiment. Klassificeringsmodellerna som var framtagna med Convo-lutional Neural Network uppnådde bäst resultat över båda datamängderna. Dockär ett Convolutional Neural Network mer invecklad att implementera och tränings-tiden är betydligt längre än Naive Bayes och Support Vector Machine. / Twitter is one of the most popular microblogs, which is used to express thoughtsand opinions on different topics. An area that has attracted much interest in recentyears is Twitter sentiment analysis. Twitter sentiment analysis is about assessingwhat sentiment a Twitter post expresses, whether it expresses something positiveor negative. Different methods can be used to perform Twitter sentiment analysis.The most common methods of Twitter sentiment analysis use machine learning.The purpose of this study is to evaluate three classification algorithms in ma-chine learning and how the labeling of a data set affects classification models abilityto classify a Twitter post correctly for Twitter sentiment analysis. Naive Bayes,Support Vector Machine and Convolutional Neural Network are the classificationalgorithms that have been evaluated. For each classification algorithm, two classi-fication models have been trained and tested on two separate data sets: StanfordTwitter Sentiment and SemEval. What separates the two data sets, in addition tothe content of the twitter posts, is the labeling method and the amount of twitterposts. The evaluation has been done according to the performance of the classifi-cation models on the respective data sets, training time and how complicated theywere to implement.The results show that all models trained and tested on SemEval achieved ahigher performance than those trained and tested on Stanford Twitter Sentiment.The Convolutional Neural Network models achieved the best results over both datasets. However, a Convolutional Neural Network is more complicated to implementand the training time is significantly longer than Naive Bayes and Support VectorMachine. Twitter sentiment analysis machine learning Naive Bayes Support Vector Machine Convolutional Neural Network SemEval Stanford Twitter Sen- timent pre-processing. Twittersentimentanalys maskininlärning Naive Bayes Support Vector Machine Convolutional Neural Network SemEval Stanford Twitter Sentiment databearbetning. Engineering and Technology Teknik och teknologier
147	System för att upptäcka Phishing : Klassificering av mejl Karlsson, Nicklas January 2008 (has links) Denna rapport tar en titt på phishing-problemet, något som många har råkat ut för med bland annat de falska Nordea eller eBay mejl som på senaste tiden har dykt upp i våra inkorgar, och ett eventuellt sätt att minska phishingens effekt. Fokus i rapporten ligger på klassificering av mejl och den huvudsakliga frågeställningen är: ”Är det, med hög träffsäkerhet, möjligt att med hjälp av ett klassificeringsverktyg sortera ut mejl som har med phishing att göra från övrig skräppost.” Det visade sig svårare än väntat att hitta phishing mejl att använda i klassificeringen. I de klassificeringar som genomfördes visade det sig att både metoden Naive Bayes och med Support Vector Machine kan hitta upp till 100 % av phishing mejlen. Rapporten pressenterar arbetsgången, teori om phishing och resultaten efter genomförda klassificeringstest. / This report takes a look at the phishing problem, something that many have come across with for example the fake Nordea or eBay e-mails that lately have shown up in our e-mail inboxes, and a possible way to reduce the effect of phishing. The focus in the report lies on classification of e-mails and the main question is: “Is it, with high accuracy, possible with a classification tool to sort phishing e-mails from other spam e-mails.” It was more difficult than expected to find phishing e-mails to use in the classification. The classifications that were made showed that it was possible to find up to 100 % of the phishing e-mails with both Naive Bayes and with Support Vector Machine. The report presents the work done, facts about phishing and the results of the classification tests made. Phishing spam classification Naive Bayes Support Vector Machine RainBow Cygwin Anti-Phishing Working Group spam filer Phishing nätfiske spam skräppost klassificering Naive Bayes Support Vector Machine RainBow Cygwin Anti-Phishing Working Group spamfiler Computer Sciences Datavetenskap (datalogi)
148	Machine learning spatial appliquée aux images multivariées et multimodales / Spatial machine learning applied to multivariate and multimodal images Franchi, Gianni 21 September 2016 (has links) Cette thèse porte sur la statistique spatiale multivariée et l’apprentissage appliqués aux images hyperspectrales et multimodales. Les thèmes suivants sont abordés :Fusion d'images :Le microscope électronique à balayage (MEB) permet d'acquérir des images à partir d'un échantillon donné en utilisant différentes modalités. Le but de ces études est d'analyser l’intérêt de la fusion de l'information pour améliorer les images acquises par MEB. Nous avons mis en œuvre différentes techniques de fusion de l'information des images, basées en particulier sur la théorie de la régression spatiale. Ces solutions ont été testées sur quelques jeux de données réelles et simulées.Classification spatiale des pixels d’images multivariées :Nous avons proposé une nouvelle approche pour la classification de pixels d’images multi/hyper-spectrales. Le but de cette technique est de représenter et de décrire de façon efficace les caractéristiques spatiales / spectrales de ces images. Ces descripteurs multi-échelle profond visent à représenter le contenu de l'image tout en tenant compte des invariances liées à la texture et à ses transformations géométriques.Réduction spatiale de dimensionnalité :Nous proposons une technique pour extraire l'espace des fonctions en utilisant l'analyse en composante morphologiques. Ainsi, pour ajouter de l'information spatiale et structurelle, nous avons utilisé les opérateurs de morphologie mathématique. / This thesis focuses on multivariate spatial statistics and machine learning applied to hyperspectral and multimodal and images in remote sensing and scanning electron microscopy (SEM). In this thesis the following topics are considered:Fusion of images:SEM allows us to acquire images from a given sample using different modalities. The purpose of these studies is to analyze the interest of fusion of information to improve the multimodal SEM images acquisition. We have modeled and implemented various techniques of image fusion of information, based in particular on spatial regression theory. They have been assessed on various datasets.Spatial classification of multivariate image pixels:We have proposed a novel approach for pixel classification in multi/hyper-spectral images. The aim of this technique is to represent and efficiently describe the spatial/spectral features of multivariate images. These multi-scale deep descriptors aim at representing the content of the image while considering invariances related to the texture and to its geometric transformations.Spatial dimensionality reduction:We have developed a technique to extract a feature space using morphological principal component analysis. Indeed, in order to take into account the spatial and structural information we used mathematical morphology operators Traitement de l'image Machine Learning Méthodes à noyaux Morphologie mathématique Analyse en composantes principales Support vector machine Apprentissage profond Transformée de scattering Krigeage Image processing Machine learning Kernel methods Mathematical morphology Principal component analysis Support vector machine Deep learning Scattering transform Kriging 621.367
149	Evaluation automatique des états émotionnels et dépressifs : vers un système de prévention des risques psychosociaux / Automatic evaluation of emotional and depressive states : towards a prevention system for psychosocial risks Cholet, Stéphane 17 June 2019 (has links) Les risques psychosociaux sont un enjeu de santé publique majeur, en particulier à cause des troubles qu'ils peuvent engendrer : stress, changements d'humeurs, burn-out, etc. Bien que le diagnostic de ces troubles doive être réalisé par un professionel, l'Affective Computing peut apporter une contribution en améliorant la compréhension des phénomènes. L'Affective Computing (ou Informatique Affective) est un domaine pluridisciplinaire, faisant intervenir des concepts d'Intelligence Artificielle, de psychologie et de psychiatrie, notamment. Dans ce travail de recherche, on s'intéresse à deux éléments pouvant faire l'objet de troubles : l'état émotionnel et l'état dépressif des individus.Le concept d'émotion couvre un très large champ de définitions et de modélisations, pour la plupart issues de travaux en psychiatrie ou en psychologie. C'est le cas, par exemple, du circumplex de Russell, qui définit une émotion comme étant la combinaison de deux dimensions affectives, nommées valence et arousal. La valence dénote le caractère triste ou joyeux d'un individu, alors que l'arousal qualifie son caractère passif ou actif. L'évaluation automatique des états émotionnels a suscité, dans la dernière décénie, un regain d'intérêt notable. Des méthodes issues de l'Intelligence Artificielle permettent d'atteindre des performances intéressantes, à partir de données capturées de manière non-invasive, comme des vidéos. Cependant, il demeure un aspect peu étudié : celui des intensités émotionnelles, et de la possibilité de les reconnaître. Dans cette thèse, nous avons exploré cet aspect au moyen de méthodes de visualisation et de classification pour montrer que l'usage de classes d'intensités émotionnelles, plutôt que de valeurs continues, bénéficie à la fois à la reconnaissance automatique et à l'interprétation des états.Le concept de dépression connaît un cadre plus strict, dans la mesure où c'est une maladie reconnue en tant que telle. Elle atteint les individus sans distinction d'âge, de genre ou de métier, mais varie en intensité ou en nature des symptômes. Pour cette raison, son étude tant au niveau de la détection que du suivi, présente un intérêt majeur pour la prévention des risques psychosociaux.Toutefois, son diagnostic est rendu difficile par le caractère parfois anodin des symptômes et par la démarche souvent délicate de consulter un spécialiste. L'échelle de Beck et le score associé permettent, au moyen d'un questionnaire, d'évaluer la sévérité de l'état dépressif d'un individu. Le système que nous avons développé est capable de reconnaître automatiquement le score dépressif d'un individu à partir de vidéos. Il comprend, d'une part, un descripteur visuel spatio-temporel bas niveau qui quantifie les micro et les macro-mouvements faciaux et, d'autre part, des méthodes neuronales issues des sciences cognitives. Sa rapidité autorise des applications de reconnaissance des états dépressifs en temps réel, et ses performances sont intéressantes au regard de l'état de l'art. La fusion des modalités visuelles et auditives a également fait l'objet d'une étude, qui montre que l'utilisation de ces deux canaux sensoriels bénéficie à la reconnaissance des états dépressifs.Au-delà des performances et de son originalité, l'un des points forts de ce travail de thèse est l'interprétabilité des méthodes. En effet, dans un contexte pluridisciplinaire tel que celui posé par l'Affective Computing, l'amélioration des connaissances et la compréhension des phénomènes étudiés sont des aspects majeurs que les méthodes informatiques sous forme de "boîte noire" ont souvent du mal à appréhender. / Psychosocial risks are a major public health issue, because of the disorders they can trigger : stress, mood swings, burn-outs, etc. Although propoer diagnosis can only be made by a healthcare professionnel, Affective Computing can make a contribution by improving the understanding of the phenomena. Affective Computing is a multidisciplinary field involving concepts of Artificial Intelligence, psychology and psychiatry, among others. In this research, we are interested in two elements that can be subject to disorders: the emotional state and the depressive state of individuals.The concept of emotion covers a wide range of definitions and models, most of which are based on work in psychiatry or psychology. A famous example is Russell's circumplex, which defines an emotion as the combination of two emotional dimensions, called valence and arousal. Valence denotes an individual's sad or joyful character, while arousal denotes his passive or active character. The automatic evaluation of emotional states has generated a significant revival of interest in the last decade. Methods from Artificial Intelligence allow to achieve interesting performances, from data captured in a non-invasive manner, such as videos. However, there is one aspect that has not been studied much: that of emotional intensities and the possibility of recognizing them. In this thesis, we have explored this aspect using visualization and classification methods to show that the use of emotional intensity classes, rather than continuous values, benefits both automatic recognition and state interpretation.The concept of depression is more strict, as it is a recognized disease as such. It affects individuals regardless of age, gender or occupation, but varies in intensity or nature of symptoms. For this reason, its study, both at the level of detection and monitoring, is of major interest for the prevention of psychosocial risks.However, his diagnosis is made difficult by the sometimes innocuous nature of the symptoms and by the often delicate process of consulting a specialist. The Beck's scale and the associated score allow, by means of a questionnaire, to evaluate the severity of an individual's state of depression. The system we have developed is able to automatically recognize an individual's depressive score from videos. It includes, on the one hand, a low-level visual spatio-temporal descriptor that quantifies micro and macro facial movements and, on the other hand, neural methods from the cognitive sciences. Its speed allows applications for real-time recognition of depressive states, and its performance is interesting with regard to the state of the art. The fusion of visual and auditory modalities has also been studied, showing that the use of these two sensory channels benefits the recognition of depressive states.Beyond performance and originality, one of the strong points of this thesis is the interpretability of the methods. Indeed, in a multidisciplinary context such as that of Affective Computing, improving knowledge and understanding of the studied phenomena is a key point that usual computer methods implemeted as "black boxes" can't deal with. Intelligence artificielle Informatique affective Réseaux de neurones Support vector machine Classification Mémoire associative hétérogène Réseaux de neurones incrémentaux Artificial intelligence Affective computing Neural networks Support vector machine Classification Bidirectional associative memory Incremental neural networks 004 006.3
150	Categorization of Swedish e-mails using Supervised Machine Learning / Kategorisering av svenska e-postmeddelanden med användning av övervakad maskininlärning Mann, Anna, Höft, Olivia January 2021 (has links) Society today is becoming more digitalized, and a common way of communication is to send e-mails. Currently, the company Auranest has a filtering method for categorizing e-mails, but the method is a few years old. The filter provides a classification of valuable e-mails for jobseekers, where employers can make contact. The company wants to know if the categorization can be performed with a different method and improved. The degree project aims to investigate whether the categorization can be proceeded with higher accuracy using machine learning. Three supervised machine learning algorithms, Naïve Bayes, Support Vector Machine (SVM), and Decision Tree, have been examined, and the algorithm with the highest results has been compared with Auranest's existing filter. Accuracy, Precision, Recall, and F1 score have been used to determine which machine learning algorithm received the highest results and in comparison, with Auranest's filter. The results showed that the supervised machine learning algorithm SVM achieved the best results in all metrics. The comparison between Auranest's existing filter and SVM showed that SVM performed better in all calculated metrics, where the accuracy showed 99.5% for SVM and 93.03% for Auranest’s filter. The comparative results showed that accuracy was the only factor that received similar results. For the other metrics, there was a noticeable difference. / Dagens samhälle blir alltmer digitaliserat och ett vanligt kommunikationssätt är att skicka e-postmeddelanden. I dagsläget har företaget Auranest ett filter för att kategorisera e-postmeddelanden men filtret är några år gammalt. Användningsområdet för filtret är att sortera ut värdefulla e-postmeddelanden för arbetssökande, där kontakt kan ske från arbetsgivare. Företaget vill veta ifall kategoriseringen kan göras med en annan metod samt förbättras. Målet med examensarbetet är att undersöka ifall filtreringen kan göras med högre träffsäkerhet med hjälp av maskininlärning. Tre övervakade maskininlärningsalgoritmer, Naïve Bayes, Support Vector Machine (SVM) och Decision Tree, har granskats och algoritmen med de högsta resultaten har jämförts med Auranests befintliga filter. Träffsäkerhet, precision, känslighet och F1-poäng har använts för att avgöra vilken maskininlärningsalgoritm som gav högst resultat sinsemellan samt i jämförelse med Auranests filter. Resultatet påvisade att den övervakade maskininlärningsmetoden SVM åstadkom de främsta resultaten i samtliga mätvärden. Jämförelsen mellan Auranests befintliga filter och SVM visade att SVM presterade bättre i alla kalkylerade mätvärden, där träffsäkerheten visade 99,5% för SVM och 93,03% för Auranests filter. De jämförande resultaten visade att träffsäkerheten var den enda faktorn som gav liknande resultat. För de övriga mätvärdena var det en märkbar skillnad. Classification categorization e-mails preprocessing TF-IDF machine learning supervised learning Naïve Bayes Support Vector Machine Decision Tree Klassificering kategorisering e-postmeddelanden förbehandling av data TF-IDF maskininlärning övervakad inlärning Naïve Bayes Support Vector Machine Decision Tree Computer Sciences Datavetenskap (datalogi)

Search results