Global ETD Search

1	Skräppost eller skinka? : En jämförande studie av övervakade maskininlärningsalgoritmer för spam och ham e-mailklassifikation / Spam or ham? : A comparative study of monitored machine learning algorithms for spam and ham e-mail classification. Bergens, Simon, Frykengård, Pontus January 2019 (has links) Spam messages in the form of e-mail is a growing problem in today's businesses. It is a problem that costs time and resources to counteract. Research into this has been done to produce techniques and tools aimed at addressing the growing number on incoming spam e-mails. The research on different algorithms and their ability to classify e-mail messages needs an update since both tools and spam e-mails have become more advanced. In this study, three different machine learning algorithms have been evaluated based on their ability to correctly classify e-mails as legitimate or spam. These algorithms are naive Bayes, support vector machine and decision tree. The algorithms are tested in an experiment with the Enron spam dataset and are then compared against each other in their performance. The result of the experiment was that support vector machine is the algorithm that correctly classified most of the data points. Even though support vector machine has the largest percentage of correctly classified data points, other algorithms can be useful from a business perspective depending on the task and context. Maskininlärning Spam e-mail Textklassificering Spam e-mailklassificering Övrig annan teknik
2	Exponerade hatkommentarer : En studie av svensk hatkommentarsklassificering Johansson, Kim January 2016 (has links) I detta arbete presenteras hatfulla kommentarer på internet som ett sam- hällsproblem som vi bör göra något åt. Webbplatsen Exponerat.net presenteras som en källa till hatfulla kommentarer. Med hjälp av ett förenklande antagande om att de kommentarer som finns på Exponerat kan utgöra en god representation för hatfulla kommentarer på internet konstruerar vi en klassificerare. Klassificeraren utvärderas i två steg; det ena med hjälp av tiofaldig korsvalidering och det andra manuellt. Klassificeraren uppvisar acceptabla precision/recall-värden i det första utvärderingssteget men faller kort i det manuella. Arbetet avslutas med en diskussion om rimligheten i det förenklande antagandet att använda en enda källa. / Hate speech on the internet is a serious issue. This study asks the question: "Is it possible to use machine learning to do something about it?". By using crawled comments from the blog Exponerat.net as a representation of “hate” and comments from the blog Feber.se as “not-hate” we try to construct a classifier. Evaluation in done in two steps; one using 10-fold cross validation and one using manual evaluation methods. The classifier produces an acceptable result in the first step but falls short in the second. The study ends with discussions about if it is even possible to train a classifier using only one source of data. exponerat.net hatkommentarer SVM textklassificering blogg näthat
3	Comparative Study of the Combined Performance of Learning Algorithms and Preprocessing Techniques for Text Classification Grancharova, Mila, Jangefalk, Michaela January 2018 (has links) With the development in the area of machine learning, society has become more dependent on applications that build on machine learning techniques. Despite this, there are extensive classification tasks which are still performed by humans. This is time costly and often results in errors. One application in machine learning is text classification which has been researched a lot the past twenty years. Text classification tasks can be automated through the machine learning technique supervised learning which can lead to increased performance compared to manual classification. When handling text data, the data often has to be preprocessed in different ways to assure a good classification. Preprocessing techniques have been shown to increase performance of text classification through supervised learning. Different processing techniques affect the performance differently depending on the choice of learning algorithm and characteristics of the data set. This thesis investigates how classification accuracy is affected by different learning algorithms and different preprocessing techniques for a specific customer feedback data set. The researched algorithms are Naïve Bayes, Support Vector Machine and Decision Tree. The research is done by experiments with dependency on algorithm and combinations of preprocessing techniques. The results show that spelling correction and removing stop words increase the accuracy for all classifiers while stemming lowers the accuracy for all classifiers. Furthermore, Decision Tree was most positively affected by preprocessing while Support Vector Machine was most negatively affected. A deeper study on why the preprocessing techniques affected the algorithms in such a way is recommended for future work. / I och med utvecklingen inom området maskininlärning har samhället blivit mer beroende av applikationer som bygger på maskininlärningstekniker. Trots detta finns omfattande klassificeringsuppgifter som fortfarande utförs av människor. Detta är tidskrävande och resulterar ofta i olika typer av fel. En uppgift inom maskininlärning är textklassificering som har forskats mycket i de senaste tjugo åren. Textklassificering kan automatiseras genom övervakad maskininlärningsteknik vilket kan leda till effektiviseringar jämfört med manuell klassificering. Ofta måste textdata förbehandlas på olika sätt för att säkerställa en god klassificering. Förbehandlingstekniker har visat sig öka textklassificeringens prestanda genom övervakad inlärning. Olika förbetningstekniker påverkar prestandan olika beroende på valet av inlärningsalgoritm och egenskaper hos datamängden. Denna avhandling undersöker hur klassificeringsnoggrannheten påverkas av olika inlärningsalgoritmer och olika förbehandlingstekniker för en specifik datamängd som utgörs av kunddata. De undersökta algoritmerna är naïve Bayes, supportvektormaskin och beslutsträd. Undersökningen görs genom experiment med beroende av algoritm och kombinationer av förbehandlingstekniker. Resultaten visar att stavningskorrektion och borttagning av stoppord ökar noggrannheten för alla klassificerare medan stämming sänker noggrannheten för alla. Decision Tree var dessutom mest positivt påverkad av de olika förbehandlingsmetoderna medan Support Vector Machine påverkades mest negativt. En djupare studie om varför förbehandlingsresultaten påverkat algoritmerna på ett sådant sätt rekommenderas för framtida arbete. Computer and Information Sciences Data- och informationsvetenskap
4	Comparing Text Classification Libraries in Scala and Python : A comparison of precision and recall Garamvölgyi, Filip, Henning Bruce, August January 2021 (has links) In today’s internet era, more text than ever is being uploaded online. The text comes in many forms, such as social media posts, business reviews, and many more. For various reasons, there is an interest in analyzing the uploaded text. For instance, an airline business could ask their customers to review the service they have received. The feedback would be collected by asking the customer to leave a review and a score. A common scenario is a review with a good score that contains negative aspects. It is preferable to avoid a situation where the entirety of the review is regarded as positive because of the score if there are negative aspects mentioned. A solution to this would be to analyze each sentence of a review and classify it by negative, neutral or, positive depending on how the sentence is perceived. With the amount of text uploaded today, it is not feasible to manually analyze text. To automatically classify text by a set of criteria is called text classification. The process of specifically classifying text by how it is perceived is a subcategory of text classification known as sentiment analysis. Positive, neutral and, negative would be the sentiments to classify. The most popular frameworks associated with the implementation of sentiment analyzers are developed in the programming language Python. However, over the years, text classification has had an increase in popularity. The increase in popularity has caused new frameworks to be developed in new programming languages. Scala is one of the programming languages that has had new frameworks developed to work with sentiment analysis. However, in comparison to Python, it has fewer available resources. Python has more available libraries to work with, available documentation, and community support online. There are even fewer resources regarding sentiment analysis in a less common language such as Swedish. The problem is no one has compared a sentiment analyzer for Swedish text implemented using Scala and compared it to Python. The purpose of this thesis is to compare recall and precision of a sentiment analyzer implemented in Scala to Python. The goal of this thesis is to increase the knowledge regarding the state of text classification for less common natural languages in Scala. To conduct the study, a qualitative approach with the support of quantitative data was used. Two kinds of sentiment analyzers were implemented in Scala and Python. The first classified text as either positive or negative (binary sentiment analysis), the second sentiment analyzer would also classify text as neutral (multiclass sentiment analysis). To perform the comparative study, the implemented analyzers would perform classification on text with known sentiments. The quality of the classifications was measured using their F1-score. The results showed that Python had better recall and quality for both tasks. In the binary task, there was not as large of a difference between the two implementations. The resources from Python were more specialized for Swedish and did not seem to be as affected by the small dataset used as the resources in Scala. Scala had an F1-score of 0.78 for binary sentiment analysis and 0.65 for multiclass sentiment analysis. Python had an F1-score of 0.83 for binary sentiment analysis and 0.78 for multiclass sentiment analysis. / I dagens internetera laddas mer text upp än någonsin online. Texten finns i många former, till exempel inlägg på sociala medier, företagsrecensioner och många fler. Av olika skäl finns det ett intresse av att analysera den uppladdade texten. Till exempel kan ett flygbolag be sina kunder att lämna omdömen om tjänsten de nyttjat. Feedbacken samlas in genom att be kunden lämna ett omdöme och ett betyg. Ett vanligt scenario är en recension med ett bra betyg som innehåller negativa aspekter. Det är att föredra att undvika en situation där hela recensionen anses vara positiv på grund av poängen, om det nämnts negativa aspekter. En lösning på detta skulle vara att analysera varje mening i en recension och klassificera den som negativ, neutral eller positiv beroende på hur meningen uppfattas. Med den mängd text som laddas upp idag är det inte möjligt att manuellt analysera text. Att automatiskt klassificera text efter en uppsättning kriterier kallas textklassificering. Processen att specifikt klassificera text efter hur den uppfattas är en underkategori av textklassificering som kallas sentimentanalys. Positivt, neutralt och negativt skulle vara sentiment att klassificera. De mest populära ramverken för implementering av sentimentanalysatorer utvecklas i programmeringsspråket Python. Men genom åren har textklassificering ökat i popularitet. Ökningen i popularitet har gjort att nya ramverk utvecklats för nya programmeringsspråk. Scala är ett av programmeringsspråken som har utvecklat nya ramverk för att arbeta med sentimentanalys. I jämförelse med Python har den dock mindre tillgängliga resurser. Python har mer bibliotek, dokumentation och mer stöd online. Det finns ännu färre resurser när det gäller sentimentanalyser på ett mindre vanligt språk som svenska. Problemet är att ingen har jämfört en sentimentanalysator för svensk text implementerad med Scala och jämfört den med Python. Syftet med denna avhandling är att jämföra precision och recall på en sentimentanalysator implementerad i Scala med Python. Målet med denna avhandling är att öka kunskapen om tillståndet för textklassificering för mindre vanliga naturliga språk i Scala. För att genomföra studien användes ett kvalitativt tillvägagångssätt med stöd av kvantitativa data. Två typer av sentimentanalysatorer implementerades i Scala och Python. Den första klassificerade texten som antingen positiv eller negativ (binär sentimentanalys), den andra sentimentanalysatorn skulle också klassificera text som neutral (sentimentanalys i flera klasser). För att utföra den jämförande studien skulle de implementerade analysatorerna utföra klassificering på text med kända sentiment. Klassificeringarnas kvalitet mättes med deras F1-poäng. Resultaten visade att Python hade bättre precision och recall för båda uppgifterna. I den binära uppgiften var det inte lika stor skillnad mellan de två implementeringarna. Resurserna från Python var mer specialiserade för svenska och verkade inte påverkas lika mycket av den lilla dataset som används som resurserna i Scala. Scala hade ett F1-poäng på 0,78 för binär sentimentanalys och 0,65 för sentimentanalys i flera klasser. Python hade ett F1-poäng på 0,83 för binär sentimentanalys och 0,78 för sentimentanalys i flera klasser. LaBSE Spark NLP NLP Text classification Scala LaBSE Spark NLP NLP Textklassificering Scala Computer and Information Sciences Data- och informationsvetenskap
5	Extending a Text Classifier to Multiple Languages / Utöka en textklassificeringsmodell till flera språk Byström, Albin January 2021 (has links) This thesis explores the possibility to extend monolingual and bilingual text classifiers to multiple languages. Two different language models are explored, language aligned word embeddings and a transformer model. The goal was to take a classifier based on Swedish and English samples and extend it to Danish, German, and Finnish samples. The result shows that extending a text classifier by word embeddings alignment or by finetuning a multilingual transformer model is possible but with varying accuracy depending on the language. / Denna avhandling undersöker möjligheten att utvidga enspråkiga och tvåspråkiga textklassificatorer till flera språk. Två olika språkmodeller utforskas, justeras ordinbäddningar och en transformatormodell. Målet var att ta en klassificerare baserad på svenska och engelska texter och utvidga den till danska, tyska och finska texter. Resultatet visar att det är möjligt att utöka en textklassificering med ordinbäddning eller genom att finjustera en flerspråkig transformatormodell, men träffsäkerheten varierar beroende på språk. Natural language processing Multilingual Transformer Word embeddings Text classification Språkteknologi Flerspråkig Transformator Ordinbäddningar Textklassificering Computer and Information Sciences Data- och informationsvetenskap
6	Deep Learning för klassificering av kundsupport-ärenden Jonsson, Max January 2020 (has links) Företag och organisationer som tillhandahåller kundsupport via e-post kommer över tid att samla på sig stora mängder textuella data. Tack vare kontinuerliga framsteg inom Machine Learning ökar ständigt möjligheterna att dra nytta av tidigare insamlat data för att effektivisera organisationens framtida supporthantering. Syftet med denna studie är att analysera och utvärdera hur Deep Learning kan användas för att automatisera processen att klassificera supportärenden. Studien baseras på ett svenskt företags domän där klassificeringarna sker inom företagets fördefinierade kategorier. För att bygga upp ett dataset extraherades supportärenden inkomna via e-post (par av rubrik och meddelande) från företagets supportdatabas, där samtliga ärenden tillhörde en av nio distinkta kategorier. Utvärderingen gjordes genom att analysera skillnaderna i systemets uppmätta precision då olika metoder för datastädning användes, samt då de neurala nätverken byggdes upp med olika arkitekturer. En avgränsning gjordes att endast undersöka olika typer av Convolutional Neural Networks (CNN) samt Recurrent Neural Networks (RNN) i form av både enkel- och dubbelriktade Long Short Time Memory (LSTM) celler. Resultaten från denna studie visar ingen ökning i precision för någon av de undersökta datastädningsmetoderna. Dock visar resultaten att en begränsning av den använda ordlistan heller inte genererar någon negativ effekt. En begränsning av ordlistan kan fortfarande vara användbar för att minimera andra effekter så som exempelvis träningstiden, och eventuellt även minska risken för överanpassning. Av de undersökta nätverksarkitekturerna presterade CNN bättre än RNN på det använda datasetet. Den mest gynnsamma nätverksarkitekturen var ett nätverk med en konvolution per pipeline som för två olika test-set genererade precisioner på 79,3 respektive 75,4 procent. Resultaten visar också att några kategorier är svårare för nätverket att klassificera än andra, eftersom dessa inte är tillräckligt distinkta från resterande kategorier i datasetet. / Companies and organizations providing customer support via email will over time grow a big corpus of text documents. With advances made in Machine Learning the possibilities to use this data to improve the customer support efficiency is steadily increasing. The aim of this study is to analyze and evaluate the use of Deep Learning methods for automizing the process of classifying support errands. This study is based on a Swedish company’s domain where the classification was made within the company’s predefined categories. A dataset was built by obtaining email support errands (subject and body pairs) from the company’s support database. The dataset consisted of data belonging to one of nine separate categories. The evaluation was done by analyzing the alteration in classification accuracy when using different methods for data cleaning and by using different network architectures. A delimitation was set to only examine the effects by using different combinations of Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN) in the shape of both unidirectional and bidirectional Long Short Time Memory (LSTM) cells. The results of this study show no increase in classification accuracy by any of the examined data cleaning methods. However, a feature reduction of the used vocabulary is proven to neither have any negative impact on the accuracy. A feature reduction might still be beneficial to minimize other side effects such as the time required to train a network, and possibly to help prevent overfitting. Among the examined network architectures CNN were proven to outperform RNN on the used dataset. The most accurate network architecture was a single convolutional network which on two different test sets reached classification rates of 79,3 and 75,4 percent respectively. The results also show some categories to be harder to classify than others, due to them not being distinct enough towards the rest of the categories in the dataset. Natural Language Processing Text Classification Convolutional Neural Network Long Short Time Memory Naturlig språkbehandling Textklassificering Convolutional Neural Network Long Short Time Memory Computer Sciences Datavetenskap (datalogi)
7	Klassificering av kvitton med hjälp av maskininlärning Enerstrand, Simon January 2019 (has links) Maskininlärning nyttjas inom fler och fler områden. Det har potential att ersätta många repetitiva arbetsuppgifter, eller åtminstone förenkla dem. Dokumenthantering inom ekonomisystem är ett område maskininlärning kan hjälpa till med. Det behövs ofta mycket manuell input i olika fält genom att avläsa fakturor eller kvitton. Målet med projektet är att skapa en applikation som nyttjar maskininlärning åt företaget Centsoft AB. Applikationen ska ta emot OCR-tolkad textmassa från en bild på ett kvitto och sedan, med hög säkerhet, kunna avgöra vilken kategori kvittot tillhör. Den här rapporten syftar till att visa utvecklingen av maskininlärningsmodellen i applikationen. Rapporten svarar på frågeställningen: ”Hur kan kvitton klassificeras med hjälp av maskininlärning?”.Undersökningsmetoden fallstudie och projektmetoden MoSCoW tillämpas i projektet. Projektet tar även hänsyn till åtagandetriangeln. Maskininlärningsramverk används för att utvärdera den upptränade modellen. Den tränade modellen klarar av att, med hög säkerhet, tolka kvitton den inte stött på tidigare. För att få en meningsfull tolkning måste kvitton ha i avsikt att tillhöra någon av de åtta tränade kategorierna.Valet av metoder passade bra till projektet för att besvara frågeställningen. Applikationen kan utvecklas vidare och implementeras i fakturahanteringssystemet. Genomförandet av projektet ger kunskap att arbeta med maskininlärningslösningar. Tekniken kan i framtiden appliceras på flera områden. / Machine learning is used in more and more areas. It has the potential to replace many repetitive tasks, or at least simplify them. Document management within financial systems is an area machine learning can help with. A lot of manual input is often needed in different fields by reading invoices or receipts. The goal of the project is to create an application that uses machine learning for the company Centsoft AB. The application should receive OCR-interpreted texts from an image of a receipt and then, with high certainty, be able to determine which category the receipt belongs to. This report aims to show the development of the machine learning model in the application. The report answers the question: "How can receipts be classified using machine learning?".The methodology case study and the research method MoSCoW will be applied during the project. The project also considers the triangle method described by Eklund. Machine learning frameworks are used to evaluate the trained model. The trained model can, with high certainty, interpret receipts it has not encountered before. In order to get a meaningful interpretation, receipts must have the intention of belonging to one of the eight trained categories.The choice of methods suited the project well to answer the question. The application can be further developed and be implemented in the invoice management system. The implementation of the project gives knowledge about how to work with machine learning solutions. In the future, the technology can be applied in several areas. Computer and Information Sciences Data- och informationsvetenskap
8	Automatic Classification of Conditions for Grants in Appropriation Directions of Government Agencies Wallerö, Emma January 2022 (has links) This study explores the possibilities of classifying language as governing or not. The ground premise is to examine how detecting and quantifying governing conditions from thousands of financial grants in appropriation directions can be performed automatically, as well as creating a data set to perform machine learning for this text classification task. In this study, automatic classification is performed along with an annotation process extracting and labelling data. Automatic classification can be performed by using a variety of data, methods and tasks. The classification task aims to mainly divide conditions into being governing of the conducting of the specific agency or not. The data consists of text from the specific chapter in the appropriation directions regarding financial grants. The text is split into sentences, keeping only sentences longer than 15 words. An iterative annotation process is then performed in order to receive labelled conditions, involving three expert annotators for the final data set, and laymen annotations for initial experiments. Given the data extracted from the annotation process, SVM, BiLSTM and KB-BERT classifiers are trained and evaluated. All models are evaluated using no context information, with bullet points as an exception, where a previous, generally descriptive sentence is included. Apart from this default input representation type, context regarding preceding sentence along with the target sentence, as well as adding specific agency to the target sentence are evaluated as alternative data representation types. The final inter-annotator agreement was not optimal with Cohen’s Kappa scores that can be interpreted as representing moderate agreement. By using majority vote for the test set, the non-optimal agreement was somewhat prevented for this specific set. The best performing model all input representation types considered was the KB-BERT using no context information, receiving an F1-score on 0.81 and an accuracy score on 0.89 on the test set. All models gave a better performance for sentences classed as governing, which might be partially due to the final annotated data sets being skewed. Possible future studies include further iterative annotation and working towards a clear and as objective definition of how a governing condition can be defined, as well as exploring the possibilities of using data augmentation to counteract the uneven distribution of classes in the final data sets. Text classification annotation SVM BiLSTM Transformers BERT Textklassificering annotering annotation SVM BiLSTM Transformers BERT
9	Multilabel text classification of public procurements using deep learning intent detection / Textklassificering av offentliga upphandlingar med djupa artificiella neuronnät och avsåtsdetektering Suta, Adin January 2019 (has links) Textual data is one of the most widespread forms of data and the amount of such data available in the world increases at a rapid rate. Text can be understood as either a sequence of characters or words, where the latter approach is the most common. With the breakthroughs within the area of applied artificial intelligence in recent years, more and more tasks are aided by automatic processing of text in various applications. The models introduced in the following sections rely on deep-learning sequence-processing in order to process and text to produce a regression algorithm for classification of what the text input refers to. We investigate and compare the performance of several model architectures along with different hyperparameters. The data set was provided by e-Avrop, a Swedish company which hosts a web platform for posting and bidding of public procurements. It consists of titles and descriptions of Swedish public procurements posted on the website of e-Avrop, along with the respective category/categories of each text. When the texts are described by several categories (multi label case) we suggest a deep learning sequence-processing regression algorithm, where a set of deep learning classifiers are used. Each model uses one of the several labels in the multi label case, along with the text input to produce a set of text - label observation pairs. The goal becomes to investigate whether these classifiers can carry out different levels of intent, an intent which should theoretically be imposed by the different training data sets used by each of the individual deep learning classifiers. / Data i form av text är en av de mest utbredda formerna av data och mängden tillgänglig textdata runt om i världen ökar i snabb takt. Text kan tolkas som en följd av bokstäver eller ord, där tolkning av text i form av ordföljder är absolut vanligast. Genombrott inom artificiell intelligens under de senaste åren har medfört att fler och fler arbetsuppgifter med koppling till text assisteras av automatisk textbearbetning. Modellerna som introduceras i denna uppsats är baserade på djupa artificiella neuronnät med sekventiell bearbetning av textdata, som med hjälp av regression förutspår tillhörande ämnesområde för den inmatade texten. Flera modeller och tillhörande hyperparametrar utreds och jämförs enligt prestanda. Datamängden som använts är tillhandahållet av e-Avrop, ett svenskt företag som erbjuder en webbtjänst för offentliggörande och budgivning av offentliga upphandlingar. Datamängden består av titlar, beskrivningar samt tillhörande ämneskategorier för offentliga upphandlingar inom Sverige, tagna från e-Avrops webtjänst. När texterna är märkta med ett flertal kategorier, föreslås en algoritm baserad på ett djupt artificiellt neuronnät med sekventiell bearbetning, där en mängd klassificeringsmodeller används. Varje sådan modell använder en av de märkta kategorierna tillsammans med den tillhörande texten, som skapar en mängd av text - kategori par. Målet är att utreda huruvida dessa klassificerare kan uppvisa olika former av uppsåt som teoretiskt sett borde vara medfört från de olika datamängderna modellerna mottagit. Natural language processing text classification deep learning applied mathematics recurrent neural network word embedding Maskininlärning textklassificering artificiella neruonnät tillämpad matematik Probability Theory and Statistics Sannolikhetsteori och statistik
10	ML enhanced interpretation of failed test result Pechetti, Hiranmayi January 2023 (has links) This master thesis addresses the problem of classifying test failures in Ericsson AB’s BAIT test framework, specifically distinguishing between environment faults and product faults. The project aims to automate the initial defect classification process, reducing manual work and facilitating faster debugging. The significance of this problem lies in the potential time and cost savings it offers to Ericsson and other companies utilizing similar test frameworks. By automating the classification of test failures, developers can quickly identify the root cause of an issue and take appropriate action, leading to improved efficiency and productivity. To solve this problem, the thesis employs machine learning techniques. A dataset of test logs is utilized to evaluate the performance of six classification models: logistic regression, support vector machines, k-nearest neighbors, naive Bayes, decision trees, and XGBoost. Precision and macro F1 scores are used as evaluation metrics to assess the models’ performance. The results demonstrate that all models perform well in classifying test failures, achieving high precision values and macro F1 scores. The decision tree and XGBoost models exhibit perfect precision scores for product faults, while the naive Bayes model achieves the highest macro F1 score. These findings highlight the effectiveness of machine learning in accurately distinguishing between environment faults and product faults within the Bait framework. Developers and organizations can benefit from the automated defect classification system, reducing manual effort and expediting the debugging process. The successful application of machine learning in this context opens up opportunities for further research and development in automated defect classification algorithms. / Detta examensarbete tar upp problemet med att klassificera testfel i Ericsson AB:s BAIT-testramverk, där man specifikt skiljer mellan miljöfel och produktfel. Projektet syftar till att automatisera den initiala defekten klassificeringsprocessen, vilket minskar manuellt arbete och underlättar snabbare felsökning. Betydelsen av detta problem ligger i de potentiella tids- och kostnadsbesparingarna det erbjuder till Ericsson och andra företag som använder liknande testramar. Förbi automatisera klassificeringen av testfel, kan utvecklare snabbt identifiera grundorsaken till ett problem och vidta lämpliga åtgärder, vilket leder till förbättrad effektivitet och produktivitet. För att lösa detta problem använder avhandlingen maskininlärningstekniker. A datauppsättning av testloggar används för att utvärdera prestandan för sex klassificeringar modeller: logistisk regression, stödvektormaskiner, k-närmaste grannar, naiva Bayes, beslutsträd och XGBoost. Precision och makro F1 poäng används som utvärderingsmått för att bedöma modellernas prestanda. Resultaten visar att alla modeller presterar bra i klassificeringstest misslyckanden, uppnå höga precisionsvärden och makro F1-poäng. Beslutet tree- och XGBoost-modeller uppvisar perfekta precision-spoäng för produktfel, medan den naiva Bayes-modellen uppnår högsta makro F1-poäng. Dessa resultat belyser effektiviteten av maskininlärning när det gäller att exakt särskilja mellan miljöfel och produktfel inom Bait-ramverket. Utvecklare och organisationer kan dra nytta av den automatiska defektklassificeringen system, vilket minskar manuell ansträngning och påskyndar felsöknings-processen. De framgångsrik tillämpning av maskininlärning i detta sammanhang öppnar möjligheter för vidare forskning och utveckling inom automatiserade defektklassificeringsalgoritmer. Data Parsing Machine Learning Log file Analysis Text Classification Supervised Classification Dataanalys maskininlärning loggfilsanalys textklassificering Övervakad klassificering Computer and Information Sciences Data- och informationsvetenskap

Search results