Global ETD Search

21	Machine Learning Algorithms to Predict Cost Account Codes in an ERP System : An Exploratory Case Study Wirdemo, Alexander January 2023 (has links) This study aimed to investigate how Machine Learning (ML) algorithms can be used to predict the cost account code to be used when handling invoices in an Enterprise Resource Planning (ERP) system commonly found in the Swedish public sector. This implied testing which one of the tested algorithms that performs the best and what criteria that need to be met in order to perform the best. Previous studies on ML and its use in invoice classification have focused on either the accounts payable side or the accounts receivable side of the balance sheet. The studies have used a variety of methods, some not only involving common ML algorithms such as Random forest, Naïve Bayes, Decision tree, Support Vector Machine, Logistic regression, Neural network or k-nearest Neighbor but also other classifiers such as rule classifiers and naïve classifiers. The general conclusion from previous studies is that several algorithms can classify invoices with a satisfactory accuracy score and that Random forest, Naïve Bayes and Neural network have shown the most promising results. The study was performed as an exploratory case study. The case company was a small municipal community where the finance clerks handles received invoices through an ERP system. The accounting step of invoice handling involves selecting the proper cost account code before submitting the invoice for review and approval. The data used was invoice summaries holding the organization number, bankgiro, postgiro and account code used. The algorithms selected for the task were the supervised learning algorithms Random forest and Naïve Bayes and the instance-based algorithm k-Nearest Neighbor (k-NN). The findings indicated that ML could be used to predict which cost account code to be used by providing a pre-filled suggestion when the clerk opens the invoice. Among the algorithms tested, Random forest performed the best with 78% accuracy (Naïve Bayes and k-NN performed at 69% and 70% accuracy, respectively). One reason for this is Random forest’s ability to handle several input variables, generate an unbiased estimate of the generalization error, and its ability to give information about the relationship between the variables and classification. However, a high level of support is needed in order to get the algorithm to perform at its best, where 335 occurrences is a guiding number in this case. / Syftet med denna studie var att undersöka hur Machine Learning (ML) algoritmer kan användas för att förutsäga vilken kontokod som ska användas vid hantering av fakturor i ett affärssystem som är vanligt förekommande i svensk offentlig sektor. Detta innebar att undersöka vilken av de testade algoritmerna som presterar bäst och vilka kriterier som måste uppfyllas för att prestera bäst. Tidigare studier om ML och dess användning vid fakturaklassificering har fokuserat på antingen balansräkningens leverantörsreskontra (leverantörsskulder) eller kundreskontrasidan (kundfordringar) i balansräkningen. Studierna har använt olika metoder, några involverar inte bara vanliga ML-algoritmer som Random forest, Naive Bayes, beslutsträd, Support Vector Machine, Logistisk regression, Neuralt nätverk eller k-nearest Neighbour, utan även andra klassificerare som regelklassificerare och naiva klassificerare. Den generella slutsatsen från tidigare studier är att det finns flera algoritmer som kan klassificera fakturor med en tillfredsställande noggrannhet, och att Random forest, Naive Bayes och neurala nätverk har visat de mest lovande resultaten. Studien utfördes som en explorativ fallstudie. Fallföretaget var en mindre kommun där ekonomiassistenter hanterar inkommande fakturor genom ett affärssystem. Bokföringssteget för fakturahantering innebär att användaren väljer rätt kostnadskontokod innan fakturan skickas för granskning och godkännande. Uppgifterna som användes var fakturasammandrag med organisationsnummer, bankgiro, postgiro och kontokod. Algoritmerna som valdes för uppgiften var de övervakade inlärningsalgoritmerna Random forest och Naive Bayes och den instansbaserade algoritmen k-Nearest Neighbour. Resultaten tyder på att ML skulle kunna användas för att förutsäga vilken kostnadskod som ska användas genom att ge ett förifyllt förslag när expediten öppnar fakturan. Bland de testade algoritmerna presterade Random forest bäst med 78 % noggrannhet (Naïve Bayes och k-Nearest Neighbour presterade med 69 % respektive 70 % noggrannhet). En förklaring till detta är Random forests förmåga att hantera flera indatavariabler, generera en opartisk skattning av generaliseringsfelet och dess förmåga att ge information om sambandet mellan variablerna och klassificeringen. Det krävs dock en högt antal dataobservationer för att få algoritmen att prestera som bäst, där 335 förekomster är ett minimum i detta fall. Artificial Intelligence Machine Learning ERP invoice automation RPA Random forest Naïve Bayes k-Nearest Neighbor Artificiell Intelligens maskinlärning ERP fakturaautomation RPA Random forest Naïve Bayes k-Nearest Neighbor Information Systems
22	Maskininlärning för dokumentklassificering av finansielladokument med fokus på fakturor / Machine Learning for Document Classification of FinancialDocuments with Focus on Invoices Khalid Saeed, Nawar January 2022 (has links) Automatiserad dokumentklassificering är en process eller metod som syftar till att bearbeta ochhantera dokument i digitala former. Många företag strävar efter en textklassificeringsmetodiksom kan lösa olika problem. Ett av dessa problem är att klassificera och organisera ett stort antaldokument baserat på en uppsättning av fördefinierade kategorier.Detta examensarbete syftar till att hjälpa Medius, vilket är ett företag som arbetar med fakturaarbetsflöde, att klassificera dokumenten som behandlas i deras fakturaarbetsflöde till fakturoroch icke-fakturor. Detta har åstadkommits genom att implementera och utvärdera olika klassificeringsmetoder för maskininlärning med avseende på deras noggrannhet och effektivitet för attklassificera finansiella dokument, där endast fakturor är av intresse.I denna avhandling har två dokumentrepresentationsmetoder "Term Frequency Inverse DocumentFrequency (TF-IDF) och Doc2Vec" använts för att representera dokumenten som vektorer. Representationen syftar till att minska komplexiteten i dokumenten och göra de lättare att hantera.Dessutom har tre klassificeringsmetoder använts för att automatisera dokumentklassificeringsprocessen för fakturor. Dessa metoder var Logistic Regression, Multinomial Naïve Bayes och SupportVector Machine.Resultaten från denna avhandling visade att alla klassificeringsmetoder som använde TF-IDF, föratt representera dokumenten som vektorer, gav goda resultat i from av prestanda och noggranhet.Noggrannheten för alla tre klassificeringsmetoderna var över 90%, vilket var kravet för att dennastudie skulle anses vara lyckad. Dessutom verkade Logistic Regression att ha det lättare att klassificera dokumenten jämfört med andra metoder. Ett test på riktiga data "dokument" som flödarin i Medius fakturaarbetsflöde visade att Logistic Regression lyckades att korrekt klassificeranästan 96% av dokumenten.Avslutningsvis, fastställdes Logistic Regression tillsammans med TF-IDF som de övergripandeoch mest lämpliga metoderna att klara av problmet om dokumentklassficering. Dessvärre, kundeDoc2Vec inte ge ett bra resultat p.g.a. datamängden inte var anpassad och tillräcklig för attmetoden skulle fungera bra. / Automated document classification is an essential technique that aims to process and managedocuments in digital forms. Many companies strive for a text classification methodology thatcan solve a plethora of problems. One of these problems is classifying and organizing a massiveamount of documents based on a set of predefined categories.This thesis aims to help Medius, a company that works with invoice workflow, to classify theirdocuments into invoices and non-invoices. This has been accomplished by implementing andevaluating various machine learning classification methods in terms of their accuracy and efficiencyfor the task of financial document classification, where only invoices are of interest. Furthermore,the necessary pre-processing steps for achieving good performance are considered when evaluatingthe mentioned classification methods.In this study, two document representation methods "Term Frequency Inverse Document Frequency (TF-IDF) and Doc2Vec" were used to represent the documents as fixed-length vectors.The representation aims to reduce the complexity of the documents and make them easier tohandle. In addition, three classification methods have been used to automate the document classification process for invoices. These methods were Logistic Regression, Multinomial Naïve Bayesand Support Vector Machine.The results from this thesis indicate that all classification methods used TF-IDF, to represent thedocuments as vectors, give high performance and accuracy. The accuracy of all three classificationmethods is over 90%, which is the prerequisite for the success of this study. Moreover, LogisticRegression appears to cope with this task very easily, since it classifies the documents moreefficiently compared to the other methods. A test of real data flowing into Medius’ invoiceworkflow shows that Logistic Regression is able to correctly classify up to 96% of the data.In conclusion, the Logistic Regression together with TF-IDF is determined to be the overall mostappropriate method out of the other tested methods. In addition, Doc2Vec suffers to providea good result because the data set is not customized and sufficient for the method to workwell. Document classification Text classification Invoices NLP TF-IDF Doc2vec Machine Learning Logistic Regression Multinomial Naïve Bayes Support Vector Machine. Dokumentklassificering Textklassificering Fakturor NLP TF-IDF Doc2vec Maskininlärning Logistic Regression Multinomial Naïve Bayes Support Vector Machine. Computer Sciences Datavetenskap (datalogi)
23	Bimodal adaptive hypermedia and interactive multimedia a web-based learning environment based on Kolb's theory of learning style Salehian, Bahram January 2003 (has links) Mémoire numérisé par la Direction des bibliothèques de l'Université de Montréal. Systèmes tutoriels intelligents Hpermédia adaptatif Multimédia interactif Apprentissage par l'action "SoftPhone" "Naïve Bayes Classifier" Environnement virtuel
24	Sentiment Analysis of Twitter Data Using Machine Learning and Deep Learning Methods Manda, Kundan Reddy January 2019 (has links) Background: Twitter, Facebook, WordPress, etc. act as the major sources of information exchange in today's world. The tweets on Twitter are mainly based on the public opinion on a product, event or topic and thus contains large volumes of unprocessed data. Synthesis and Analysis of this data is very important and difficult due to the size of the dataset. Sentiment analysis is chosen as the apt method to analyse this data as this method does not go through all the tweets but rather relates to the sentiments of these tweets in terms of positive, negative and neutral opinions. Sentiment Analysis is normally performed in 3 ways namely Machine learning-based approach, Sentiment lexicon-based approach, and Hybrid approach. The Machine learning based approach uses machine learning algorithms and deep learning algorithms for analysing the data, whereas the sentiment lexicon-based approach uses lexicons in analysing the data and they contain vocabulary of positive and negative words. The Hybrid approach uses a combination of both Machine learning and sentiment lexicon approach for classification. Objectives: The primary objectives of this research are: To identify the algorithms and metrics for evaluating the performance of Machine Learning Classifiers. To compare the metrics from the identified algorithms depending on the size of the dataset that affects the performance of the best-suited algorithm for sentiment analysis. Method: The method chosen to address the research questions is Experiment. Through which the identified algorithms are evaluated with the selected metrics. Results: The identified machine learning algorithms are Naïve Bayes, Random Forest, XGBoost and the deep learning algorithm is CNN-LSTM. The algorithms are evaluated with respect to the metrics namely precision, accuracy, F1 score, recall and compared. CNN-LSTM model is best suited for sentiment analysis on twitter data with respect to the selected size of the dataset. Conclusion: Through the analysis of results, the aim of this research is achieved in identifying the best-suited algorithm for sentiment analysis on twitter data with respect to the selected dataset. CNN-LSTM model results in having the highest accuracy of 88% among the selected algorithms for the sentiment analysis of Twitter data with respect to the selected dataset. Machine Learning Sentiment Analysis Twitter data Deep Learning Naïve Bayes Twitter Sentiment Analysis Computer Sciences Datavetenskap (datalogi)
25	Approximations of Bayes Classifiers for Statistical Learning of Clusters Ekdahl, Magnus January 2006 (has links) <p>It is rarely possible to use an optimal classifier. Often the classifier used for a specific problem is an approximation of the optimal classifier. Methods are presented for evaluating the performance of an approximation in the model class of Bayesian Networks. Specifically for the approximation of class conditional independence a bound for the performance is sharpened.</p><p>The class conditional independence approximation is connected to the minimum description length principle (MDL), which is connected to Jeffreys’ prior through commonly used assumptions. One algorithm for unsupervised classification is presented and compared against other unsupervised classifiers on three data sets.</p> / Report code: LiU-TEK-LIC 2006:11. Pattern Recognition Stochastic Complexity Naïve Bayes Bayesian Network Classification Clustering Chow-Liu trees Mathematical statistics Matematisk statistik
26	Uso potencial de ferramentas de classificação de texto como assinaturas de comportamentos suicidas : um estudo de prova de conceito usando os escritos pessoais de Virginia Woolf Berni, Gabriela de Ávila January 2018 (has links) A presente dissertação analisa o conteúdo dos diários e cartas de Virginia Woolf para avaliar se um algoritmo de classificação de texto poderia identificar um padrão escrito relacionado aos dois meses anteriores ao suicídio de Virginia Woolf. Este é um estudo de classificação de texto. Comparamos 46 entradas de textos dos dois meses anteriores ao suicídio de Virginia Woolf com 54 textos selecionados aleatoriamente do trabalho de Virginia Woolf durante outro período de sua vida. O texto de cartas e dos diários foi incluído, enquanto livros, romances, histórias curtas e fragmentos de artigos foram excluídos. Os dados foram analisados usando um algoritmo de aprendizagem mecânica Naïve-Bayes. O modelo mostrou uma acurácia de 80,45%, sensibilidade de 69% e especificidade de 91%. A estatística Kappa foi de 0,6, o que significa um bom acordo, e o valor P do modelo foi de 0,003. A Área Sob a curva ROC foi 0,80. O presente estudo foi o primeiro a analisar a viabilidade de um modelo de machine learning, juntamente com dados de texto, a fim de identificar padrões escritos associados ao comportamento suicida nos diários e cartas de um romancista. Nossa assinatura de texto foi capaz de identificar o período de dois meses antes do suicídio com uma alta precisão / The present study analyzes the content of Virginia Woolf’s diaries and letters to assess whether a text classification algorithm could identify written pattern related to the two months previous to Virginia Woolf’s suicide. This is a text classification study. We compared 46 texts entries from the two months previous to Virginia Woolf’s suicide with 54 texts randomly selected from Virginia Woolf’s work during other period of her life. Letters and diaries were included, while books, novels, short stories, and article fragments were excluded. The data was analyzed by using a Naïve-Bayes machine-learning algorithm. The model showed a balanced accuracy of 80.45%, sensitivity of 69%, and specificity of 91%. The Kappa statistic was 0.6, which means a good agreement, and the p value of the model was 0.003. The Area Under the ROC curve was 0.80. The present study was the first to analyze the feasibility of a machine learning model coupled with text data in order to identify written patterns associated with suicidal behavior in the diaries and letters of a novelist. Our text signature was able to identify the period of two months preceding suicide with a high accuracy. Woolf, Virginia, 1882-1941 Transtorno bipolar Suicídio Aprendizado de máquina Manuscritos Bipolar disorder Machine learning Suicide Naïve-Bayes
27	Uso potencial de ferramentas de classificação de texto como assinaturas de comportamentos suicidas : um estudo de prova de conceito usando os escritos pessoais de Virginia Woolf Berni, Gabriela de Ávila January 2018 (has links) A presente dissertação analisa o conteúdo dos diários e cartas de Virginia Woolf para avaliar se um algoritmo de classificação de texto poderia identificar um padrão escrito relacionado aos dois meses anteriores ao suicídio de Virginia Woolf. Este é um estudo de classificação de texto. Comparamos 46 entradas de textos dos dois meses anteriores ao suicídio de Virginia Woolf com 54 textos selecionados aleatoriamente do trabalho de Virginia Woolf durante outro período de sua vida. O texto de cartas e dos diários foi incluído, enquanto livros, romances, histórias curtas e fragmentos de artigos foram excluídos. Os dados foram analisados usando um algoritmo de aprendizagem mecânica Naïve-Bayes. O modelo mostrou uma acurácia de 80,45%, sensibilidade de 69% e especificidade de 91%. A estatística Kappa foi de 0,6, o que significa um bom acordo, e o valor P do modelo foi de 0,003. A Área Sob a curva ROC foi 0,80. O presente estudo foi o primeiro a analisar a viabilidade de um modelo de machine learning, juntamente com dados de texto, a fim de identificar padrões escritos associados ao comportamento suicida nos diários e cartas de um romancista. Nossa assinatura de texto foi capaz de identificar o período de dois meses antes do suicídio com uma alta precisão / The present study analyzes the content of Virginia Woolf’s diaries and letters to assess whether a text classification algorithm could identify written pattern related to the two months previous to Virginia Woolf’s suicide. This is a text classification study. We compared 46 texts entries from the two months previous to Virginia Woolf’s suicide with 54 texts randomly selected from Virginia Woolf’s work during other period of her life. Letters and diaries were included, while books, novels, short stories, and article fragments were excluded. The data was analyzed by using a Naïve-Bayes machine-learning algorithm. The model showed a balanced accuracy of 80.45%, sensitivity of 69%, and specificity of 91%. The Kappa statistic was 0.6, which means a good agreement, and the p value of the model was 0.003. The Area Under the ROC curve was 0.80. The present study was the first to analyze the feasibility of a machine learning model coupled with text data in order to identify written patterns associated with suicidal behavior in the diaries and letters of a novelist. Our text signature was able to identify the period of two months preceding suicide with a high accuracy. Woolf, Virginia, 1882-1941 Transtorno bipolar Suicídio Aprendizado de máquina Manuscritos Bipolar disorder Machine learning Suicide Naïve-Bayes
28	Machine learning in logistics : Increasing the performance of machine learning algorithms on two specific logistic problems / Maskininlärning i logistik : Öka prestandan av maskininlärningsalgoritmer på två specifika logistikproblem. Lind Nilsson, Rasmus January 2017 (has links) Data Ductus, a multination IT-consulting company, wants to develop an AI that monitors a logistic system and looks for errors. Once trained enough, this AI will suggest a correction and automatically right issues if they arise. This project presents how one works with machine learning problems and provides a deeper insight into how cross-validation and regularisation, among other techniques, are used to improve the performance of machine learning algorithms on the defined problem. Three techniques are tested and evaluated in our logistic system on three different machine learning algorithms, namely Naïve Bayes, Logistic Regression and Random Forest. The evaluation of the algorithms leads us to conclude that Random Forest, using cross-validated parameters, gives the best performance on our specific problems, with the other two falling behind in each tested category. It became clear to us that cross-validation is a simple, yet powerful tool for increasing the performance of machine learning algorithms. / Data Ductus, ett multinationellt IT-konsultföretag vill utveckla en AI som övervakar ett logistiksystem och uppmärksammar fel. När denna AI är tillräckligt upplärd ska den föreslå korrigering eller automatiskt korrigera problem som uppstår. Detta projekt presenterar hur man arbetar med maskininlärningsproblem och ger en djupare inblick i hur kors-validering och regularisering, bland andra tekniker, används för att förbättra prestandan av maskininlärningsalgoritmer på det definierade problemet. Dessa tekniker testas och utvärderas i vårt logistiksystem på tre olika maskininlärnings algoritmer, nämligen Naïve Bayes, Logistic Regression och Random Forest. Utvärderingen av algoritmerna leder oss till att slutsatsen är att Random Forest, som använder korsvaliderade parametrar, ger bästa prestanda på våra specifika problem, medan de andra två faller bakom i varje testad kategori. Det blev klart för oss att kors-validering är ett enkelt, men kraftfullt verktyg för att öka prestanda hos maskininlärningsalgoritmer. Machine learning confusion matrix performance random forest naïve bayes logistic regression cross-validation regularisation Computer Sciences Datavetenskap (datalogi)
29	Evaluation of selected data mining algorithms implemented in Medical Decision Support Systems Aftarczuk, Kamila January 2007 (has links) The goal of this master’s thesis is to identify and evaluate data mining algorithms which are commonly implemented in modern Medical Decision Support Systems (MDSS). They are used in various healthcare units all over the world. These institutions store large amounts of medical data. This data may contain relevant medical information hidden in various patterns buried among the records. Within the research several popular MDSS’s are analyzed in order to determine the most common data mining algorithms utilized by them. Three algorithms have been identified: Naïve Bayes, Multilayer Perceptron and C4.5. Prior to the very analyses the algorithms are calibrated. Several testing configurations are tested in order to determine the best setting for the algorithms. Afterwards, an ultimate comparison of the algorithms orders them with respect to their performance. The evaluation is based on a set of performance metrics. The analyses are conducted in WEKA on five UCI medical datasets: breast cancer, hepatitis, heart disease, dermatology disease, diabetes. The analyses have shown that it is very difficult to name a single data mining algorithm to be the most suitable for the medical data. The results gained for the algorithms were very similar. However, the final evaluation of the outcomes allowed singling out the Naïve Bayes to be the best classifier for the given domain. It was followed by the Multilayer Perceptron and the C4.5. Naïve Bayes Multilayer Perceptron C4.5 medical data mining medical decision support Computer Sciences Datavetenskap (datalogi) Software Engineering Programvaruteknik
30	Uso potencial de ferramentas de classificação de texto como assinaturas de comportamentos suicidas : um estudo de prova de conceito usando os escritos pessoais de Virginia Woolf Berni, Gabriela de Ávila January 2018 (has links) A presente dissertação analisa o conteúdo dos diários e cartas de Virginia Woolf para avaliar se um algoritmo de classificação de texto poderia identificar um padrão escrito relacionado aos dois meses anteriores ao suicídio de Virginia Woolf. Este é um estudo de classificação de texto. Comparamos 46 entradas de textos dos dois meses anteriores ao suicídio de Virginia Woolf com 54 textos selecionados aleatoriamente do trabalho de Virginia Woolf durante outro período de sua vida. O texto de cartas e dos diários foi incluído, enquanto livros, romances, histórias curtas e fragmentos de artigos foram excluídos. Os dados foram analisados usando um algoritmo de aprendizagem mecânica Naïve-Bayes. O modelo mostrou uma acurácia de 80,45%, sensibilidade de 69% e especificidade de 91%. A estatística Kappa foi de 0,6, o que significa um bom acordo, e o valor P do modelo foi de 0,003. A Área Sob a curva ROC foi 0,80. O presente estudo foi o primeiro a analisar a viabilidade de um modelo de machine learning, juntamente com dados de texto, a fim de identificar padrões escritos associados ao comportamento suicida nos diários e cartas de um romancista. Nossa assinatura de texto foi capaz de identificar o período de dois meses antes do suicídio com uma alta precisão / The present study analyzes the content of Virginia Woolf’s diaries and letters to assess whether a text classification algorithm could identify written pattern related to the two months previous to Virginia Woolf’s suicide. This is a text classification study. We compared 46 texts entries from the two months previous to Virginia Woolf’s suicide with 54 texts randomly selected from Virginia Woolf’s work during other period of her life. Letters and diaries were included, while books, novels, short stories, and article fragments were excluded. The data was analyzed by using a Naïve-Bayes machine-learning algorithm. The model showed a balanced accuracy of 80.45%, sensitivity of 69%, and specificity of 91%. The Kappa statistic was 0.6, which means a good agreement, and the p value of the model was 0.003. The Area Under the ROC curve was 0.80. The present study was the first to analyze the feasibility of a machine learning model coupled with text data in order to identify written patterns associated with suicidal behavior in the diaries and letters of a novelist. Our text signature was able to identify the period of two months preceding suicide with a high accuracy. Woolf, Virginia, 1882-1941 Transtorno bipolar Suicídio Aprendizado de máquina Manuscritos Bipolar disorder Machine learning Suicide Naïve-Bayes

Search results