Global ETD Search

31	An implementation analysis of a machine learning algorithm on eye-tracking data in order to detect early signs of dementia Lindberg, Jennifer, Siren, Henrik January 2020 (has links) This study aims to investigate whether or not it is possible to use a machine learning algorithm on eye-tracking data in order to detect early signs of Alzheimer’s disease, which is a type of dementia. Early signs of Alzheimer’s are characterized by mild cognitive impairment. In addition to this, patients with mild cognitive impairment fixate more when reading. The eye-tracking data is gathered in trials, conducted by specialist doctors at a hospital, where 24 patients read a text. Furthermore, the data is pre-processed by extracting different features, such as fixations and difficulty levels of the specific passage in the text. Thenceforth, the features are applied in a naïve Bayes machine learning algorithm, implementing so called leave-one-out cross validation, under two separate conditions; using both fixation features and features related to the difficulty of the text and in addition to this, only using fixation features. Finally, the two conditions achieved the same results - with an accuracy of 64%. Thereby, the conclusion was drawn that even though the amount of data samples (patients) was small, the machine learning algorithm could somewhat predict if a patient was at an early stage of Alzheimer’s disease or not, based on eye-tracking data. Additionally, the implementation is further analyzed through the use of a stakeholder analysis, a SWOT-analysis and from an innovation perspective. / Denna studie syftar till att undersöka huruvida det är möjligt att använda en maskininlärningsalgoritm på eye-tracking data för att upptäcka tidiga tecken på Alzheimer’s sjukdom, vilket är en typ av demens. Tidiga tecken på Alzheimer’s karaktäriseras av mild kognitiv nedsättning. Vidare fixerar patienter med en mild kognitiv nedsättning mer när de läser. Eye-tracking data samlas in i undersökningar genomförda av specialistläkare på ett sjukhus, där 24 patienter läser en text. Därefter förbehandlas datan genom att extrahera olika features, såsom fixeringar och svårighetsnivåer på specifika avsnitt i texten. Efter detta appliceras features i en naïve Bayes maskininlärningsalgoritm som implementerar så kallad leave-one- out cross validation under två separata fall; användande av enbart fixerings features samt användandet av både fixerings features och features för svårighetsgrad för olika avsnitt i texten. Slutligen erhölls samma resultat i båda fallen – med en accuracy på 64%. Därav drogs slutsatsen att även om mängden data (antalet patienter) var liten, kunde maskininlärningsalgoritmen till viss del förutse om en patient var i ett tidigt stadie av Alzheimer’s sjukdom eller inte, baserat på eye-tracking data. Dessutom analyseras implementationen vidare med användning av en intressentanalys, en SWOT-analys och från ett innovationsperspektiv. Dementia Eye-tracking Machine Learning naïve Bayes. Computer and Information Sciences Data- och informationsvetenskap
32	Uma investigação empírica e comparativa da aplicação de RNAs ao problema de mineração de opiniões e análise de sentimentos Moraes, Rodrigo de 26 March 2013 (has links) Submitted by Silvana Teresinha Dornelles Studzinski (sstudzinski) on 2015-05-04T17:25:43Z No. of bitstreams: 1 Rodrigo Morais.pdf: 5083865 bytes, checksum: 69563cc7178422ac20ff08fe38ee97de (MD5) / Made available in DSpace on 2015-05-04T17:25:43Z (GMT). No. of bitstreams: 1 Rodrigo Morais.pdf: 5083865 bytes, checksum: 69563cc7178422ac20ff08fe38ee97de (MD5) Previous issue date: 2013 / Nenhuma / A área de Mineração de Opiniões e Análise de Sentimentos surgiu da necessidade de processamento automatizado de informações textuais referentes a opiniões postadas na web. Como principal motivação está o constante crescimento do volume desse tipo de informação, proporcionado pelas tecnologia trazidas pela Web 2.0, que torna inviável o acompanhamento e análise dessas opiniões úteis tanto para usuários com pretensão de compra de novos produtos quanto para empresas para a identificação de demanda de mercado. Atualmente, a maioria dos estudos em Mineração de Opiniões e Análise de Sentimentos que fazem o uso de mineração de dados se voltam para o desenvolvimentos de técnicas que procuram uma melhor representação do conhecimento e acabam utilizando técnicas de classificação comumente aplicadas, não explorando outras que apresentam bons resultados em outros problemas. Sendo assim, este trabalho tem como objetivo uma investigação empírica e comparativa da aplicação do modelo clássico de Redes Neurais Artificiais (RNAs), o multilayer perceptron , no problema de Mineração de Opiniões e Análise de Sentimentos. Para isso, bases de dados de opiniões são definidas e técnicas de representação de conhecimento textual são aplicadas sobre essas objetivando uma igual representação dos textos para os classificadores através de unigramas. A partir dessa reresentação, os classificadores Support Vector Machines (SVM), Naïve Bayes (NB) e RNAs são aplicados considerandos três diferentes contextos de base de dados: (i) bases de dados balanceadas, (ii) bases com diferentes níveis de desbalanceamento e (iii) bases em que a técnica para o tratamento do desbalanceamento undersampling randômico é aplicada. A investigação do contexto desbalanceado e de outros originados dele se mostra relevante uma vez que bases de opiniões disponíveis na web normalmente apresentam mais opiniões positivas do que negativas. Para a avaliação dos classificadores são utilizadas métricas tanto para a mensuração de desempenho de classificação quanto para a de tempo de execução. Os resultados obtidos sobre o contexto balanceado indicam que as RNAs conseguem superar significativamente os resultados dos demais classificadores e, apesar de apresentarem um grande custo computacional para treinamento, proporcionam tempos de classificação significantemente inferiores aos do classificador que apresentou os resultados de classificação mais próximos aos dos resultados das RNAs. Já para o contexto desbalanceado, as RNAs se mostram sensíveis ao aumento de ruído na representação dos dados e ao aumento do desbalanceamento, se destacando nestes experimentos, o classificador NB. Com a aplicação de undersampling as RNAs conseguem ser equivalentes aos demais classificadores apresentando resultados competitivos. Porém, podem não ser o classificador mais adequado de se adotar nesse contexto quando considerados os tempos de treinamento e classificação, e também a diferença pouco expressiva de acerto de classificação. / The area of Opinion Mining and Sentiment Analysis emerges from the need for automated processing of textual information about reviews posted in the web. The main motivation of this area is the constant volume growth of such information, provided by the technologies brought by Web 2.0, that makes impossible the monitoring and analysis of these reviews that are useful for users, who desire to purchase new products, and for companies to identify market demand as well. Currently, the most studies of Opinion Mining and Sentiment Analysis that make use of data mining aims to the development of techniques that seek a better knowledge representation and using classification techniques commonly applied and they not explore others classifiers that work well in other problems. Thus, this work aims a comparative empirical research of the ap-plication of the classical model of Artificial Neural Networks (ANN), the multilayer perceptron, in the Opinion Mining and Sentiment Analysis problem. For this, reviews datasets are defined and techniques for textual knowledge representation applied to these aiming an equal texts rep-resentation for the classifiers. From this representation, the classifiers Support Vector Machines (SVM), Naïve Bayes (NB) and ANN are applied considering three data context: (i) balanced datasets, (ii) datasets with different unbalanced ratio and (iii) datasets with the application of random undersampling technique for the unbalanced handling. The unbalanced context inves-tigation and of others originated from it becomes relevant once datasets available in the web ordinarily contain more positive opinions than negative. For the classifiers evaluation, metrics both for the classification perform and for run time are used. The results obtained in the bal-anced context indicate that ANN outperformed significantly the others classifiers and, although it has a large computation cost for the training fase, the ANN classifier provides classification time (real-time) significantly less than the classifier that obtained the results closer than ANN. For the unbalanced context, the ANN are sensitive to the growth of noise representation and the unbalanced growth while the NB classifier stood out. With the undersampling application, the ANN classifier is equivalent to the others classifiers attaining competitive results. However, it can not be the most appropriate classifier to this context when the training and classification time and its little advantage of classification accuracy are considered. Aprendizado de máquina Classificadores Redes neurais artificiais Análise de sentimentos Mineração de opiniões Support vector machines Naïve bayes Sentiment analysis Opinion mining Machine learning Classifiers Artificial neural networks Support vector machines Naïve bayes
33	Apport des Systèmes Multi-Agent et de la logique floue pour l'assistance au tuteur dans une communauté d'apprentissage en ligne / Contribution of Multi-Agent Systems and Fuzzy logic to support tutors in Learning Communities Chaabi, Youness 11 July 2016 (has links) La place importante du tutorat dans la réussite d'un dispositif de formation en ligne a ouvert un nouvel axe de recherche dans le domaine des EIAH (Environnements Informatiques pour l'Apprentissage Humain). Nos travaux se situent plus particulièrement dans le champ de recherches des ACAO. Dans un contexte collaboratif, le tutorat et les outils « d'awareness » constituent des solutions admises pour faire face à l'isolement qui très souvent, mène à l'abandon de l'apprenant. Ainsi, du fait des difficultés rencontrées par le tuteur pour assurer un encadrement et un suivi appropriés à partir des traces de communication (en quantités conséquentes) laissées par les apprenants, nous proposons une approche multi-agents pour analyser les conversations textuelles asynchrones entre apprenants. Ces interactions sont révélatrices de comportements sociaux-animateur, indépendant, etc... qu'il nous paraît important de pouvoir repérer lors d'une pédagogie de projet pour permettre aux apprenants de situer leurs travaux par rapport aux autres apprenants et situer leur groupe par rapport aux autres groupes d'une part, et d'autre part permettre au tuteur d'accompagner les apprenants dans leur processus d'apprentissage, repérer et soutenir les individus en difficulté pour leur éviter l'abandon. Ces indicateurs seront déduits à partir des grands volumes d'échanges textuels entre apprenants.L'approche a été ensuite testée sur une situation réelle, qui a montré une parfaite concordance entre les résultatsobservés par des tuteurs humains et ceux déterminés automatiquement par notre système. / The growing importance of online training has put emphasis on the role of remote tutoring. A whole new area of research, dedicated to environment for human learning (EHL), is emerging. We are concerned with this field. More specifically, we will focus on the monitoring of learners.The instrumentation and observation of learners activities by exploiting interaction traces in the EHL and the development of indicators can help tutors to monitor activities of learners and support them in their collaborative learning process. Indeed, in a learning situation, the teacher needs to observe the behavior of learners in order to build an idea about their involvement, preferences and learning styles so that he can adapt the proposed activities. As part of the automatic analysis of collaborative learner¿s activities, we describe a multi agent approach for supporting learning activities in a Virtual Learning Environment context. In order to assist teachers who monitor learning processes, viewed as a specific type of collaboration, the proposed system estimates a behavioral (sociological) profile for each student. This estimation is based on automatic analysis of students textual asynchronous conversations. The determined profiles are proposed to the teacher and may provide assistance toteacher during tutoring tasks. The system was experimented with students of the master "software quality" of the Ibn Tofail University. The results obtained show that the proposed approach is effective and gives satisfactory results. Apprentissage collectif Analyse de texte Acte langage Système Multi-Agents Naïve bayésienne Profils sociaux Logique floue CSCL Pédagogie de projet Collaborative Learning Conversation Analysis Fuzzy Logic Multi-Agent System Naïve Bayes Social Profiles 006
34	Categorization of Swedish e-mails using Supervised Machine Learning / Kategorisering av svenska e-postmeddelanden med användning av övervakad maskininlärning Mann, Anna, Höft, Olivia January 2021 (has links) Society today is becoming more digitalized, and a common way of communication is to send e-mails. Currently, the company Auranest has a filtering method for categorizing e-mails, but the method is a few years old. The filter provides a classification of valuable e-mails for jobseekers, where employers can make contact. The company wants to know if the categorization can be performed with a different method and improved. The degree project aims to investigate whether the categorization can be proceeded with higher accuracy using machine learning. Three supervised machine learning algorithms, Naïve Bayes, Support Vector Machine (SVM), and Decision Tree, have been examined, and the algorithm with the highest results has been compared with Auranest's existing filter. Accuracy, Precision, Recall, and F1 score have been used to determine which machine learning algorithm received the highest results and in comparison, with Auranest's filter. The results showed that the supervised machine learning algorithm SVM achieved the best results in all metrics. The comparison between Auranest's existing filter and SVM showed that SVM performed better in all calculated metrics, where the accuracy showed 99.5% for SVM and 93.03% for Auranest’s filter. The comparative results showed that accuracy was the only factor that received similar results. For the other metrics, there was a noticeable difference. / Dagens samhälle blir alltmer digitaliserat och ett vanligt kommunikationssätt är att skicka e-postmeddelanden. I dagsläget har företaget Auranest ett filter för att kategorisera e-postmeddelanden men filtret är några år gammalt. Användningsområdet för filtret är att sortera ut värdefulla e-postmeddelanden för arbetssökande, där kontakt kan ske från arbetsgivare. Företaget vill veta ifall kategoriseringen kan göras med en annan metod samt förbättras. Målet med examensarbetet är att undersöka ifall filtreringen kan göras med högre träffsäkerhet med hjälp av maskininlärning. Tre övervakade maskininlärningsalgoritmer, Naïve Bayes, Support Vector Machine (SVM) och Decision Tree, har granskats och algoritmen med de högsta resultaten har jämförts med Auranests befintliga filter. Träffsäkerhet, precision, känslighet och F1-poäng har använts för att avgöra vilken maskininlärningsalgoritm som gav högst resultat sinsemellan samt i jämförelse med Auranests filter. Resultatet påvisade att den övervakade maskininlärningsmetoden SVM åstadkom de främsta resultaten i samtliga mätvärden. Jämförelsen mellan Auranests befintliga filter och SVM visade att SVM presterade bättre i alla kalkylerade mätvärden, där träffsäkerheten visade 99,5% för SVM och 93,03% för Auranests filter. De jämförande resultaten visade att träffsäkerheten var den enda faktorn som gav liknande resultat. För de övriga mätvärdena var det en märkbar skillnad. Classification categorization e-mails preprocessing TF-IDF machine learning supervised learning Naïve Bayes Support Vector Machine Decision Tree Klassificering kategorisering e-postmeddelanden förbehandling av data TF-IDF maskininlärning övervakad inlärning Naïve Bayes Support Vector Machine Decision Tree Computer Sciences Datavetenskap (datalogi)
35	Ärendehantering genom maskininlärning Bennheden, Daniel January 2023 (has links) Det här examensarbetet undersöker hur artificiell intelligens kan användas för att automatisktkategorisera felanmälan som behandlas i ett ärendehanteringssystem genom att användamaskininlärning och tekniker som text mining. Studien utgår från Design Science ResearchMethodology och Peffers sex steg för designmetodologi som utöver design även berör kravställningoch utvärdering av funktion. Maskininlärningsmodellerna som tagits fram tränades på historiskadata från ärendehanteringssystem Infracontrol Online med fyra typer av olika algoritmer, NaiveBayes, Support Vector Machine, Neural Network och Random Forest. En webapplikation togs framför att demonstrera hur en av de maskininlärningsmodeller som tränats fungerar och kan användasför att kategorisera text. Olika användare av systemet har därefter haft möjlighet att testafunktionen och utvärdera hur den fungerar genom att markera när kategoriseringen avtextprompter träffar rätt respektive fel.Resultatet visar på att det är möjligt att lösa uppgiften med hjälp av maskininlärning. En avgörandedel av utvecklingsarbetet för att göra modellen användbar var urvalet av data som användes för attträna modellen. Olika kunder som använder systemet, använder det på olika sätt, vilket gjorde detfördelaktigt att separera dem och träna modeller för olika kunder individuellt. En källa tillinkonsistenta resultat är hur organisationer förändrar sina processer och ärendehantering över tidoch problemet hanterades genom att begränsa hur långt tillbaka i tiden modellen hämtar data förträning. Dessa två strategier för att hantera problem har nackdelen att den mängd historiska datasom finns tillgänglig att träna modellen på minskar, men resultaten visar inte någon tydlig nackdelför de maskininlärningsmodeller som tränats på mindre datamängder utan även de har en godtagbarträffsäkerhet. / This thesis investigates how artificial intelligence can be used to automatically categorize faultreports that are processed in a case management system by using machine learning and techniquessuch as text mining. The study is based on Design Science Research Methodology and Peffer's sixsteps of design methodology, which in addition to design of an artifact concerns requirements andevaluation. The machine learning models that were developed were trained on historical data fromthe case management system Infracontrol Online, using four types of algorithms, Naive Bayes,Support Vector Machine, Neural Network, and Random Forest. A web application was developed todemonstrate how one of the machine learning models trained works and can be used to categorizetext. Regular users of the system have then had the opportunity to test the performance of themodel and evaluate how it works by marking where it categorizes text prompts correctly.The results show that it is possible to solve the task using machine learning. A crucial part of thedevelopment was the selection of data used to train the model. Different customers using thesystem use it in different ways, which made it advantageous to separate them and train models fordifferent customers independently. Another source of inconsistent results is how organizationschange their processes and thus case management over time. This issue was addressed by limitinghow far back in time the model retrieves data for training. The two strategies for solving the issuesmentioned have the disadvantage that the amount of historical data available for training decreases,but the results do not show any clear disadvantage for the machine learning models trained onsmaller data sets. They perform well and tests show an acceptable level of accuracy for theirpredictions Artificial Intelligence machine learning case management RPA Random forest Naïve Bayes support vector machine neural network Artificiell Intelligens maskinlärning ärendehantering RPA Random forest Naïve Bayes support vector machine neurala nätverk. Information Systems
36	Machine Learning Algorithms to Predict Cost Account Codes in an ERP System : An Exploratory Case Study Wirdemo, Alexander January 2023 (has links) This study aimed to investigate how Machine Learning (ML) algorithms can be used to predict the cost account code to be used when handling invoices in an Enterprise Resource Planning (ERP) system commonly found in the Swedish public sector. This implied testing which one of the tested algorithms that performs the best and what criteria that need to be met in order to perform the best. Previous studies on ML and its use in invoice classification have focused on either the accounts payable side or the accounts receivable side of the balance sheet. The studies have used a variety of methods, some not only involving common ML algorithms such as Random forest, Naïve Bayes, Decision tree, Support Vector Machine, Logistic regression, Neural network or k-nearest Neighbor but also other classifiers such as rule classifiers and naïve classifiers. The general conclusion from previous studies is that several algorithms can classify invoices with a satisfactory accuracy score and that Random forest, Naïve Bayes and Neural network have shown the most promising results. The study was performed as an exploratory case study. The case company was a small municipal community where the finance clerks handles received invoices through an ERP system. The accounting step of invoice handling involves selecting the proper cost account code before submitting the invoice for review and approval. The data used was invoice summaries holding the organization number, bankgiro, postgiro and account code used. The algorithms selected for the task were the supervised learning algorithms Random forest and Naïve Bayes and the instance-based algorithm k-Nearest Neighbor (k-NN). The findings indicated that ML could be used to predict which cost account code to be used by providing a pre-filled suggestion when the clerk opens the invoice. Among the algorithms tested, Random forest performed the best with 78% accuracy (Naïve Bayes and k-NN performed at 69% and 70% accuracy, respectively). One reason for this is Random forest’s ability to handle several input variables, generate an unbiased estimate of the generalization error, and its ability to give information about the relationship between the variables and classification. However, a high level of support is needed in order to get the algorithm to perform at its best, where 335 occurrences is a guiding number in this case. / Syftet med denna studie var att undersöka hur Machine Learning (ML) algoritmer kan användas för att förutsäga vilken kontokod som ska användas vid hantering av fakturor i ett affärssystem som är vanligt förekommande i svensk offentlig sektor. Detta innebar att undersöka vilken av de testade algoritmerna som presterar bäst och vilka kriterier som måste uppfyllas för att prestera bäst. Tidigare studier om ML och dess användning vid fakturaklassificering har fokuserat på antingen balansräkningens leverantörsreskontra (leverantörsskulder) eller kundreskontrasidan (kundfordringar) i balansräkningen. Studierna har använt olika metoder, några involverar inte bara vanliga ML-algoritmer som Random forest, Naive Bayes, beslutsträd, Support Vector Machine, Logistisk regression, Neuralt nätverk eller k-nearest Neighbour, utan även andra klassificerare som regelklassificerare och naiva klassificerare. Den generella slutsatsen från tidigare studier är att det finns flera algoritmer som kan klassificera fakturor med en tillfredsställande noggrannhet, och att Random forest, Naive Bayes och neurala nätverk har visat de mest lovande resultaten. Studien utfördes som en explorativ fallstudie. Fallföretaget var en mindre kommun där ekonomiassistenter hanterar inkommande fakturor genom ett affärssystem. Bokföringssteget för fakturahantering innebär att användaren väljer rätt kostnadskontokod innan fakturan skickas för granskning och godkännande. Uppgifterna som användes var fakturasammandrag med organisationsnummer, bankgiro, postgiro och kontokod. Algoritmerna som valdes för uppgiften var de övervakade inlärningsalgoritmerna Random forest och Naive Bayes och den instansbaserade algoritmen k-Nearest Neighbour. Resultaten tyder på att ML skulle kunna användas för att förutsäga vilken kostnadskod som ska användas genom att ge ett förifyllt förslag när expediten öppnar fakturan. Bland de testade algoritmerna presterade Random forest bäst med 78 % noggrannhet (Naïve Bayes och k-Nearest Neighbour presterade med 69 % respektive 70 % noggrannhet). En förklaring till detta är Random forests förmåga att hantera flera indatavariabler, generera en opartisk skattning av generaliseringsfelet och dess förmåga att ge information om sambandet mellan variablerna och klassificeringen. Det krävs dock en högt antal dataobservationer för att få algoritmen att prestera som bäst, där 335 förekomster är ett minimum i detta fall. Artificial Intelligence Machine Learning ERP invoice automation RPA Random forest Naïve Bayes k-Nearest Neighbor Artificiell Intelligens maskinlärning ERP fakturaautomation RPA Random forest Naïve Bayes k-Nearest Neighbor Information Systems
37	Maskininlärning för dokumentklassificering av finansielladokument med fokus på fakturor / Machine Learning for Document Classification of FinancialDocuments with Focus on Invoices Khalid Saeed, Nawar January 2022 (has links) Automatiserad dokumentklassificering är en process eller metod som syftar till att bearbeta ochhantera dokument i digitala former. Många företag strävar efter en textklassificeringsmetodiksom kan lösa olika problem. Ett av dessa problem är att klassificera och organisera ett stort antaldokument baserat på en uppsättning av fördefinierade kategorier.Detta examensarbete syftar till att hjälpa Medius, vilket är ett företag som arbetar med fakturaarbetsflöde, att klassificera dokumenten som behandlas i deras fakturaarbetsflöde till fakturoroch icke-fakturor. Detta har åstadkommits genom att implementera och utvärdera olika klassificeringsmetoder för maskininlärning med avseende på deras noggrannhet och effektivitet för attklassificera finansiella dokument, där endast fakturor är av intresse.I denna avhandling har två dokumentrepresentationsmetoder "Term Frequency Inverse DocumentFrequency (TF-IDF) och Doc2Vec" använts för att representera dokumenten som vektorer. Representationen syftar till att minska komplexiteten i dokumenten och göra de lättare att hantera.Dessutom har tre klassificeringsmetoder använts för att automatisera dokumentklassificeringsprocessen för fakturor. Dessa metoder var Logistic Regression, Multinomial Naïve Bayes och SupportVector Machine.Resultaten från denna avhandling visade att alla klassificeringsmetoder som använde TF-IDF, föratt representera dokumenten som vektorer, gav goda resultat i from av prestanda och noggranhet.Noggrannheten för alla tre klassificeringsmetoderna var över 90%, vilket var kravet för att dennastudie skulle anses vara lyckad. Dessutom verkade Logistic Regression att ha det lättare att klassificera dokumenten jämfört med andra metoder. Ett test på riktiga data "dokument" som flödarin i Medius fakturaarbetsflöde visade att Logistic Regression lyckades att korrekt klassificeranästan 96% av dokumenten.Avslutningsvis, fastställdes Logistic Regression tillsammans med TF-IDF som de övergripandeoch mest lämpliga metoderna att klara av problmet om dokumentklassficering. Dessvärre, kundeDoc2Vec inte ge ett bra resultat p.g.a. datamängden inte var anpassad och tillräcklig för attmetoden skulle fungera bra. / Automated document classification is an essential technique that aims to process and managedocuments in digital forms. Many companies strive for a text classification methodology thatcan solve a plethora of problems. One of these problems is classifying and organizing a massiveamount of documents based on a set of predefined categories.This thesis aims to help Medius, a company that works with invoice workflow, to classify theirdocuments into invoices and non-invoices. This has been accomplished by implementing andevaluating various machine learning classification methods in terms of their accuracy and efficiencyfor the task of financial document classification, where only invoices are of interest. Furthermore,the necessary pre-processing steps for achieving good performance are considered when evaluatingthe mentioned classification methods.In this study, two document representation methods "Term Frequency Inverse Document Frequency (TF-IDF) and Doc2Vec" were used to represent the documents as fixed-length vectors.The representation aims to reduce the complexity of the documents and make them easier tohandle. In addition, three classification methods have been used to automate the document classification process for invoices. These methods were Logistic Regression, Multinomial Naïve Bayesand Support Vector Machine.The results from this thesis indicate that all classification methods used TF-IDF, to represent thedocuments as vectors, give high performance and accuracy. The accuracy of all three classificationmethods is over 90%, which is the prerequisite for the success of this study. Moreover, LogisticRegression appears to cope with this task very easily, since it classifies the documents moreefficiently compared to the other methods. A test of real data flowing into Medius’ invoiceworkflow shows that Logistic Regression is able to correctly classify up to 96% of the data.In conclusion, the Logistic Regression together with TF-IDF is determined to be the overall mostappropriate method out of the other tested methods. In addition, Doc2Vec suffers to providea good result because the data set is not customized and sufficient for the method to workwell. Document classification Text classification Invoices NLP TF-IDF Doc2vec Machine Learning Logistic Regression Multinomial Naïve Bayes Support Vector Machine. Dokumentklassificering Textklassificering Fakturor NLP TF-IDF Doc2vec Maskininlärning Logistic Regression Multinomial Naïve Bayes Support Vector Machine. Computer Sciences Datavetenskap (datalogi)
38	Bimodal adaptive hypermedia and interactive multimedia a web-based learning environment based on Kolb's theory of learning style Salehian, Bahram January 2003 (has links) Mémoire numérisé par la Direction des bibliothèques de l'Université de Montréal. Systèmes tutoriels intelligents Hpermédia adaptatif Multimédia interactif Apprentissage par l'action "SoftPhone" "Naïve Bayes Classifier" Environnement virtuel
39	Décryptage des mécanismes de signalisation précoce de la costimulation dans l' activation des lymphocytes T naifs / Deciphering the mechanisms of TCR and CD28 early signaling pathway cooperation required for naïve T cell activation Xia, Fan 26 November 2014 (has links) L'objectif de notre travail est de comprendre la contribution relative des voies de signalisation précoces du TCR et de CD28 dans l'activation des lymphocytes T naïfs. Notre étude a d'abord montré que dans les cellules T CD4+ naïves, la stimulation du TCR augmente de manière significative la liaison en deux dimensions (2D) de CD28 avec ses ligands B7, et ceci dépend à la fois du domaine cytoplasmique de CD28 et de l'activité des src kinases. Par la suite, notre analyse biochimique a démontré que l'engagement du TCR par son ligand (CMHp) potentialise la phosphorylation de CD28 stimulée par son ligand B7. En outre, la stimulation conjointe du TCR et de CD28 augmente fortement la phosphorylation des protéines de signalisation proximales telles que les molécules Vav-1 et PLCγ-1. Nous avons également examiné la mobilisation des ions calcique (Ca++). Nous avons trouvé que l'engagement du TCR ou de CD28 seul est capable de déclencher une élévation de la concentration intracellulaire d'ions Ca++ dans des cellules T naïves. Cette élévation qui se caractérise par de fortes fluctuations de la concentration calcique impliquerait principalement 2 types de canaux calciques de la membrane plasmique. De façon attendue, une stimulation conjointe des lymphocytes par le TCR et CD28 augmente l'amplitude moyenne de la réponse calcique. Nos données ont révélé que seule une stimulation conjointe, et non individuelle, du TCR et de CD28 augmente significativement le temps de résidence du Ca++ libres fluctuants par rapport aux cellules non stimulées. Par conséquent, cette augmentation du temps de résidence caractérise spécifiquement la réponse calcique induite par TCR et CD28. / In this work, we aimed at determining the relationship between and specific contribution of TCR and CD28 early signaling pathways in naïve CD4+ T cell activation. Our data showed that in naïve CD4+ T cells, TCR stimulation significantly increased the 2D binding of CD28 to its B7 ligands and this increase depended on both cytoplasmic tail of CD28 and activity of src kinases. Our biochemical analysis then demonstrated that TCR engagement with its ligand pMHC strongly enhanced the CD28 tyrosine phosphorylation triggered by B7. Moreover, the conjoint stimulation of TCR and CD28 markedly augmented activation of proximal signaling molecules such like Vav-1 and PLCγ-1 compared to the stimulation with each receptor alone. We next went to examine the calcium ion (Ca2+) mobilization. We found that in naïve CD4+ T cells, engagement with ligand of TCR or CD28 alone was able to trigger rise of the fluctuating cytosolic-free Ca2+ level. Unexpectedly, such rises implicated predominantly the involvements of two different types of calcium channels: Cav and CRAC channels. The conjoint stimulation with both TCR and CD28 enabled the augment of average amplitude of the calcium response. Through the time series analysis, our data unveiled that the conjoint, but not separate, TCR and CD28 stimulation in naïve CD4+ T cells significantly increased the fluctuating cytosolic-free Ca2+ dwell time relative to that found in unstimulated cells. The increase of the cytosolic-free Ca2+ dwell time therefore uniquely characterized the calcium response triggered by TCR and CD28 and presumably corresponded to a fundamental feature for the high efficiency of T cell activation induction. Tcr Cd28 Costimulation La mobilisation des ions calcique Lymphocytes T naïfs Tcr Cd28 Costimulation Calcium mobilization Naïve T cell 571
40	Sentiment Analysis of Twitter Data Using Machine Learning and Deep Learning Methods Manda, Kundan Reddy January 2019 (has links) Background: Twitter, Facebook, WordPress, etc. act as the major sources of information exchange in today's world. The tweets on Twitter are mainly based on the public opinion on a product, event or topic and thus contains large volumes of unprocessed data. Synthesis and Analysis of this data is very important and difficult due to the size of the dataset. Sentiment analysis is chosen as the apt method to analyse this data as this method does not go through all the tweets but rather relates to the sentiments of these tweets in terms of positive, negative and neutral opinions. Sentiment Analysis is normally performed in 3 ways namely Machine learning-based approach, Sentiment lexicon-based approach, and Hybrid approach. The Machine learning based approach uses machine learning algorithms and deep learning algorithms for analysing the data, whereas the sentiment lexicon-based approach uses lexicons in analysing the data and they contain vocabulary of positive and negative words. The Hybrid approach uses a combination of both Machine learning and sentiment lexicon approach for classification. Objectives: The primary objectives of this research are: To identify the algorithms and metrics for evaluating the performance of Machine Learning Classifiers. To compare the metrics from the identified algorithms depending on the size of the dataset that affects the performance of the best-suited algorithm for sentiment analysis. Method: The method chosen to address the research questions is Experiment. Through which the identified algorithms are evaluated with the selected metrics. Results: The identified machine learning algorithms are Naïve Bayes, Random Forest, XGBoost and the deep learning algorithm is CNN-LSTM. The algorithms are evaluated with respect to the metrics namely precision, accuracy, F1 score, recall and compared. CNN-LSTM model is best suited for sentiment analysis on twitter data with respect to the selected size of the dataset. Conclusion: Through the analysis of results, the aim of this research is achieved in identifying the best-suited algorithm for sentiment analysis on twitter data with respect to the selected dataset. CNN-LSTM model results in having the highest accuracy of 88% among the selected algorithms for the sentiment analysis of Twitter data with respect to the selected dataset. Machine Learning Sentiment Analysis Twitter data Deep Learning Naïve Bayes Twitter Sentiment Analysis Computer Sciences Datavetenskap (datalogi)

Search results