• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 57
  • 18
  • 13
  • 7
  • 5
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 121
  • 121
  • 64
  • 57
  • 49
  • 42
  • 28
  • 28
  • 27
  • 26
  • 24
  • 21
  • 20
  • 18
  • 17
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
51

Diagnóstico de falhas incipientes em linhas de transmissão / Diagnosis of incipient failures in transmission lines

SILVA, Paula Renatha Nunes da 26 October 2018 (has links)
Submitted by Luciclea Silva (luci@ufpa.br) on 2018-12-11T14:50:03Z No. of bitstreams: 2 license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) Tese_Diagnosticofalhasincipientes.pdf: 5235661 bytes, checksum: 67b492c9d40682971d19271da4d4a96c (MD5) / Approved for entry into archive by Luciclea Silva (luci@ufpa.br) on 2018-12-11T14:50:33Z (GMT) No. of bitstreams: 2 license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) Tese_Diagnosticofalhasincipientes.pdf: 5235661 bytes, checksum: 67b492c9d40682971d19271da4d4a96c (MD5) / Made available in DSpace on 2018-12-11T14:50:33Z (GMT). No. of bitstreams: 2 license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) Tese_Diagnosticofalhasincipientes.pdf: 5235661 bytes, checksum: 67b492c9d40682971d19271da4d4a96c (MD5) Previous issue date: 2018-10-26 / Atualmente, a operação do sistema de transmissão de energia elétrica é sobrecarregada pela grande quantidade de informações oriundas dos mais diversos sistemas de monitoração, que devem analisar estas informações para manter o sistema em condições de operação aceitáveis de acordo com a normas do Setor Elétrico Brasileiro. Nesse contexto, este trabalho propõe um sistema de diagnóstico de falhas on-line em linhas de transmissão baseado na análise da monitoração da corrente de fuga para múltiplas falhas incipientes, que é composto de módulos que se adaptam de modo autônomo às melhorias que são executadas na LT. O trabalho desenvolvido aborda especificamente o módulo diagnóstico, no qual são extraídas as características do espectro harmônico da corrente de fuga com falha, e posteriormente, identifica a falha mais proeminente em um cenário multi eventos. Para extrair as características dos sinais de corrente de fuga com falhas foi utilizada a redundância analítica, que a partir de dados obtidos em laboratório e em campo, serviu para determinar o comportamento normal da LT, elaborar o modelo da LT em funcionamento normal e com a anomalia. De posse da corrente de fuga de funcionamento normal e com falha realiza-se a caracterização destes sinais, que empregam algoritmos adequados nas características levantadas no estado da arte sobre o tema e nos dados obtidos em campo e em laboratório. Após escolher o algoritmo de extração que possui melhor desempenho para múltiplas falhas, são propostos classificadores para determinar qual a falhas mais proeminente na LT. O projeto do classificador levou em consideração que o sistema precisa se adaptar às mudanças ocorridas na LT, incorporando o conhecimento sobre o sistema, uma vez que este é bastante dinâmico. / Atualmente, a operação do sistema de transmissão de energia elétrica é sobrecarregada pela grande quantidade de informações oriundas dos mais diversos sistemas de monitoração, que devem analisar estas informações para manter o sistema em condições de operação aceitáveis de acordo com a normas do Setor Elétrico Brasileiro. Nesse contexto, este trabalho propõe um sistema de diagnóstico de falhas on-line em linhas de transmissão baseado na análise da monitoração da corrente de fuga para múltiplas falhas incipientes, que é composto de módulos que se adaptam de modo autônomo às melhorias que são executadas na LT. O trabalho desenvolvido aborda especificamente o módulo diagnóstico, no qual são extraídas as características do espectro harmônico da corrente de fuga com falha, e posteriormente, identifica a falha mais proeminente em um cenário multi eventos. Para extrair as características dos sinais de corrente de fuga com falhas foi utilizada a redundância analítica, que a partir de dados obtidos em laboratório e em campo, serviu para determinar o comportamento normal da LT, elaborar o modelo da LT em funcionamento normal e com a anomalia. De posse da corrente de fuga de funcionamento normal e com falha realiza-se a caracterização destes sinais, que empregam algoritmos adequados nas características levantadas no estado da arte sobre o tema e nos dados obtidos em campo e em laboratório. Após escolher o algoritmo de extração que possui melhor desempenho para múltiplas falhas, são propostos classificadores para determinar qual a falhas mais proeminente na LT. O projeto do classificador levou em consideração que o sistema precisa se adaptar às mudanças ocorridas na LT, incorporando o conhecimento sobre o sistema, uma vez que este é bastante dinâmico.
52

Um estudo sobre a extraÃÃo de caracterÃsticas e a classificaÃÃo de imagens invariantes à rotaÃÃo extraÃdas de um sensor industrial 3D / A study on the extraction of characteristics and the classification of invariant images through the rotation of an 3D industrial sensor

Rodrigo Dalvit Carvalho da Silva 08 May 2014 (has links)
CoordenaÃÃo de AperfeÃoamento de Pessoal de NÃvel Superior / Neste trabalho, à discutido o problema de reconhecimento de objetos utilizando imagens extraÃdas de um sensor industrial 3D. NÃs nos concentramos em 9 extratores de caracterÃsticas, dos quais 7 sÃo baseados nos momentos invariantes (Hu, Zernike, Legendre, Fourier-Mellin, Tchebichef, Bessel-Fourier e Gaussian-Hermite), um outro à baseado na Transformada de Hough e o Ãltimo na anÃlise de componentes independentes, e, 4 classificadores, Naive Bayes, k-Vizinhos mais PrÃximos, MÃquina de Vetor de Suporte e Rede Neural Artificial-Perceptron Multi-Camadas. Para a escolha do melhor extrator de caracterÃsticas, foram comparados os seus desempenhos de classificaÃÃo em termos de taxa de acerto e de tempo de extraÃÃo, atravÃs do classificador k-Vizinhos mais PrÃximos utilizando distÃncia euclidiana. O extrator de caracterÃsticas baseado nos momentos de Zernike obteve as melhores taxas de acerto, 98.00%, e tempo relativamente baixo de extraÃÃo de caracterÃsticas, 0.3910 segundos. Os dados gerados a partir deste, foram apresentados a diferentes heurÃsticas de classificaÃÃo. Dentre os classificadores testados, o classificador k-Vizinhos mais PrÃximos, obteve a melhor taxa mÃdia de acerto, 98.00% e, tempo mÃdio de classificaÃÃo relativamente baixo, 0.0040 segundos, tornando-se o classificador mais adequado para a aplicaÃÃo deste estudo. / In this work, the problem of recognition of objects using images extracted from a 3D industrial sensor is discussed. We focus in 9 feature extractors (where seven are based on invariant moments -Hu, Zernike, Legendre, Fourier-Mellin, Tchebichef, BesselâFourier and Gaussian-Hermite-, another is based on the Hough transform and the last one on independent component analysis), and 4 classifiers (Naive Bayes, k-Nearest Neighbor, Support Vector machines and Artificial Neural Network-Multi-Layer Perceptron). To choose the best feature extractor, their performance was compared in terms of classification accuracy rate and extraction time by the k-nearest neighbors classifier using euclidean distance. The feature extractor based on Zernike moments, got the best hit rates, 98.00 %, and relatively low time feature extraction, 0.3910 seconds. The data generated from this, were presented to different heuristic classification. Among the tested classifiers, the k-nearest neighbors classifier achieved the highest average hit rate, 98.00%, and average time of relatively low rank, 0.0040 seconds, thus making it the most suitable classifier for the implementation of this study.
53

System för att upptäcka Phishing : Klassificering av mejl

Karlsson, Nicklas January 2008 (has links)
<p>Denna rapport tar en titt på phishing-problemet, något som många har råkat ut för med bland annat de falska Nordea eller eBay mejl som på senaste tiden har dykt upp i våra inkorgar, och ett eventuellt sätt att minska phishingens effekt. Fokus i rapporten ligger på klassificering av mejl och den huvudsakliga frågeställningen är: ”Är det, med hög träffsäkerhet, möjligt att med hjälp av ett klassificeringsverktyg sortera ut mejl som har med phishing att göra från övrig skräppost.” Det visade sig svårare än väntat att hitta phishing mejl att använda i klassificeringen. I de klassificeringar som genomfördes visade det sig att både metoden Naive Bayes och med Support Vector Machine kan hitta upp till 100 % av phishing mejlen. Rapporten pressenterar arbetsgången, teori om phishing och resultaten efter genomförda klassificeringstest.</p> / <p>This report takes a look at the phishing problem, something that many have come across with for example the fake Nordea or eBay e-mails that lately have shown up in our e-mail inboxes, and a possible way to reduce the effect of phishing. The focus in the report lies on classification of e-mails and the main question is: “Is it, with high accuracy, possible with a classification tool to sort phishing e-mails from other spam e-mails.” It was more difficult than expected to find phishing e-mails to use in the classification. The classifications that were made showed that it was possible to find up to 100 % of the phishing e-mails with both Naive Bayes and with Support Vector Machine. The report presents the work done, facts about phishing and the results of the classification tests made.</p>
54

Twittersentimentanalys : Jämförelse av klassificeringsmodeller tränade på olika datamängder. / Twitter Sentiment Analysis : Comparison of classification models trained on different data sets.

Bandgren, Johannes, Selberg, Johan January 2018 (has links)
Twitter är en av de populäraste mikrobloggarna, som används för att uttryckatankar och åsikter om olika ämnen. Ett område som har dragit till sig mycketintresse under de senaste åren är twittersentimentanalys. Twittersentimentanalyshandlar om att bedöma vad för sentiment ett inlägg på Twitter uttrycker, om detuttrycker någonting positivt eller negativt. Olika metoder kan användas för attutföra twittersentimentanalys, där vissa lämpar sig bättre än andra. De vanligastemetoderna för twittersentimentanalys använder maskininlärning.Syftet med denna studie är att utvärdera tre stycken klassificeringsalgoritmerinom maskininlärning och hur märkningen av en datamängd påverkar en klassifi-ceringsmodells förmåga att märka ett twitterinlägg korrekt för twittersentimenta-nalys. Naive Bayes, Support Vector Machine och Convolutional Neural Network ärklassificeringsalgoritmerna som har utvärderats. För varje klassificeringsalgoritmhar två klassificeringsmodeller tagits fram, som har tränats och testats på två se-parata datamängder: Stanford Twitter Sentiment och SemEval. Det som skiljer detvå datamängderna åt, utöver innehållet i twitterinläggen, är märkningsmetodenoch mängden twitterinlägg. Utvärderingen har gjorts utefter vilken prestanda deframtagna klassificeringmodellerna uppnår på respektive datamängd, hur lång tidde tar att träna och hur invecklade de var att implementera.Resultaten av studien visar att samtliga modeller som tränades och testades påSemEval uppnådde en högre prestanda än de som tränades och testades på Stan-ford Twitter Sentiment. Klassificeringsmodellerna som var framtagna med Convo-lutional Neural Network uppnådde bäst resultat över båda datamängderna. Dockär ett Convolutional Neural Network mer invecklad att implementera och tränings-tiden är betydligt längre än Naive Bayes och Support Vector Machine. / Twitter is one of the most popular microblogs, which is used to express thoughtsand opinions on different topics. An area that has attracted much interest in recentyears is Twitter sentiment analysis. Twitter sentiment analysis is about assessingwhat sentiment a Twitter post expresses, whether it expresses something positiveor negative. Different methods can be used to perform Twitter sentiment analysis.The most common methods of Twitter sentiment analysis use machine learning.The purpose of this study is to evaluate three classification algorithms in ma-chine learning and how the labeling of a data set affects classification models abilityto classify a Twitter post correctly for Twitter sentiment analysis. Naive Bayes,Support Vector Machine and Convolutional Neural Network are the classificationalgorithms that have been evaluated. For each classification algorithm, two classi-fication models have been trained and tested on two separate data sets: StanfordTwitter Sentiment and SemEval. What separates the two data sets, in addition tothe content of the twitter posts, is the labeling method and the amount of twitterposts. The evaluation has been done according to the performance of the classifi-cation models on the respective data sets, training time and how complicated theywere to implement.The results show that all models trained and tested on SemEval achieved ahigher performance than those trained and tested on Stanford Twitter Sentiment.The Convolutional Neural Network models achieved the best results over both datasets. However, a Convolutional Neural Network is more complicated to implementand the training time is significantly longer than Naive Bayes and Support VectorMachine.
55

System för att upptäcka Phishing : Klassificering av mejl

Karlsson, Nicklas January 2008 (has links)
Denna rapport tar en titt på phishing-problemet, något som många har råkat ut för med bland annat de falska Nordea eller eBay mejl som på senaste tiden har dykt upp i våra inkorgar, och ett eventuellt sätt att minska phishingens effekt. Fokus i rapporten ligger på klassificering av mejl och den huvudsakliga frågeställningen är: ”Är det, med hög träffsäkerhet, möjligt att med hjälp av ett klassificeringsverktyg sortera ut mejl som har med phishing att göra från övrig skräppost.” Det visade sig svårare än väntat att hitta phishing mejl att använda i klassificeringen. I de klassificeringar som genomfördes visade det sig att både metoden Naive Bayes och med Support Vector Machine kan hitta upp till 100 % av phishing mejlen. Rapporten pressenterar arbetsgången, teori om phishing och resultaten efter genomförda klassificeringstest. / This report takes a look at the phishing problem, something that many have come across with for example the fake Nordea or eBay e-mails that lately have shown up in our e-mail inboxes, and a possible way to reduce the effect of phishing. The focus in the report lies on classification of e-mails and the main question is: “Is it, with high accuracy, possible with a classification tool to sort phishing e-mails from other spam e-mails.” It was more difficult than expected to find phishing e-mails to use in the classification. The classifications that were made showed that it was possible to find up to 100 % of the phishing e-mails with both Naive Bayes and with Support Vector Machine. The report presents the work done, facts about phishing and the results of the classification tests made.
56

A contribution to topological learning and its application in Social Networks / Une contribution à l'apprentissage topologique et son application dans les réseaux sociaux

Ezzeddine, Diala 01 October 2014 (has links)
L'Apprentissage Supervisé est un domaine populaire de l'Apprentissage Automatique en progrès constant depuis plusieurs années. De nombreuses techniques ont été développées pour résoudre le problème de classification, mais, dans la plupart des cas, ces méthodes se basent sur la présence et le nombre de points d'une classe donnée dans des zones de l'espace que doit définir le classifieur. Á cause de cela la construction de ce classifieur est dépendante de la densité du nuage de points des données de départ. Dans cette thèse, nous montrons qu'utiliser la topologie des données peut être une bonne alternative lors de la construction des classifieurs. Pour cela, nous proposons d'utiliser les graphes topologiques comme le Graphe de Gabriel (GG) ou le Graphes des Voisins Relatifs (RNG). Ces dernier représentent la topologie de données car ils sont basées sur la notion de voisinages et ne sont pas dépendant de la densité. Pour appliquer ce concept, nous créons une nouvelle méthode appelée Classification aléatoire par Voisinages (Random Neighborhood Classification (RNC)). Cette méthode utilise des graphes topologiques pour construire des classifieurs. De plus, comme une Méthodes Ensemble (EM), elle utilise plusieurs classifieurs pour extraire toutes les informations pertinentes des données. Les EM sont bien connues dans l'Apprentissage Automatique. Elles génèrent de nombreux classifieurs à partir des données, puis agrègent ces classifieurs en un seul. Le classifieur global obtenu est reconnu pour être très eficace, ce qui a été montré dans de nombreuses études. Cela est possible car il s'appuie sur des informations obtenues auprès de chaque classifieur qui le compose. Nous avons comparé RNC à d'autres méthodes de classification supervisées connues sur des données issues du référentiel UCI Irvine. Nous constatons que RNC fonctionne bien par rapport aux meilleurs d'entre elles, telles que les Forêts Aléatoires (RF) et Support Vector Machines (SVM). La plupart du temps, RNC se classe parmi les trois premières méthodes en terme d'eficacité. Ce résultat nous a encouragé à étudier RNC sur des données réelles comme les tweets. Twitter est un réseau social de micro-blogging. Il est particulièrement utile pour étudier l'opinion à propos de l'actualité et sur tout sujet, en particulier la politique. Cependant, l'extraction de l'opinion politique depuis Twitter pose des défis particuliers. En effet, la taille des messages, le niveau de langage utilisé et ambiguïté des messages rend très diffcile d'utiliser les outils classiques d'analyse de texte basés sur des calculs de fréquence de mots ou des analyses en profondeur de phrases. C'est cela qui a motivé cette étude. Nous proposons d'étudier les couples auteur/sujet pour classer le tweet en fonction de l'opinion de son auteur à propos d'un politicien (un sujet du tweet). Nous proposons une procédure qui porte sur l'identification de ces opinions. Nous pensons que les tweets expriment rarement une opinion objective sur telle ou telle action d'un homme politique mais plus souvent une conviction profonde de son auteur à propos d'un mouvement politique. Détecter l'opinion de quelques auteurs nous permet ensuite d'utiliser la similitude dans les termes employés par les autres pour retrouver ces convictions à plus grande échelle. Cette procédure à 2 étapes, tout d'abord identifier l'opinion de quelques couples de manière semi-automatique afin de constituer un référentiel, puis ensuite d'utiliser l'ensemble des tweets d'un couple (tous les tweets d'un auteur mentionnant un politicien) pour les comparer avec ceux du référentiel. L'Apprentissage Topologique semble être un domaine très intéressant à étudier, en particulier pour résoudre les problèmes de classification...... / Supervised Learning is a popular field of Machine Learning that has made recent progress. In particular, many methods and procedures have been developed to solve the classification problem. Most classical methods in Supervised Learning use the density estimation of data to construct their classifiers.In this dissertation, we show that the topology of data can be a good alternative in constructing classifiers. We propose using topological graphs like Gabriel graphs (GG) and Relative Neighborhood Graphs (RNG) that can build the topology of data based on its neighborhood structure. To apply this concept, we create a new method called Random Neighborhood Classification (RNC).In this method, we use topological graphs to construct classifiers and then apply Ensemble Methods (EM) to get all relevant information from the data. EM is well known in Machine Learning, generates many classifiers from data and then aggregates these classifiers into one. Aggregate classifiers have been shown to be very efficient in many studies, because it leverages relevant and effective information from each generated classifier. We first compare RNC to other known classification methods using data from the UCI Irvine repository. We find that RNC works very well compared to very efficient methods such as Random Forests and Support Vector Machines. Most of the time, it ranks in the top three methods in efficiency. This result has encouraged us to study the efficiency of RNC on real data like tweets. Twitter, a microblogging Social Network, is especially useful to mine opinion on current affairs and topics that span the range of human interest, including politics. Mining political opinion from Twitter poses peculiar challenges such as the versatility of the authors when they express their political view, that motivate this study. We define a new attribute, called couple, that will be very helpful in the process to study the tweets opinion. A couple is an author that talk about a politician. We propose a new procedure that focuses on identifying the opinion on tweet using couples. We think that focusing on the couples's opinion expressed by several tweets can overcome the problems of analysing each single tweet. This approach can be useful to avoid the versatility, language ambiguity and many other artifacts that are easy to understand for a human being but not automatically for a machine.We use classical Machine Learning techniques like KNN, Random Forests (RF) and also our method RNC. We proceed in two steps : First, we build a reference set of classified couples using Naive Bayes. We also apply a second alternative method to Naive method, sampling plan procedure, to compare and evaluate the results of Naive method. Second, we evaluate the performance of this approach using proximity measures in order to use RNC, RF and KNN. The expirements used are based on real data of tweets from the French presidential election in 2012. The results show that this approach works well and that RNC performs very good in order to classify opinion in tweets.Topological Learning seems to be very intersting field to study, in particular to address the classification problem. Many concepts to get informations from topological graphs need to analyse like the ones described by Aupetit, M. in his work (2005). Our work show that Topological Learning can be an effective way to perform classification problem.
57

Classification Performance Between Machine Learning and Traditional Programming in Java

Alassadi, Abdulrahman, Ivanauskas, Tadas January 2019 (has links)
This study proposes a performance comparison between two Java applications with two different programming approaches, machine learning, and traditional programming. A case where both machine learning and traditional programming can be applied is a classification problem with numeric values. The data is heart disease dataset since heart disease is the leading cause of death in the USA. Performance analysis of both applications is carried to state the differences in four main points; the development time for each application, code complexity, and time complexity of the implemented algorithms, the classification accuracy results, and the resource consumption of each application. The machine learning Java application is built with the help of WEKA library and using its NaiveBayes class to build the model and evaluate its accuracy. While the traditional programming Java application is built with the help of a cardiologist as an expert in the field of the problem to identify the injury indications values. The findings of this study are that the traditional programming application scored better performance results in development time, code complexity, and resource consumption. It scored a classification accuracy of 80.2% while the Naive Bayes algorithms in the machine learning application scored an accuracy of 85.51% but on the expense of high resource consumption and execution time.
58

Improving search results with machine learning : Classifying multi-source data with supervised machine learning to improve search results

Stakovska, Meri January 2018 (has links)
Sony’s Support Application team wanted an experiment to be conducted by which they could determine if it was suitable to use Machine Learning to improve the quantity and quality of search results of the in-application search tool. By improving the quantity and quality of the results the team wanted to improve the customer’s journey. A supervised machine learning model was created to classify articles into four categories; Wi-Fi &amp; Connectivity, Apps &amp; Settings, System &amp; Performance, andBattery Power &amp; Charging. The same model was used to create a service that categorized the search terms into one of the four categories. The classified articles and the classified search terms were used to complement the existing search tool. The baseline for the experiment was the result of the search tool without classification. The results of the experiment show that the number of articles did indeed increase but due mainly to the broadness of the categories the search results held low quality.
59

A bayesian network system for tinnitus diagnostics

Jangholi, Narges January 2014 (has links)
Orientador: Prof. Dr. Peter M. E. Claessens / Dissertação (mestrado) - Universidade Federal do ABC, Programa de Pós-Graduação em Neurociência e Cognição, 2014. / Zumbido (tinnitus) é um distúrbio comum de audição, muitas vezes debilitante em graus variados. Dado que zumbido é uma condição multifacetada, com sintomas que frequentemente são psicológicos e subjetivos, e com muitas causas potenciais, a diagnose deste distúrbio não é trivial. Por exemplo, zumbido pode ser objetivo e mensurável ou subjetivo e produzido por fatores neurais que podem ser de localização mais periférica ou central. Este projeto de mestrado propõe o desenvolvimento de um sistema especialista médico para apoiar clínicos na indicação de tratamento para zumbido. Este estudo foca em três tipos de tratamento para zumbido, a saber, dieta, medicação e aparelho auditivo, como também nas combinações, para categorização supervisionada. Redes Bayesianas ingênuas (naive) foram utilizadas para relacionar uma diversidade de resultados de exames e elementos de anamnese a indicações de tratamento por clínicos. Como tratamentos não são mutualmente exclusivos, a categorização deve levar em conta casos multi-label, isto é, a possibilidade de indicações diferentes de tratamento simultâneas. Com o objetivo de mapear as probabilidades a posteriori das indicações diferentes de tratamento para classificação multi-label , a diferença entre as distribuições a posteriori foi usada como critério para resolver o problema multi-label. Esta estratégia foi avaliada e o desempenho comparada a uma estratégia mais simples de mapeamento single-label. Os resultados mostram que a acurácia da abordagem multi-label é melhor que o ajuste single-label. O sistema fornece assim um primeiro passo satisfatório do desenvolvimento de um sistema de apoio médico futuramente mais amplo, integrado e dinâmico. / Tinnitus is a common hearing disorder, often debilitating to varying degrees. Given that tinnitus is a multifaceted condition, with symptoms that are often psychological and subjective, and with many different possible causes, its diagnosis is not trivial. For example, tinnitus can be objective and measureable or subjective and produced by neural factors which can either be more peripheral or more centrally located. This Master¿s project proposes the development of a medical expert system to assist clinicians in the indication of treatment for tinnitus. This study focused on three types of treatment for tinnitus, namely, Diet, Medication and Hearing Aid, as well as on their combinations for supervised categorization. Naïve Bayes networks were used to relate a diversity of test results and elements of the anamnesis to treatment referrals by clinicians. Because treatments are not mutually exclusive, the categorization needs to take into account multi-labeling cases, that is, the possibility of several simultaneous treatment indications. In order to map the posterior probabilities of the different treatment indications to multi-labeling classification, the difference between posterior probabilities was used as a criterion to solve the multi-labeling problem. This strategy was evaluated and its performance compared to a simpler single-labeling mapping strategy. The result shows that the accuracy of the multi-labeling approach is higher than a single-labeling adjustment. The system thus provides a first satisfactory step in the development of a more encompassing, integrated and dynamic medical support system.
60

Automated classification of bibliographic data using SVM and Naive Bayes

Nordström, Jesper January 2018 (has links)
Classification of scientific bibliographic data is an important and increasingly more time-consuming task in a “publish or perish” paradigm where the number of scientific publications is steadily growing. Apart from being a resource-intensive endeavor, manual classification has also been shown to be often performed with a quite high degree of inconsistency. Since many bibliographic databases contain a large number of already classified records supervised machine learning for automated classification might be a solution for handling the increasing volumes of published scientific articles. In this study automated classification of bibliographic data, based on two different machine learning methods; Naive Bayes and Support Vector Machine (SVM), were evaluated. The data used in the study were collected from the Swedish research database SwePub and the features used for training the classifiers were based on abstracts and titles in the bibliographic records. The accuracy achieved ranged between a lowest score of 0.54 and a highest score of 0.84. The classifiers based on Support Vector Machine did consistently receive higher scores than the classifiers based on Naive Bayes. Classification performed at the second level in the hierarchical classification system used clearly resulted in lower scores than classification performed at the first level. Using abstracts as the basis for feature extraction yielded overall better results than using titles, the differences were however very small.

Page generated in 0.4389 seconds