Global ETD Search

11	Reconhecimento automático de defeitos de fabricação em painéis TFT-LCD através de inspeção de imagem SILVA, Antonio Carlos de Castro da 15 January 2016 (has links) Submitted by Fabio Sobreira Campos da Costa (fabio.sobreira@ufpe.br) on 2016-09-12T14:09:09Z No. of bitstreams: 2 license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5) MSc_Antonio Carlos de Castro da Silva_digital_12_04_16.pdf: 2938596 bytes, checksum: 9d5e96b489990fe36c4e1ad5a23148dd (MD5) / Made available in DSpace on 2016-09-12T14:09:09Z (GMT). No. of bitstreams: 2 license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5) MSc_Antonio Carlos de Castro da Silva_digital_12_04_16.pdf: 2938596 bytes, checksum: 9d5e96b489990fe36c4e1ad5a23148dd (MD5) Previous issue date: 2016-01-15 / A detecção prematura de defeitos nos componentes de linhas de montagem de fabricação é determinante para a obtenção de produtos finais de boa qualidade. Partindo desse pressuposto, o presente trabalho apresenta uma plataforma desenvolvida para detecção automática dos defeitos de fabricação em painéis TFT-LCD (Thin Film Transistor-Liquid Cristal Displays) através da realização de inspeção de imagem. A plataforma desenvolvida é baseada em câmeras, sendo o painel inspecionado posicionado em uma câmara fechada para não sofrer interferência da luminosidade do ambiente. As etapas da inspeção consistem em aquisição das imagens pelas câmeras, definição da região de interesse (detecção do quadro), extração das características, análise das imagens, classificação dos defeitos e tomada de decisão de aprovação ou rejeição do painel. A extração das características das imagens é realizada tomando tanto o padrão RGB como imagens em escala de cinza. Para cada componente RGB a intensidade de pixels é analisada e a variância é calculada, se um painel apresentar variação de 5% em relação aos valores de referência, o painel é rejeitado. A classificação é realizada por meio do algorítimo de Naive Bayes. Os resultados obtidos mostram um índice de 94,23% de acurácia na detecção dos defeitos. Está sendo estudada a incorporação da plataforma aqui descrita à linha de produção em massa da Samsung em Manaus. / The early detection of defects in the parts used in manufacturing assembly lines is crucial for assuring the good quality of the final product. Thus, this paper presents a platform developed for automatically detecting manufacturing defects in TFT-LCD (Thin Film Transistor-Liquid Cristal Displays) panels by image inspection. The developed platform is based on câmeras. The panel under inspection is positioned in a closed chamber to avoid interference from light sources from the environment. The inspection steps encompass image acquisition by the cameras, setting the region of interest (frame detection), feature extraction, image analysis, classification of defects, and decision making. The extraction of the features of the acquired images is performed using both the standard RGB and grayscale images. For each component the intensity of RGB pixels is analyzed and the variance is calculated. A panel is rejected if the value variation of the measure obtained is 5% of the reference values. The classification is performed using the Naive Bayes algorithm. The results obtained show an accuracy rate of 94.23% in defect detection. Samsung (Manaus) is considering the possibility of incorporating the platform described here to its mass production line. TFT-LCD plataforma reconhecimento de imagem detecção automática classificador Naive Bayes TFT-LCD platform image recognition automatic detection Naive Bayes classifier
12	[en] A STUDY OF MULTILABEL TEXT CLASSIFICATION ALGORITHMS USING NAIVE-BAYES / [pt] UM ESTUDO DE ALGORITMOS PARA CLASSIFICAÇÃO AUTOMÁTICA DE TEXTOS UTILIZANDO NAIVE-BAYES DAVID STEINBRUCH 12 March 2007 (has links) [pt] A quantidade de informação eletrônica vem crescendo de forma acelerada, motivada principalmente pela facilidade de publicação e divulgação que a Internet proporciona. Desta forma, é necessária a organização da informação de forma a facilitar a sua aquisição. Muitos trabalhos propuseram resolver este problema através da classificação automática de textos associando a eles vários rótulos (classificação multirótulo). No entanto, estes trabalhos transformam este problema em subproblemas de classificação binária, considerando que existe independência entre as categorias. Além disso, utilizam limiares (thresholds), que são muito específicos para o conjunto de treinamento utilizado, não possuindo grande capacidade de generalização na aprendizagem. Esta dissertação propõe dois algoritmos de classificação automática de textos baseados no algoritmo multinomial naive Bayes e sua utilização em um ambiente on-line de classificação automática de textos com realimentação de relevância pelo usuário. Para testar a eficiência dos algoritmos propostos, foram realizados experimentos na base de notícias Reuters 21758 e na base de documentos médicos Ohsumed. / [en] The amount of electronic information has been growing fast, mainly due to the easiness of publication and spreading that Internet provides. Therefore, is necessary the organisation of information to facilitate its retrieval. Many works have solved this problem through the automatic text classification, associating to them several labels (multilabel classification). However, those works have transformed this problem into binary classification subproblems, considering there is not dependence among categories. Moreover, they have used thresholds, which are very sepecific of the classifier document base, and so, does not have great generalization capacity in the learning process. This thesis proposes two text classifiers based on the multinomial algorithm naive Bayes and its usage in an on-line text classification environment with user relevance feedback. In order to test the proposed algorithms efficiency, experiments have been performed on the Reuters 21578 news base, and on the Ohsumed medical document base. [pt] APRENDIZADO DE MAQUINA [en] MACHINE LEARNING [pt] INTERNET [en] INTERNET [pt] CATEGORIZACAO DE TEXTOS [en] TEXT CATEGORIZATION [pt] CLASSIFICACAO DE TEXTOS [en] TEXT CLASSIFICATION [pt] MULTIROTULO [en] MULTILABEL [pt] NAIVE-BAYES [en] NAIVE-BAYES
13	SELEÇÃO DE ATRIBUTOS EM IMAGENS COLETADAS SOB CONDIÇÕES DE ILUMINAÇÃO NÃO CONTROLADA E SUA INFLUÊNCIA NO DESEMPENHO DE CLASSIFICADORES NAIVE BAYES PARA IDENTIFICAÇÃO DE OBJETOS EM ESTUFAS AGRÍCOLAS Gaspareto, Marinaldo José 10 September 2013 (has links) Made available in DSpace on 2017-07-21T14:19:40Z (GMT). No. of bitstreams: 1 Marinaldo Gaspareto.pdf: 1456191 bytes, checksum: ffaf0b449c6b9d107bdf1946a4619315 (MD5) Previous issue date: 2013-09-10 / A problem regarding the implementation of navigation systems for autonomous moving robots is to detect the objects of interest and obstacles which are in the environment. This study considers the detection of walls / low walls of agricultural greenhouses in digital images obtained without illumination control. The proposed approach employs techniques of digital image processing and digital classification to detect the object of interest. The classifier has been developed digital type Naive Bayes. Two important issues when employing classification methods in computer vision is the accuracy of the classifier and the complexity of computing time. The selection of attributes descriptors that comprise a classifier has great impact on these two factors, generally the fewer attributes are required, the lower the computational cost. Regarding it, this study compared the performance of two methods of feature selection based on principal component analysis, named B2 and B4 in two cases. In the first scenario the feature selection was conducted on all the data extracted from all images. The second selection was performed for images grouped by similarity. After selection, the selected attributes for each approach was used to construct the type Naive Bayes classifier with 12, 17, 22 and 27 input variables. The results indicate that the grouping of images is useful when: (a) the distance from the center of the group to the center of the original database exceeds a threshold and (b) a correlation among the descriptors variables and the target variable is greater than in the group as a whole complete data. Keywords: Greenhouses, Autonomous navigation, Selection attributes, Naive Bayes classifiers. / Um problema relativo à implementação de sistemas de navegação para robôs autônomos móveis é a detecção dos objetos de interesse e dos obstáculos que estão no ambiente. Este trabalho considera a detecção das paredes/muretas de estufas agrícolas em imagens digitais adquiridas sem controle de iluminação. A abordagem proposta emprega técnicas de processamento digital de imagens e classificação digital para detectar o objeto de interesse. O classificador digital desenvolvido foi do tipo Naive Bayes. Duas questões importantes quando do emprego de métodos de classificação em visão computacional são a acurácia do classificador e a complexidade de tempo de computação. A seleção dos atributos descritores que compõem um classificador tem grande impacto sobre estes dois fatores, de um modo geral, quanto menos atributos forem necessários, menor o custo computacional. Considerando isso, este trabalho comparou o desempenho de dois métodos de seleção de atributos baseados na análise de componentes principais, chamados B2 e B4 em duas situações. Na primeira situação, a seleção de atributos foi realizada sobre o conjunto dos dados extraídos de todas as imagens. Na segunda, a seleção foi realizada para imagens agrupadas por similaridade. Após a seleção, os atributos selecionados em cada uma das abordagens foram usados para construir classificadores do tipo Naive Bayes com 12, 17, 22 e 27 variáveis de entrada. Os resultados indicam que o agrupamento de imagens é útil quando: (a) a distância do centro do grupo ao centro da base original ultrapassa um limiar e (b) a correlação entre as variáveis descritoras e a variável meta é maior no grupo do que no conjunto completo de dados. seleção de atributos classificadores Naive Bayes greenhouses autonomous navigation selection attributes Naive Bayes classifiers
14	Seleção entre estratégias de geração automática de dados de teste por meio de métricas estáticas de softwares orientados a objetos / Selection between whole test generation strategies by analysing object oriented software static metrics Ramos, Gustavo da Mota 09 October 2018 (has links) Produtos de software com diferentes complexidades são criados diariamente através da elicitação de demandas complexas e variadas juntamente a prazos restritos. Enquanto estes surgem, altos níveis de qualidade são esperados para tais, ou seja, enquanto os produtos tornam-se mais complexos, o nível de qualidade pode não ser aceitável enquanto o tempo hábil para testes não acompanha a complexidade. Desta maneira, o teste de software e a geração automática de dados de testes surgem com o intuito de entregar produtos contendo altos níveis de qualidade mediante baixos custos e rápidas atividades de teste. Porém, neste contexto, os profissionais de desenvolvimento dependem das estratégias de geração automáticas de testes e principalmente da seleção da técnica mais adequada para conseguir maior cobertura de código possível, este é um fator importante dados que cada técnica de geração de dados de teste possui particularidades e problemas que fazem seu uso melhor em determinados tipos de software. A partir desde cenário, o presente trabalho propõe a seleção da técnica adequada para cada classe de um software com base em suas características, expressas por meio de métricas de softwares orientados a objetos a partir do algoritmo de classificação Naive Bayes. Foi realizada uma revisão bibliográfica de dois algoritmos de geração, algoritmo de busca aleatório e algoritmo de busca genético, compreendendo assim suas vantagens e desvantagens tanto de implementação como de execução. As métricas CK também foram estudadas com o intuito de compreender como estas podem descrever melhor as características de uma classe. O conhecimento adquirido possibilitou coletar os dados de geração de testes de cada classe como cobertura de código e tempo de geração a partir de cada técnica e também as métricas CK, permitindo assim a análise destes dados em conjunto e por fim execução do algoritmo de classificação. Os resultados desta análise demonstraram que um conjunto reduzido e selecionado das métricas CK é mais eficiente e descreve melhor as características de uma classe se comparado ao uso do conjunto por completo. Os resultados apontam também que as métricas CK não influenciam o tempo de geração dos dados de teste, entretanto, as métricas CK demonstraram correlação moderada e influência na seleção do algoritmo genético, participando assim na sua seleção pelo algoritmo Naive Bayes / Software products with different complexity are created daily through analysis of complex and varied demands together with tight deadlines. While these arise, high levels of quality are expected for such, as products become more complex, the quality level may not be acceptable while the timing for testing does not keep up with complexity. In this way, software testing and automatic generation of test data arise in order to deliver products containing high levels of quality through low cost and rapid test activities. However, in this context, software developers depend on the strategies of automatic generation of tests and especially on the selection of the most adequate technique to obtain greater code coverage possible, this is an important factor given that each technique of data generation of test have peculiarities and problems that make its use better in certain types of software. From this scenario, the present work proposes the selection of the appropriate technique for each class of software based on its characteristics, expressed through object oriented software metrics from the naive bayes classification algorithm. Initially, a literature review of the two generation algorithms was carried out, random search algorithm and genetic search algorithm, thus understanding its advantages and disadvantages in both implementation and execution. The CK metrics have also been studied in order to understand how they can better describe the characteristics of a class. The acquired knowledge allowed to collect the generation data of tests of each class as code coverage and generation time from each technique and also the CK metrics, thus allowing the analysis of these data together and finally execution of the classification algorithm. The results of this analysis demonstrated that a reduced and selected set of metrics is more efficient and better describes the characteristics of a class besides demonstrating that the CK metrics have little or no influence on the generation time of the test data and on the random search algorithm . However, the CK metrics showed a medium correlation and influence in the selection of the genetic algorithm, thus participating in its selection by the algorithm naive bayes Algoritmo genético Cobertura de testes Code coverages Genetic algorithm Geração de testes Métricas CK Naive bayes Naive bayes Software testing Test data generation Teste de software
15	Seleção entre estratégias de geração automática de dados de teste por meio de métricas estáticas de softwares orientados a objetos / Selection between whole test generation strategies by analysing object oriented software static metrics Gustavo da Mota Ramos 09 October 2018 (has links) Produtos de software com diferentes complexidades são criados diariamente através da elicitação de demandas complexas e variadas juntamente a prazos restritos. Enquanto estes surgem, altos níveis de qualidade são esperados para tais, ou seja, enquanto os produtos tornam-se mais complexos, o nível de qualidade pode não ser aceitável enquanto o tempo hábil para testes não acompanha a complexidade. Desta maneira, o teste de software e a geração automática de dados de testes surgem com o intuito de entregar produtos contendo altos níveis de qualidade mediante baixos custos e rápidas atividades de teste. Porém, neste contexto, os profissionais de desenvolvimento dependem das estratégias de geração automáticas de testes e principalmente da seleção da técnica mais adequada para conseguir maior cobertura de código possível, este é um fator importante dados que cada técnica de geração de dados de teste possui particularidades e problemas que fazem seu uso melhor em determinados tipos de software. A partir desde cenário, o presente trabalho propõe a seleção da técnica adequada para cada classe de um software com base em suas características, expressas por meio de métricas de softwares orientados a objetos a partir do algoritmo de classificação Naive Bayes. Foi realizada uma revisão bibliográfica de dois algoritmos de geração, algoritmo de busca aleatório e algoritmo de busca genético, compreendendo assim suas vantagens e desvantagens tanto de implementação como de execução. As métricas CK também foram estudadas com o intuito de compreender como estas podem descrever melhor as características de uma classe. O conhecimento adquirido possibilitou coletar os dados de geração de testes de cada classe como cobertura de código e tempo de geração a partir de cada técnica e também as métricas CK, permitindo assim a análise destes dados em conjunto e por fim execução do algoritmo de classificação. Os resultados desta análise demonstraram que um conjunto reduzido e selecionado das métricas CK é mais eficiente e descreve melhor as características de uma classe se comparado ao uso do conjunto por completo. Os resultados apontam também que as métricas CK não influenciam o tempo de geração dos dados de teste, entretanto, as métricas CK demonstraram correlação moderada e influência na seleção do algoritmo genético, participando assim na sua seleção pelo algoritmo Naive Bayes / Software products with different complexity are created daily through analysis of complex and varied demands together with tight deadlines. While these arise, high levels of quality are expected for such, as products become more complex, the quality level may not be acceptable while the timing for testing does not keep up with complexity. In this way, software testing and automatic generation of test data arise in order to deliver products containing high levels of quality through low cost and rapid test activities. However, in this context, software developers depend on the strategies of automatic generation of tests and especially on the selection of the most adequate technique to obtain greater code coverage possible, this is an important factor given that each technique of data generation of test have peculiarities and problems that make its use better in certain types of software. From this scenario, the present work proposes the selection of the appropriate technique for each class of software based on its characteristics, expressed through object oriented software metrics from the naive bayes classification algorithm. Initially, a literature review of the two generation algorithms was carried out, random search algorithm and genetic search algorithm, thus understanding its advantages and disadvantages in both implementation and execution. The CK metrics have also been studied in order to understand how they can better describe the characteristics of a class. The acquired knowledge allowed to collect the generation data of tests of each class as code coverage and generation time from each technique and also the CK metrics, thus allowing the analysis of these data together and finally execution of the classification algorithm. The results of this analysis demonstrated that a reduced and selected set of metrics is more efficient and better describes the characteristics of a class besides demonstrating that the CK metrics have little or no influence on the generation time of the test data and on the random search algorithm . However, the CK metrics showed a medium correlation and influence in the selection of the genetic algorithm, thus participating in its selection by the algorithm naive bayes Algoritmo genético Cobertura de testes Geração de testes Métricas CK Naive bayes Teste de software Code coverages Genetic algorithm Naive bayes Software testing Test data generation
16	Applicering av maskininlärning för att predicera utfall av Kickstarter-projekt / Application of machine learning to predict outcome of Kickstarter-projects Lidén, Rickard, In, Gabriel January 2021 (has links) Crowdfunding är i den moderna digitala världen ett populärt sätt att samla in pengar till sitt projekt. Kickstarter är en av de ledande sidorna för crowdfunding. Predicering av ett Kickstarter-projekts framgång eller misslyckande kan därav vara av stort intresse för entreprenörer.Studiens syfte är att jämföra fyra olika algoritmers prediceringsförmåga på två olika Kickstarter-dataset. Det ena datasetet sträcker sig mellan åren 2020-2021, och det andra mellan åren 2016-2021. Algoritmerna som jämförs är KNN, Naive Bayes, MLP, och Random Forest.Av dessa fyra modeller så skapades i denna studie de bästa produktionsmodellerna av KNN och Random Forest. KNN var bäst för 2020-2021-datasetet, med 77,0% träffsäkerhet. Random Forest var bäst för 2016-2021-datasetet, med 76,8% träffsäkerhet. / Crowdfunding has in the modern, digitalized world become a popular method for gathering money for a project. Kickstarter is one of the most popular websites for crowdfunding. This means that predicting the success or failure of a Kickstarter-project by way of machine learning could be of great interest to entrepreneurs.The purpose of this study is to compare the predictive abilities of four different algorithms on two different Kickstarter-datasets. One dataset contains data in the span of the years 2020-2021, and the other contains data from 2016-2021. The algorithms used in this study are KNN, Naive Bayes, MLP and Random Forest.Out of these four algorithms, the top-performing prediction abilities for the two datasets were found in KNN and Random Forest. KNN was the best-performing algorithm for 2020-2021, with 77,0% accuracy. Random Forest had the top score for 2016-2021, with 76,8% accuracy. The language used in this study is Swedish. Crowdfunding Machine learning Random Forest Multilayer Perceptron KNN Naive Bayes Crowdfunding Maskininlärning Random Forest Multilayer Perceptron KNN Naive Bayes Computer and Information Sciences Data- och informationsvetenskap
17	Sentimentanalys av svenskt aktieforum för att förutspå aktierörelse / Sentiment analysis of Swedish stock trading forum for predicting stock market movement Ouadria, Michel Sebastian, Ciobanu, Ann-Stephanie January 2020 (has links) Förevarande studie undersöker möjligheten att förutsäga aktierörelse på en dagligbasis med sentimentanalys av inlägg från ett svenskt aktieforum. Sentimentanalys används för att finna subjektivitet i form av känslor (sentiment) ur text. Textdata extraherades från ett svenskt aktieforum för att förutsäga aktierörelsen för den relaterade aktien. All data aggregerades inom en bestämd tidsperiod på två år. Undersökningen utnyttjade maskininlärning för att träna tre maskininlärningsmodeller med textdata och aktiedata. Resultatet påvisade ingen tydlig korrelation mellan sentiment och aktierörelse. Vidare uppnåddes inte samma resultat som tidigare arbeten inom området. Den högst uppnådda noggrannheten med modellerna beräknades till 64%. / The present study examines the possibility of predicting stock movement on a daily basis with sentiment analysis of posts in a swedish stock trading forum. Sentiment analysis is used to find subjectivity in the form of emotions (sentiment) from text. Textdata was extracted from a stock forum to predict the share movement of the related share. All data was aggregated within a fixed period of two years. The analysis utilizes machine learning to train three machine learning models with textdata and stockdata. The result showed no clear correlation between sentiment and stock movement. Furthermore, the result was not able to replicate accuracy as previous work in the field. The highest accuracy achieved with the models was calculated at 64%. Sentiment analysis Stock market Machine Learning Support Vector Machine Naive Bayes Extreme Gradient Boosting Sentimentanalys Aktiemarknad Maskininlärning Stödvektormaskin Naive Bayes Extreme Gradient Boosting Computer and Information Sciences Data- och informationsvetenskap
18	Anomaly-based intrusion detection using Tree Augmented Naive Bayes Classifier Wester, Philip January 2021 (has links) With the rise of information technology and the dependence on these systems, it becomes increasingly more important to keep the systems secure. The possibility to detect an intrusion with intrusion detection systems (IDS) is one of multiple fundamental technologies that may increase the security of a system. One of the bigger challenges of an IDS, is to detect types of intrusions that have previously not been encountered, so called unknown intrusions. These types of intrusions are generally detected by using methods collectively called anomaly detection methods. In this thesis I evaluate the performance of the algorithm Tree Augmented Naive Bayes Classifier (TAN) as an intrusion detection classifier. More specifically, I created a TAN program from scratch in Python and tested the program on two data sets containing data traffic. The thesis aims to create a better understanding of how TAN works and evaluate if it is a reasonable algorithm for intrusion detection. The results show that TAN is able to perform at an acceptable level with a reasonably high accuracy. The results also highlights the importance of using the smoothing operator included in the standard version of TAN. / Med informationsteknikens utveckling och det ökade beroendet av dessa system, blir det alltmer viktigt att hålla systemen säkra. Intrångsdetektionssystem (IDS) är en av många fundamentala teknologier som kan öka säkerheten i ett system. En av de större utmaningarna inom IDS, är att upptäcka typer av intrång som tidigare inte stötts på, så kallade okända intrång. Dessa intrång upptäcks oftast med hjälp av metoder som kollektivt kallas för avvikelsedetektionsmetoder. I denna uppsats utvärderar jag algoritmen Tree Augmented Naive Bayes Classifiers (TAN) prestation som en intrångsdetektionsklassificerare. Jag programmerade ett TAN-program, i Python, och testade detta program på två dataset som innehöll datatrafik. Denna uppsats ämnar att skapa en bättre förståelse för hur TAN fungerar, samt utvärdera om det är en lämplig algoritm för detektion av intrång. Resultaten visar att TAN kan prestera på en acceptabel nivå, med rimligt hög noggrannhet. Resultaten markerar även betydelsen av "smoothing operator", som inkluderas i standardversionen av TAN. Intrusion detection Anomaly detection Tree Augmented Naive Bayes Machine learning Network based intrusion detection. Intrångsdetektion Avvikelsedetektion Tree Augmented Naive Bayes Maskininlärning Nätverksbaserad intrångsdetektion Computer and Information Sciences Data- och informationsvetenskap
19	Federated Online Learning with Streaming Data for Intrusion Detection Systems : Comparing Federated and Centralized Learning Methods in Online and Offline Settings Arvidsson, Victor January 2024 (has links) Background. With increased pressure from both regulatory bodies and end-users, interest in privacy preserving machine learning methods have increased among companies and researchers in the last few years. One of the main areas of research regarding this is federated learning. Further, with the current situation in the world, interest in cybersecurity is also at an all time high, where intrusion detection systems are one component of interest. With anomaly-based intrusion detection systems using machine learning methods, it is desirable that these can adapt automatically over time as the network patterns change, resulting in online learning being highly relevant for this application. Previous research has studied offline federated intrusion detection systems. However, there have been very little work performed in the study of online federated learning for intrusion detection systems. Objectives. The objective of this thesis is to evaluate the performance of online federated machine learning methods for intrusion detection systems. Furthermore, the thesis will study the performance relationship between offline and online models for both centralized and federated learning, in order to draw conclusions about the ability to extrapolate from results between the different types of models. Methods. This thesis uses a quasi-experiment to evaluate two different types of models, Naive Bayes and Semi-supervised Federated Learning on Evolving Data Streams (SFLEDS), on three different datasets, NSL-KDD, UNSW-NB15, and CIC-IDS2017. For each model, four variants are implemented: centralized offline, centralized online, federated offline and federated online, and in the federated setting the models are evaluated with 20, 30, and 40 clients. Results. The results show that the best performing model in general is the federated online SFLEDS. They also highlight an important problem with using imbalanced datasets without proper care for data preprocessing and model design. Finally, the results show that there are no general relationships between offline and online models that hold in both the centralized and federated settings in terms of prediction performance. Conclusions. The main conclusion of the thesis is that online federated learning has a lot of potential for the application of intrusion detection systems, but more research is required to find the optimal models and parameters that result in satisfactory performance. / Bakgrund. Med ökat tryck från både tillsynsorgan och slutanvändare har intresset för integritetsbevarande maskininlärning ökat hos företag och forskare under de senaste åren. Ett av huvudområdena där det forskas om detta är inom federerad inlärning. Vidare, med det nuvarande läget i världen är intresset för cybersäkerhet högre än någonsin, där bland annat intrångsdetekteringssystem är av intresse. Med avvikelsebaserade intrångsdetekteringssystem som använder sig av maskininlärning så är det önskvärt att dessa automatiskt kan anpassa sig över tid när nätverksmönster förändras, vilket resulterar i att online maskininlärning är högst relevant för området. Tidigare forskning har studerat federerade offline intrångsdetekteringssystem, men det finns väldigt lite forskning gällande federerad online maskininlärning för intrångsdetekteringssystem. Syfte. Syftet med det här arbetet är att utvärdera prestandan av federerad online maskininlärning för intrångsdetekteringssystem. Vidare kommer det här arbetet att studera prestandaförhållandet mellan offline och online modeller för både centraliserad och federerad inlärning, för att kunna dra slutsatser om förmågan att extrapolera resultat mellan olika typer av modeller. \newline\textbf{Metod.} Det här arbetet använder sig av ett kvasiexperiment för att utvärdera två olika modeller, Naive Bayes och Semi-supervised Federated Learning on Evolving Data Streams (SFLEDS), på tre olika dataset, NSL-KDD, UNSW-NB15 och CIC-IDS2017. För varje modell implementeras fyra varianter: centraliserad offline, centraliserad online, federerad offline och federerad online. De federerade modellerna utvärderas med 20, 30 och 40 klienter. Resultat. Resultaten visar att den generellt bästa modellen är online SFLEDS. De belyser även ett viktigt problem med att använda obalanserade dataset utan tillräcklig hänsyn till förbearbetning av datan och modelldesign. Slutligen visar resultaten att det inte finns något generellt samband mellan offline och online modeller som stämmer för både centraliserad och federerad inlärning när det gäller modellprestanda. Slutsatser. Den huvudsakliga slutsatsen från arbetet är att federerad online maskininlärning har stor potential för intrångsdetekteringssystem, men mer forskning krävs för att hitta den bästa modellen och de bästa parametrarna för att nå ett tillfredsställande resultat. Naive Bayes SFLEDS Semi-Supervised Learning Cybersecurity Centralized Federated Learning Naive Bayes SFLEDS Semi-övervakad maskininlärning Cybersäkerhet Centraliserad federerad maskininlärning Computer Sciences Datavetenskap (datalogi)
20	Bayesian classification of DNA barcodes Anderson, Michael P. January 1900 (has links) Doctor of Philosophy / Department of Statistics / Suzanne Dubnicka / DNA barcodes are short strands of nucleotide bases taken from the cytochrome c oxidase subunit 1 (COI) of the mitochondrial DNA (mtDNA). A single barcode may have the form C C G G C A T A G T A G G C A C T G . . . and typically ranges in length from 255 to around 700 nucleotide bases. Unlike nuclear DNA (nDNA), mtDNA remains largely unchanged as it is passed from mother to offspring. It has been proposed that these barcodes may be used as a method of differentiating between biological species (Hebert, Ratnasingham, and deWaard 2003). While this proposal is sharply debated among some taxonomists (Will and Rubinoff 2004), it has gained momentum and attention from biologists. One issue at the heart of the controversy is the use of genetic distance measures as a tool for species differentiation. Current methods of species classification utilize these distance measures that are heavily dependent on both evolutionary model assumptions as well as a clearly defined "gap" between intra- and interspecies variation (Meyer and Paulay 2005). We point out the limitations of such distance measures and propose a character-based method of species classification which utilizes an application of Bayes' rule to overcome these deficiencies. The proposed method is shown to provide accurate species-level classification. The proposed methods also provide answers to important questions not addressable with current methods. DNA Barcodes Bayesian Classification Species Discovery Naive Bayes Classifier Sequential Analysis High-dimensional Data Statistics (0463)

Search results