• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 146
  • 40
  • 23
  • 20
  • 7
  • 6
  • 5
  • 5
  • 3
  • 3
  • 2
  • 2
  • 1
  • Tagged with
  • 306
  • 200
  • 90
  • 59
  • 52
  • 51
  • 41
  • 37
  • 36
  • 36
  • 33
  • 29
  • 27
  • 26
  • 25
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
201

Operação de busca exata aos K-vizinhos mais próximos reversos em espaços métricos / Answering exact reverse k-nerarest neighbors queries in metric space

Willian Dener de Oliveira 19 March 2010 (has links)
A complexidade dos dados armazenados em grandes bases de dados aumenta cada vez mais, criando a necessidade de novas operações de consulta. Uma classe de operações que tem apresentado interesse crescente são as chamadas Consultas por Similaridade, sendo as mais conhecidas as consultas por Abrangência (\'R IND. q\') e por k-Vizinhos mais Proximos (kNN), sendo que esta ultima obtem quais são os k elementos armazenados mais similares a um dado elemento de referência. Outra consulta que é interessante tanto para consultas diretas quanto como parte de operações de análises mais complexas e a operação de consulta aos k-Vizinhos mais Próximos Reversos (RkNN). Seu objetivo e obter todos os elementos armazenados que têm um dado elemento de referência como um dos seus k elementos mais similares. Devido a complexidade de execução da operação de RkNN, a grande maioria das soluções existentes restringem-se a dados representados em espaços multidimensionais euclidianos (nos quais estão denidas tambem operações cardinais e topológicas, além de se considerar a similaridade como sendo a distância Euclidiana entre dois elementos), ou então obtém apenas respostas aproximadas, sujeitas a existência de falsos negativos. Várias aplicações de análise de dados científicos, médicos, de engenharia, financeiros, etc. requerem soluções eficientes para o problema da operação de RkNN sobre dados representados em espaços métricos, onde os elementos não podem ser considerados estar em um espaço nem Euclidiano nem multidimensional. Num espaço métrico, além dos próprios elementos armazenados existe apenas uma função de comparação métrica entre pares de objetos. Neste trabalho, são propostas novas podas de espaço de busca e o algoritmo RkNN-MG que utiliza essas novas podas para solucionar o problema de consultas RkNN exatas em espaços métricos sem limitações. Toda a proposta supõe que o conjunto de dados esta em um espaço métrico imerso isometricamente em espaço euclidiano e utiliza propriedades da geometria métrica válida neste espaço para realizar podas eficientes por lei dos cossenos combinada com as podas tradicionais por desigualdade triangular. Os experimentos demonstram comparativamente que as novas podas são mais eficientes que as tradicionais podas por desigualdade triangular, tendo desempenhos equivalente quando comparadas em conjuntos de alta dimensionalidade ou com dimensão fractal alta. Assim, os resultados confirmam as novas podas propostas como soluções alternativas eficientes para o problema de consultas RkNN / Data stored in large databases present an ever increasing complexity, pressing for the development of new classes of query operators. One such class, which is enticing an increasing interest, is the so-called Similarity Queries, where the most common are the similarity range queries (\'R IND. q\') and the k-nearest neighbor queries (kNN). A k-nearest neighbor query aims at retrieving the k stored elements nearer (or more similar) to a given reference element. Another important similarity query is the reverse k-nearest neighbor (RkNN), useful both for queries posed directly by the analyst and for queries that are part of more complex analysis processes. The objective of a reverse k-nearest neighbor queries is obtaining the stored elements that has the query reference element as one of their k-nearest neighbors. As the RkNN operation is a rather expensive operation, from the computational standpoint, most existing solutions only solve the query when applied over Euclidean multidimensional spaces (as these spaces also define cardinal and topological operations besides the Euclidean distance between pairs of elements) or retrieve only approximate answers, where false negatives can occur. Several applications, like the analysis of scientific, medical, engineering or financial data, require efficient and exact answers for the RkNN queries over data which is frequently represented in metric spaces, that is where no other property besides the similarity measure exists. Therefore, for applications handling metrical data, the assumption of Euclidean metric or even multidimensional data cannot be used. In this work, we propose new pruning rules based on the law of cosines, and the RkNN-MG algorithm, which uses them to solve RkNN queries in a way that is exact, faster than the existing approaches, that is not limited for any value of k, and that can be applied both over static and over dynamic datasets. The new pruning rules assume that the data set is in a metric space that can be embedded into an Euclidean space and use metric geometry properties valid in this space to perform effective pruning based on the law of cosines combined with the traditional pruning based on the triangle inequality property. The experiments show that the new pruning rules are alkways more efficient than the traditional pruning rules based solely on the triangle inequality. The experiments show that for high high dimensionality datasets, or for metric datasets with high fractal dimensionality, the performance improvement is smaller than for for lower dimensioinality datasets, but it\'s never worse. Thus, the results confirm that the our pruning rules are efficient alternative to solve RkNN queries in general
202

Utilização de métodos de machine learning para identificação de instrumentos musicais de sopro pelo timbre

Veras, Ricardo da Costa January 2018 (has links)
Orientador: Prof. Dr. Ricardo Suyama / Dissertação (mestrado) - Universidade Federal do ABC, Programa de Pós-Graduação em Engenharia da Informação, Santo André, 2018. / De forma geral a Classificação de Padrões voltada a Processamento de Sinais vem sendo estudada e utilizada para a interpretação de informações diversas, que se manifestam em forma de imagens, áudios, dados geofísicos, impulsos elétricos, entre outros. Neste trabalho são estudadas técnicas de Machine Learning aplicadas ao problema de identificação de instrumentos musicais, buscando obter um sistema automático de reconhecimento de timbres. Essas técnicas foram utilizadas especificamente com cinco instrumentos da categoria de Sopro de Madeira (o Clarinete, o Fagote, a Flauta, o Oboé e o Sax). As técnicas utilizadas foram o kNN (com k = 3) e o SVM (numa configuração não linear), assim como foram estudadas algumas características (features) dos áudios, tais como o MFCC (do inglês Mel-Frequency Cepstral Coefficients), o ZCR (do inglês Zero Crossing Rate), a entropia, entre outros, sendo fonte de dados para os processos de treinamento e de teste. Procurou-se estudar instrumentos nos quais se observa uma aproximação nos timbres, e com isso verificar como é o comportamento de um sistema classificador nessas condições específicas. Observou-se também o comportamento dessas técnicas com áudios desconhecidos do treinamento, assim como com trechos em que há uma mistura de elementos (gerando interferências para cada modelo classificador) que poderiam desviar os resultados, ou com misturas de elementos que fazem parte das classes observadas, e que se somam num mesmo áudio. Os resultados indicam que as características selecionadas possuem informações relevantes a respeito do timbre de cada um dos instrumentos avaliados (como observou-se em relação aos solos), embora a acurácia obtida para alguns dos instrumentos tenha sido abaixo do esperado (como observou-se em relação aos duetos). / In general, Pattern Classification for Signal Processing has been studied and used for the interpretation of several information, which are manifested in many ways, like: images, audios, geophysical data, electrical impulses, among others. In this project we study techniques of Machine Learning applied to the problem of identification of musical instruments, aiming to obtain an automatic system of timbres recognition. These techniques were used specifically with five instruments of Woodwind category (Clarinet, Bassoon, Flute, Oboe and Sax). The techniques used were the kNN (with k = 3) and the SVM (in a non-linear configuration), as well as some audio features, such as MFCC (Mel-Frequency Cepstral Coefficients), ZCR (Zero Crossing Rate), entropy, among others, used as data source for the training and testing processes. We tried to study instruments in which an approximation in the timbres is observed, and to verify in this case how is the behavior of a classifier system in these specific conditions. It was also observed the behavior of these techniques with audios unknown to the training, as well as with sections in which there is a mixture of elements (generating interferences for each classifier model) that could deviate the results, or with mixtures of elements that are part of the observed classes, and added in a same audio. The results indicate that the selected characteristics have relevant information regarding the timbre of each one of evaluated instruments (as observed on the solos results), although the accuracy obtained for some of the instruments was lower than expected (as observed on the duets results).
203

Clusters (k) Identification without Triangle Inequality : A newly modelled theory / Clustering(k) without Triangle Inequality : A newly modelled theory

Narreddy, Naga Sambu Reddy, Durgun, Tuğrul January 2012 (has links)
Cluster analysis characterizes data that are similar enough and useful into meaningful groups (clusters).For example, cluster analysis can be applicable to find group of genes and proteins that are similar, to retrieve information from World Wide Web, and to identify locations that are prone to earthquakes. So the study of clustering has become very important in several fields, which includes psychology and other social sciences, biology, statistics, pattern recognition, information retrieval, machine learning and data mining [1] [2].   Cluster analysis is the one of the widely used technique in the area of data mining. According to complexity and amount of data in a system, we can use variety of cluster analysis algorithms. K-means clustering is one of the most popular and widely used among the ten algorithms in data mining [3]. Like other clustering algorithms, it is not the silver bullet. K-means clustering requires pre analysis and knowledge before the number of clusters and their centroids are determined. Recent studies show a new approach for K-means clustering which does not require any pre knowledge for determining the number of clusters [4].   In this thesis, we propose a new clustering procedure to solve the central problem of identifying the number of clusters (k) by imitating the desired number of clusters with proper properties. The proposed algorithm is validated by investigating different characteristics of the analyzed data with modified theory, analyze parameters efficiency and their relationships. The parameters in this theory include the selection of embryo-size (m), significance level (α), distributions (d), and training set (n), in the identification of clusters (k).
204

Applying a Molecular Genetics Approach to Shark Conservation and Management: Assessment of DNA Barcoding in Hammerhead Sharks and Global Population Genetic Structuring in the Gray Reef Shark, Carcharhinus amblyrhynchos.

Horn, Rebekah L. 01 February 2010 (has links)
Chapter 1 DNA barcoding based on the mitochondrial cytochrome c oxidase subunit I (COI) gene sequence is emerging as a useful tool for identifying unknown, whole or partial organisms to species level. However, the application of only a single mitochondrial marker for robust species identification has also come under some criticism due to the possibility of erroneous identifications resulting from species hybridizations and/or the potential presence of nuclear-mitochondrial psuedogenes. The addition of a complementary nuclear DNA barcode has therefore been widely recommended to overcome these potential COI gene limitations, especially in wildlife law enforcement applications where greater confidence in the identifications is essential. In this study, we examined the comparative nucleotide sequence divergence and utility of the mitochondrial COI gene (N=182 animals) and nuclear ribosomal internal transcribed spacer 2 (ITS2) locus (N=190 animals) in the 8 known and 1 proposed cryptic species of globally widespread, hammerhead sharks (family Sphyrnidae). Since hammerhead sharks are under intense fishing pressure for their valuable fins with some species potentially set to receive CITES listing, tools for monitoring their fishery landings and tracking trade in their body parts is necessary to achieve effective management and conservation outcomes. Our results demonstrate that both COI and ITS2 loci function robustly as stand-alone barcodes for hammerhead shark species identification. Phylogenetic analyses of both loci independently and together accurately place each hammerhead species together in reciprocally monophyletic groups with strong bootstrap support. The two barcodes differed notably in levels of intraspecific divergence, with average intraspecific K2P distance an order of magnitude lower in the ITS2 (0.297% for COI and 0.0967% for ITS2). The COI barcode also showed phylogeographic separation in Sphyrna zygaena, S. lewini and S. tiburo, potentially providing a useful option for assigning unknown specimens (e.g. market fins) to a broad geographic origin. We suggest that COI supplemented by ITS2 DNA barcoding can be used in an integrated and robust approach for species assignment of unknown hammerhead sharks and their body parts in fisheries and international trade. Chapter 2 The gray reef shark (Carcharhinus amblyrhynchos) is an Indo-Pacific, coral reef associated species that likely plays an important role as apex predator in maintaining the integrity of coral reef ecosystems. Populations of this shark have declined substantially in some parts of its range due to over-fishing, with recent estimates suggesting a 17% decline per year on the Great Barrier Reef (GBR). Currently, there is no information on the population structure or genetic status of gray reef sharks to aid in their management and conservation. We assessed the genetic population structure and genetic diversity of this species by using complete mitochondrial control region sequences and 15 nuclear microsatellite markers. Gray reef shark samples (n=305) were obtained from 10 locations across the species’ known longitudinal Indo-Pacific range: western Indian Ocean (Madagascar), eastern Indian Ocean (Cocos [Keeling] Islands, Andaman Sea, Indonesia, and western Australia), central Pacific (Hawaii, Palmyra Atoll, and Fanning Atoll), and southwestern Pacific (eastern Australia – Great Barrier Reef). The mitochondrial and nuclear marker data were concordant in most cases with population-based analysis showing significant overall structure (FST = 0.27906 (pST = 0.071 ± 0.02), and significant pairwise genetic differentiation between nearly all of the putative populations sampled (i.e., 9 of the 10 for mitochondrial and 8 of the 10 for nuclear markers). Individual-based analysis of microsatellite genotypes identified at least 5 populations. The concordant mitochondrial and nuclear marker results are consistent with a scenario of very low to no appreciable connectivity (gene flow) among most of the sampled locations, suggesting that natural repopulation of overfished regions by sharks from distant reefs is unlikely. The results also indicate that conservation of genetic diversity in gray reef sharks will require management measures on relatively local scales. Our findings of extensive genetic structuring suggests that a high level of genetic isolation is also likely to be the case in unsampled populations of this species.
205

Algoritmo kNN para previsão de dados temporais: funções de previsão e critérios de seleção de vizinhos próximos aplicados a variáveis ambientais em limnologia / Time series prediction using a KNN-based algorithm prediction functions and nearest neighbor selection criteria applied to limnological data

Carlos Andres Ferrero 04 March 2009 (has links)
A análise de dados contendo informações sequenciais é um problema de crescente interesse devido à grande quantidade de informação que é gerada, entre outros, em processos de monitoramento. As séries temporais são um dos tipos mais comuns de dados sequenciais e consistem em observações ao longo do tempo. O algoritmo k-Nearest Neighbor - Time Series Prediction kNN-TSP é um método de previsão de dados temporais. A principal vantagem do algoritmo é a sua simplicidade, e a sua aplicabilidade na análise de séries temporais não-lineares e na previsão de comportamentos sazonais. Entretanto, ainda que ele frequentemente encontre as melhores previsões para séries temporais parcialmente periódicas, várias questões relacionadas com a determinação de seus parâmetros continuam em aberto. Este trabalho, foca-se em dois desses parâmetros, relacionados com a seleção de vizinhos mais próximos e a função de previsão. Para isso, é proposta uma abordagem simples para selecionar vizinhos mais próximos que considera a similaridade e a distância temporal de modo a selecionar os padrões mais similares e mais recentes. Também é proposta uma função de previsão que tem a propriedade de manter bom desempenho na presença de padrões em níveis diferentes da série temporal. Esses parâmetros foram avaliados empiricamente utilizando várias séries temporais, inclusive caóticas, bem como séries temporais reais referentes a variáveis ambientais do reservatório de Itaipu, disponibilizadas pela Itaipu Binacional. Três variáveis limnológicas fortemente correlacionadas são consideradas nos experimentos de previsão: temperatura da água, temperatura do ar e oxigênio dissolvido. Uma análise de correlação é realizada para verificar se os dados previstos mantem a correlação das variáveis. Os resultados mostram que, o critério de seleção de vizinhos próximos e a função de previsão, propostos neste trabalho, são promissores / Treating data that contains sequential information is an important problem that arises during the data mining process. Time series constitute a popular class of sequential data, where records are indexed by time. The k-Nearest Neighbor - Time Series Prediction kNN-TSP method is an approximator for time series prediction problems. The main advantage of this approximator is its simplicity, and is often used in nonlinear time series analysis for prediction of seasonal time series. Although kNN-TSP often finds the best fit for nearly periodic time series forecasting, some problems related to how to determine its parameters still remain. In this work, we focus in two of these parameters: the determination of the nearest neighbours and the prediction function. To this end, we propose a simple approach to select the nearest neighbours, where time is indirectly taken into account by the similarity measure, and a prediction function which is not disturbed in the presence of patterns at different levels of the time series. Both parameters were empirically evaluated on several artificial time series, including chaotic time series, as well as on a real time series related to several environmental variables from the Itaipu reservoir, made available by Itaipu Binacional. Three of the most correlated limnological variables were considered in the experiments carried out on the real time series: water temperature, air temperature and dissolved oxygen. Analyses of correlation were also accomplished to verify if the predicted variables values maintain similar correlation as the original ones. Results show that both proposals, the one related to the determination of the nearest neighbours as well as the one related to the prediction function, are promising
206

Quelques problèmes de coloration du graphe / Some coloring problems of graphs

Xu, Renyu 27 May 2017 (has links)
Un k-coloriage total d'un graphe G est un coloriage de V(G)cup E(G) utilisant (1,2,…,k) couleurs tel qu'aucune paire d'éléments adjacents ou incidents ne reçoivent la même couleur. Le nombre chromatique total chi''(G) est le plus petit entier k tel que G admette un k-coloriage total. Dans le chapitre 2, nous étudions la coloration totale de graphe planaires et obtenons 3 résultats : (1) Soit G un graphe planaire avec pour degré maximum Deltageq8. Si toutes les paires de 6-cycles cordaux ne sont pas adjacentes dans G, alors chi''(G)=Delta+1. (2) Soit G un graphe planaire avec pour degré maximum Deltageq8. Si tout 7-cycle de G contient au plus deux cordes, alors chi''(G)=Delta+1. (3) Soit G un graphe planaire sans 5-cycles cordaux qui s'intersectent, c'est à dire tel que tout sommet ne soit incident qu'à au plus un seul 5-cycle cordal. Si Deltageq7, alors chi''(G)=Delta+1.Une relation L est appelé assignation pour un graphe G s'il met en relation chaque x à une liste de couleur. S'il est possible de colorier G tel que la couleur de chaque x soit présente dans la liste qu'il lui a été assignée, et qu'aucune paire de sommets adjacents n'aient la même couleur, alors on dit que G est L-coloriable. Un graphe G est k-selectionable si G est L-coloriable pour toute assignation L de G qui satisfait |L(v)geq k| pour tout x. Nous démontrons que si chaque 5-cycle de G n'est pas simultanément adjacent à des 3-cycles et des 4-cycles, alors G est 4-sélectionable. Dans le chapitre 3, nous prouvons que si aucun des 5-cycles de G n'est adjacent à un 4-cycles, alors chi'_l(G)=Delta et chi''_l(G)=Delta+1 si Delta(G)geq8, et chi'_l(G)leqDelta+1 et chi''_l(G)leqDelta+2 si Delta(G)geq6.Dans le chapitre 4, nous allons fournir une définition du coloriage total somme-des-voisins-distinguant, et passer en revue les progrgrave{e}s et conjecture concernant ce type de coloriage. Soit f(v) la somme des couleurs d'un sommet v et des toutes les arrêtes incidentes à v. Un k-coloriage total somme-des-voisins-distinguant de G est un k coloriage total de G tel que pour chaque arrête uvin E(G), f(u)eq f(v). Le plus petit k tel qu'on ai un tel coloriage sur G est appelé le nombre chromatique total somme-des-voisins-distinguant, noté chi''_{sum} (G). Nous avons démontré que si un graphe G avec degré maximum Delta(G) peut être embedded dans une surface Sigma de caractéristique eulérienne chi(Sigma)geq0, alors chi_{sum}^{''}(G)leq max{Delta(G)+2, 16}.Une forêt linéaire est un graphe pour lequel chaque composante connexe est une chemin. L'arboricité linéaire la(G) d'un graphe G tel que définie est le nombre minimum de forêts linéaires dans G, dont l'union est égale à V(G). Dans le chapitre 5, nous prouvons que si G est une graphe planaire tel que tout 7-cycle de G contienne au plus deux cordes, alors G est linéairementleft lceil frac{Delta+1}{2}ightceil-sélectionable si Delta(G)geq6, et G est linéairement left lceil frac{Delta}{2}ightceil-sélectionable si Delta(G)geq 11. / A k-total-coloring of a graph G is a coloring of V(G)cup E(G) using (1,2,…,k) colors such that no two adjacent or incident elements receive the same color.The total chromatic number chi''(G) is the smallest integer k such that G has a k-total-coloring. In chapter 2, we study total coloring of planar graphs and obtain three results: (1) Let G be a planar graph with maximum degree Deltageq8. If every two chordal 6-cycles are not adjacent in G, then chi''(G)=Delta+1. (2) Let G be a planar graph G with maximum degree Deltageq8. If any 7-cycle of G contains at most two chords, then chi''(G)=Delta+1. (3) Let G be a planar graph without intersecting chordal 5-cycles, that is, every vertex is incident with at most one chordal 5-cycle. If Deltageq7, then chi''(G)=Delta+1.A mapping L is said to be an assignment for a graph G if it assigns a list L(x) of colors to each xin V(G)cup E(G). If it is possible to color G so that every vertex gets a color from its list and no two adjacent vertices receive the same color, then we say that G is L-colorable. A graph G is k-choosable if G is an L-colorable for any assignment L for G satisfying |L(x)|geq k for every vertex xin V(G)cup E(G). We prove that if every 5-cycle of G is not simultaneously adjacent to 3-cycles and 4-cycles, then G is 4-choosable. In chapter 3, if every 5-cycles of G is not adjacent to 4-cycles, we prove that chi'_l(G)=Delta, chi''_l(G)=Delta+1 if Delta(G)geq8, and chi'_l(G)leqDelta+1, chi''_l(G)leqDelta+2 if Delta(G)geq6.In chapter 4, we will give the definition of neighbor sum distinguishing total coloring. Let f(v) denote the sum of the colors of a vertex v and the colors of all incident edges of v. A total k-neighbor sum distinguishing-coloring of G is a total k-coloring of G such that for each edge uvin E(G), f(u)eq f(v). The smallestnumber k is called the neighbor sum distinguishing total chromatic number, denoted by chi''_{sum} (G). Pilsniak and Wozniak conjectured that for any graph G with maximum degree Delta(G) holds that chi''_{sum} (G)leqDelta(G)+3. We prove for a graph G with maximum degree Delta(G) which can be embedded in a surface Sigma of Euler characteristic chi(Sigma)geq0, then chi_{sum}^{''}(G)leq max{Delta(G)+2, 16}.Lastly, we study the linear L-choosable arboricity of graph. A linear forest is a graph in which each component is a path. The linear arboricity la(G) of a graph G is the minimum number of linear forests in G, whose union is the set of all edges of G. A list assignment L to the edges of G is the assignment of a set L(e)subseteq N of colors to every edge e of G, where N is the set of positive integers. If G has a coloring varphi (e) such that varphi (e)in L(e) for every edge e and (V(G),varphi^{-1}(i)) is a linear forest for any iin C_{varphi}, where C_{varphi }=left { varphi (e)|ein E(G)ight }, then we say that G is linear L-colorable and varphi is a linear L-coloring of G. We say that G is linear k-choosable if it is linear L-colorable for every list assignment L satisfying |L(e)| geq k for all edges e. The list linear arboricity la_{list}(G) of a graph G is the minimum number k for which G is linear k-list colorable. It is obvious that la(G)leq la_{list}(G). In chapter 5, we prove that if G is a planar graph such that every 7-cycle of G contains at most two chords, then G is linear left lceil frac{Delta+1}{2}ightceil-choosable if Delta(G)geq6, and G is linear left lceil frac{Delta}{2}ightceil-choosable if Delta(G)geq 11.
207

Učení založené na instancích / Instance based learning

Martikán, Miroslav January 2009 (has links)
This thesis is specialized in instance based learning algorithms. Main goal is to create an application for educational purposes. There are instance based learning algorithms (IBL), nearest neighbor algorithms and kd-trees described theoretically in this thesis. Practical part is about making of tutorial application. Application can generate data, classified them with nearest neighbor algorithm and is able of IB1, IB2 and IB3 algorithm testing.
208

Detekce dynamických síťových aplikací / Detection of Dynamic Network Applications

Burián, Pavel January 2013 (has links)
This thesis deals with detection of dynamic network applications. It describes some of the existing protocols and methods of their identification from IP flow and packet contents. It constitues a design of a detection system based on the automatic creation of regular expressions and describes its implementation. It presents the created regular expressions for BitTorrent and eDonkey protocol. It compares their quality with the solution of L7-filter.
209

Image classification of pediatric pneumonia : A comparative study of supervised statistical learning techniques

Rönnefall, Jacob, Wendel, Jakob January 2022 (has links)
A child dies of pneumonia every 39 seconds, and the process of preventing deaths caused by pneumonia has been considerably slower compared to other infectious diseases. Meanwhile, the traditional method of manually diagnosing patients has reached its ceiling on performance. With the support of a machine learning classification algorithm to help with the screening of pneumonia from x-ray images combined with the expertise of a physician, the identification and diagnosis of pediatric pneumonia should be both quicker and more accurate. In this study, four different types of supervised machine learning algorithms have been trained, tested, and evaluated to see which model could predict most accurately whether a patient in an x-ray image has pneumonia or not. The four models included in this study have been trained by four different supervised machine learning algorithms: logistic regression, k-nearest-neighbor, support vector machine, and neural network. The results show that KNN has the highest sensitivity, NN adapts to new data the best by not being under- or overfit. SVM had the highest balanced accuracy on both train and test data but a proportionally high difference between the in- and out-sample error. In conclusion, relatively high performance can be achieved when classifying x-ray images of pneumonia even with limited resources.
210

Comparing Julia and Python : An investigation of the performance on image processing with deep neural networks and classification

Axillus, Viktor January 2020 (has links)
Python is the most popular language when it comes to prototyping and developing machine learning algorithms. Python is an interpreted language that causes it to have a significant performance loss compared to compiled languages. Julia is a newly developed language that tries to bridge the gap between high performance but cumbersome languages such as C++ and highly abstracted but typically slow languages such as Python. However, over the years, the Python community have developed a lot of tools that addresses its performance problems. This raises the question if choosing one language over the other has any significant performance difference. This thesis compares the performance, in terms of execution time, of the two languages in the machine learning domain. More specifically, image processing with GPU-accelerated deep neural networks and classification with k-nearest neighbor on the MNIST and EMNIST dataset. Python with Keras and Tensorflow is compared against Julia with Flux for GPU-accelerated neural networks. For classification Python with Scikit-learn is compared against Julia with Nearestneighbors.jl. The results point in the direction that Julia has a performance edge in regards to GPU-accelerated deep neural networks. With Julia outperforming Python by roughly 1.25x − 1.5x. For classification with k-nearest neighbor the results were a bit more varied with Julia outperforming Python in 5 out of 8 different measurements. However, there exists some validity threats and additional research is needed that includes all different frameworks available for the languages in order to provide a more conclusive and generalized answer.

Page generated in 0.0302 seconds