Global ETD Search

201	Utilização de métodos de machine learning para identificação de instrumentos musicais de sopro pelo timbre Veras, Ricardo da Costa January 2018 (has links) Orientador: Prof. Dr. Ricardo Suyama / Dissertação (mestrado) - Universidade Federal do ABC, Programa de Pós-Graduação em Engenharia da Informação, Santo André, 2018. / De forma geral a Classificação de Padrões voltada a Processamento de Sinais vem sendo estudada e utilizada para a interpretação de informações diversas, que se manifestam em forma de imagens, áudios, dados geofísicos, impulsos elétricos, entre outros. Neste trabalho são estudadas técnicas de Machine Learning aplicadas ao problema de identificação de instrumentos musicais, buscando obter um sistema automático de reconhecimento de timbres. Essas técnicas foram utilizadas especificamente com cinco instrumentos da categoria de Sopro de Madeira (o Clarinete, o Fagote, a Flauta, o Oboé e o Sax). As técnicas utilizadas foram o kNN (com k = 3) e o SVM (numa configuração não linear), assim como foram estudadas algumas características (features) dos áudios, tais como o MFCC (do inglês Mel-Frequency Cepstral Coefficients), o ZCR (do inglês Zero Crossing Rate), a entropia, entre outros, sendo fonte de dados para os processos de treinamento e de teste. Procurou-se estudar instrumentos nos quais se observa uma aproximação nos timbres, e com isso verificar como é o comportamento de um sistema classificador nessas condições específicas. Observou-se também o comportamento dessas técnicas com áudios desconhecidos do treinamento, assim como com trechos em que há uma mistura de elementos (gerando interferências para cada modelo classificador) que poderiam desviar os resultados, ou com misturas de elementos que fazem parte das classes observadas, e que se somam num mesmo áudio. Os resultados indicam que as características selecionadas possuem informações relevantes a respeito do timbre de cada um dos instrumentos avaliados (como observou-se em relação aos solos), embora a acurácia obtida para alguns dos instrumentos tenha sido abaixo do esperado (como observou-se em relação aos duetos). / In general, Pattern Classification for Signal Processing has been studied and used for the interpretation of several information, which are manifested in many ways, like: images, audios, geophysical data, electrical impulses, among others. In this project we study techniques of Machine Learning applied to the problem of identification of musical instruments, aiming to obtain an automatic system of timbres recognition. These techniques were used specifically with five instruments of Woodwind category (Clarinet, Bassoon, Flute, Oboe and Sax). The techniques used were the kNN (with k = 3) and the SVM (in a non-linear configuration), as well as some audio features, such as MFCC (Mel-Frequency Cepstral Coefficients), ZCR (Zero Crossing Rate), entropy, among others, used as data source for the training and testing processes. We tried to study instruments in which an approximation in the timbres is observed, and to verify in this case how is the behavior of a classifier system in these specific conditions. It was also observed the behavior of these techniques with audios unknown to the training, as well as with sections in which there is a mixture of elements (generating interferences for each classifier model) that could deviate the results, or with mixtures of elements that are part of the observed classes, and added in a same audio. The results indicate that the selected characteristics have relevant information regarding the timbre of each one of evaluated instruments (as observed on the solos results), although the accuracy obtained for some of the instruments was lower than expected (as observed on the duets results). SINAIS TIMBRE CLASSIFICAÇÃO ÁUDIO k-VIZINHOS MAIS PRÓXIMOS MÁQUINA DE VETORES DE SUPORTE SIGNALS CLASSIFICATION k-NEAREST NEIGHBOR SUPPORT VECTOR MACHINES
202	Clusters (k) Identification without Triangle Inequality : A newly modelled theory / Clustering(k) without Triangle Inequality : A newly modelled theory Narreddy, Naga Sambu Reddy, Durgun, Tuğrul January 2012 (has links) Cluster analysis characterizes data that are similar enough and useful into meaningful groups (clusters).For example, cluster analysis can be applicable to find group of genes and proteins that are similar, to retrieve information from World Wide Web, and to identify locations that are prone to earthquakes. So the study of clustering has become very important in several fields, which includes psychology and other social sciences, biology, statistics, pattern recognition, information retrieval, machine learning and data mining [1] [2]. Cluster analysis is the one of the widely used technique in the area of data mining. According to complexity and amount of data in a system, we can use variety of cluster analysis algorithms. K-means clustering is one of the most popular and widely used among the ten algorithms in data mining [3]. Like other clustering algorithms, it is not the silver bullet. K-means clustering requires pre analysis and knowledge before the number of clusters and their centroids are determined. Recent studies show a new approach for K-means clustering which does not require any pre knowledge for determining the number of clusters [4]. In this thesis, we propose a new clustering procedure to solve the central problem of identifying the number of clusters (k) by imitating the desired number of clusters with proper properties. The proposed algorithm is validated by investigating different characteristics of the analyzed data with modified theory, analyze parameters efficiency and their relationships. The parameters in this theory include the selection of embryo-size (m), significance level (α), distributions (d), and training set (n), in the identification of clusters (k). K-means clustering modifying K-means clustering nearest neighbor clustering general clustering procedure Kolmogorov Simonov-test parameters descriptions Computer and Information Sciences Data- och informationsvetenskap
203	Applying a Molecular Genetics Approach to Shark Conservation and Management: Assessment of DNA Barcoding in Hammerhead Sharks and Global Population Genetic Structuring in the Gray Reef Shark, Carcharhinus amblyrhynchos. Horn, Rebekah L. 01 February 2010 (has links) Chapter 1 DNA barcoding based on the mitochondrial cytochrome c oxidase subunit I (COI) gene sequence is emerging as a useful tool for identifying unknown, whole or partial organisms to species level. However, the application of only a single mitochondrial marker for robust species identification has also come under some criticism due to the possibility of erroneous identifications resulting from species hybridizations and/or the potential presence of nuclear-mitochondrial psuedogenes. The addition of a complementary nuclear DNA barcode has therefore been widely recommended to overcome these potential COI gene limitations, especially in wildlife law enforcement applications where greater confidence in the identifications is essential. In this study, we examined the comparative nucleotide sequence divergence and utility of the mitochondrial COI gene (N=182 animals) and nuclear ribosomal internal transcribed spacer 2 (ITS2) locus (N=190 animals) in the 8 known and 1 proposed cryptic species of globally widespread, hammerhead sharks (family Sphyrnidae). Since hammerhead sharks are under intense fishing pressure for their valuable fins with some species potentially set to receive CITES listing, tools for monitoring their fishery landings and tracking trade in their body parts is necessary to achieve effective management and conservation outcomes. Our results demonstrate that both COI and ITS2 loci function robustly as stand-alone barcodes for hammerhead shark species identification. Phylogenetic analyses of both loci independently and together accurately place each hammerhead species together in reciprocally monophyletic groups with strong bootstrap support. The two barcodes differed notably in levels of intraspecific divergence, with average intraspecific K2P distance an order of magnitude lower in the ITS2 (0.297% for COI and 0.0967% for ITS2). The COI barcode also showed phylogeographic separation in Sphyrna zygaena, S. lewini and S. tiburo, potentially providing a useful option for assigning unknown specimens (e.g. market fins) to a broad geographic origin. We suggest that COI supplemented by ITS2 DNA barcoding can be used in an integrated and robust approach for species assignment of unknown hammerhead sharks and their body parts in fisheries and international trade. Chapter 2 The gray reef shark (Carcharhinus amblyrhynchos) is an Indo-Pacific, coral reef associated species that likely plays an important role as apex predator in maintaining the integrity of coral reef ecosystems. Populations of this shark have declined substantially in some parts of its range due to over-fishing, with recent estimates suggesting a 17% decline per year on the Great Barrier Reef (GBR). Currently, there is no information on the population structure or genetic status of gray reef sharks to aid in their management and conservation. We assessed the genetic population structure and genetic diversity of this species by using complete mitochondrial control region sequences and 15 nuclear microsatellite markers. Gray reef shark samples (n=305) were obtained from 10 locations across the species’ known longitudinal Indo-Pacific range: western Indian Ocean (Madagascar), eastern Indian Ocean (Cocos [Keeling] Islands, Andaman Sea, Indonesia, and western Australia), central Pacific (Hawaii, Palmyra Atoll, and Fanning Atoll), and southwestern Pacific (eastern Australia – Great Barrier Reef). The mitochondrial and nuclear marker data were concordant in most cases with population-based analysis showing significant overall structure (FST = 0.27906 (pST = 0.071 ± 0.02), and significant pairwise genetic differentiation between nearly all of the putative populations sampled (i.e., 9 of the 10 for mitochondrial and 8 of the 10 for nuclear markers). Individual-based analysis of microsatellite genotypes identified at least 5 populations. The concordant mitochondrial and nuclear marker results are consistent with a scenario of very low to no appreciable connectivity (gene flow) among most of the sampled locations, suggesting that natural repopulation of overfished regions by sharks from distant reefs is unlikely. The results also indicate that conservation of genetic diversity in gray reef sharks will require management measures on relatively local scales. Our findings of extensive genetic structuring suggests that a high level of genetic isolation is also likely to be the case in unsampled populations of this species. barcoding hammerhead shark ITS2 neighbor-joining fin trade Gray reef shark control region microsatellites population connectivity Marine Biology
204	Algoritmo kNN para previsão de dados temporais: funções de previsão e critérios de seleção de vizinhos próximos aplicados a variáveis ambientais em limnologia / Time series prediction using a KNN-based algorithm prediction functions and nearest neighbor selection criteria applied to limnological data Carlos Andres Ferrero 04 March 2009 (has links) A análise de dados contendo informações sequenciais é um problema de crescente interesse devido à grande quantidade de informação que é gerada, entre outros, em processos de monitoramento. As séries temporais são um dos tipos mais comuns de dados sequenciais e consistem em observações ao longo do tempo. O algoritmo k-Nearest Neighbor - Time Series Prediction kNN-TSP é um método de previsão de dados temporais. A principal vantagem do algoritmo é a sua simplicidade, e a sua aplicabilidade na análise de séries temporais não-lineares e na previsão de comportamentos sazonais. Entretanto, ainda que ele frequentemente encontre as melhores previsões para séries temporais parcialmente periódicas, várias questões relacionadas com a determinação de seus parâmetros continuam em aberto. Este trabalho, foca-se em dois desses parâmetros, relacionados com a seleção de vizinhos mais próximos e a função de previsão. Para isso, é proposta uma abordagem simples para selecionar vizinhos mais próximos que considera a similaridade e a distância temporal de modo a selecionar os padrões mais similares e mais recentes. Também é proposta uma função de previsão que tem a propriedade de manter bom desempenho na presença de padrões em níveis diferentes da série temporal. Esses parâmetros foram avaliados empiricamente utilizando várias séries temporais, inclusive caóticas, bem como séries temporais reais referentes a variáveis ambientais do reservatório de Itaipu, disponibilizadas pela Itaipu Binacional. Três variáveis limnológicas fortemente correlacionadas são consideradas nos experimentos de previsão: temperatura da água, temperatura do ar e oxigênio dissolvido. Uma análise de correlação é realizada para verificar se os dados previstos mantem a correlação das variáveis. Os resultados mostram que, o critério de seleção de vizinhos próximos e a função de previsão, propostos neste trabalho, são promissores / Treating data that contains sequential information is an important problem that arises during the data mining process. Time series constitute a popular class of sequential data, where records are indexed by time. The k-Nearest Neighbor - Time Series Prediction kNN-TSP method is an approximator for time series prediction problems. The main advantage of this approximator is its simplicity, and is often used in nonlinear time series analysis for prediction of seasonal time series. Although kNN-TSP often finds the best fit for nearly periodic time series forecasting, some problems related to how to determine its parameters still remain. In this work, we focus in two of these parameters: the determination of the nearest neighbours and the prediction function. To this end, we propose a simple approach to select the nearest neighbours, where time is indirectly taken into account by the similarity measure, and a prediction function which is not disturbed in the presence of patterns at different levels of the time series. Both parameters were empirically evaluated on several artificial time series, including chaotic time series, as well as on a real time series related to several environmental variables from the Itaipu reservoir, made available by Itaipu Binacional. Three of the most correlated limnological variables were considered in the experiments carried out on the real time series: water temperature, air temperature and dissolved oxygen. Analyses of correlation were also accomplished to verify if the predicted variables values maintain similar correlation as the original ones. Results show that both proposals, the one related to the determination of the nearest neighbours as well as the one related to the prediction function, are promising Aprendizado de máquina Dados ambientais Funções de previsão Limnologia Previsão de dados temporais Seleção de vizinhos próximos Environmental data Limnology Machine learning Nearest neighbor selection Prediction functions Time series prediction
205	Quelques problèmes de coloration du graphe / Some coloring problems of graphs Xu, Renyu 27 May 2017 (has links) Un k-coloriage total d'un graphe G est un coloriage de V(G)cup E(G) utilisant (1,2,…,k) couleurs tel qu'aucune paire d'éléments adjacents ou incidents ne reçoivent la même couleur. Le nombre chromatique total chi''(G) est le plus petit entier k tel que G admette un k-coloriage total. Dans le chapitre 2, nous étudions la coloration totale de graphe planaires et obtenons 3 résultats : (1) Soit G un graphe planaire avec pour degré maximum Deltageq8. Si toutes les paires de 6-cycles cordaux ne sont pas adjacentes dans G, alors chi''(G)=Delta+1. (2) Soit G un graphe planaire avec pour degré maximum Deltageq8. Si tout 7-cycle de G contient au plus deux cordes, alors chi''(G)=Delta+1. (3) Soit G un graphe planaire sans 5-cycles cordaux qui s'intersectent, c'est à dire tel que tout sommet ne soit incident qu'à au plus un seul 5-cycle cordal. Si Deltageq7, alors chi''(G)=Delta+1.Une relation L est appelé assignation pour un graphe G s'il met en relation chaque x à une liste de couleur. S'il est possible de colorier G tel que la couleur de chaque x soit présente dans la liste qu'il lui a été assignée, et qu'aucune paire de sommets adjacents n'aient la même couleur, alors on dit que G est L-coloriable. Un graphe G est k-selectionable si G est L-coloriable pour toute assignation L de G qui satisfait \|L(v)geq k\| pour tout x. Nous démontrons que si chaque 5-cycle de G n'est pas simultanément adjacent à des 3-cycles et des 4-cycles, alors G est 4-sélectionable. Dans le chapitre 3, nous prouvons que si aucun des 5-cycles de G n'est adjacent à un 4-cycles, alors chi'_l(G)=Delta et chi''_l(G)=Delta+1 si Delta(G)geq8, et chi'_l(G)leqDelta+1 et chi''_l(G)leqDelta+2 si Delta(G)geq6.Dans le chapitre 4, nous allons fournir une définition du coloriage total somme-des-voisins-distinguant, et passer en revue les progrgrave{e}s et conjecture concernant ce type de coloriage. Soit f(v) la somme des couleurs d'un sommet v et des toutes les arrêtes incidentes à v. Un k-coloriage total somme-des-voisins-distinguant de G est un k coloriage total de G tel que pour chaque arrête uvin E(G), f(u)eq f(v). Le plus petit k tel qu'on ai un tel coloriage sur G est appelé le nombre chromatique total somme-des-voisins-distinguant, noté chi''_{sum} (G). Nous avons démontré que si un graphe G avec degré maximum Delta(G) peut être embedded dans une surface Sigma de caractéristique eulérienne chi(Sigma)geq0, alors chi_{sum}^{''}(G)leq max{Delta(G)+2, 16}.Une forêt linéaire est un graphe pour lequel chaque composante connexe est une chemin. L'arboricité linéaire la(G) d'un graphe G tel que définie est le nombre minimum de forêts linéaires dans G, dont l'union est égale à V(G). Dans le chapitre 5, nous prouvons que si G est une graphe planaire tel que tout 7-cycle de G contienne au plus deux cordes, alors G est linéairementleft lceil frac{Delta+1}{2}ightceil-sélectionable si Delta(G)geq6, et G est linéairement left lceil frac{Delta}{2}ightceil-sélectionable si Delta(G)geq 11. / A k-total-coloring of a graph G is a coloring of V(G)cup E(G) using (1,2,…,k) colors such that no two adjacent or incident elements receive the same color．The total chromatic number chi''(G) is the smallest integer k such that G has a k-total-coloring. In chapter 2, we study total coloring of planar graphs and obtain three results: (1) Let G be a planar graph with maximum degree Deltageq8. If every two chordal 6-cycles are not adjacent in G, then chi''(G)=Delta+1. (2) Let G be a planar graph G with maximum degree Deltageq8. If any 7-cycle of G contains at most two chords, then chi''(G)=Delta+1. (3) Let G be a planar graph without intersecting chordal 5-cycles, that is, every vertex is incident with at most one chordal 5-cycle. If Deltageq7, then chi''(G)=Delta+1.A mapping L is said to be an assignment for a graph G if it assigns a list L(x) of colors to each xin V(G)cup E(G). If it is possible to color G so that every vertex gets a color from its list and no two adjacent vertices receive the same color, then we say that G is L-colorable. A graph G is k-choosable if G is an L-colorable for any assignment L for G satisfying \|L(x)\|geq k for every vertex xin V(G)cup E(G). We prove that if every 5-cycle of G is not simultaneously adjacent to 3-cycles and 4-cycles, then G is 4-choosable. In chapter 3, if every 5-cycles of G is not adjacent to 4-cycles, we prove that chi'_l(G)=Delta, chi''_l(G)=Delta+1 if Delta(G)geq8, and chi'_l(G)leqDelta+1, chi''_l(G)leqDelta+2 if Delta(G)geq6.In chapter 4, we will give the definition of neighbor sum distinguishing total coloring. Let f(v) denote the sum of the colors of a vertex v and the colors of all incident edges of v. A total k-neighbor sum distinguishing-coloring of G is a total k-coloring of G such that for each edge uvin E(G), f(u)eq f(v). The smallestnumber k is called the neighbor sum distinguishing total chromatic number, denoted by chi''_{sum} (G). Pilsniak and Wozniak conjectured that for any graph G with maximum degree Delta(G) holds that chi''_{sum} (G)leqDelta(G)+3. We prove for a graph G with maximum degree Delta(G) which can be embedded in a surface Sigma of Euler characteristic chi(Sigma)geq0, then chi_{sum}^{''}(G)leq max{Delta(G)+2, 16}.Lastly, we study the linear L-choosable arboricity of graph. A linear forest is a graph in which each component is a path. The linear arboricity la(G) of a graph G is the minimum number of linear forests in G, whose union is the set of all edges of G. A list assignment L to the edges of G is the assignment of a set L(e)subseteq N of colors to every edge e of G, where N is the set of positive integers. If G has a coloring varphi (e) such that varphi (e)in L(e) for every edge e and (V(G),varphi^{-1}(i)) is a linear forest for any iin C_{varphi}, where C_{varphi }=left { varphi (e)\|ein E(G)ight }, then we say that G is linear L-colorable and varphi is a linear L-coloring of G. We say that G is linear k-choosable if it is linear L-colorable for every list assignment L satisfying \|L(e)\| geq k for all edges e. The list linear arboricity la_{list}(G) of a graph G is the minimum number k for which G is linear k-list colorable. It is obvious that la(G)leq la_{list}(G). In chapter 5, we prove that if G is a planar graph such that every 7-cycle of G contains at most two chords, then G is linear left lceil frac{Delta+1}{2}ightceil-choosable if Delta(G)geq6, and G is linear left lceil frac{Delta}{2}ightceil-choosable if Delta(G)geq 11. Coloration totale Coloration par liste Arboricité linéaire L-Déterminable Total coloring List coloring Neighbor sum distinguish coloring
206	Učení založené na instancích / Instance based learning Martikán, Miroslav January 2009 (has links) This thesis is specialized in instance based learning algorithms. Main goal is to create an application for educational purposes. There are instance based learning algorithms (IBL), nearest neighbor algorithms and kd-trees described theoretically in this thesis. Practical part is about making of tutorial application. Application can generate data, classified them with nearest neighbor algorithm and is able of IB1, IB2 and IB3 algorithm testing.
207	Detekce dynamických síťových aplikací / Detection of Dynamic Network Applications Burián, Pavel January 2013 (has links) This thesis deals with detection of dynamic network applications. It describes some of the existing protocols and methods of their identification from IP flow and packet contents. It constitues a design of a detection system based on the automatic creation of regular expressions and describes its implementation. It presents the created regular expressions for BitTorrent and eDonkey protocol. It compares their quality with the solution of L7-filter.
208	Image classification of pediatric pneumonia : A comparative study of supervised statistical learning techniques Rönnefall, Jacob, Wendel, Jakob January 2022 (has links) A child dies of pneumonia every 39 seconds, and the process of preventing deaths caused by pneumonia has been considerably slower compared to other infectious diseases. Meanwhile, the traditional method of manually diagnosing patients has reached its ceiling on performance. With the support of a machine learning classification algorithm to help with the screening of pneumonia from x-ray images combined with the expertise of a physician, the identification and diagnosis of pediatric pneumonia should be both quicker and more accurate. In this study, four different types of supervised machine learning algorithms have been trained, tested, and evaluated to see which model could predict most accurately whether a patient in an x-ray image has pneumonia or not. The four models included in this study have been trained by four different supervised machine learning algorithms: logistic regression, k-nearest-neighbor, support vector machine, and neural network. The results show that KNN has the highest sensitivity, NN adapts to new data the best by not being under- or overfit. SVM had the highest balanced accuracy on both train and test data but a proportionally high difference between the in- and out-sample error. In conclusion, relatively high performance can be achieved when classifying x-ray images of pneumonia even with limited resources. Machine learning Algorithm Logistic regression K-nearest-neighbor Support vector machine Neural network Sensitivity Specificity ROC Accuracy Probability Theory and Statistics Sannolikhetsteori och statistik
209	Comparing Julia and Python : An investigation of the performance on image processing with deep neural networks and classification Axillus, Viktor January 2020 (has links) Python is the most popular language when it comes to prototyping and developing machine learning algorithms. Python is an interpreted language that causes it to have a significant performance loss compared to compiled languages. Julia is a newly developed language that tries to bridge the gap between high performance but cumbersome languages such as C++ and highly abstracted but typically slow languages such as Python. However, over the years, the Python community have developed a lot of tools that addresses its performance problems. This raises the question if choosing one language over the other has any significant performance difference. This thesis compares the performance, in terms of execution time, of the two languages in the machine learning domain. More specifically, image processing with GPU-accelerated deep neural networks and classification with k-nearest neighbor on the MNIST and EMNIST dataset. Python with Keras and Tensorflow is compared against Julia with Flux for GPU-accelerated neural networks. For classification Python with Scikit-learn is compared against Julia with Nearestneighbors.jl. The results point in the direction that Julia has a performance edge in regards to GPU-accelerated deep neural networks. With Julia outperforming Python by roughly 1.25x − 1.5x. For classification with k-nearest neighbor the results were a bit more varied with Julia outperforming Python in 5 out of 8 different measurements. However, there exists some validity threats and additional research is needed that includes all different frameworks available for the languages in order to provide a more conclusive and generalized answer. julia python performance comparison machine learning image processing GPU GPU-acceleration neural networks autoencoder classification knn k-nearest neighbor Software Engineering Programvaruteknik
210	Techniky pro zarovnávání skupin biologických sekvencí / Techniques for Multiple Sequence Alignments Hrazdil, Jiří January 2009 (has links) This thesis summarizes ways of representation of biological sequences and file formats used for sequence exchange and storage. Next part deals with techniques used for sequence pairwise alignment, followed by extension of these techniques to the problem of multiple sequence alignment. Additional methods are introduced, that are suboptimal, but on the other hand are able to compute results in reasonable time. Practical part of this thesis consists of implementing multiple sequence alignment application in Java programming language.

Search results