Global ETD Search

131	Evaluation of Supervised Machine LearningAlgorithms for Detecting Anomalies in Vehicle’s Off-Board Sensor Data Wahab, Nor-Ul January 2018 (has links) A diesel particulate filter (DPF) is designed to physically remove diesel particulate matter or soot from the exhaust gas of a diesel engine. Frequently replacing DPF is a waste of resource and waiting for full utilization is risky and very costly, so, what is the optimal time/milage to change DPF? Answering this question is very difficult without knowing when the DPF is changed in a vehicle. We are finding the answer with supervised machine learning algorithms for detecting anomalies in vehicles off-board sensor data (operational data of vehicles). Filter change is considered an anomaly because it is rare as compared to normal data. Non-sequential machine learning algorithms for anomaly detection like oneclass support vector machine (OC-SVM), k-nearest neighbor (K-NN), and random forest (RF) are applied for the first time on DPF dataset. The dataset is unbalanced, and accuracy is found misleading as a performance measure for the algorithms. Precision, recall, and F1-score are found good measure for the performance of the machine learning algorithms when the data is unbalanced. RF gave highest F1-score of 0.55 than K-NN (0.52) and OCSVM (0.51). It means that RF perform better than K-NN and OC-SVM but after further investigation it is concluded that the results are not satisfactory. However, a sequential approach should have been tried which could yield better result. Anomaly detection rule-based one class support vector machine k-nearest neighbor random forest confusion matrix accuracy precision recall F1-score Social Sciences Interdisciplinary
132	Alterações na legislação brasileira de manejo florestal e seus efeitos na distribuição espacial e polinização de espécies madeireiras amazônicas / Changes in Brazilian forest management legislation and their effects on spatial distribution and pollination of Amazonian timber species Vanessa Erler Sontag 29 August 2017 (has links) Conhecer o comportamento espacial e demográfico e a dinâmica genética das espécies madeireiras e manter uma distância entre as árvores que permita sua reprodução é essencial para o desenvolvimento de procedimentos de manejo que visem a conservação das espécies e garantia de estoques futuros de madeira. No entanto, quando uma área é explorada para fins madeireiros, as árvores remanescentes podem não ficar a uma distância viável a polinização. A legislação brasileira atual limita a exploração de espécies com baixa densidade de ocorrência e define alguns critérios para a escolha das árvores remanescentes, porém, eles levam em consideração apenas o número de indivíduos e não os fatores ecológicos e genéticos das espécies além de serem os mesmos aplicados a toda Amazônia. O objetivo deste trabalho foi analisar o comportamento espacial de três espécies madeireiras, a Manilkara huberi, a Hymenaea courbaril e o Handroanthus serratifolius, em quatro áreas de estudo na Amazônia brasileira a partir de inventários de empresas florestais e verificar a implicação das últimas mudanças ocorridas na legislação no processo de polinização dessas espécies. O trabalho foi dividido em duas partes. A primeira verificou se essas três espécies possuem o mesmo padrão espacial em diferentes regiões da Amazônia e discutiu a questão da raridade presente na legislação. Foi calculada a densidade e a matriz do vizinho mais próximo para todos os indivíduos antes do corte das três espécies em cada área de estudo e as distâncias plotadas em um gráfico quantil-quantil. Os resultados mostraram que a Manilkara huberi é uma espécie que pode ser encontrada em alta ou baixa densidade e em agregados ou não dependendo da região de ocorrência, diferente do Handroanthus serratifolius que apresenta uma densidade e padrão de distribuição semelhante independente da região de ocorrência. A Hymenaea courbaril permeia entre essas duas situações. Notou-se uma semelhança na distribuição das espécies entre as áreas próximas. A segunda parte analisou as consequências da alteração da legislação na distância entre as árvores remanescentes das três espécies e verificou se essa distância era viável para o processo de polinização. Foi simulado o corte a partir de cenários legislativos, em que apenas o diâmetro mínimo de corte (DMC) foi alterado. Os resultados mostraram que houve uma diminuição na distância entre árvores. A diminuição favoreceu o processo de polinização visto que os polinizadores precisam percorrer menores distâncias na busca por alimento. A legislação tem tomado um caminho mais conservativo, porém há muito o que ser desenvolvido, visto que cada espécie possui sua própria ecologia reprodutiva mas são manejadas da mesma forma. / The information about the spatial and demographic behavior and the genetic dynamic of timber species and maintaining a distance between trees that allows their reproduction is essential for the development of management procedures to conserve species and guarantee future wood stocks. However, when an area is harvested for timber purposes, the remaining trees may not stay at a feasible distance for pollination. Current Brazilian legislation limits the exploitation of low-density species and defines some criteria for choosing the remaining trees. However, they take into account only the number of individuals and not the ecology and genetic aspects of the species. Besides, the same criteria are applied to the entire Amazon. The aim of this study was to analyze the spatial behavior of three timber species, Manilkara huberi, Hymenaea courbaril and Handroanthus serratifolius, in four study areas in the Brazilian Amazon Forest. Companies inventories were used to verify the implication of the latest changes in the Brazilian legislation on the pollination process of these species. The study was divided into two parts. The first one verified if these three species have the same spatial pattern in different regions of the Amazon and discussed the rarity issue in the legislation. The density and the nearest neighbor distance matrix were calculated for all individuals before cutting for the three species in each study area and the distances were plotted on a quantile-quantile plot. The results showed that Manilkara huberi can be found in high or low density and aggregated or not depending on the region of occurrence. On the other hand, other than Handroanthus serratifolius populations present similar densities and distribution patterns despite region of occurrence. Hymenaea courbaril permeates between these two situations. The distribution of this species among nearby areas showed similarity. The second part of this work analyzed the consequences of changes in Brazilian forest management legislation on the distance between the remaining trees of the three species and verified whether this distance was feasible for the pollination process. The cutting was simulated based on two legislative scenarios, in which only the minimum cut diameter (MCD) was changed. The results showed that there was a decrease in the distance between trees due to the increase of the density of remaining individuals. The distance decrease favored the pollination process, since pollinators need to travel shorter distances searching for food. Brazilian forest legislation has taken a more conservative path, but there is still much to be developed, since each species has its own reproductive ecology, even so are managed the same way. Amazônia Distância do vizinho mais próximo Exploração de impacto reduzido Manejo florestal sustentável Amazon Nearest neighbor distance Reduced impact logging Sustainable forest management
133	Operação de busca exata aos K-vizinhos mais próximos reversos em espaços métricos / Answering exact reverse k-nerarest neighbors queries in metric space Willian Dener de Oliveira 19 March 2010 (has links) A complexidade dos dados armazenados em grandes bases de dados aumenta cada vez mais, criando a necessidade de novas operações de consulta. Uma classe de operações que tem apresentado interesse crescente são as chamadas Consultas por Similaridade, sendo as mais conhecidas as consultas por Abrangência (\'R IND. q\') e por k-Vizinhos mais Proximos (kNN), sendo que esta ultima obtem quais são os k elementos armazenados mais similares a um dado elemento de referência. Outra consulta que é interessante tanto para consultas diretas quanto como parte de operações de análises mais complexas e a operação de consulta aos k-Vizinhos mais Próximos Reversos (RkNN). Seu objetivo e obter todos os elementos armazenados que têm um dado elemento de referência como um dos seus k elementos mais similares. Devido a complexidade de execução da operação de RkNN, a grande maioria das soluções existentes restringem-se a dados representados em espaços multidimensionais euclidianos (nos quais estão denidas tambem operações cardinais e topológicas, além de se considerar a similaridade como sendo a distância Euclidiana entre dois elementos), ou então obtém apenas respostas aproximadas, sujeitas a existência de falsos negativos. Várias aplicações de análise de dados científicos, médicos, de engenharia, financeiros, etc. requerem soluções eficientes para o problema da operação de RkNN sobre dados representados em espaços métricos, onde os elementos não podem ser considerados estar em um espaço nem Euclidiano nem multidimensional. Num espaço métrico, além dos próprios elementos armazenados existe apenas uma função de comparação métrica entre pares de objetos. Neste trabalho, são propostas novas podas de espaço de busca e o algoritmo RkNN-MG que utiliza essas novas podas para solucionar o problema de consultas RkNN exatas em espaços métricos sem limitações. Toda a proposta supõe que o conjunto de dados esta em um espaço métrico imerso isometricamente em espaço euclidiano e utiliza propriedades da geometria métrica válida neste espaço para realizar podas eficientes por lei dos cossenos combinada com as podas tradicionais por desigualdade triangular. Os experimentos demonstram comparativamente que as novas podas são mais eficientes que as tradicionais podas por desigualdade triangular, tendo desempenhos equivalente quando comparadas em conjuntos de alta dimensionalidade ou com dimensão fractal alta. Assim, os resultados confirmam as novas podas propostas como soluções alternativas eficientes para o problema de consultas RkNN / Data stored in large databases present an ever increasing complexity, pressing for the development of new classes of query operators. One such class, which is enticing an increasing interest, is the so-called Similarity Queries, where the most common are the similarity range queries (\'R IND. q\') and the k-nearest neighbor queries (kNN). A k-nearest neighbor query aims at retrieving the k stored elements nearer (or more similar) to a given reference element. Another important similarity query is the reverse k-nearest neighbor (RkNN), useful both for queries posed directly by the analyst and for queries that are part of more complex analysis processes. The objective of a reverse k-nearest neighbor queries is obtaining the stored elements that has the query reference element as one of their k-nearest neighbors. As the RkNN operation is a rather expensive operation, from the computational standpoint, most existing solutions only solve the query when applied over Euclidean multidimensional spaces (as these spaces also define cardinal and topological operations besides the Euclidean distance between pairs of elements) or retrieve only approximate answers, where false negatives can occur. Several applications, like the analysis of scientific, medical, engineering or financial data, require efficient and exact answers for the RkNN queries over data which is frequently represented in metric spaces, that is where no other property besides the similarity measure exists. Therefore, for applications handling metrical data, the assumption of Euclidean metric or even multidimensional data cannot be used. In this work, we propose new pruning rules based on the law of cosines, and the RkNN-MG algorithm, which uses them to solve RkNN queries in a way that is exact, faster than the existing approaches, that is not limited for any value of k, and that can be applied both over static and over dynamic datasets. The new pruning rules assume that the data set is in a metric space that can be embedded into an Euclidean space and use metric geometry properties valid in this space to perform effective pruning based on the law of cosines combined with the traditional pruning based on the triangle inequality property. The experiments show that the new pruning rules are alkways more efficient than the traditional pruning rules based solely on the triangle inequality. The experiments show that for high high dimensionality datasets, or for metric datasets with high fractal dimensionality, the performance improvement is smaller than for for lower dimensioinality datasets, but it\'s never worse. Thus, the results confirm that the our pruning rules are efficient alternative to solve RkNN queries in general Consulta por similaridade Espaço númerico Indexação RkNN Vizinhos mais próximos reversos Access method Metric space Reverse k-nearest neighbor RkN N Similarity query
134	Utilização de métodos de machine learning para identificação de instrumentos musicais de sopro pelo timbre Veras, Ricardo da Costa January 2018 (has links) Orientador: Prof. Dr. Ricardo Suyama / Dissertação (mestrado) - Universidade Federal do ABC, Programa de Pós-Graduação em Engenharia da Informação, Santo André, 2018. / De forma geral a Classificação de Padrões voltada a Processamento de Sinais vem sendo estudada e utilizada para a interpretação de informações diversas, que se manifestam em forma de imagens, áudios, dados geofísicos, impulsos elétricos, entre outros. Neste trabalho são estudadas técnicas de Machine Learning aplicadas ao problema de identificação de instrumentos musicais, buscando obter um sistema automático de reconhecimento de timbres. Essas técnicas foram utilizadas especificamente com cinco instrumentos da categoria de Sopro de Madeira (o Clarinete, o Fagote, a Flauta, o Oboé e o Sax). As técnicas utilizadas foram o kNN (com k = 3) e o SVM (numa configuração não linear), assim como foram estudadas algumas características (features) dos áudios, tais como o MFCC (do inglês Mel-Frequency Cepstral Coefficients), o ZCR (do inglês Zero Crossing Rate), a entropia, entre outros, sendo fonte de dados para os processos de treinamento e de teste. Procurou-se estudar instrumentos nos quais se observa uma aproximação nos timbres, e com isso verificar como é o comportamento de um sistema classificador nessas condições específicas. Observou-se também o comportamento dessas técnicas com áudios desconhecidos do treinamento, assim como com trechos em que há uma mistura de elementos (gerando interferências para cada modelo classificador) que poderiam desviar os resultados, ou com misturas de elementos que fazem parte das classes observadas, e que se somam num mesmo áudio. Os resultados indicam que as características selecionadas possuem informações relevantes a respeito do timbre de cada um dos instrumentos avaliados (como observou-se em relação aos solos), embora a acurácia obtida para alguns dos instrumentos tenha sido abaixo do esperado (como observou-se em relação aos duetos). / In general, Pattern Classification for Signal Processing has been studied and used for the interpretation of several information, which are manifested in many ways, like: images, audios, geophysical data, electrical impulses, among others. In this project we study techniques of Machine Learning applied to the problem of identification of musical instruments, aiming to obtain an automatic system of timbres recognition. These techniques were used specifically with five instruments of Woodwind category (Clarinet, Bassoon, Flute, Oboe and Sax). The techniques used were the kNN (with k = 3) and the SVM (in a non-linear configuration), as well as some audio features, such as MFCC (Mel-Frequency Cepstral Coefficients), ZCR (Zero Crossing Rate), entropy, among others, used as data source for the training and testing processes. We tried to study instruments in which an approximation in the timbres is observed, and to verify in this case how is the behavior of a classifier system in these specific conditions. It was also observed the behavior of these techniques with audios unknown to the training, as well as with sections in which there is a mixture of elements (generating interferences for each classifier model) that could deviate the results, or with mixtures of elements that are part of the observed classes, and added in a same audio. The results indicate that the selected characteristics have relevant information regarding the timbre of each one of evaluated instruments (as observed on the solos results), although the accuracy obtained for some of the instruments was lower than expected (as observed on the duets results). SINAIS TIMBRE CLASSIFICAÇÃO ÁUDIO k-VIZINHOS MAIS PRÓXIMOS MÁQUINA DE VETORES DE SUPORTE SIGNALS CLASSIFICATION k-NEAREST NEIGHBOR SUPPORT VECTOR MACHINES
135	Clusters (k) Identification without Triangle Inequality : A newly modelled theory / Clustering(k) without Triangle Inequality : A newly modelled theory Narreddy, Naga Sambu Reddy, Durgun, Tuğrul January 2012 (has links) Cluster analysis characterizes data that are similar enough and useful into meaningful groups (clusters).For example, cluster analysis can be applicable to find group of genes and proteins that are similar, to retrieve information from World Wide Web, and to identify locations that are prone to earthquakes. So the study of clustering has become very important in several fields, which includes psychology and other social sciences, biology, statistics, pattern recognition, information retrieval, machine learning and data mining [1] [2]. Cluster analysis is the one of the widely used technique in the area of data mining. According to complexity and amount of data in a system, we can use variety of cluster analysis algorithms. K-means clustering is one of the most popular and widely used among the ten algorithms in data mining [3]. Like other clustering algorithms, it is not the silver bullet. K-means clustering requires pre analysis and knowledge before the number of clusters and their centroids are determined. Recent studies show a new approach for K-means clustering which does not require any pre knowledge for determining the number of clusters [4]. In this thesis, we propose a new clustering procedure to solve the central problem of identifying the number of clusters (k) by imitating the desired number of clusters with proper properties. The proposed algorithm is validated by investigating different characteristics of the analyzed data with modified theory, analyze parameters efficiency and their relationships. The parameters in this theory include the selection of embryo-size (m), significance level (α), distributions (d), and training set (n), in the identification of clusters (k). K-means clustering modifying K-means clustering nearest neighbor clustering general clustering procedure Kolmogorov Simonov-test parameters descriptions Computer and Information Sciences Data- och informationsvetenskap
136	Algoritmo kNN para previsão de dados temporais: funções de previsão e critérios de seleção de vizinhos próximos aplicados a variáveis ambientais em limnologia / Time series prediction using a KNN-based algorithm prediction functions and nearest neighbor selection criteria applied to limnological data Carlos Andres Ferrero 04 March 2009 (has links) A análise de dados contendo informações sequenciais é um problema de crescente interesse devido à grande quantidade de informação que é gerada, entre outros, em processos de monitoramento. As séries temporais são um dos tipos mais comuns de dados sequenciais e consistem em observações ao longo do tempo. O algoritmo k-Nearest Neighbor - Time Series Prediction kNN-TSP é um método de previsão de dados temporais. A principal vantagem do algoritmo é a sua simplicidade, e a sua aplicabilidade na análise de séries temporais não-lineares e na previsão de comportamentos sazonais. Entretanto, ainda que ele frequentemente encontre as melhores previsões para séries temporais parcialmente periódicas, várias questões relacionadas com a determinação de seus parâmetros continuam em aberto. Este trabalho, foca-se em dois desses parâmetros, relacionados com a seleção de vizinhos mais próximos e a função de previsão. Para isso, é proposta uma abordagem simples para selecionar vizinhos mais próximos que considera a similaridade e a distância temporal de modo a selecionar os padrões mais similares e mais recentes. Também é proposta uma função de previsão que tem a propriedade de manter bom desempenho na presença de padrões em níveis diferentes da série temporal. Esses parâmetros foram avaliados empiricamente utilizando várias séries temporais, inclusive caóticas, bem como séries temporais reais referentes a variáveis ambientais do reservatório de Itaipu, disponibilizadas pela Itaipu Binacional. Três variáveis limnológicas fortemente correlacionadas são consideradas nos experimentos de previsão: temperatura da água, temperatura do ar e oxigênio dissolvido. Uma análise de correlação é realizada para verificar se os dados previstos mantem a correlação das variáveis. Os resultados mostram que, o critério de seleção de vizinhos próximos e a função de previsão, propostos neste trabalho, são promissores / Treating data that contains sequential information is an important problem that arises during the data mining process. Time series constitute a popular class of sequential data, where records are indexed by time. The k-Nearest Neighbor - Time Series Prediction kNN-TSP method is an approximator for time series prediction problems. The main advantage of this approximator is its simplicity, and is often used in nonlinear time series analysis for prediction of seasonal time series. Although kNN-TSP often finds the best fit for nearly periodic time series forecasting, some problems related to how to determine its parameters still remain. In this work, we focus in two of these parameters: the determination of the nearest neighbours and the prediction function. To this end, we propose a simple approach to select the nearest neighbours, where time is indirectly taken into account by the similarity measure, and a prediction function which is not disturbed in the presence of patterns at different levels of the time series. Both parameters were empirically evaluated on several artificial time series, including chaotic time series, as well as on a real time series related to several environmental variables from the Itaipu reservoir, made available by Itaipu Binacional. Three of the most correlated limnological variables were considered in the experiments carried out on the real time series: water temperature, air temperature and dissolved oxygen. Analyses of correlation were also accomplished to verify if the predicted variables values maintain similar correlation as the original ones. Results show that both proposals, the one related to the determination of the nearest neighbours as well as the one related to the prediction function, are promising Aprendizado de máquina Dados ambientais Funções de previsão Limnologia Previsão de dados temporais Seleção de vizinhos próximos Environmental data Limnology Machine learning Nearest neighbor selection Prediction functions Time series prediction
137	Učení založené na instancích / Instance based learning Martikán, Miroslav January 2009 (has links) This thesis is specialized in instance based learning algorithms. Main goal is to create an application for educational purposes. There are instance based learning algorithms (IBL), nearest neighbor algorithms and kd-trees described theoretically in this thesis. Practical part is about making of tutorial application. Application can generate data, classified them with nearest neighbor algorithm and is able of IB1, IB2 and IB3 algorithm testing.
138	Image classification of pediatric pneumonia : A comparative study of supervised statistical learning techniques Rönnefall, Jacob, Wendel, Jakob January 2022 (has links) A child dies of pneumonia every 39 seconds, and the process of preventing deaths caused by pneumonia has been considerably slower compared to other infectious diseases. Meanwhile, the traditional method of manually diagnosing patients has reached its ceiling on performance. With the support of a machine learning classification algorithm to help with the screening of pneumonia from x-ray images combined with the expertise of a physician, the identification and diagnosis of pediatric pneumonia should be both quicker and more accurate. In this study, four different types of supervised machine learning algorithms have been trained, tested, and evaluated to see which model could predict most accurately whether a patient in an x-ray image has pneumonia or not. The four models included in this study have been trained by four different supervised machine learning algorithms: logistic regression, k-nearest-neighbor, support vector machine, and neural network. The results show that KNN has the highest sensitivity, NN adapts to new data the best by not being under- or overfit. SVM had the highest balanced accuracy on both train and test data but a proportionally high difference between the in- and out-sample error. In conclusion, relatively high performance can be achieved when classifying x-ray images of pneumonia even with limited resources. Machine learning Algorithm Logistic regression K-nearest-neighbor Support vector machine Neural network Sensitivity Specificity ROC Accuracy Probability Theory and Statistics Sannolikhetsteori och statistik
139	Comparing Julia and Python : An investigation of the performance on image processing with deep neural networks and classification Axillus, Viktor January 2020 (has links) Python is the most popular language when it comes to prototyping and developing machine learning algorithms. Python is an interpreted language that causes it to have a significant performance loss compared to compiled languages. Julia is a newly developed language that tries to bridge the gap between high performance but cumbersome languages such as C++ and highly abstracted but typically slow languages such as Python. However, over the years, the Python community have developed a lot of tools that addresses its performance problems. This raises the question if choosing one language over the other has any significant performance difference. This thesis compares the performance, in terms of execution time, of the two languages in the machine learning domain. More specifically, image processing with GPU-accelerated deep neural networks and classification with k-nearest neighbor on the MNIST and EMNIST dataset. Python with Keras and Tensorflow is compared against Julia with Flux for GPU-accelerated neural networks. For classification Python with Scikit-learn is compared against Julia with Nearestneighbors.jl. The results point in the direction that Julia has a performance edge in regards to GPU-accelerated deep neural networks. With Julia outperforming Python by roughly 1.25x − 1.5x. For classification with k-nearest neighbor the results were a bit more varied with Julia outperforming Python in 5 out of 8 different measurements. However, there exists some validity threats and additional research is needed that includes all different frameworks available for the languages in order to provide a more conclusive and generalized answer. julia python performance comparison machine learning image processing GPU GPU-acceleration neural networks autoencoder classification knn k-nearest neighbor Software Engineering Programvaruteknik
140	Using native mass spectrometry to study the role of homo-oligomeric proteins in gene regulation by using TRAP as a model protein system Holmquist, Melody L. 06 November 2020 (has links) No description available. Biochemistry Biophysics Biology trp RNA binding attenuation protein TRAP Anti-TRAP native mass spectrometry surface-induced dissociation SID homo-oligomers homo-oligomeric ring proteins thermodynamics nearest-neighbor model

Search results