41 |
Operação de busca exata aos K-vizinhos mais próximos reversos em espaços métricos / Answering exact reverse k-nerarest neighbors queries in metric spaceOliveira, Willian Dener de 19 March 2010 (has links)
A complexidade dos dados armazenados em grandes bases de dados aumenta cada vez mais, criando a necessidade de novas operações de consulta. Uma classe de operações que tem apresentado interesse crescente são as chamadas Consultas por Similaridade, sendo as mais conhecidas as consultas por Abrangência (\'R IND. q\') e por k-Vizinhos mais Proximos (kNN), sendo que esta ultima obtem quais são os k elementos armazenados mais similares a um dado elemento de referência. Outra consulta que é interessante tanto para consultas diretas quanto como parte de operações de análises mais complexas e a operação de consulta aos k-Vizinhos mais Próximos Reversos (RkNN). Seu objetivo e obter todos os elementos armazenados que têm um dado elemento de referência como um dos seus k elementos mais similares. Devido a complexidade de execução da operação de RkNN, a grande maioria das soluções existentes restringem-se a dados representados em espaços multidimensionais euclidianos (nos quais estão denidas tambem operações cardinais e topológicas, além de se considerar a similaridade como sendo a distância Euclidiana entre dois elementos), ou então obtém apenas respostas aproximadas, sujeitas a existência de falsos negativos. Várias aplicações de análise de dados científicos, médicos, de engenharia, financeiros, etc. requerem soluções eficientes para o problema da operação de RkNN sobre dados representados em espaços métricos, onde os elementos não podem ser considerados estar em um espaço nem Euclidiano nem multidimensional. Num espaço métrico, além dos próprios elementos armazenados existe apenas uma função de comparação métrica entre pares de objetos. Neste trabalho, são propostas novas podas de espaço de busca e o algoritmo RkNN-MG que utiliza essas novas podas para solucionar o problema de consultas RkNN exatas em espaços métricos sem limitações. Toda a proposta supõe que o conjunto de dados esta em um espaço métrico imerso isometricamente em espaço euclidiano e utiliza propriedades da geometria métrica válida neste espaço para realizar podas eficientes por lei dos cossenos combinada com as podas tradicionais por desigualdade triangular. Os experimentos demonstram comparativamente que as novas podas são mais eficientes que as tradicionais podas por desigualdade triangular, tendo desempenhos equivalente quando comparadas em conjuntos de alta dimensionalidade ou com dimensão fractal alta. Assim, os resultados confirmam as novas podas propostas como soluções alternativas eficientes para o problema de consultas RkNN / Data stored in large databases present an ever increasing complexity, pressing for the development of new classes of query operators. One such class, which is enticing an increasing interest, is the so-called Similarity Queries, where the most common are the similarity range queries (\'R IND. q\') and the k-nearest neighbor queries (kNN). A k-nearest neighbor query aims at retrieving the k stored elements nearer (or more similar) to a given reference element. Another important similarity query is the reverse k-nearest neighbor (RkNN), useful both for queries posed directly by the analyst and for queries that are part of more complex analysis processes. The objective of a reverse k-nearest neighbor queries is obtaining the stored elements that has the query reference element as one of their k-nearest neighbors. As the RkNN operation is a rather expensive operation, from the computational standpoint, most existing solutions only solve the query when applied over Euclidean multidimensional spaces (as these spaces also define cardinal and topological operations besides the Euclidean distance between pairs of elements) or retrieve only approximate answers, where false negatives can occur. Several applications, like the analysis of scientific, medical, engineering or financial data, require efficient and exact answers for the RkNN queries over data which is frequently represented in metric spaces, that is where no other property besides the similarity measure exists. Therefore, for applications handling metrical data, the assumption of Euclidean metric or even multidimensional data cannot be used. In this work, we propose new pruning rules based on the law of cosines, and the RkNN-MG algorithm, which uses them to solve RkNN queries in a way that is exact, faster than the existing approaches, that is not limited for any value of k, and that can be applied both over static and over dynamic datasets. The new pruning rules assume that the data set is in a metric space that can be embedded into an Euclidean space and use metric geometry properties valid in this space to perform effective pruning based on the law of cosines combined with the traditional pruning based on the triangle inequality property. The experiments show that the new pruning rules are alkways more efficient than the traditional pruning rules based solely on the triangle inequality. The experiments show that for high high dimensionality datasets, or for metric datasets with high fractal dimensionality, the performance improvement is smaller than for for lower dimensioinality datasets, but it\'s never worse. Thus, the results confirm that the our pruning rules are efficient alternative to solve RkNN queries in general
|
42 |
Simple, Faster Kinetic Data StructuresRahmati, Zahed 28 August 2014 (has links)
Proximity problems and point set embeddability problems are fundamental and well-studied in computational geometry and graph drawing. Examples of such problems that are of particular interest to us in this dissertation include: finding the closest pair among a set P of points, finding the k-nearest neighbors to each point p in P, answering reverse k-nearest neighbor queries, computing the Yao graph, the Semi-Yao graph and the Euclidean minimum spanning tree of P, and mapping the vertices of a planar graph to a set P of points without inducing edge crossings.
In this dissertation, we consider so-called kinetic version of these problems, that is, the points are allowed to move continuously along known trajectories, which are subject to change. We design a set of data structures and a mechanism to efficiently update the data structures. These updates occur at critical, discrete times. Also, a query may arrive at any time. We want to answer queries quickly without solving problems from scratch, so we maintain solutions continuously. We present new techniques for giving kinetic solutions with better performance for some these problems, and we provide the first kinetic results for others. In particular, we provide:
• A simple kinetic data structure (KDS) to maintain all the nearest neighbors and the closest pair. Our deterministic kinetic approach for maintenance of all the nearest neighbors improves the previous randomized kinetic algorithm.
• An exact KDS for maintenance of the Euclidean minimum spanning tree, which improves the previous KDS.
• The first KDS's for maintenance of the Yao graph and the Semi-Yao graph.
• The first KDS to consider maintaining plane graphs on moving points.
• The first KDS for maintenance of all the k-nearest neighbors, for any k ≥ 1.
• The first KDS to answer the reverse k-nearest neighbor queries, for any k ≥ 1 in any fixed dimension, on a set of moving points. / Graduate
|
43 |
Evaluation of Supervised Machine LearningAlgorithms for Detecting Anomalies in Vehicle’s Off-Board Sensor DataWahab, Nor-Ul January 2018 (has links)
A diesel particulate filter (DPF) is designed to physically remove diesel particulate matter or soot from the exhaust gas of a diesel engine. Frequently replacing DPF is a waste of resource and waiting for full utilization is risky and very costly, so, what is the optimal time/milage to change DPF? Answering this question is very difficult without knowing when the DPF is changed in a vehicle. We are finding the answer with supervised machine learning algorithms for detecting anomalies in vehicles off-board sensor data (operational data of vehicles). Filter change is considered an anomaly because it is rare as compared to normal data. Non-sequential machine learning algorithms for anomaly detection like oneclass support vector machine (OC-SVM), k-nearest neighbor (K-NN), and random forest (RF) are applied for the first time on DPF dataset. The dataset is unbalanced, and accuracy is found misleading as a performance measure for the algorithms. Precision, recall, and F1-score are found good measure for the performance of the machine learning algorithms when the data is unbalanced. RF gave highest F1-score of 0.55 than K-NN (0.52) and OCSVM (0.51). It means that RF perform better than K-NN and OC-SVM but after further investigation it is concluded that the results are not satisfactory. However, a sequential approach should have been tried which could yield better result.
|
44 |
Operação de busca exata aos K-vizinhos mais próximos reversos em espaços métricos / Answering exact reverse k-nerarest neighbors queries in metric spaceWillian Dener de Oliveira 19 March 2010 (has links)
A complexidade dos dados armazenados em grandes bases de dados aumenta cada vez mais, criando a necessidade de novas operações de consulta. Uma classe de operações que tem apresentado interesse crescente são as chamadas Consultas por Similaridade, sendo as mais conhecidas as consultas por Abrangência (\'R IND. q\') e por k-Vizinhos mais Proximos (kNN), sendo que esta ultima obtem quais são os k elementos armazenados mais similares a um dado elemento de referência. Outra consulta que é interessante tanto para consultas diretas quanto como parte de operações de análises mais complexas e a operação de consulta aos k-Vizinhos mais Próximos Reversos (RkNN). Seu objetivo e obter todos os elementos armazenados que têm um dado elemento de referência como um dos seus k elementos mais similares. Devido a complexidade de execução da operação de RkNN, a grande maioria das soluções existentes restringem-se a dados representados em espaços multidimensionais euclidianos (nos quais estão denidas tambem operações cardinais e topológicas, além de se considerar a similaridade como sendo a distância Euclidiana entre dois elementos), ou então obtém apenas respostas aproximadas, sujeitas a existência de falsos negativos. Várias aplicações de análise de dados científicos, médicos, de engenharia, financeiros, etc. requerem soluções eficientes para o problema da operação de RkNN sobre dados representados em espaços métricos, onde os elementos não podem ser considerados estar em um espaço nem Euclidiano nem multidimensional. Num espaço métrico, além dos próprios elementos armazenados existe apenas uma função de comparação métrica entre pares de objetos. Neste trabalho, são propostas novas podas de espaço de busca e o algoritmo RkNN-MG que utiliza essas novas podas para solucionar o problema de consultas RkNN exatas em espaços métricos sem limitações. Toda a proposta supõe que o conjunto de dados esta em um espaço métrico imerso isometricamente em espaço euclidiano e utiliza propriedades da geometria métrica válida neste espaço para realizar podas eficientes por lei dos cossenos combinada com as podas tradicionais por desigualdade triangular. Os experimentos demonstram comparativamente que as novas podas são mais eficientes que as tradicionais podas por desigualdade triangular, tendo desempenhos equivalente quando comparadas em conjuntos de alta dimensionalidade ou com dimensão fractal alta. Assim, os resultados confirmam as novas podas propostas como soluções alternativas eficientes para o problema de consultas RkNN / Data stored in large databases present an ever increasing complexity, pressing for the development of new classes of query operators. One such class, which is enticing an increasing interest, is the so-called Similarity Queries, where the most common are the similarity range queries (\'R IND. q\') and the k-nearest neighbor queries (kNN). A k-nearest neighbor query aims at retrieving the k stored elements nearer (or more similar) to a given reference element. Another important similarity query is the reverse k-nearest neighbor (RkNN), useful both for queries posed directly by the analyst and for queries that are part of more complex analysis processes. The objective of a reverse k-nearest neighbor queries is obtaining the stored elements that has the query reference element as one of their k-nearest neighbors. As the RkNN operation is a rather expensive operation, from the computational standpoint, most existing solutions only solve the query when applied over Euclidean multidimensional spaces (as these spaces also define cardinal and topological operations besides the Euclidean distance between pairs of elements) or retrieve only approximate answers, where false negatives can occur. Several applications, like the analysis of scientific, medical, engineering or financial data, require efficient and exact answers for the RkNN queries over data which is frequently represented in metric spaces, that is where no other property besides the similarity measure exists. Therefore, for applications handling metrical data, the assumption of Euclidean metric or even multidimensional data cannot be used. In this work, we propose new pruning rules based on the law of cosines, and the RkNN-MG algorithm, which uses them to solve RkNN queries in a way that is exact, faster than the existing approaches, that is not limited for any value of k, and that can be applied both over static and over dynamic datasets. The new pruning rules assume that the data set is in a metric space that can be embedded into an Euclidean space and use metric geometry properties valid in this space to perform effective pruning based on the law of cosines combined with the traditional pruning based on the triangle inequality property. The experiments show that the new pruning rules are alkways more efficient than the traditional pruning rules based solely on the triangle inequality. The experiments show that for high high dimensionality datasets, or for metric datasets with high fractal dimensionality, the performance improvement is smaller than for for lower dimensioinality datasets, but it\'s never worse. Thus, the results confirm that the our pruning rules are efficient alternative to solve RkNN queries in general
|
45 |
Utilização de métodos de machine learning para identificação de instrumentos musicais de sopro pelo timbreVeras, Ricardo da Costa January 2018 (has links)
Orientador: Prof. Dr. Ricardo Suyama / Dissertação (mestrado) - Universidade Federal do ABC, Programa de Pós-Graduação em Engenharia da Informação, Santo André, 2018. / De forma geral a Classificação de Padrões voltada a Processamento de Sinais
vem sendo estudada e utilizada para a interpretação de informações diversas, que se
manifestam em forma de imagens, áudios, dados geofísicos, impulsos elétricos, entre
outros. Neste trabalho são estudadas técnicas de Machine Learning aplicadas ao problema
de identificação de instrumentos musicais, buscando obter um sistema automático de
reconhecimento de timbres. Essas técnicas foram utilizadas especificamente com cinco
instrumentos da categoria de Sopro de Madeira (o Clarinete, o Fagote, a Flauta, o Oboé e
o Sax). As técnicas utilizadas foram o kNN (com k = 3) e o SVM (numa configuração
não linear), assim como foram estudadas algumas características (features) dos áudios,
tais como o MFCC (do inglês Mel-Frequency Cepstral Coefficients), o ZCR (do inglês Zero
Crossing Rate), a entropia, entre outros, sendo fonte de dados para os processos de
treinamento e de teste. Procurou-se estudar instrumentos nos quais se observa uma
aproximação nos timbres, e com isso verificar como é o comportamento de um sistema
classificador nessas condições específicas. Observou-se também o comportamento dessas
técnicas com áudios desconhecidos do treinamento, assim como com trechos em que há
uma mistura de elementos (gerando interferências para cada modelo classificador) que
poderiam desviar os resultados, ou com misturas de elementos que fazem parte das
classes observadas, e que se somam num mesmo áudio. Os resultados indicam que as
características selecionadas possuem informações relevantes a respeito do timbre de
cada um dos instrumentos avaliados (como observou-se em relação aos solos), embora
a acurácia obtida para alguns dos instrumentos tenha sido abaixo do esperado (como
observou-se em relação aos duetos). / In general, Pattern Classification for Signal Processing has been studied and
used for the interpretation of several information, which are manifested in many ways,
like: images, audios, geophysical data, electrical impulses, among others. In this project
we study techniques of Machine Learning applied to the problem of identification
of musical instruments, aiming to obtain an automatic system of timbres recognition.
These techniques were used specifically with five instruments of Woodwind category
(Clarinet, Bassoon, Flute, Oboe and Sax). The techniques used were the kNN (with
k = 3) and the SVM (in a non-linear configuration), as well as some audio features, such
as MFCC (Mel-Frequency Cepstral Coefficients), ZCR (Zero Crossing Rate), entropy,
among others, used as data source for the training and testing processes. We tried to
study instruments in which an approximation in the timbres is observed, and to verify
in this case how is the behavior of a classifier system in these specific conditions. It was
also observed the behavior of these techniques with audios unknown to the training, as
well as with sections in which there is a mixture of elements (generating interferences
for each classifier model) that could deviate the results, or with mixtures of elements
that are part of the observed classes, and added in a same audio. The results indicate
that the selected characteristics have relevant information regarding the timbre of each
one of evaluated instruments (as observed on the solos results), although the accuracy
obtained for some of the instruments was lower than expected (as observed on the duets
results).
|
46 |
Image classification of pediatric pneumonia : A comparative study of supervised statistical learning techniquesRönnefall, Jacob, Wendel, Jakob January 2022 (has links)
A child dies of pneumonia every 39 seconds, and the process of preventing deaths caused by pneumonia has been considerably slower compared to other infectious diseases. Meanwhile, the traditional method of manually diagnosing patients has reached its ceiling on performance. With the support of a machine learning classification algorithm to help with the screening of pneumonia from x-ray images combined with the expertise of a physician, the identification and diagnosis of pediatric pneumonia should be both quicker and more accurate. In this study, four different types of supervised machine learning algorithms have been trained, tested, and evaluated to see which model could predict most accurately whether a patient in an x-ray image has pneumonia or not. The four models included in this study have been trained by four different supervised machine learning algorithms: logistic regression, k-nearest-neighbor, support vector machine, and neural network. The results show that KNN has the highest sensitivity, NN adapts to new data the best by not being under- or overfit. SVM had the highest balanced accuracy on both train and test data but a proportionally high difference between the in- and out-sample error. In conclusion, relatively high performance can be achieved when classifying x-ray images of pneumonia even with limited resources.
|
47 |
Comparing Julia and Python : An investigation of the performance on image processing with deep neural networks and classificationAxillus, Viktor January 2020 (has links)
Python is the most popular language when it comes to prototyping and developing machine learning algorithms. Python is an interpreted language that causes it to have a significant performance loss compared to compiled languages. Julia is a newly developed language that tries to bridge the gap between high performance but cumbersome languages such as C++ and highly abstracted but typically slow languages such as Python. However, over the years, the Python community have developed a lot of tools that addresses its performance problems. This raises the question if choosing one language over the other has any significant performance difference. This thesis compares the performance, in terms of execution time, of the two languages in the machine learning domain. More specifically, image processing with GPU-accelerated deep neural networks and classification with k-nearest neighbor on the MNIST and EMNIST dataset. Python with Keras and Tensorflow is compared against Julia with Flux for GPU-accelerated neural networks. For classification Python with Scikit-learn is compared against Julia with Nearestneighbors.jl. The results point in the direction that Julia has a performance edge in regards to GPU-accelerated deep neural networks. With Julia outperforming Python by roughly 1.25x − 1.5x. For classification with k-nearest neighbor the results were a bit more varied with Julia outperforming Python in 5 out of 8 different measurements. However, there exists some validity threats and additional research is needed that includes all different frameworks available for the languages in order to provide a more conclusive and generalized answer.
|
48 |
Evaluation of Three Machine Learning Algorithms for the Automatic Classification of EMG Patterns in Gait DisordersFricke, Christopher, Alizadeh, Jalal, Zakhary, Nahrin, Woost, Timo B., Bogdan, Martin, Classen, Joseph 27 March 2023 (has links)
Gait disorders are common in neurodegenerative diseases and distinguishing between
seemingly similar kinematic patterns associated with different pathological entities is a
challenge even for the experienced clinician. Ultimately, muscle activity underlies the
generation of kinematic patterns. Therefore, one possible way to address this problem
may be to differentiate gait disorders by analyzing intrinsic features of muscle activations
patterns. Here, we examined whether it is possible to differentiate electromyography
(EMG) gait patterns of healthy subjects and patients with different gait disorders using
machine learning techniques. Nineteen healthy volunteers (9 male, 10 female, age 28.2
± 6.2 years) and 18 patients with gait disorders (10 male, 8 female, age 66.2 ± 14.7
years) resulting from different neurological diseases walked down a hallway 10 times at
a convenient pace while their muscle activity was recorded via surface EMG electrodes
attached to 5 muscles of each leg (10 channels in total). Gait disorders were classified
as predominantly hypokinetic (n = 12) or ataxic (n = 6) gait by two experienced raters
based on video recordings. Three different classification methods (Convolutional Neural
Network—CNN, Support Vector Machine—SVM, K-Nearest Neighbors—KNN) were
used to automatically classify EMG patterns according to the underlying gait disorder
and differentiate patients and healthy participants. Using a leave-one-out approach for
training and evaluating the classifiers, the automatic classification of normal and abnormal
EMG patterns during gait (2 classes: “healthy” and “patient”) was possible with a high
degree of accuracy using CNN (accuracy 91.9%), but not SVM (accuracy 67.6%) or
KNN (accuracy 48.7%). For classification of hypokinetic vs. ataxic vs. normal gait (3
classes) best results were again obtained for CNN (accuracy 83.8%) while SVM and
KNN performed worse (accuracy SVM 51.4%, KNN 32.4%). These results suggest that
machine learning methods are useful for distinguishing individuals with gait disorders
from healthy controls and may help classification with respect to the underlying disorder
even when classifiers are trained on comparably small cohorts. In our study, CNN
achieved higher accuracy than SVM and KNN and may constitute a promising method
for further investigation.
|
49 |
Classifying Portable Electronic Devices using Device Specifications : A Comparison of Machine Learning TechniquesWesterholm, Ludvig January 2024 (has links)
In this project, we explored the usage of machine learning in classifying portable electronic devices. The primary objective was to identify devices such as laptops, smartphones, and tablets, based on their physical and technical specification. These specifications, sourced from the Pricerunner price comparison website, contain height, Wi-Fi standard, and screen resolution. We aggregated this information into a dataset and split it into a training set and a testing set. To achieve the classification of devices, we trained four popular machine learning models: Random Forest (RF), Logistic Regression (LR), k-Nearest Neighbor (kNN), and Fully Connected Network (FCN). We then compared the performance of these models. The evaluation metrics used to compare performance included precision, recall, F1-score, accuracy, and training time. The RF model achieved the highest overall accuracy of 95.4% on the original dataset. The FCN, applied to a dataset processed with standardization followed by Principal Component Analysis (PCA), reached an accuracy of 92.7%, the best within this specific subset. LR excelled in a few class-specific metrics, while kNN performed notably well relative to its training time. The RF model was the clear winner on the original dataset, while the kNN model was a strong contender on the PCA-processed dataset due to its significantly faster training time compared to the FCN. In conclusion, the RF was the best-performing model on the original dataset, the FCN showed impressive results on the standardized and PCA-processed dataset, and the kNN model, with its highest macro precision and rapid training time, also demonstrated competitive performance.
|
50 |
Delving into gene-set multiplex networks facilitated by a k-nearest neighbor-based measure of similarity / k-最近傍法に基づく類似性尺度による、遺伝子セットの多重ネットワーク解析Zheng, Cheng 25 March 2024 (has links)
京都大学 / 新制・課程博士 / 博士(医学) / 甲第25192号 / 医博第5078号 / 新制||医||1072(附属図書館) / 京都大学大学院医学研究科医学専攻 / (主査)教授 村川 泰裕, 教授 斎藤 通紀, 教授 李 聖林 / 学位規則第4条第1項該当 / Doctor of Agricultural Science / Kyoto University / DFAM
|
Page generated in 0.0322 seconds