61 |
Doppler Radar Data Processing And ClassificationAygar, Alper 01 September 2008 (has links) (PDF)
In this thesis, improving the performance of the automatic recognition of the Doppler radar targets is studied. The radar used in this study is a ground-surveillance doppler radar. Target types are car, truck, bus, tank, helicopter, moving man and running man. The input of this thesis is the output of the real doppler radar signals which are normalized and preprocessed (TRP vectors: Target Recognition Pattern vectors) in the doctorate thesis by Erdogan (2002). TRP vectors are normalized and homogenized doppler radar target signals with respect to target speed, target aspect angle and target range. Some target classes have repetitions in time in their TRPs. By the use of these repetitions, improvement of the target type classification performance is studied. K-Nearest Neighbor (KNN) and Support Vector Machine (SVM) algorithms are used for doppler radar target classification and the results are evaluated. Before classification PCA (Principal Component Analysis), LDA (Linear Discriminant Analysis), NMF (Nonnegative Matrix Factorization) and ICA (Independent Component Analysis) are implemented and applied to normalized doppler radar signals for feature extraction and dimension reduction in an efficient way. These techniques transform the input vectors, which are the normalized doppler radar signals, to another space. The effects of the implementation of these feature extraction algoritms and the use of the repetitions in doppler radar target signals on the doppler radar target classification performance are studied.
|
62 |
中文詞彙集的來源與權重對中文裁判書分類成效的影響 / Exploring the Influences of Lexical Sources and Term Weights on the Classification of Chinese Judgment Documents鄭人豪, Cheng, Jen-Hao Unknown Date (has links)
國外法學資訊系統已研究多年,嘗試利用科技幫助提昇司法審判的效率。重要的議題包括輔助判決,法律文件分類,或是相似案件搜尋等。本研究將針對中文裁判書的分類做進一步談討。
在文件特徵表示方面,我們以有序詞組來表達中文裁判書,我們嘗試比較採用不同的詞彙來源對於分類效果的影響。實驗中我們分別採用一般通用的電子詞典建立一般詞組;以及以演算法取出法學專業詞彙集建立專業詞組。並依tf-idf(term frequency – inverse document frequency)的概念,設計兩種詞組權重tpf-idf(term pair frequency – inverse document frequency)以及tpf-icf(term pair frequency – inverse category frequency),來計算特徵詞組權重。
在文件分類演算法方面,我們實作以相似度為基礎的k最近鄰居法作為系統分類機制,藉由裁判書的案由欄位,將案例分為七種類別,分別為竊盜、搶奪、強盜、贓物、傷害、恐嚇以及賭博。並藉由觀察案例資料庫的相似度分佈,以找出恰當的參數,進一步得到較佳的分類正確率與較低的拒絕率。
我們並依照自省式學習法的精神,建立權重調整的機制。企圖藉由自省式學習法提昇分類效果,以及找出對分類有影響的詞組。而我們以案例資料庫的相似度差異值以及距離差異值,分析調整前後案例資料庫的變化,藉以觀察自省式學習法的效果。 / Legal information systems for non-Chinese languages have been studied intensively in the past many years. There are several topics under discussion, such as judgment assistance, legal document classification, and similar case search, and so on. This thesis studies the classification of Chinese judgment documents.
I use phrases as the indices for documents. I attempt to compare the influences of different lexical sources for segmenting Chinese text. One of the lexical sources is a general machine-readable dictionary, Hownet, and the other is the set of terms algorithmically extracted from legal documents. Based on the concept of tf-idf, I design two kinds of phrase weights: tpf-idf and tpf-icf.
In the experiments, I use the k-nearest neighbor method to classify Chinese judgment documents into seven categories based on their prosecution reasons: larceny(竊盜), robbery (搶奪), robbery by threatening or disabling the victims (強盜), receiving stolen property (贓物), causing bodily harm (傷害), intimidation (恐嚇), and gambling(賭博). To achieve high accuracy with low rejection rates, I observe and discuss the distribution of similarity of the training documents to select appropriate parameters. In addition, I also conduct a set of analogous experiments for classifying documents based on the cited legal articles for gambling cases.
To improve the classification effects, I apply the introspective learning technique to adjust the weights of phrases. I observe the intra-cluster similarity and inter-cluster similarity in evaluating the effects of weight adjustment on experiments for classifying documents based on their prosecution reasons and cited articles.
|
63 |
Classification of uncertain data in the framework of belief functions : nearest-neighbor-based and rule-based approaches / Classification des données incertaines dans le cadre des fonctions de croyance : la métode des k plus proches voisins et la méthode à base de règlesJiao, Lianmeng 26 October 2015 (has links)
Dans de nombreux problèmes de classification, les données sont intrinsèquement incertaines. Les données d’apprentissage disponibles peuvent être imprécises, incomplètes, ou même peu fiables. En outre, des connaissances spécialisées partielles qui caractérisent le problème de classification peuvent également être disponibles. Ces différents types d’incertitude posent de grands défis pour la conception de classifieurs. La théorie des fonctions de croyance fournit un cadre rigoureux et élégant pour la représentation et la combinaison d’une grande variété d’informations incertaines. Dans cette thèse, nous utilisons cette théorie pour résoudre les problèmes de classification des données incertaines sur la base de deux approches courantes, à savoir, la méthode des k plus proches voisins (kNN) et la méthode à base de règles.Pour la méthode kNN, une préoccupation est que les données d’apprentissage imprécises dans les régions où les classes de chevauchent peuvent affecter ses performances de manière importante. Une méthode d’édition a été développée dans le cadre de la théorie des fonctions de croyance pour modéliser l’information imprécise apportée par les échantillons dans les régions qui se chevauchent. Une autre considération est que, parfois, seul un ensemble de données d’apprentissage incomplet est disponible, auquel cas les performances de la méthode kNN se dégradent considérablement. Motivé par ce problème, nous avons développé une méthode de fusion efficace pour combiner un ensemble de classifieurs kNN couplés utilisant des métriques couplées apprises localement. Pour la méthode à base de règles, afin d’améliorer sa performance dans les applications complexes, nous étendons la méthode traditionnelle dans le cadre des fonctions de croyance. Nous développons un système de classification fondé sur des règles de croyance pour traiter des informations incertains dans les problèmes de classification complexes. En outre, dans certaines applications, en plus de données d’apprentissage, des connaissances expertes peuvent également être disponibles. Nous avons donc développé un système de classification hybride fondé sur des règles de croyance permettant d’utiliser ces deux types d’information pour la classification. / In many classification problems, data are inherently uncertain. The available training data might be imprecise, incomplete, even unreliable. Besides, partial expert knowledge characterizing the classification problem may also be available. These different types of uncertainty bring great challenges to classifier design. The theory of belief functions provides a well-founded and elegant framework to represent and combine a large variety of uncertain information. In this thesis, we use this theory to address the uncertain data classification problems based on two popular approaches, i.e., the k-nearest neighbor rule (kNN) andrule-based classification systems. For the kNN rule, one concern is that the imprecise training data in class over lapping regions may greatly affect its performance. An evidential editing version of the kNNrule was developed based on the theory of belief functions in order to well model the imprecise information for those samples in over lapping regions. Another consideration is that, sometimes, only an incomplete training data set is available, in which case the ideal behaviors of the kNN rule degrade dramatically. Motivated by this problem, we designedan evidential fusion scheme for combining a group of pairwise kNN classifiers developed based on locally learned pairwise distance metrics.For rule-based classification systems, in order to improving their performance in complex applications, we extended the traditional fuzzy rule-based classification system in the framework of belief functions and develop a belief rule-based classification system to address uncertain information in complex classification problems. Further, considering that in some applications, apart from training data collected by sensors, partial expert knowledge can also be available, a hybrid belief rule-based classification system was developed to make use of these two types of information jointly for classification.
|
64 |
Bank Customer Churn Prediction : A comparison between classification and evaluation methodsTandan, Isabelle, Goteman, Erika January 2020 (has links)
This study aims to assess which supervised statistical learning method; random forest, logistic regression or K-nearest neighbor, that is the best at predicting banks customer churn. Additionally, the study evaluates which cross-validation set approach; k-Fold cross-validation or leave-one-out cross-validation that yields the most reliable results. Predicting customer churn has increased in popularity since new technology, regulation and changed demand has led to an increase in competition for banks. Thus, with greater reason, banks acknowledge the importance of maintaining their customer base. The findings of this study are that unrestricted random forest model estimated using k-Fold is to prefer out of performance measurements, computational efficiency and a theoretical point of view. Albeit, k-Fold cross-validation and leave-one-out cross-validation yield similar results, k-Fold cross-validation is to prefer due to computational advantages. For future research, methods that generate models with both good interpretability and high predictability would be beneficial. In order to combine the knowledge of which customers end their engagement as well as understanding why. Moreover, interesting future research would be to analyze at which dataset size leave-one-out cross-validation and k-Fold cross-validation yield the same results.
|
65 |
Detekce fibrilace síní v krátkodobých EKG záznamech / Detection of atrial fibrillation in short-term ECGAmbrožová, Monika January 2019 (has links)
Atrial fibrillation is diagnosed in 1-2% of the population, in next decades, it expects a significant increase in the number of patients with this arrhythmia in connection with the aging of the population and the higher incidence of some diseases that are considered as risk factors of atrial fibrillation. The aim of this work is to describe the problem of atrial fibrillation and the methods that allow its detection in the ECG record. In the first part of work there is a theory dealing with cardiac physiology and atrial fibrillation. There is also basic descreption of the detection of atrial fibrillation. In the practical part of work, there is described software for detection of atrial fibrillation, which is provided by BTL company. Furthermore, an atrial fibrillation detector is designed. Several parameters were selected to detect the variation of RR intervals. These are the parameters of the standard deviation, coefficient of skewness and kurtosis, coefficient of variation, root mean square of the successive differences, normalized absolute deviation, normalized absolute difference, median absolute deviation and entropy. Three different classification models were used: support vector machine (SVM), k-nearest neighbor (KNN) and discriminant analysis classification. The SVM classification model achieves the best results. Results of success indicators (sensitivity: 67.1%; specificity: 97.0%; F-measure: 66.8%; accuracy: 92.9%).
|
66 |
Adaptivní klient pro sociální síť Twitter / Adaptive Client for Twitter Social NetworkGuňka, Jiří January 2011 (has links)
The goal of this term project is create user friendly client of Twitter. They may use methods of machine learning as naive bayes classifier to mentions new interests tweets. For visualissation this tweets will be use hyperbolic trees and some others methods.
|
67 |
Fuel failure analysis in Boiling Water Reactors (BWR) using Machine Learning. : A comparison of different machine learning algorithms and their performance at predicting fuel failures.Borg, Sofia January 2024 (has links)
In collaboration with Westinghouse Electric AB this project aims to study the possibilities with using machine learning methods to predict fuel failure in a Boiling Water Reactors (BWRs). The main objective has been to create a dataset consisting of both empirical measurements and simulated samples from a physics model and evaluate different machine learning algorithms, that use these datasets to predict fuel defects. The simulated data is created using a physics model derived from the ANS-5.4 standard which allows for good control over specific parameter values. Three machine learning algorithms were deemed fit for this type of problem and used throughout the project: Random Forest (RF), K-Nearest Neighbor (KNN) and Neural Network (NN). Both classification and regression type problems have been assessed. All three methods showed good results for the classification problems, where the goal was to predict if there was a fuel failure or not. All models reached an accuracy above 97% and performed well, the RF model had the highest overall, with an accuracy of 98.2 %. However, the NN method made the fewest false negative predictions and can therefore be seen as the best model for this purpose. For the regression, problems with the aim of predicting escape rates, both the RF and KNN had similar promising results with very small errors overall. Yet, there is a slight increase in errors when predicting higher escape rates for both models. This is most likely due to the available data being of mostly low escape rates. The NN did not perform well with this problem, the predictions having large error for both low and high escape rates, a possible explanation is the lack of data. To improve the results, and create even better models, the empirical measurements need to contain more information such as defect location and fuel failure size, also an increase in the number of samples taken at fuel failure operation would be valuable.
|
68 |
Αναγνώριση βασικών κινήσεων του χεριού με χρήση ηλεκτρομυογραφήματος / Recognition of basic hand movements using electromyographyΣαψάνης, Χρήστος 13 October 2013 (has links)
Ο στόχος αυτής της εργασίας ήταν η αναγνώριση έξι βασικών κινήσεων του χεριού με χρήση δύο συστημάτων. Όντας θέμα διεπιστημονικού επιπέδου έγινε μελέτη της ανατομίας των μυών του πήχη, των βιοσημάτων, της μεθόδου της ηλεκτρομυογραφίας (ΗΜΓ) και μεθόδων αναγνώρισης προτύπων. Παράλληλα, το σήμα περιείχε αρκετό θόρυβο και έπρεπε να αναλυθεί, με χρήση του EMD, να εξαχθούν χαρακτηριστικά αλλά και να μειωθεί η διαστασιμότητά τους, με χρήση των RELIEF και PCA, για βελτίωση του ποσοστού επιτυχίας ταξινόμησης. Στο πρώτο μέρος γίνεται χρήση συστήματος ΗΜΓ της Delsys αρχικά σε ένα άτομο και στη συνέχεια σε έξι άτομα με το κατά μέσο όρο επιτυχημένης ταξινόμησης, για τις έξι αυτές κινήσεις, να αγγίζει ποσοστά άνω του 80%. Το δεύτερο μέρος περιλαμβάνει την κατασκευή αυτόνομου συστήματος ΗΜΓ με χρήση του Arduino μικροελεγκτή, αισθητήρων ΗΜΓ και ηλεκτροδίων, τα οποία είναι τοποθετημένα σε ένα ελαστικό γάντι. Τα αποτελέσματα ταξινόμησης σε αυτή την περίπτωση αγγίζουν το 75%. / The aim of this work was to identify six basic movements of the hand using two systems. Being an interdisciplinary topic, there has been conducted studying in the anatomy of forearm muscles, biosignals, the method of electromyography (EMG) and methods of pattern recognition. Moreover, the signal contained enough noise and had to be analyzed, using EMD, to extract features and to reduce its dimensionality, using RELIEF and PCA, to improve the success rate of classification. The first part uses an EMG system of Delsys initially for an individual and then for six people with the average successful classification, for these six movements at rates of over 80%. The second part involves the construction of an autonomous system EMG using an Arduino microcontroller, EMG sensors and electrodes, which are arranged in an elastic glove. Classification results in this case reached 75% of success.
|
69 |
Detekce fibrilace síní v EKG / ECG based atrial fibrillation detectionProkopová, Ivona January 2020 (has links)
Atrial fibrillation is one of the most common cardiac rhythm disorders characterized by ever-increasing prevalence and incidence in the Czech Republic and abroad. The incidence of atrial fibrillation is reported at 2-4 % of the population, but due to the often asymptomatic course, the real prevalence is even higher. The aim of this work is to design an algorithm for automatic detection of atrial fibrillation in the ECG record. In the practical part of this work, an algorithm for the detection of atrial fibrillation is proposed. For the detection itself, the k-nearest neighbor method, the support vector method and the multilayer neural network were used to classify ECG signals using features indicating the variability of RR intervals and the presence of the P wave in the ECG recordings. The best detection was achieved by a model using a multilayer neural network classification with two hidden layers. Results of success indicators: Sensitivity 91.23 %, Specificity 99.20 %, PPV 91.23 %, F-measure 91.23 % and Accuracy 98.53 %.
|
70 |
Neue Indexingverfahren für die Ähnlichkeitssuche in metrischen Räumen über großen DatenmengenGuhlemann, Steffen 08 April 2016 (has links)
Ein zunehmend wichtiges Thema in der Informatik ist der Umgang mit Ähnlichkeit in einer großen Anzahl unterschiedlicher Domänen. Derzeit existiert keine universell verwendbare Infrastruktur für die Ähnlichkeitssuche in allgemeinen metrischen Räumen. Ziel der Arbeit ist es, die Grundlage für eine derartige Infrastruktur zu legen, die in klassische Datenbankmanagementsysteme integriert werden könnte.
Im Rahmen einer Analyse des State of the Art wird der M-Baum als am besten geeignete Basisstruktur identifiziert. Dieser wird anschließend zum EM-Baum erweitert, wobei strukturelle Kompatibilität mit dem M-Baum erhalten wird. Die Abfragealgorithmen werden im Hinblick auf eine Minimierung notwendiger Distanzberechnungen optimiert. Aufbauend auf einer mathematischen Analyse der Beziehung zwischen Baumstruktur und Abfrageaufwand werden Freiheitsgrade in Baumänderungsalgorithmen genutzt, um Bäume so zu konstruieren, dass Ähnlichkeitsanfragen mit einer minimalen Anzahl an Anfrageoperationen beantwortet werden können. / A topic of growing importance in computer science is the handling of similarity in multiple heterogenous domains. Currently there is no common infrastructure to support this for the general metric space. The goal of this work is lay the foundation for such an infrastructure, which could be integrated into classical data base management systems.
After some analysis of the state of the art the M-Tree is identified as most suitable base and enhanced in multiple ways to the EM-Tree retaining structural compatibility. The query algorithms are optimized to reduce the number of necessary distance calculations. On the basis of a mathematical analysis of the relation between the tree structure and the query performance degrees of freedom in the tree edit algorithms are used to build trees optimized for answering similarity queries using a minimal number of distance calculations.
|
Page generated in 0.0736 seconds