Global ETD Search

61	中文詞彙集的來源與權重對中文裁判書分類成效的影響 / Exploring the Influences of Lexical Sources and Term Weights on the Classification of Chinese Judgment Documents 鄭人豪, Cheng, Jen-Hao Unknown Date (has links) 國外法學資訊系統已研究多年，嘗試利用科技幫助提昇司法審判的效率。重要的議題包括輔助判決，法律文件分類，或是相似案件搜尋等。本研究將針對中文裁判書的分類做進一步談討。在文件特徵表示方面，我們以有序詞組來表達中文裁判書，我們嘗試比較採用不同的詞彙來源對於分類效果的影響。實驗中我們分別採用一般通用的電子詞典建立一般詞組；以及以演算法取出法學專業詞彙集建立專業詞組。並依tf-idf(term frequency – inverse document frequency)的概念，設計兩種詞組權重tpf-idf(term pair frequency – inverse document frequency)以及tpf-icf(term pair frequency – inverse category frequency)，來計算特徵詞組權重。在文件分類演算法方面，我們實作以相似度為基礎的k最近鄰居法作為系統分類機制，藉由裁判書的案由欄位，將案例分為七種類別，分別為竊盜、搶奪、強盜、贓物、傷害、恐嚇以及賭博。並藉由觀察案例資料庫的相似度分佈，以找出恰當的參數，進一步得到較佳的分類正確率與較低的拒絕率。我們並依照自省式學習法的精神，建立權重調整的機制。企圖藉由自省式學習法提昇分類效果，以及找出對分類有影響的詞組。而我們以案例資料庫的相似度差異值以及距離差異值，分析調整前後案例資料庫的變化，藉以觀察自省式學習法的效果。 / Legal information systems for non-Chinese languages have been studied intensively in the past many years. There are several topics under discussion, such as judgment assistance, legal document classification, and similar case search, and so on. This thesis studies the classification of Chinese judgment documents. I use phrases as the indices for documents. I attempt to compare the influences of different lexical sources for segmenting Chinese text. One of the lexical sources is a general machine-readable dictionary, Hownet, and the other is the set of terms algorithmically extracted from legal documents. Based on the concept of tf-idf, I design two kinds of phrase weights: tpf-idf and tpf-icf. In the experiments, I use the k-nearest neighbor method to classify Chinese judgment documents into seven categories based on their prosecution reasons: larceny(竊盜), robbery (搶奪), robbery by threatening or disabling the victims (強盜), receiving stolen property (贓物), causing bodily harm (傷害), intimidation (恐嚇), and gambling(賭博). To achieve high accuracy with low rejection rates, I observe and discuss the distribution of similarity of the training documents to select appropriate parameters. In addition, I also conduct a set of analogous experiments for classifying documents based on the cited legal articles for gambling cases. To improve the classification effects, I apply the introspective learning technique to adjust the weights of phrases. I observe the intra-cluster similarity and inter-cluster similarity in evaluating the effects of weight adjustment on experiments for classifying documents based on their prosecution reasons and cited articles. 法學資訊系統自然語言處理 k最近鄰居法自省式學習法 Legal information system Natural language processing k nearest neighbor introspective learning
62	Classification of uncertain data in the framework of belief functions : nearest-neighbor-based and rule-based approaches / Classification des données incertaines dans le cadre des fonctions de croyance : la métode des k plus proches voisins et la méthode à base de règles Jiao, Lianmeng 26 October 2015 (has links) Dans de nombreux problèmes de classification, les données sont intrinsèquement incertaines. Les données d’apprentissage disponibles peuvent être imprécises, incomplètes, ou même peu fiables. En outre, des connaissances spécialisées partielles qui caractérisent le problème de classification peuvent également être disponibles. Ces différents types d’incertitude posent de grands défis pour la conception de classifieurs. La théorie des fonctions de croyance fournit un cadre rigoureux et élégant pour la représentation et la combinaison d’une grande variété d’informations incertaines. Dans cette thèse, nous utilisons cette théorie pour résoudre les problèmes de classification des données incertaines sur la base de deux approches courantes, à savoir, la méthode des k plus proches voisins (kNN) et la méthode à base de règles.Pour la méthode kNN, une préoccupation est que les données d’apprentissage imprécises dans les régions où les classes de chevauchent peuvent affecter ses performances de manière importante. Une méthode d’édition a été développée dans le cadre de la théorie des fonctions de croyance pour modéliser l’information imprécise apportée par les échantillons dans les régions qui se chevauchent. Une autre considération est que, parfois, seul un ensemble de données d’apprentissage incomplet est disponible, auquel cas les performances de la méthode kNN se dégradent considérablement. Motivé par ce problème, nous avons développé une méthode de fusion efficace pour combiner un ensemble de classifieurs kNN couplés utilisant des métriques couplées apprises localement. Pour la méthode à base de règles, afin d’améliorer sa performance dans les applications complexes, nous étendons la méthode traditionnelle dans le cadre des fonctions de croyance. Nous développons un système de classification fondé sur des règles de croyance pour traiter des informations incertains dans les problèmes de classification complexes. En outre, dans certaines applications, en plus de données d’apprentissage, des connaissances expertes peuvent également être disponibles. Nous avons donc développé un système de classification hybride fondé sur des règles de croyance permettant d’utiliser ces deux types d’information pour la classification. / In many classification problems, data are inherently uncertain. The available training data might be imprecise, incomplete, even unreliable. Besides, partial expert knowledge characterizing the classification problem may also be available. These different types of uncertainty bring great challenges to classifier design. The theory of belief functions provides a well-founded and elegant framework to represent and combine a large variety of uncertain information. In this thesis, we use this theory to address the uncertain data classification problems based on two popular approaches, i.e., the k-nearest neighbor rule (kNN) andrule-based classification systems. For the kNN rule, one concern is that the imprecise training data in class over lapping regions may greatly affect its performance. An evidential editing version of the kNNrule was developed based on the theory of belief functions in order to well model the imprecise information for those samples in over lapping regions. Another consideration is that, sometimes, only an incomplete training data set is available, in which case the ideal behaviors of the kNN rule degrade dramatically. Motivated by this problem, we designedan evidential fusion scheme for combining a group of pairwise kNN classifiers developed based on locally learned pairwise distance metrics.For rule-based classification systems, in order to improving their performance in complex applications, we extended the traditional fuzzy rule-based classification system in the framework of belief functions and develop a belief rule-based classification system to address uncertain information in complex classification problems. Further, considering that in some applications, apart from training data collected by sensors, partial expert knowledge can also be available, a hybrid belief rule-based classification system was developed to make use of these two types of information jointly for classification. Classification à base de règles Classifieurs Fusion de données Théorie des fonctions de croyances Gestion de l'incertitude K plus proches voisins Data classification Information fusion Uncertainty management Theory of belief functions K-nearest neighbor rule Rule-based classification system
63	Bank Customer Churn Prediction : A comparison between classification and evaluation methods Tandan, Isabelle, Goteman, Erika January 2020 (has links) This study aims to assess which supervised statistical learning method; random forest, logistic regression or K-nearest neighbor, that is the best at predicting banks customer churn. Additionally, the study evaluates which cross-validation set approach; k-Fold cross-validation or leave-one-out cross-validation that yields the most reliable results. Predicting customer churn has increased in popularity since new technology, regulation and changed demand has led to an increase in competition for banks. Thus, with greater reason, banks acknowledge the importance of maintaining their customer base. The findings of this study are that unrestricted random forest model estimated using k-Fold is to prefer out of performance measurements, computational efficiency and a theoretical point of view. Albeit, k-Fold cross-validation and leave-one-out cross-validation yield similar results, k-Fold cross-validation is to prefer due to computational advantages. For future research, methods that generate models with both good interpretability and high predictability would be beneficial. In order to combine the knowledge of which customers end their engagement as well as understanding why. Moreover, interesting future research would be to analyze at which dataset size leave-one-out cross-validation and k-Fold cross-validation yield the same results. machine learning cross-validation k-fold leave-one-out random forest decision trees k-nearest neighbor logistic regression supervised learning supervised statistical learning binary classification customer churn bank customer churn. Probability Theory and Statistics Sannolikhetsteori och statistik
64	Detekce fibrilace síní v krátkodobých EKG záznamech / Detection of atrial fibrillation in short-term ECG Ambrožová, Monika January 2019 (has links) Atrial fibrillation is diagnosed in 1-2% of the population, in next decades, it expects a significant increase in the number of patients with this arrhythmia in connection with the aging of the population and the higher incidence of some diseases that are considered as risk factors of atrial fibrillation. The aim of this work is to describe the problem of atrial fibrillation and the methods that allow its detection in the ECG record. In the first part of work there is a theory dealing with cardiac physiology and atrial fibrillation. There is also basic descreption of the detection of atrial fibrillation. In the practical part of work, there is described software for detection of atrial fibrillation, which is provided by BTL company. Furthermore, an atrial fibrillation detector is designed. Several parameters were selected to detect the variation of RR intervals. These are the parameters of the standard deviation, coefficient of skewness and kurtosis, coefficient of variation, root mean square of the successive differences, normalized absolute deviation, normalized absolute difference, median absolute deviation and entropy. Three different classification models were used: support vector machine (SVM), k-nearest neighbor (KNN) and discriminant analysis classification. The SVM classification model achieves the best results. Results of success indicators (sensitivity: 67.1%; specificity: 97.0%; F-measure: 66.8%; accuracy: 92.9%).
65	Adaptivní klient pro sociální síť Twitter / Adaptive Client for Twitter Social Network Guňka, Jiří January 2011 (has links) The goal of this term project is create user friendly client of Twitter. They may use methods of machine learning as naive bayes classifier to mentions new interests tweets. For visualissation this tweets will be use hyperbolic trees and some others methods.
66	Αναγνώριση βασικών κινήσεων του χεριού με χρήση ηλεκτρομυογραφήματος / Recognition of basic hand movements using electromyography Σαψάνης, Χρήστος 13 October 2013 (has links) Ο στόχος αυτής της εργασίας ήταν η αναγνώριση έξι βασικών κινήσεων του χεριού με χρήση δύο συστημάτων. Όντας θέμα διεπιστημονικού επιπέδου έγινε μελέτη της ανατομίας των μυών του πήχη, των βιοσημάτων, της μεθόδου της ηλεκτρομυογραφίας (ΗΜΓ) και μεθόδων αναγνώρισης προτύπων. Παράλληλα, το σήμα περιείχε αρκετό θόρυβο και έπρεπε να αναλυθεί, με χρήση του EMD, να εξαχθούν χαρακτηριστικά αλλά και να μειωθεί η διαστασιμότητά τους, με χρήση των RELIEF και PCA, για βελτίωση του ποσοστού επιτυχίας ταξινόμησης. Στο πρώτο μέρος γίνεται χρήση συστήματος ΗΜΓ της Delsys αρχικά σε ένα άτομο και στη συνέχεια σε έξι άτομα με το κατά μέσο όρο επιτυχημένης ταξινόμησης, για τις έξι αυτές κινήσεις, να αγγίζει ποσοστά άνω του 80%. Το δεύτερο μέρος περιλαμβάνει την κατασκευή αυτόνομου συστήματος ΗΜΓ με χρήση του Arduino μικροελεγκτή, αισθητήρων ΗΜΓ και ηλεκτροδίων, τα οποία είναι τοποθετημένα σε ένα ελαστικό γάντι. Τα αποτελέσματα ταξινόμησης σε αυτή την περίπτωση αγγίζουν το 75%. / The aim of this work was to identify six basic movements of the hand using two systems. Being an interdisciplinary topic, there has been conducted studying in the anatomy of forearm muscles, biosignals, the method of electromyography (EMG) and methods of pattern recognition. Moreover, the signal contained enough noise and had to be analyzed, using EMD, to extract features and to reduce its dimensionality, using RELIEF and PCA, to improve the success rate of classification. The first part uses an EMG system of Delsys initially for an individual and then for six people with the average successful classification, for these six movements at rates of over 80%. The second part involves the construction of an autonomous system EMG using an Arduino microcontroller, EMG sensors and electrodes, which are arranged in an elastic glove. Classification results in this case reached 75% of success. RELIEF αλγόριθμος Αναγνώριση προτύπων 612.76 Biomedical signal analysis RELIEF algorithm Empirical Mode Decomposition (EMD) Principal Component Analysis (PCA) Pattern recognition Arduino Support Vector Machines (SVM) K - nearest neighbor (KNN) Feature selection Electromyography (EMG)
67	Detekce fibrilace síní v EKG / ECG based atrial fibrillation detection Prokopová, Ivona January 2020 (has links) Atrial fibrillation is one of the most common cardiac rhythm disorders characterized by ever-increasing prevalence and incidence in the Czech Republic and abroad. The incidence of atrial fibrillation is reported at 2-4 % of the population, but due to the often asymptomatic course, the real prevalence is even higher. The aim of this work is to design an algorithm for automatic detection of atrial fibrillation in the ECG record. In the practical part of this work, an algorithm for the detection of atrial fibrillation is proposed. For the detection itself, the k-nearest neighbor method, the support vector method and the multilayer neural network were used to classify ECG signals using features indicating the variability of RR intervals and the presence of the P wave in the ECG recordings. The best detection was achieved by a model using a multilayer neural network classification with two hidden layers. Results of success indicators: Sensitivity 91.23 %, Specificity 99.20 %, PPV 91.23 %, F-measure 91.23 % and Accuracy 98.53 %.
68	Neue Indexingverfahren für die Ähnlichkeitssuche in metrischen Räumen über großen Datenmengen Guhlemann, Steffen 08 April 2016 (has links) Ein zunehmend wichtiges Thema in der Informatik ist der Umgang mit Ähnlichkeit in einer großen Anzahl unterschiedlicher Domänen. Derzeit existiert keine universell verwendbare Infrastruktur für die Ähnlichkeitssuche in allgemeinen metrischen Räumen. Ziel der Arbeit ist es, die Grundlage für eine derartige Infrastruktur zu legen, die in klassische Datenbankmanagementsysteme integriert werden könnte. Im Rahmen einer Analyse des State of the Art wird der M-Baum als am besten geeignete Basisstruktur identifiziert. Dieser wird anschließend zum EM-Baum erweitert, wobei strukturelle Kompatibilität mit dem M-Baum erhalten wird. Die Abfragealgorithmen werden im Hinblick auf eine Minimierung notwendiger Distanzberechnungen optimiert. Aufbauend auf einer mathematischen Analyse der Beziehung zwischen Baumstruktur und Abfrageaufwand werden Freiheitsgrade in Baumänderungsalgorithmen genutzt, um Bäume so zu konstruieren, dass Ähnlichkeitsanfragen mit einer minimalen Anzahl an Anfrageoperationen beantwortet werden können. / A topic of growing importance in computer science is the handling of similarity in multiple heterogenous domains. Currently there is no common infrastructure to support this for the general metric space. The goal of this work is lay the foundation for such an infrastructure, which could be integrated into classical data base management systems. After some analysis of the state of the art the M-Tree is identified as most suitable base and enhanced in multiple ways to the EM-Tree retaining structural compatibility. The query algorithms are optimized to reduce the number of necessary distance calculations. On the basis of a mathematical analysis of the relation between the tree structure and the query performance degrees of freedom in the tree edit algorithms are used to build trees optimized for answering similarity queries using a minimal number of distance calculations. info:eu-repo/classification/ddc/004 ddc:004
69	Topics in random matrices and statistical machine learning / ランダム行列と統計的機械学習について Sushma, Kumari 25 September 2018 (has links) 京都大学 / 0048 / 新制・課程博士 / 博士(理学) / 甲第21327号 / 理博第4423号 / 新制\|\|理\|\|1635(附属図書館) / 京都大学大学院理学研究科数学・数理解析専攻 / (主査)准教授 COLLINS,Benoit Vincent Pierre, 教授泉正己, 教授日野正訓 / 学位規則第4条第1項該当 / Doctor of Science / Kyoto University / DFAM Wishart matrices $(m, n, beta)$-Laguerre matrices compound Wishart matrices joint eigenvalue density gap probability inverse moments finiteness condition $k$-nearest neighbor rule Stone`s theorem metrically sigma-finite dimensional space Nagata dimension generalized Stone`s lemma, Preiss` result weak and strong consistency 400
70	Classification of Radar Emitters Based on Pulse Repetition Interval using Machine Learning Svensson, André January 2022 (has links) In electronic warfare, one of the key technologies is radar. Radar is used to detect and identify unknown aerial, nautical or land-based objects. An attribute of of a pulsed radar signal is the Pulse Repetition Interval (PRI) which is the time interval between pulses in a pulse train. In a passive radar receiver system, the PRI can be used to recognize the emitter system. Correct classification of emitter systems is a crucial part of Electronic Support Measures (ESM) and Radar Warning Receivers (RWR) in order to deploy appropriate measures depending on the emitter system. Inaccurate predictions of emitter systems can have lethal consequences and variables such as time and confidence in the predictions are essential for an effective predictive method. Due to the classified nature of military systems and techniques, there are no industry standard systems or techniques that perform quick and accurate classifications of emitter systems based on PRI. Therefore, methods that allows for fast and accurate predictions based on PRI is highly desirable and worthy of research. This thesis explores and compares the capabilities of two machine learning methods for the task of classifying emitters based on received PRI. The first method is an attention based model which performs well throughout all levels of realistic noise and is quick to learn and even quicker to give accurate predictions. The second method is a K-Nearest Neighbor (KNN) implementation that, while performing well for noise-free PRI, finds its performance degrading as the amount of noise increases. An additional outcome of this thesis is the development of a system to generate samples in an automated fashion. The attention based model performs well, achieving a macro avarage F1-score of 63% in the 59-class recognition task whereas the performance of the KNN is lower, achieving a macro avarage F1-score of 43%. Future research could be conducted with the purpose of designing a better attention based model for producing higher and more confident predictions and designing algorithms to reduce the time complexity of the KNN implementation. / En av de viktigaste teknikerna inom telektrig är radarn. Radar används för att upptäcka och identifiera okända, luftburna, sjögående eller landbaserade förmål. En komponent av radar är Pulsrepetitionsinterval (Pulse Repetition Intervall, PRI) som beskrivs som tidsintervallet mellan två inkommande pulser. I ett radarvarnar system (Radar Warning Receiver, RWR) kan PRI användas för att identifiera radarsystem. Korrekt identifiering av radarsystem är en viktig uppgift för elektroniska understödsmedel (Electronic Support Measures, ESM) med syfte att tillsätta lämpliga medel beroende på radarsystemet i fråga. Icke tillförlitlig identifiering av radarsystem kan ha dödliga konsekvenser och variabler som tid och säkerhet i identifieringen är avgörande för ett effektivt system. Då dokumentation och specifikationer för militära system i regel är hemligstämplade är det svårt att utröna någon typ av industristandard för att utföra snabb och säker klassificering av radarsystem baserat på PRI. Därför är det av stort intresse detta område och möjligheterna för sådana lösningar utforskas. Detta examensarbete utforskar och jämför förmågorna hos två maskininlärningsmetoder i avseende att korrekt identifiera radarsändare baserat på genererat PRI. Den första metoden är ett djupt neuralt nätverk som använder sig av tekniken ”attention”. Det djupa nätverket presterar bra för alla brusnivåer och lär sig snabbt att känna igen attributen hos PRI som kännetecknar vilken radarsändare och som efter träning dessutom är snabb på att korrekt identifiera PRI. Den andra metoden är en K-Nearest Neighbor implementation som förvisso presterar bra på icke brusig data men vars förmåga försämras allt eftersom brusnivåerna ökar. Ett ytterligare resultat av arbetet är utvecklingen och implementationen av en metod för att specificera PRI och sedan generera PRI efter specifikation. Attention modellen genererar bra prediktioner för data bestående av 59 klasser, med ett F1-score snitt om 63% medan KNN-implementationen för samma uppgift har en lägre träffsäkerhet med ett F1-score snitt om 43%. Vidare forskning kan innefatta utökad utveckling av det djupa, neurala nätverket i syfte att förbättra dess förmåga för identifiering och metoder för att minimera tidsåtgången för KNN implementationen. Radar Warning Systems Pulse Repetition Interval Artificial Neural Networks Attention Transformers K-Nearest Neighbour Classification Data Generation Radar Temporal Sequences Aritficiella Neurala Nätverk Tidsserie data Puls Repetitions Intervall Attention Transformers K-Nearest Neighbor Klassificering Datagenerering Radar Computer and Information Sciences Data- och informationsvetenskap

Search results