Global ETD Search

21	Méthodes ensembliste pour des problèmes de classification multi-vues et multi-classes avec déséquilibres / Tackling the uneven views problem with cooperation based ensemble learning methods Koco, Sokol 16 December 2013 (has links) De nos jours, dans plusieurs domaines, tels que la bio-informatique ou le multimédia, les données peuvent être représentées par plusieurs ensembles d'attributs, appelés des vues. Pour une tâche de classification donnée, nous distinguons deux types de vues : les vues fortes sont celles adaptées à la tâche, les vues faibles sont adaptées à une (petite) partie de la tâche ; en classification multi-classes, chaque vue peut s'avérer forte pour reconnaître une classe, et faible pour reconnaître d’autres classes : une telle vue est dite déséquilibrée. Les travaux présentés dans cette thèse s'inscrivent dans le cadre de l'apprentissage supervisé et ont pour but de traiter les questions d'apprentissage multi-vue dans le cas des vues fortes, faibles et déséquilibrées. La première contribution de cette thèse est un algorithme d'apprentissage multi-vues théoriquement fondé sur le cadre de boosting multi-classes utilisé par AdaBoost.MM. La seconde partie de cette thèse concerne la mise en place d'un cadre général pour les méthodes d'apprentissage de classes déséquilibrées (certaines classes sont plus représentées que les autres). Dans la troisième partie, nous traitons le problème des vues déséquilibrées en combinant notre approche des classes déséquilibrées et la coopération entre les vues mise en place pour appréhender la classification multi-vues. Afin de tester les méthodes sur des données réelles, nous nous intéressons au problème de classification d'appels téléphoniques, qui a fait l'objet du projet ANR DECODA. Ainsi chaque partie traite différentes facettes du problème. / Nowadays, in many fields, such as bioinformatics or multimedia, data may be described using different sets of features, also called views. For a given classification task, we distinguish two types of views:strong views, which are suited for the task, and weak views suited for a (small) part of the task; in multi-class learning, a view can be strong with respect to some (few) classes and weak for the rest of the classes: these are imbalanced views. The works presented in this thesis fall in the supervised learning setting and their aim is to address the problem of multi-view learning under strong, weak and imbalanced views, regrouped under the notion of uneven views. The first contribution of this thesis is a multi-view learning algorithm based on the same framework as AdaBoost.MM. The second part of this thesis proposes a unifying framework for imbalanced classes supervised methods (some of the classes are more represented than others). In the third part of this thesis, we tackle the uneven views problem through the combination of the imbalanced classes framework and the between-views cooperation used to take advantage of the multiple views. In order to test the proposed methods on real-world data, we consider the task of phone calls classifications, which constitutes the subject of the ANR DECODA project. Each part of this thesis deals with different aspects of the problem. Apprentissage automatique Apprentissage supervisé Apprentissage multi-vues Vues déséquilibrées Méthodes ensemblistes Coopération entre vues Matrices de confusion Classes déséquilibrées Boosting Machine learning Supervised learning Multi-view learning Uneven views Ensemble methods Between-views cooperation Confusion matrix Imbalanced classes Boosting 004
22	Srovnání vybraných klasifikačních metod pro vícerozměrná data / Comparison of selected classification methods for multivariate data Stecenková, Marina January 2012 (has links) The aim of this thesis is comparison of selected classification methods which are logistic regression (binary and multinominal), multilayer perceptron and classification trees, CHAID and CRT. The first part is reminiscent of the theoretical basis of these methods and explains the nature of parameters of the models. The next section applies the above classification methods to the six data sets and then compares the outputs of these methods. Particular emphasis is placed on the discriminatory power rating models, which a separate chapter is devoted to. Rating discriminatory power of the model is based on the overall accuracy, F-measure and size of the area under the ROC curve. The benefit of this work is not only a comparison of selected classification methods based on statistical models evaluating discriminatory power, but also an overview of the strengths and weaknesses of each method.
23	Využití umělé inteligence ve vibrodiagnostice / Utilization of artificial intelligence in vibrodiagnostics Dočekalová, Petra January 2021 (has links) The diploma thesis deals with machine learning, expert systems, fuzzy logic, genetic algorithms, neural networks and chaos theory, which fall into the category of artificial intelligence. The aim of this work is to describe and implement three different classification methods, according to which the data set will be processed. The GNU Octave software environment was chosen for the data application for licensing reasons. Further evaluate the success of data classification, including visualization. Three different classification methods are used for comparison, so that we can compare the processed data with each other.
24	Machine Learning based Predictive Data Analytics for Embedded Test Systems Al Hanash, Fayad January 2023 (has links) Organizations gather enormous amounts of data and analyze these data to extract insights that can be useful for them and help them to make better decisions. Predictive data analytics is a crucial subfield within data analytics that make accurate predictions. Predictive data analytics extracts insights from data by using machine learning algorithms. This thesis presents the supervised learning algorithm to perform predicative data analytics in Embedded Test System at the Nordic Engineering Partner company. Predictive Maintenance is a concept that is often used in manufacturing industries which refers to predicting asset failures before they occur. The machine learning algorithms used in this thesis are support vector machines, multi-layer perceptrons, random forests, and gradient boosting. Both binary and multi-class classifier have been provided to fit the models, and cross-validation, sampling techniques, and a confusion matrix have been provided to accurately measure their performance. In addition to accuracy, recall, precision, f1, kappa, mcc, and roc auc measurements are used as well. The prediction models that are fitted achieve high accuracy. Machine learning Artificial Intelligence Predictive data analytics Embedded test systems Confusion matrix Predictive maintenance Support vector machines Random forest Gradient Boosting Multi-layer perceptron Binary classification Multi-class classification Computer Sciences Datavetenskap (datalogi)
25	Predicting Customer Churn in a Subscription-Based E-Commerce Platform Using Machine Learning Techniques Aljifri, Ahmed January 2024 (has links) This study investigates the performance of Logistic Regression, k-Nearest Neighbors (KNN), and Random Forest algorithms in predicting customer churn within an e-commerce platform. The choice of the mentioned algorithms was due to the unique characteristics of the dataset and the unique perception and value provided by each algorithm. Iterative models ‘examinations, encompassing preprocessing techniques, feature engineering, and rigorous evaluations, were conducted. Logistic Regression showcased moderate predictive capabilities but lagged in accurately identifying potential churners due to its assumptions of linearity between log odds and predictors. KNN emerged as the most accurate classifier, achieving superior sensitivity and specificity (98.22% and 96.35%, respectively), outperforming other models. Random Forest, with sensitivity and specificity (91.75% and 95.83% respectively) excelled in specificity but slightly lagged in sensitivity. Feature importance analysis highlighted "Tenure" as the most impactful variable for churn prediction. Preprocessing techniques differed in performance across models, emphasizing the importance of tailored preprocessing. The study's findings underscore the significance of continuous model refinement and optimization in addressing complex business challenges like customer churn. The insights serve as a foundation for businesses to implement targeted retention strategies, mitigating customer attrition, and promote growth in e-commerce platforms. Customer churn prediction E-commerce Machine learning algorithms Logistic Regression k-Nearest Neighbors (KNN) Random Forest Feature engineering Preprocessing techniques Model evaluation performance measures supervised machine learning classification confusion matrix. Computer Sciences Datavetenskap (datalogi)
26	Klientų duomenų valdymas bankininkystėje / Client data management in banking Žiupsnys, Giedrius 09 July 2011 (has links) Darbas apima banko klientų kredito istorinių duomenų dėsningumų tyrimą. Pirmiausia nagrinėjamos banko duomenų saugyklos, siekiant kuo geriau perprasti bankinius duomenis. Vėliau naudojant banko duomenų imtis, kurios apima kreditų grąžinimo istoriją, siekiama įvertinti klientų nemokumo riziką. Tai atliekama adaptuojant algoritmus bei programinę įrangą duomenų tyrimui, kuris pradedamas nuo informacijos apdorojimo ir paruošimo. Paskui pritaikant įvairius klasifikavimo algoritmus, sudarinėjami modeliai, kuriais siekiama kuo tiksliau suskirstyti turimus duomenis, nustatant nemokius klientus. Taip pat siekiant įvertinti kliento vėluojamų mokėti paskolą dienų skaičių pasitelkiami regresijos algoritmai bei sudarinėjami prognozės modeliai. Taigi darbo metu atlikus numatytus tyrimus, pateikiami duomenų vitrinų modeliai, informacijos srautų schema. Taip pat nurodomi klasifikavimo ir prognozavimo modeliai bei algoritmai, geriausiai įvertinantys duotas duomenų imtis. / This work is about analysing regularities in bank clients historical credit data. So first of all bank information repositories are analyzed to comprehend banks data. Then using data mining algorithms and software for bank data sets, which describes credit repayment history, clients insolvency risk is being tried to estimate. So first step in analyzis is information preprocessing for data mining. Later various classification algorithms is used to make models wich classify our data sets and help to identify insolvent clients as accurate as possible. Besides clasiffication, regression algorithms are analyzed and prediction models are created. These models help to estimate how long client are late to pay deposit. So when researches have been done data marts and data flow schema are presented. Also classification and regressions algorithms and models, which shows best estimation results for our data sets, are introduced. Duomenų tyrimas Duomenų vitrina Kredito rizikos vertinimas Klasifikavimas Prognozavimas Kryžminis patikrinimas Nesutapimų matrica Tiesinė regresija Klasifikavimo taisyklė Sprendimų medis. data mining Data mart Credit risk estimation Classification Regression Cross validation Confusion matrix Linear regression Classification rule Desicion tree
27	Engineering Ecosystems of Systems: UML Profile, Credential Design, and Risk-balanced Cellular Access Control Bissessar, David 14 December 2021 (has links) This thesis proposes an Ecosystem perspective for the engineering of SoS and CPS and illustrates the impact of this perspective in three areas of contribution category First, from a conceptual and Systems Engineering perspective, a conceptual framework including the Ecosystems of System Unified Language Modeling (EoS-UML) profile, a set of Ecosystem Ensemble Diagrams, the Arms :Length Trust Model and the Cyber Physical Threat Model are provided. Second, having established this conceptual view of the ecosystem, we recognize unique role of the cryptographic credentials within it, towards enabling the ecosystem long-term value proposition and acting as a value transfer agent, implementing careful balance of properties meet stakeholder needs. Third, we propose that the ecosystem computers can be used as a distributed compute engine to run Collaborative Algorithms. To demonstrate, we define access control scheme, risk-balanced Cellular Access Control (rbCAC). The rbCAC algorithm defines access control within a cyber-physical environment in a manner which balances cost, risk, and net utility in a multi-authority setting. rbCAC is demonstrated it in an Air Travel and Border Services scenario. Other domains are also discussed included air traffic control threat prevention from drone identity attacks in protected airspaces. These contributions offer significant material for future development, ongoing credential and ecosystem design, including dynamic perimeters and continuous-time sampling, intelligent and self optimizing ecosystems, runtime collaborative platform design contracts and constraints, and analysis of APT attacks to SCADA systems using ecosystem approaches. cryptography digital credentials biometrics fuzzy extractors ecosystems systems engineering UML EoS-UML digital credential design rbCAC SoS Systems of Systems CPS Cyber-physical Systems Distributed Computing Collaborative Computing Design by Contract Design by Smart Contract rbCAC Ecosystem Ensemble Diagram Confusion Matrix Classifier Evaluation Emergent Behavior Ecosystems Enginering
28	Klasifikace emailové komunikace / Classification of eMail Communication Piják, Marek January 2018 (has links) This diploma's thesis is based around creating a classifier, which will be able to recognize an email communication received by Topefekt.s.r.o on daily basis and assigning it into classification class. This project will implement some of the most commonly used classification methods including machine learning. Thesis will also include evaluation comparing all used methods.
29	Využití umělé inteligence v technické diagnostice / Utilization of artificial intelligence in technical diagnostics Konečný, Antonín January 2021 (has links) The diploma thesis is focused on the use of artificial intelligence methods for evaluating the fault condition of machinery. The evaluated data are from a vibrodiagnostic model for simulation of static and dynamic unbalances. The machine learning methods are applied, specifically supervised learning. The thesis describes the Spyder software environment, its alternatives, and the Python programming language, in which the scripts are written. It contains an overview with a description of the libraries (Scikit-learn, SciPy, Pandas ...) and methods — K-Nearest Neighbors (KNN), Support Vector Machines (SVM), Decision Trees (DT) and Random Forests Classifiers (RF). The results of the classification are visualized in the confusion matrix for each method. The appendix includes written scripts for feature engineering, hyperparameter tuning, evaluation of learning success and classification with visualization of the result.
30	Moderní řečové příznaky používané při diagnóze chorob / State of the art speech features used during the Parkinson disease diagnosis Bílý, Ondřej January 2011 (has links) This work deals with the diagnosis of Parkinson's disease by analyzing the speech signal. At the beginning of this work there is described speech signal production. The following is a description of the speech signal analysis, its preparation and subsequent feature extraction. Next there is described Parkinson's disease and change of the speech signal by this disability. The following describes the symptoms, which are used for the diagnosis of Parkinson's disease (FCR, VSA, VOT, etc.). Another part of the work deals with the selection and reduction symptoms using the learning algorithms (SVM, ANN, k-NN) and their subsequent evaluation. In the last part of the thesis is described a program to count symptoms. Further is described selection and the end evaluated all the result.

Search results