Global ETD Search

11	System Complexity Reduction via Feature Selection January 2011 (has links) abstract: This dissertation transforms a set of system complexity reduction problems to feature selection problems. Three systems are considered: classification based on association rules, network structure learning, and time series classification. Furthermore, two variable importance measures are proposed to reduce the feature selection bias in tree models. Associative classifiers can achieve high accuracy, but the combination of many rules is difficult to interpret. Rule condition subset selection (RCSS) methods for associative classification are considered. RCSS aims to prune the rule conditions into a subset via feature selection. The subset then can be summarized into rule-based classifiers. Experiments show that classifiers after RCSS can substantially improve the classification interpretability without loss of accuracy. An ensemble feature selection method is proposed to learn Markov blankets for either discrete or continuous networks (without linear, Gaussian assumptions). The method is compared to a Bayesian local structure learning algorithm and to alternative feature selection methods in the causal structure learning problem. Feature selection is also used to enhance the interpretability of time series classification. Existing time series classification algorithms (such as nearest-neighbor with dynamic time warping measures) are accurate but difficult to interpret. This research leverages the time-ordering of the data to extract features, and generates an effective and efficient classifier referred to as a time series forest (TSF). The computational complexity of TSF is only linear in the length of time series, and interpretable features can be extracted. These features can be further reduced, and summarized for even better interpretability. Lastly, two variable importance measures are proposed to reduce the feature selection bias in tree-based ensemble models. It is well known that bias can occur when predictor attributes have different numbers of values. Two methods are proposed to solve the bias problem. One uses an out-of-bag sampling method called OOBForest, and the other, based on the new concept of a partial permutation test, is called a pForest. Experimental results show the existing methods are not always reliable for multi-valued predictors, while the proposed methods have advantages. / Dissertation/Thesis / Ph.D. Industrial Engineering 2011 Industrial Engineering Artificial Intelligence Information Technology associative classification attribute importance feature selection random forest time series classification
12	A machine learning approach for automatic and generic side-channel attacks Lerman, Liran 10 June 2015 (has links) L'omniprésence de dispositifs interconnectés amène à un intérêt massif pour la sécurité informatique fournie entre autres par le domaine de la cryptographie. Pendant des décennies, les spécialistes en cryptographie estimaient le niveau de sécurité d'un algorithme cryptographique indépendamment de son implantation dans un dispositif. Cependant, depuis la publication des attaques d'implantation en 1996, les attaques physiques sont devenues un domaine de recherche actif en considérant les propriétés physiques de dispositifs cryptographiques. Dans notre dissertation, nous nous concentrons sur les attaques profilées. Traditionnellement, les attaques profilées appliquent des méthodes paramétriques dans lesquelles une information a priori sur les propriétés physiques est supposée. Le domaine de l'apprentissage automatique produit des modèles automatiques et génériques ne nécessitant pas une information a priori sur le phénomène étudié.<p><p>Cette dissertation apporte un éclairage nouveau sur les capacités des méthodes d'apprentissage automatique. Nous démontrons d'abord que les attaques profilées paramétriques surpassent les méthodes d'apprentissage automatique lorsqu'il n'y a pas d'erreur d'estimation ni d'hypothèse. En revanche, les attaques fondées sur l'apprentissage automatique sont avantageuses dans des scénarios réalistes où le nombre de données lors de l'étape d'apprentissage est faible. Par la suite, nous proposons une nouvelle métrique formelle d'évaluation qui permet (1) de comparer des attaques paramétriques et non-paramétriques et (2) d'interpréter les résultats de chaque méthode. La nouvelle mesure fournit les causes d'un taux de réussite élevé ou faible d'une attaque et, par conséquent, donne des pistes pour améliorer l'évaluation d'une implantation. Enfin, nous présentons des résultats expérimentaux sur des appareils non protégés et protégés. La première étude montre que l'apprentissage automatique a un taux de réussite plus élevé qu'une méthode paramétrique lorsque seules quelques données sont disponibles. La deuxième expérience démontre qu'un dispositif protégé est attaquable avec une approche appartenant à l'apprentissage automatique. La stratégie basée sur l'apprentissage automatique nécessite le même nombre de données lors de la phase d'apprentissage que lorsque celle-ci attaque un produit non protégé. Nous montrons également que des méthodes paramétriques surestiment ou sous-estiment le niveau de sécurité fourni par l'appareil alors que l'approche basée sur l'apprentissage automatique améliore cette estimation. <p><p>En résumé, notre thèse est que les attaques basées sur l'apprentissage automatique sont avantageuses par rapport aux techniques classiques lorsque la quantité d'information a priori sur l'appareil cible et le nombre de données lors de la phase d'apprentissage sont faibles. / Doctorat en Sciences / info:eu-repo/semantics/nonPublished Informatique générale Cryptography Machine learning Cryptographie Apprentissage automatique time series classification cryptanalysis cryptography side-channel attack machine learning power analysis
13	Investigation of Machine Learning Methods for Anomaly Detection and Characterisation of Cable Shoe Pressing Processes Härenby Deak, Elliot January 2021 (has links) The ability to reliably connect electrical cables is important in many applications. A poor connection can become a fire hazard, so it is important that cables are always appropriately connected. This thesis investigates methods for monitoring of a machine that presses cable connectors onto cables. Using sensor data from the machine, would it be possible to create an algorithm that can automatically identify the cable and connector and thus make decisions on how a connector should be pressed for successful attachment? Furthermore, would it be possible to create an anomaly detection algorithm that is able to detect whether a connector has been incorrectly pressed by the end user? If these two questions can be addressed, the solutions would minimise the likelihood of errors, and enable detection of errors that anyway do arise. In this thesis, it is shown that the k-Nearest Neighbour (kNN) algorithm and Long Short-Term Memory (LSTM) network are both successful in classification of connectors and cables, both performing with 100% accuracy on the test set. The LSTM is the more promising alternative in terms of convergence and speed, being 28 times faster as well as requiring less memory. The distance-based methods and an autoencoder are investigated for the anomaly detection task. Data corresponding to a wide variety of possible incorrect kinds of usage of the tool were collected. The best anomaly detector detects 92% of incorrect cases of varying degrees of difficulty, a number which was higher than expected. On the tasks investigated, the performance of the neural networks are equal to or higher than the performance of the alternative methods. Machine Learning LSTM kNN Time-series Classification Anomaly Detection Annan elektroteknik och elektronik
14	Classicação de séries temporais utilizando diferentes representações de dados e ensembles / Time series classification using multiple representations and ensembles Giusti, Rafael 23 August 2017 (has links) Dados temporais são ubíquos em quase todas as áreas do conhecimento humano. A área de aprendizado de máquina tem contribuído para a mineração desse tipo de dados com algoritmos para classificação, agrupamento, detecção de anomalias ou exceções e detecção de padrões recorrentes, dentre outros. Tais algoritmos dependem, muitas vezes, de uma função capaz de expressar um conceito de similaridade entre os dados. Um dos mais importantes modelos de classificação, denominado 1-NN, utiliza uma função de distância para comparar uma série temporal de interesse a um conjunto de referência, atribuindo à primeira o rótulo da série de referência mais semelhante. Entretanto, existem situações nas quais os dados temporais são insuficientes para identificar vizinhos de acordo com o conceito associado às classes. Uma possível abordagem é transportar as séries para um domínio de representação no qual atributos mais relevantes para a classificação são mais claros. Por exemplo, uma série temporal pode ser decomposta em componentes periódicas de diferentes frequências e amplitudes. Para muitas aplicações, essas componentes são muito mais significativas na discriminação das classes do que a evolução da série ao longo do tempo. Nesta Tese, emprega-se diversidade de representações e de distâncias para a classificação de séries temporais. Com base na escolha de uma representação de dados adequada para expor as características discriminativas do domínio, pode-se obter classificadores mais fiéis ao conceitoalvo. Para esse fim, promove-se um estudo de domínios de representação de dados temporais, visando identificar como esses domínios podem estabelecer espaços alternativos de decisão. Diferentes modelos do classificador 1-NN são avaliados isoladamente e associados em ensembles de classificadores a fim de se obter classificadores mais robustos. Funções de distância e domínios alternativos de representação são também utilizados neste trabalho para produzir atributos não temporais, denominados atributos de distâncias. Esses atributos refletem conceitos de vizinhança aos exemplos do conjunto de treinamento e podem ser utilizados para treinar modelos de classificação que tipicamente não são eficazes quando treinados com as observações originais. Nesta Tese mostra-se que atributos de distância permitem obter resultados compatíveis com o estado-da-arte. / Temporal data are ubiquitous in nearly all areas of human knowledge. The research field known as machine learning has contributed to temporal data mining with algorithms for classification, clustering, anomaly or exception detection, and motif detection, among others. These algorithms oftentimes are reliant on a distance function that must be capable of expressing a similarity concept among the data. One of the most important classification models, the 1-NN, employs a distance function when comparing a time series of interest against a reference set, and assigns to the former the label of the most similar reference time series. There are, however, several domains in which the temporal data are insufficient to characterize neighbors according to the concepts associated to the classes. One possible approach to this problem is to transform the time series into a representation domain in which the meaningful attributes for the classifier are more clearly expressed. For instance, a time series may be decomposed into periodic components of different frequency and amplitude values. For several applications, those components are much more meaningful in discriminating the classes than the temporal evolution of the original observations. In this work, we employ diversity of representation and distance functions for the classification of time series. By choosing a data representation that is more suitable to express the discriminating characteristics of the domain, we are able to achieve classification that are more faithful to the target-concept. With this goal in mind, we promote a study of time series representation domains, and we evaluate how such domains can provide alternative decision spaces. Different models of the 1-NN classifier are evaluated both isolated and associated in classification ensembles in order to construct more robust classifiers. We also use distance functions and alternative representation domains in order to extract nontemporal attributes, known as distance features. Distance features reflect neighborhood concepts of the instances to the training samples, and they may be used to induce classification models which are typically not as efficient when trained with the original time series observations. We show that distance features allow for classification results compatible with the state-of-the-art. Aprendizado de máquina Artificial intelligence Classificação de séries temporais Inteligência artificial Machine learning Representação de séries temporais Séries temporais Time series Time series classification Time series representation
15	Classicação de séries temporais utilizando diferentes representações de dados e ensembles / Time series classification using multiple representations and ensembles Rafael Giusti 23 August 2017 (has links) Dados temporais são ubíquos em quase todas as áreas do conhecimento humano. A área de aprendizado de máquina tem contribuído para a mineração desse tipo de dados com algoritmos para classificação, agrupamento, detecção de anomalias ou exceções e detecção de padrões recorrentes, dentre outros. Tais algoritmos dependem, muitas vezes, de uma função capaz de expressar um conceito de similaridade entre os dados. Um dos mais importantes modelos de classificação, denominado 1-NN, utiliza uma função de distância para comparar uma série temporal de interesse a um conjunto de referência, atribuindo à primeira o rótulo da série de referência mais semelhante. Entretanto, existem situações nas quais os dados temporais são insuficientes para identificar vizinhos de acordo com o conceito associado às classes. Uma possível abordagem é transportar as séries para um domínio de representação no qual atributos mais relevantes para a classificação são mais claros. Por exemplo, uma série temporal pode ser decomposta em componentes periódicas de diferentes frequências e amplitudes. Para muitas aplicações, essas componentes são muito mais significativas na discriminação das classes do que a evolução da série ao longo do tempo. Nesta Tese, emprega-se diversidade de representações e de distâncias para a classificação de séries temporais. Com base na escolha de uma representação de dados adequada para expor as características discriminativas do domínio, pode-se obter classificadores mais fiéis ao conceitoalvo. Para esse fim, promove-se um estudo de domínios de representação de dados temporais, visando identificar como esses domínios podem estabelecer espaços alternativos de decisão. Diferentes modelos do classificador 1-NN são avaliados isoladamente e associados em ensembles de classificadores a fim de se obter classificadores mais robustos. Funções de distância e domínios alternativos de representação são também utilizados neste trabalho para produzir atributos não temporais, denominados atributos de distâncias. Esses atributos refletem conceitos de vizinhança aos exemplos do conjunto de treinamento e podem ser utilizados para treinar modelos de classificação que tipicamente não são eficazes quando treinados com as observações originais. Nesta Tese mostra-se que atributos de distância permitem obter resultados compatíveis com o estado-da-arte. / Temporal data are ubiquitous in nearly all areas of human knowledge. The research field known as machine learning has contributed to temporal data mining with algorithms for classification, clustering, anomaly or exception detection, and motif detection, among others. These algorithms oftentimes are reliant on a distance function that must be capable of expressing a similarity concept among the data. One of the most important classification models, the 1-NN, employs a distance function when comparing a time series of interest against a reference set, and assigns to the former the label of the most similar reference time series. There are, however, several domains in which the temporal data are insufficient to characterize neighbors according to the concepts associated to the classes. One possible approach to this problem is to transform the time series into a representation domain in which the meaningful attributes for the classifier are more clearly expressed. For instance, a time series may be decomposed into periodic components of different frequency and amplitude values. For several applications, those components are much more meaningful in discriminating the classes than the temporal evolution of the original observations. In this work, we employ diversity of representation and distance functions for the classification of time series. By choosing a data representation that is more suitable to express the discriminating characteristics of the domain, we are able to achieve classification that are more faithful to the target-concept. With this goal in mind, we promote a study of time series representation domains, and we evaluate how such domains can provide alternative decision spaces. Different models of the 1-NN classifier are evaluated both isolated and associated in classification ensembles in order to construct more robust classifiers. We also use distance functions and alternative representation domains in order to extract nontemporal attributes, known as distance features. Distance features reflect neighborhood concepts of the instances to the training samples, and they may be used to induce classification models which are typically not as efficient when trained with the original time series observations. We show that distance features allow for classification results compatible with the state-of-the-art. Aprendizado de máquina Classificação de séries temporais Inteligência artificial Representação de séries temporais Séries temporais Artificial intelligence Machine learning Time series Time series classification Time series representation
16	Sequence classification on gamified behavior data from a learning management system : Predicting student outcome using neural networks and Markov chain Elmäng, Niclas January 2020 (has links) This study has investigated whether it is possible to classify time series data originating from a gamified learning management system. By using the school data provided by the gamification company Insert Coin AB, the aim was to distribute the teacher’s supervision more efficiently among students who are more likely to fail. Motivating this is the possibility that the student retention and completion rate can be increased. This was done by using Long short-term memory and convolutional neural networks and Markov chain to classify time series of event data. Since the classes are balanced the classification was evaluated using only the accuracy metric. The results for the neural networks show positive results but overfitting seems to occur strongly for the convolutional network and less so for the Long short-term memory network. The Markov chain show potential but further work is needed to mitigate the problem of a strong correlation between sequence length and likelihood. Long Short-term Memory Convolutional neural network Markov Chain Time series Classification Gamification Information Systems, Social aspects
17	Diagnosis of Evaporative Emissions Control System Using Physics-based and Machine Learning Methods Yang, Ruochen 24 September 2020 (has links) No description available. Automotive Engineering Mechanical Engineering Electrical Engineering Evaporative Emissions Control System Gasoline Evaporation Physics-based Model and Simulation Temporal Convolutional Network
18	A Transformer-Based Scoring Approach for Startup Success Prediction : Utilizing Deep Learning Architectures and Multivariate Time Series Classification to Predict Successful Companies Halvardsson, Gustaf January 2023 (has links) The Transformer, an attention-based deep learning architecture, has shown promising capabilities in both Natural Language Processing and Computer Vision. Recently, it has also been applied to time series classification, which has traditionally used statistical methods or the Gated Recurrent Unit (GRU). The aim of this project was to apply multivariate time series classification to evaluate Transformer-based models, in comparison with the traditional GRUs. The evaluation was done within the problem of startup success prediction at a venture and private equity firm called EQT. Four different Machine Learning (ML) models – the Univariate GRU, Multivariate GRU, Transformer Encoder, and an already existing implementation, the Time Series Transformer (TST) – were benchmarked using two public datasets and the EQT dataset which utilized an investor-centric data split. The results suggest that the TST is the best-performing model on EQT’s dataset within the scope of this project, with a 47% increase in performance – measured by the Area Under the Curve (AUC) metric – compared to the Univariate GRU, and a 12% increase compared to the Multivariate GRU. It was also the best, and third-best, performing model on the two public datasets. Additionally, the model also demonstrated the highest training stability out of all four models, and 15 times shorter training times than the Univariate GRU. The TST also presented several potential qualitative advantages such as utilizing its embeddings for downstream tasks, an unsupervised learning technique, higher explainability, and improved multi-modal compatibility. The project results, therefore, suggest that the TST is a viable alternative to the GRU architecture for multivariate time series classification within the investment domain. With its performance, stability, and added benefits, the TST is certainly worth considering for time series modeling tasks. / Transformern är en attention-baserad arkitektur skapad för djupinlärning som har demonsterat lovande kapacitet inom både naturlig språkbehandling och datorseende. Nyligen har det även tillämpats på tidsserieklassificering, som traditionellt har använt statistiska metoder eller GRU. Syftet med detta projekt var att tillämpa multivariat tidsserieklassificering för att utvärdera transformer-baserade modeller, i jämförelse med de traditionella GRUerna. Jämförelsen gjordes inom problemet med att klassificera vilka startup-företag som är potentiellt framgångsrika eller inte, och gjordes på ett risk- och privatkapitalbolag som heter EQT. Fyra olika maskininlärningsmodeller – Univariat GRU, Multivariat GRU, Transformer Encoder och en redan existerande implementering, TST – jämfördes med hjälp av två offentliga datamängder och EQT-datamängden som använde sig av en investerarcentrerad datauppdelning. Resultaten tyder på att TST är den modellen som presterar bäst på EQT:s datauppsättning inom ramen för detta projekt, med en 47% ökning i prestanda – mätt med AUC – jämfört med den univariata GRUn och en ökning på 12% jämfört med den multivariata GRUn. Det var också den bäst och tredje bäst presterande modellen på de två offentliga datamängderna. Modellen visade även den högsta träningsstabiliteten av alla fyra modellerna och 15 gånger kortare träningstider än den univariata GRUn. TST visade även flera potentiella kvalitativa fördelar som att använda dess inbäddningar för nedströmsuppgifter, en oövervakad inlärningsteknik, högre förklarabarhet och förbättrad multimodal kompatibilitet. Projektresultaten tyder därför på att TST är ett gångbart alternativ till GRUarkitekturen för multivariat tidsserieklassificering inom investeringsdomänen. Med sin prestanda, stabilitet och extra fördelar är TST verkligen värt att överväga för tidsseriemodelleringsproblem. Machine learning Time Series Classification Transformers Gated Recurrent Unit Venture Capital Maskininlärning tidsseriesklassifiering Transformer Gated Recurrent Unit riskkapital Computer Sciences Datavetenskap (datalogi)
19	Machine learning for detecting financial crime from transactional behaviour Englund, Markus January 2023 (has links) Banks and other financial institutions are to a certain extent obligated to ensure that their services are not utilized for any type of financial crime. This thesis investigates the possibility of analyzing bank customers' transactional behaviour with machine learning to detect if they are involved in financial crime. The purpose of this is to see if a new approach to processing and analyzing transaction data could make financial crime detection more accurate and efficient. Transactions of a customer over a time period are processed to form multivariate time series. These time series are then used as input to different machine learning models for time series classification. The best method involves a transform called Random Convolutional Kernel Transform that extracts features from the time series. These features are then used as input to a logistic regression model that generates probabilities of the different class labels. This method achieves a ROC AUC-score of 0.856 when classifying customers as being involved in financial crime or not. The results indicate that the time series models detect patterns in transaction data that connect customers to financial crime which previously investigated methods have not been able to find. machine learning deep learning financial crime time series time series classification XGBoost maskininlärning djupinlärning finansiell brottslighet tidsserier klassificering av tidsserier XGBoost Computer and Information Sciences Data- och informationsvetenskap
20	Classification de séries temporelles avec applications en télédétection / Time Series Classification Algorithms with Applications in Remote Sensing Bailly, Adeline 25 May 2018 (has links) La classification de séries temporelles a suscité beaucoup d’intérêt au cours des dernières années en raison de ces nombreuses applications. Nous commençons par proposer la méthode Dense Bag-of-Temporal-SIFT-Words (D-BoTSW) qui utilise des descripteurs locaux basés sur la méthode SIFT, adaptés pour les données en une dimension et extraits à intervalles réguliers. Des expériences approfondies montrent que notre méthode D-BoTSW surpassent de façon significative presque tous les classificateurs de référence comparés. Ensuite, nous proposons un nouvel algorithmebasé sur l’algorithme Learning Time Series Shapelets (LTS) que nous appelons Adversarially- Built Shapelets (ABS). Cette méthode est basée sur l’introduction d’exemples adversaires dans le processus d’apprentissage de LTS et elle permet de générer des shapelets plus robustes. Des expériences montrent une amélioration significative de la performance entre l’algorithme de base et notre proposition. En raison du manque de jeux de données labelisés, formatés et disponibles enligne, nous utilisons deux jeux de données appelés TiSeLaC et Brazilian-Amazon. / Time Series Classification (TSC) has received an important amount of interest over the past years due to many real-life applications. In this PhD, we create new algorithms for TSC, with a particular emphasis on Remote Sensing (RS) time series data. We first propose the Dense Bag-of-Temporal-SIFT-Words (D-BoTSW) method that uses dense local features based on SIFT features for 1D data. Extensive experiments exhibit that D-BoTSW significantly outperforms nearly all compared standalone baseline classifiers. Then, we propose an enhancement of the Learning Time Series Shapelets (LTS) algorithm called Adversarially-Built Shapelets (ABS) based on the introduction of adversarial time series during the learning process. Adversarial time series provide an additional regularization benefit for the shapelets and experiments show a performance improvementbetween the baseline and our proposed framework. Due to the lack of available RS time series datasets,we also present and experiment on two remote sensing time series datasets called TiSeLaCand Brazilian-Amazon Classification des séries temporelles Sac-de-mots Shapelets de séries temporelles Réseaux de neurones convolutionnels Exemples adversaires SIFT Time Series Classification Bag-of-Words Time Series Shapelets Convolutional Neural Networks Adversarial Examples SIFT features 004.2 510

Search results