Global ETD Search

1	Adaptively-Halting RNN for Tunable Early Classification of Time Series Hartvigsen, Thomas 11 November 2018 (has links) Early time series classification is the task of predicting the class label of a time series before it is observed in its entirety. In time-sensitive domains where information is collected over time it is worth sacrificing some classification accuracy in favor of earlier predictions, ideally early enough for actions to be taken. However, since accuracy and earliness are contradictory objectives, a solution to this problem must find a task-dependent trade-off. There are two common state-of-the-art methods. The first involves an analyst selecting a timestep at which all predictions must be made. This does not capture earliness on a case-by-case basis, so if the selecting timestep is too early, all later signals are missed, and if a signal happens early, the classifier still waits to generate a prediction. The second method is the exhaustive search for signals, which encodes no timing information and is not scalable to high dimensions or long time series. We design the first early classification model called EARLIEST to tackle this multi-objective optimization problem, jointly learning (1) to decide at which time step to halt and generate predictions and (2) how to classify the time series. Each of these is learned based on the task and data features. We achieve an analyst-controlled balance between the goals of earliness and accuracy by pairing a recurrent neural network that learns to classify time series as a supervised learning task with a stochastic controller network that learns a halting-policy as a reinforcement learning task. The halting-policy dictates sequential decisions, one per timestep, of whether or not to halt the recurrent neural network and classify the time series early. This pairing of networks optimizes a global objective function that incorporates both earliness and accuracy. We validate our method via critical clinical prediction tasks in the MIMIC III database from the Beth Israel Deaconess Medical Center along with another publicly available time series classification dataset. We show that EARLIEST out-performs two state-of-the-art LSTM-based early classification methods. Additionally, we dig deeper into our model's performance using a synthetic dataset which shows that EARLIEST learns to halt when it observes signals without having explicit access to signal locations. The contributions of this work are three-fold. First, our method is the first neural network-based solution to early classification of time series, bringing the recent successes of deep learning to this problem. Second, we present the first reinforcement-learning based solution to the unsupervised nature of early classification, learning the underlying distributions of signals without access to this information through trial and error. Third, we propose the first joint-optimization of earliness and accuracy, allowing learning of complex relationships between these contradictory goals.
2	Difference Histograms: A new tool for time series analysis applied to bearing fault diagnosis "van Wyk, BJ, van Wyk, MA, Qi,G 24 December 2008 (has links) Abstract A powerful tool for bearing time series feature extraction and classification is introduced that is computationally inexpensive, easy to implement and suitable for real-time applications. In this paper the proposed technique is applied to two rolling element bearing time series classification problems and shown that in some cases no data pre-processing, artificial neural network or nearest neighbour approaches are required. From the results obtained it is clear that for the specific applications considered, the proposed method performed as well as or better than alternative approaches based on conventional feature extraction. Time series classification Bearing fault diagnosis
3	Concatenated Decision Paths Classification for Time Series Shapelets - A New Approach for One Dimensional Data Classification and its Application Mitzev, Ivan Stefanov 04 May 2018 (has links) Time series are very common in presenting collected data such as economic indicators, natural phenomenon, control engineering data, among others. In the last decade, the interest in time series data mining increased as the amount of collected data increased dramatically. Standard approaches for time series classification are based on collecting distance measures, such as the Euclidian distance (ED) and dynamic time warping (DTW) along with 1-NN classifier for further classification. Recently, more advanced types of classification were found, introducing primitives (named time series shapelet) that consistently represent a certain class. The time series shapelet is a small sub-section of the entire time series, which is “particularly discriminating”. It appears that shapelets based classification produces higher accuracies on some data sets, based on the fact that the global features are more sensitive to noise than locals. Despite its advantages, the time series shapelets classification has an apparent disadvantage: very slow training time. This work attempts to improve the training time for the originally proposed time series shapelets classification algorithm and introduces a new approach for time series classification based on concatenated decision tree paths. First, the classical algorithm for time series classification based on shapelets, is significantly improved in terms of the training time. The improvement is based on using randomly generated sequences tuned in a particle-swarm-optimization (PSO) environment, instead of using sub-series from the original time series. Second, a new highly accurate classification method, based on concatenated decision tree paths, is introduced. The approach builds a unique representative pattern of a certain class based on the taken paths in a pool of decision trees. Third, the proposed method has been successfully extended for a 2-class-labels classification problem where only one decision tree can be built. A variety of 2-class-labels decision trees were built based on different splitting criterion (distance to a random shapelet); thus- increasing the pool of decision trees and increasing the overall accuracy. Fourth, the proposed method was successfully applied on two classes image classification problem, by converting the image into time series. An accuracy of around 95% was achieved for the pedestrian detection case from the Daimler database. machine learning time series classification shapelets
4	TEST ORACLE AUTOMATION WITH MACHINE LEARNING : A FEASIBILITY STUDY Imamovic, Nermin January 2018 (has links) The train represents a complex system, where every sub-system has an important role. If a subsystem doesn’t work how it should, the correctness of whole the train can be uncertain. To ensure that system works properly, we should test each sub-system individually and integrate them together in the whole system. Each of these subsystems consists of the different modules with different functionalities what should be tested. Testing of different functionalities often requires a different approach. For some functionalities, it is necessary domain knowledge from the human expert, such as classification of signals in different use cases in Propulsion and Controls (PPC) in Bombardier Transportation. Due to this reason, we need to simulate of using experts knowledge in the certain domain. We are investigating the use of machine learning techniques for solving this cases and creating system what will automatically classify different signals using the previous human knowledge. This case study is conducted in Bombardier Transportation (BT), Västerås in departments Train Control Management System (TCMS) and Propulsion and Controls (PPC), where data is collected, analyzed and evaluated. We proposed a method for solving the oracle problem based on machine learning approach for different for certain use case. Also, we explained different steps what can be used for solving the test oracle problem where signals are part of verdict process Test oracle automation the oracle problem machine learning classification feature engineering signal classification time-series analysis time-series classification multivariate time-series classification Computer Systems Datorsystem
5	The Application of Machine Learning Techniques in Flight Test Applications Cooke, Alan, Melia, Thomas, Grayson, Siobhan 11 1900 (has links) This paper discusses the use of diagnostics based on machine learning (ML) within a flight test context. The paper begins by discussing some of the problems associated with instrumenting a test aircraft and how they could be ameliorated using ML-based diagnostics. We then describe a number of types of supervised ML algorithms which can be used in this context. In addition, key practical aspects of applying these algorithms, such as feature engineering and parameter selection, are also discussed. The paper then outlines a real-world application developed by Curtiss-Wright, called Machine Learning for Advanced System Diagnostics (MLASD). This description includes key challenges that were encountered during the development process and how suitable input features were identified. Real-world results are also presented. Finally, we suggest some further applications of ML techniques, in addition to describing other areas of development. FTI Machine Learning Time Series Classification Anomaly Detection Resource Constrained Environments
6	Bug Prediction with Machine Learning : Bloodhound 0.1 Rehnholm, Gustav, Rysjö, Felix January 2021 (has links) Introduction Bugs in software is a problem that grows over time if they are not dealt with in an early stage, therefore it is desirable to find bugs as early as possible. Bugs usually correlate with low software quality, which can be measured with different code metrics. The goal of this thesis is to find out if machine learning can be used to predict bugs, using code metric trends. Method To achieve the thesis goal a program was developed, which will be called Bloodhound, that analyses code metric trends to predict bugs using the machine learning algorithm k nearest neighbour. The code metrics required to do so is extracted using the program cdbs, which in turn uses the program SonarQube to create the source code metrics. Results Bloodhound were trained with a time-frame of 42 days between the dates June 1, 2016 to July 13, 2016 containing 202 commits and 312 changed files from the JabRef repository. The files were changed on average 1.5 times. Bloodhound never found more than 25% of the bugs and of its bug predictions, was right at most 42% of the time. Conclusion Bloodhound did not succeed in predicting bugs. But that was most likely because the time frame was too short to generate any significant trends. Bug prediction Machine learning Time series classification Computer Sciences Datavetenskap (datalogi)
7	IMBALANCED TIME SERIES FORECASTING AND NEURAL TIME SERIES CLASSIFICATION Chen, Xiaoqian 01 August 2023 (has links) (PDF) This dissertation will focus on the forecasting and classification of time series. Specifically, the forecasting problem will focus on imbalanced time series (ITS) which contain a mix of a mix of low probability extreme observations and high probability normal observations. Two approaches are proposed to improve the forecasting of ITS. In the first approach proposed in chapter 2, an ITS will be modelled as a composition of normal and extreme observations, the input predictor variables and the associated forecast output will be combined into moving blocks, and the blocks will be categorized as extreme event (EE) or normal event (NE) blocks. Imbalance will be decreased by oversampling the minority EE blocks and undersampling the majority NE blocks using modifications of block bootstrapping and synthetic minority oversampling technique (SMOTE). Convolution neural networks (CNNs) and long-short term memory (LSTMs) will be selected for forecast modelling. In the second approach described in chapter 3, which focuses on improving the forecasting accuracies LSTM models, a training strategy called Circular-Shift Circular Epoch Training (CSET), is proposed to preserve the natural ordering of observations in epochs during training without any attempt to balance the extreme and normal observations. The strategy will be universal because it could be applied to train LSTMs to forecast events in normal time series or in imbalanced time series in exactly the same manner. The CSET strategy will be formulated for both univariate and multivariate time series forecasting. The classification problem will focus on the classification event-related potential neural time series by exploiting information offered by the cone of influence (COI) of the continuous wavelet transform (CWT). The COI is a boundary that is superimposed on the wavelet scalogram to delineate the coefficients that are accurate from those that are inaccurate due to edge effects. The features derived from the inaccurate coefficients are, therefore, unreliable. It is hypothesized that the classifier performance would improve if unreliable features, which are outside the COI, are zeroed out, and the performance would improve even further if those features are cropped out completely. Two CNN multidomain models will be introduced to fuse the multichannel Z-scalograms and the V-scalograms. In the first multidomain model, referred to as the Z-CuboidNet, the input to the CNN will be generated by fusing the Z-scalograms of the multichannel ERPs into a frequency-time-spatial cuboid. In the second multidomain model, referred to as the V-MatrixNet, the CNN input will be formed by fusing the frequency-time vectors of the V-scalograms of the multichannel ERPs into a frequency-time-spatial matrix. CNN cone of influence Deep learning imbalanced time series forecasting LSTM neural time series classification
8	Banger for the Buck : Predicting Growth of Music Tracks using Machine Learning / En sång för slanten Nilsson, Elliot, Wensink, Liza January 2022 (has links) The advent of music streaming has made it increasingly important for actors in the music industry to understand if tracks are going to succeed or not. This study investigates if it is possible to accurately classify the growth of the listener base of a music track based on multivariate time series with listener behavior data. 18 popular time series classification algorithms were used to build predictive models which were evaluated in a 10-fold cross-validation. We also examined the algorithms’ potential to deliver business value for a record label. Lastly, the possibilities and challenges of applying a data-driven business model in the music industry were investigated by performing a comparative analysis of a modern and traditional record label. Six algorithms were found to significantly outperform the baseline. Two algorithms based on convolutional kernels, RR and AMini, were found to present the biggest business value because of their accuracy and low time complexity. While it may be necessary for record labels to adopt data-driven business models to flourish in the modern market, there are difficulties regarding the competitiveness of digital solutions and complications in moving the focus from networking to developing technology. / Spridningen av musiktjänster har gjort det alltmer viktigt för aktörer i musikbranschen att förstå vilka låtar som kommer att lyckas och inte. Denna studie undersöker om det är möjligt att klassificera tillväxten av en låts lyssnarantal baserat på multivariata tidsserier innehållandes data om lyssnarbeteende. 18 populära algoritmer för tidsserieklassificering användes för att bygga prediktiva modeller som utvärderades med 10-delad korsvalidering. Vi undersökte sedan algoritmernas potential att skapa affärsvärde för ett skivbolag. Slutligen studerades möjligheter och utmaningar som datadrivna affärsmodeller presenterar i denna bransch genom en komparativ analys av ett modernt och traditionellt skivbolag. Sex algoritmer visade sig signifikant överträffa en baslinjeklassificerare. Vi fann att två algoritmer baserade på faltningskärnor, RR och AMini, kunde skapa störst affärsvärde på grund av deras noggrannhet samt låga tidskomplexitet. Det verkar vara nödvändigt för skivbolag att anamma datadrivna affärsmodeller för att frodas i den moderna marknaden, men det finns svårigheter som måste beaktas vad gäller konkurrenskraften för digitala lösningar samt förflyttandet av fokuset från nätverksbyggande till teknologiutveckling. Time series classification Multivariate time series Music industry Record label business model. Computer Sciences Datavetenskap (datalogi)
9	Extending the ROCKET Machine Learning algorithm to improve Multivariate Time Series classification / Utökning av maskininlärningsalgoritmen ROCKET för att förbättra dess multivariata tidsserieklassificering Solana i Carulla, Adrià January 2024 (has links) Medan normen i tidsserieklassificering (TSC) har varit att förbättra noggrannheten, har nya modeller med fokus på effektivitet nyligen fått uppmärksamhet. I synnerhet modeller som kallas ROCKET"(RandOm Convolutional KErnel Transform), som fungerar genom att slumpmässigt generera ett stort antal kärnor som används som funktionsextraktorer för att träna en enkel åsklassificerare, kan prestera lika bra som andra toppmoderna algoritmer, samtidigt som de har en betydande ökning i effektivitet. Även om ROCKET-modeller ursprungligen designades för Univariate Time Series (UTS), som definieras av en enda kanal eller sekvens, har dessa klassificerare också visat utmärkta resultat när de testats på Multivariate Time Series (MTS), där egenskaperna för tidsserien är spridda över flera kanaler. Därför är det av vetenskapligt intresse att utforska dessa modeller för att bedöma deras övergripande prestanda och om effektiviteten kan förbättras ytterligare. Nyligen genomförda studier presenterar en ny algoritm som kallas Sequential Feature Detachment (SFD) som, förutom ROCKET, avsevärt kan minska storleken på modellerna samtidigt som noggrannheten ökar något genom en sekventiell funktionsvalsteknik. Trots dessa anmärkningsvärda resultat var experimenten som ledde till slutsatserna begränsade till användningen av UTS, vilket lämnade utrymme för utforskningen av denna algoritm på MTS. Följaktligen undersöker denna studie hur man kan utnyttja ROCKET-algoritmer och SFD för att förbättra MTS-klassificeringsuppgifter vad gäller både effektivitet och noggrannhet, samtidigt som god tolkningsbarhet bibehålls som en begränsning. För att uppnå detta genomförs experiment på flera University of East Anglia (UEA) MTS-datauppsättningar, testar modellensembler, grupperar kanaler baserat på förutsägbarhet och undersöker kanalrelevanser tillsammans med SFD. Resultaten visar hur modellanpassning inte är en metod som kan öka noggrannheten i testuppsättningarna och hur förutsägbarheten för enskilda kanaler inte bibehålls längs datapartitioner. Det visas dock hur användning av SFD med MiniROCKET, en variant av ROCKET som inkluderar slumpmässiga kanalkombinationer, inte bara förbättrar klassificeringsresultaten, utan också ger ett statistiskt signifikant kanalrelevansmått. / While the norm in Time Series Classification (TSC) has been to improve accuracy, new models focusing on efficiency have recently been attracting attention. In particular, models known as ”ROCKET” (RandOm Convolutional KErnel Transform), which work by randomly generating a large number of kernels used as feature extractors to train a simple ridge classifier, can yield results as good as other state-of-the-art algorithms while presenting a significant increase in efficiency. Although ROCKET models were originally designed for Univariate Time Series (UTS), which are defined by a single channel or sequence, these classifiers have also shown excellent results when tested on Multivariate Time Series (MTS), where the characteristics of the time series are spread across multiple channels. Therefore, it is of scientific interest to explore these models to assess their overall performance and whether efficiency can be further improved. Recent studies present a novel algorithm named Sequential Feature Detachment (SFD) which, on top of ROCKET, can significantly reduce the model size while slightly increasing accuracy through a sequential feature selection technique. Despite these remarkable results, the experiments leading to the conclusions were limited to the use of UTS, leaving room for the exploration of this algorithm on MTS. Consequently, this thesis evaluates different strategies to implement ROCKET and SFD algorithms for MTS classification tasks, focusing not only on improving efficiency and accuracy, but also on adding interpretability to the classifier. To achieve this, experiments were conducted by testing model ensembles, grouping channels based on predictability, and examining channel relevances alongside SFD. The University of East Anglia (UEA) MTS archive was used to evaluate the resulting models, as it is common with TSC algorithms. The results demonstrate that model ensembling does not increase accuracy in the test sets and that the predictability of individual channels is not maintained across dataset splits. However, the study shows that using SFD with MiniROCKET, a variant of ROCKET that includes random channel combinations, not only can improve classification results but also provide a statistically significant channel relevance measure. Time Series Classification ROCKET Multivariate Time Series Tidsserieklassificering ROCKET Multivariate tidsserier Computer and Information Sciences Data- och informationsvetenskap
10	Анализ и обработка данных окулографии методом машинного обучения для временных рядов : магистерская диссертация / Analysis and Processing of Oculography Data Using Machine Learning Methods for Time Series Трокин, М. А., Trokin, M. A. January 2024 (has links) Работа посвящена решению актуальной задачи классификации многомерных временных рядов данных окулографии методом машинного обучения для диагностики дислексии. Дислексия распространенное заболевание, его имеет каждый десятый из популяции, и ранняя его диагностика позволяет предотвратить его последствия, а также улучшить качество жизни этих людей. Современные методы классификации данных окулографии позволяют добиться высокой точности диагностики данного заболевания, однако не используют сырые данные айтрекоров, представляющие из себя параметры перемещения глаз. В данной работе изучены сырые данные о положении глаз испытуемых шведского лонгитюдного проекта, исследовавшего дефекты чтения у детей, предложен метод k–NN с динамической трансформацией времени для классификации многомерных временных рядов окулографических данных, предложены метрики для оценки работы модели, подобраны оптимальные гиперпараметры, а также проанализированы ошибки построенного классификатора. / The graduate qualification work is dedicated to solving the current task of classifying multivariate time series oculographic data using machine learning methods for diagnosing dyslexia. Dyslexia is a widespread disorder, affecting one in ten individuals in the population, and early diagnosis can prevent its consequences and improve the quality of life for these individuals. Modern methods for classifying oculographic data achieve high diagnostic accuracy for this condition, but they do not utilize raw eye-tracking data, which includes parameters of eye movements. In this study, raw eye-tracking data from the Swedish longitudinal project investigating reading disabilities in children were analyzed. A k-NN method with dynamic time warping (DTW) was proposed for classifying multivariate time series oculographic data. Metrics for evaluating the model's performance were proposed, optimal hyperparameters were selected, and the errors of the constructed classifier were analyzed. MASTER'S THESIS OCULOGRAPHY DATA CLASSIFICATION MULTIVARIATE TIME SERIES CLASSIFICATION OCULOGRAPHY K-NN WITH DYNAMIC TIME WARPING TIME SERIES CLASSIFICATION ОКУЛОГРАФИЯ

Search results