Global ETD Search

21	Sparse Representation and its Application to Multivariate Time Series Classification Sani, Habiba M. January 2022 (has links) In signal processing field, there are various measures that can be employed to analyse and represent the signal in order to obtain meaningful outcome. Sparse representation (SR) has continued to receive great attention as one of the well-known tools in statistical theory which among others, is used to extract specific latent temporal features that can reveal salient primitive and sparsely represented features of complex data signals, including temporal data analysis. Under reasonable conditions, many signals are assumed to be sparse within a domain, such as spatial, time, or timefrequency domain, and this sparse characteristics of such signals can be obtained through the SR. The ECG signal, for instance, is typically a temporal sparse signal, comprises of various periodic activities such as time delay and frequency amplitudes, plus additive noise and possible interference. Particularly challenging in signal processing, especially time series signals is how to reconstruct and extract the various features that characterized the signal. Many problems (e.g., signal components analysis, feature extraction/selection in signals, signal reconstruction, and classification) can be formulated as linear models and solved using the SR technique The reconstruction of signals through SR can offer a rich representation of the sparsified temporal structure of the original signal. Due to its numerous advantages, such as noise tolerance and widespread use in various signal processing tasks, this has motivated many researchers to adopt the use of this technique for various signal representation analysis for a better and richer representation of the original input signal. In line with this, therefore, the goal of this study is to propose a SR-based mathematical framework and a coherence function for reconstruction and feature extraction from signals for subsequent analysis. The time embedding principle was first applied to restructure the signal into tine delay vectors and then the proposed approach, referred to as temporal subsequence SR approach was used to reconstruct the noisy signals and provides a sparsified time dependent input signal representation, and then the coherence function is further used to compute and extract the correlational coefficient quantities between the temporal subsequence signals to form the final feature vectors representing the discriminative features for each of the signal. These final feature vectors representing the signal are further used as inputs to machine learning classifiers. Experiments are carried out to illustrate the usefulness of the proposed methods and to assess their impact on the classification performance of the SVM and MLP classifiers using the popular and widely used ECG time series benchmark dataset. This research study supports the general hypothesis that, signal reconstruction methods (datadriven approach) can be valuable in learning compact features from the original signals for classifications. Sparse Representation (SR) Multivariate time series Classification Signal processing SR-based mathematical framework
22	Contributions to Efficient Statistical Modeling of Complex Data with Temporal Structures Hu, Zhihao 03 March 2022 (has links) This dissertation will focus on three research projects: Neighborhood vector auto regression in multivariate time series, uncertainty quantification for agent-based modeling networked anagrams, and a scalable algorithm for multi-class classification. The first project studies the modeling of multivariate time series, with the applications in the environmental sciences and other areas. In this work, a so-called neighborhood vector autoregression (NVAR) model is proposed to efficiently analyze large-dimensional multivariate time series. The time series are assumed to have underlying distances among them based on the inherent setting of the problem. When this distance matrix is available or can be obtained, the proposed NVAR method is demonstrated to provides a computationally efficient and theoretically sound estimation of model parameters. The performance of the proposed method is compared with other existing approaches in both simulation studies and a real application of stream nitrogen study. The second project focuses on the study of group anagram games. In a group anagram game, players are provided letters to form as many words as possible. In this work, the enhanced agent behavior models for networked group anagram games are built, exercised, and evaluated under an uncertainty quantification framework. Specifically, the game data for players is clustered based on their skill levels (forming words, requesting letters, and replying to requests), the multinomial logistic regressions for transition probabilities are performed, and the uncertainty is quantified within each cluster. The result of this process is a model where players are assigned different numbers of neighbors and different skill levels in the game. Simulations of ego agents with neighbors are conducted to demonstrate the efficacy of the proposed methods. The third project aims to develop efficient and scalable algorithms for multi-class classification, which achieve a balance between prediction accuracy and computing efficiency, especially in high dimensional settings. The traditional multinomial logistic regression becomes slow in high dimensional settings where the number of classes (M) and the number of features (p) is large. Our algorithms are computing efficiently and scalable to data with even higher dimensions. The simulation and case study results demonstrate that our algorithms have huge advantage over traditional multinomial logistic regressions, and maintains comparable prediction performance. / Doctor of Philosophy / In many data-central applications, data often have complex structures involving temporal structures and high dimensionality. Modeling of complex data with temporal structures have attracted great attention in many applications such as enviromental sciences, network sciences, data mining, neuroscience, and economics. However, modeling such complex data is quite challenging due to large uncertainty and dimensionality of complex data. This dissertation focuses on modeling and prediction of complex data with temporal structures. Three different types of complex data are modeled. For example, the nitrogen of multiple streams are modeled in a joint manner, human actions in networked group anagrams are modeled and the uncertainty is quantified, and data with multiple labels are classified. Different models are proposed and they are demonstrated to be efficient through simulation and case study. Neighborhood Vector Autoregression Multivariate Time Series Uncertainty Quantification Agent-Based Modeling Multi-Class Prediction
23	Data-Driven Methods for Modeling and Predicting Multivariate Time Series using Surrogates Chakraborty, Prithwish 05 July 2016 (has links) Modeling and predicting multivariate time series data has been of prime interest to researchers for many decades. Traditionally, time series prediction models have focused on finding attributes that have consistent correlations with target variable(s). However, diverse surrogate signals, such as News data and Twitter chatter, are increasingly available which can provide real-time information albeit with inconsistent correlations. Intelligent use of such sources can lead to early and real-time warning systems such as Google Flu Trends. Furthermore, the target variables of interest, such as public heath surveillance, can be noisy. Thus models built for such data sources should be flexible as well as adaptable to changing correlation patterns. In this thesis we explore various methods of using surrogates to generate more reliable and timely forecasts for noisy target signals. We primarily investigate three key components of the forecasting problem viz. (i) short-term forecasting where surrogates can be employed in a now-casting framework, (ii) long-term forecasting problem where surrogates acts as forcing parameters to model system dynamics and, (iii) robust drift models that detect and exploit 'changepoints' in surrogate-target relationship to produce robust models. We explore various 'physical' and 'social' surrogate sources to study these sub-problems, primarily to generate real-time forecasts for endemic diseases. On modeling side, we employed matrix factorization and generalized linear models to detect short-term trends and explored various Bayesian sequential analysis methods to model long-term effects. Our research indicates that, in general, a combination of surrogates can lead to more robust models. Interestingly, our findings indicate that under specific scenarios, particular surrogates can decrease overall forecasting accuracy - thus providing an argument towards the use of 'Good data' against 'Big data'. / Ph. D. Multivariate Time Series Surrogates Generalized Linear Models Bayesian Sequential Analysis Computational Epidemiology
24	Extending the ROCKET Machine Learning algorithm to improve Multivariate Time Series classification / Utökning av maskininlärningsalgoritmen ROCKET för att förbättra dess multivariata tidsserieklassificering Solana i Carulla, Adrià January 2024 (has links) Medan normen i tidsserieklassificering (TSC) har varit att förbättra noggrannheten, har nya modeller med fokus på effektivitet nyligen fått uppmärksamhet. I synnerhet modeller som kallas ROCKET"(RandOm Convolutional KErnel Transform), som fungerar genom att slumpmässigt generera ett stort antal kärnor som används som funktionsextraktorer för att träna en enkel åsklassificerare, kan prestera lika bra som andra toppmoderna algoritmer, samtidigt som de har en betydande ökning i effektivitet. Även om ROCKET-modeller ursprungligen designades för Univariate Time Series (UTS), som definieras av en enda kanal eller sekvens, har dessa klassificerare också visat utmärkta resultat när de testats på Multivariate Time Series (MTS), där egenskaperna för tidsserien är spridda över flera kanaler. Därför är det av vetenskapligt intresse att utforska dessa modeller för att bedöma deras övergripande prestanda och om effektiviteten kan förbättras ytterligare. Nyligen genomförda studier presenterar en ny algoritm som kallas Sequential Feature Detachment (SFD) som, förutom ROCKET, avsevärt kan minska storleken på modellerna samtidigt som noggrannheten ökar något genom en sekventiell funktionsvalsteknik. Trots dessa anmärkningsvärda resultat var experimenten som ledde till slutsatserna begränsade till användningen av UTS, vilket lämnade utrymme för utforskningen av denna algoritm på MTS. Följaktligen undersöker denna studie hur man kan utnyttja ROCKET-algoritmer och SFD för att förbättra MTS-klassificeringsuppgifter vad gäller både effektivitet och noggrannhet, samtidigt som god tolkningsbarhet bibehålls som en begränsning. För att uppnå detta genomförs experiment på flera University of East Anglia (UEA) MTS-datauppsättningar, testar modellensembler, grupperar kanaler baserat på förutsägbarhet och undersöker kanalrelevanser tillsammans med SFD. Resultaten visar hur modellanpassning inte är en metod som kan öka noggrannheten i testuppsättningarna och hur förutsägbarheten för enskilda kanaler inte bibehålls längs datapartitioner. Det visas dock hur användning av SFD med MiniROCKET, en variant av ROCKET som inkluderar slumpmässiga kanalkombinationer, inte bara förbättrar klassificeringsresultaten, utan också ger ett statistiskt signifikant kanalrelevansmått. / While the norm in Time Series Classification (TSC) has been to improve accuracy, new models focusing on efficiency have recently been attracting attention. In particular, models known as ”ROCKET” (RandOm Convolutional KErnel Transform), which work by randomly generating a large number of kernels used as feature extractors to train a simple ridge classifier, can yield results as good as other state-of-the-art algorithms while presenting a significant increase in efficiency. Although ROCKET models were originally designed for Univariate Time Series (UTS), which are defined by a single channel or sequence, these classifiers have also shown excellent results when tested on Multivariate Time Series (MTS), where the characteristics of the time series are spread across multiple channels. Therefore, it is of scientific interest to explore these models to assess their overall performance and whether efficiency can be further improved. Recent studies present a novel algorithm named Sequential Feature Detachment (SFD) which, on top of ROCKET, can significantly reduce the model size while slightly increasing accuracy through a sequential feature selection technique. Despite these remarkable results, the experiments leading to the conclusions were limited to the use of UTS, leaving room for the exploration of this algorithm on MTS. Consequently, this thesis evaluates different strategies to implement ROCKET and SFD algorithms for MTS classification tasks, focusing not only on improving efficiency and accuracy, but also on adding interpretability to the classifier. To achieve this, experiments were conducted by testing model ensembles, grouping channels based on predictability, and examining channel relevances alongside SFD. The University of East Anglia (UEA) MTS archive was used to evaluate the resulting models, as it is common with TSC algorithms. The results demonstrate that model ensembling does not increase accuracy in the test sets and that the predictability of individual channels is not maintained across dataset splits. However, the study shows that using SFD with MiniROCKET, a variant of ROCKET that includes random channel combinations, not only can improve classification results but also provide a statistically significant channel relevance measure. Time Series Classification ROCKET Multivariate Time Series Tidsserieklassificering ROCKET Multivariate tidsserier Computer and Information Sciences Data- och informationsvetenskap
25	A study of demand forecasting cashew trade in CearÃ through multivariate time series / Um Estudo da previsÃo de demanda da castanha de caju no comÃrcio exterior cearense atravÃs de sÃries temporais multivariadas Diego Duarte Lima 14 June 2013 (has links) nÃo hÃ / The application of time series in varius areas such as engineering, logistics, operations research and economics, aims to provide the knowledge of the dependency between observations, trends, seasonality and forecasts. Considering the lack of effective supporting methods od logistics planning in the area of foreign trade, the multivariate models habe been presented and used in this work, in the area of time series: vector autoregression (VAR), vector autoregression moving-average (VARMA) and state-space integral equation (SS). These models were used for the analysis of demand forecast, the the bivariate series of value and volume of cashew nut exports from CearÃ from 1996 to 2012. The results showed that the model state space was more successful in predicting the variables value and volume over the period that goes from january to march 2013, when compared to other models by the method of root mean squared error, getting the lowest values for those criteria. / A aplicaÃÃo de sÃries temporais em diversas Ãreas como engenharia, logÃstica, pesquisa operacional e economia, tem como objetivo o conhecimento da dependÃncia entre dados, suas possÃveis tendÃncias, sazonalidades e a previsÃo de dados futuros. Considerando a carÃncia de mÃtodos eficazes de suporte ao planejamento logÃstico na Ãrea de comÃrcio exterior, neste trabalho foram apresentados e utilizados os modelos multivariados, na Ãrea de sÃries temporais: auto-regressivo vetorial (VAR), auto-regressivomÃdias mÃveis vetorial (ARMAV) e espaÃo de estados (EES). Estes modelos foram empregados para a anÃlise de previsÃo de demanda, da sÃrie bivaria de valor e volume das exportaÃÃes cearenses de castanha de caju no perÃodo de 1996 Ã 2012. Os resultados mostraram que o modelo espaÃo de estados foi mais eficiente na previsÃo das variÃveis valor e volume ao longo do perÃodo janeiro Ã marÃo de 2013, quando comparado aos demais modelos pelo mÃtodo da raiz quadrada do erro mÃdio quadrÃtico, obtendo os menores valores para o referido critÃrio. LogÃstica portuÃria MÃtodos de previsÃo de demanda foreign trade in cearÃ demand forecasting methods multivariate time series port logistics PESQUISA OPERACIONAL
26	Interest rates modeling for insurance : interpolation, extrapolation, and forecasting / Modélisation des taux d'intérêt en assurance : interpolation, extrapolation, et prédiction Moudiki, Thierry 05 July 2018 (has links) L'ORSA Own Risk Solvency and Assessment est un ensemble de règles définies par la directive européenne Solvabilité II. Il est destiné à servir d'outil d'aide à la décision et d'analyse stratégique des risques. Dans le contexte de l'ORSA, les compagnies d'assurance doivent évaluer leur solvabilité future, de façon continue et prospective. Pour ce faire, ces dernières doivent notamment obtenir des projections de leur bilan (actif et passif) sur un certain horizon temporel. Dans ce travail de thèse, nous nous focalisons essentiellement sur l'aspect de prédiction des valeurs futures des actifs. Plus précisément, nous traitons de la courbe de taux, de sa construction et de son extrapolation à une date donnée, et de ses prédictions envisagées dans le futur. Nous parlons dans le texte de "courbe de taux", mais il s'agit en fait de construction de courbes de facteurs d'actualisation. Le risque de défaut de contrepartie n'est pas explicitement traité, mais des techniques similaires à celles développées peuvent être adaptées à la construction de courbe de taux incorporant le risque de défaut de contrepartie / The Own Risk Solvency and Assessment (ORSA) is a set of processes defined by the European prudential directive Solvency II, that serve for decision-making and strategic analysis. In the context of ORSA, insurance companies are required to assess their solvency needs in a continuous and prospective way. For this purpose, they notably need to forecast their balance sheet -asset and liabilities- over a defined horizon. In this work, we specifically focus on the asset forecasting part. This thesis is about the Yield Curve, Forecasting, and Forecasting the Yield Curve. We present a few novel techniques for the construction, the extrapolation of static curves (that is, curves which are constructed at a fixed date), and for forecasting the spot interest rates over time. Throughout the text, when we say "Yield Curve", we actually mean "Discount curve". That is: we ignore the counterparty credit risk, and consider that the curves are risk-free. Though, the same techniques could be applied to construct/forecast the actual risk-free curves and credit spread curves, and combine both to obtain pseudo- discount curves incorporating the counterparty credit risk Taux d'intérêt Courbe de taux Apprentissage Statistique Prédiction Séries temporelles Interest rates Yield curve Machine learning Forecasting Multivariate time series 650
27	Uniform interval normalization : Data representation of sparse and noisy data sets for machine learning Sävhammar, Simon January 2020 (has links) The uniform interval normalization technique is proposed as an approach to handle sparse data and to handle noise in the data. The technique is evaluated transforming and normalizing the MoodMapper and Safebase data sets, the predictive capabilities are compared by forecasting the data set with aLSTM model. The results are compared to both the commonly used MinMax normalization technique and MinMax normalization with a time2vec layer. It was found the uniform interval normalization performed better on the sparse MoodMapper data set, and the denser Safebase data set. Future works consist of studying the performance of uniform interval normalization on other data sets and with other machine learning models. Multivariate time series forecasting machine learning LSTM data representation fuzzification Information Systems, Social aspects
28	Evaluating the use of Brush and Tooltip for Time Series visualizations: A comparative study Helin, Sebastian, Eklund, André January 2023 (has links) This study uses a combination of user testing and analysis to evaluate the impact of brush and tooltip on the comprehension of time series visualizations. Employing a sequential mixed-methods approach, with qualitative data from semi-structured interviews used to inform the design of a visualization tool, followed by a quantitative user study to validate it. Sixteen (16) participants from various fields of study, predominantly computer science, participated in the study. A MANOVA test was conducted with results indicating a significant statistical difference between the groups. Results deriving from the study show that the use of brush and tooltip increases user accuracy on detecting outliers, as for perception of trends and patterns. The study’s context was limited to desktop usage, and all participants were treated as a homogenous group, presenting potential limitations in applying these findings to other devices or more diverse user groups. The results provide information about improving time series data visualizations for facilitating more efficient and effective understanding, which can be relevant specifically to data analysts and academic researchers. Other Engineering and Technologies Annan teknik
29	Machine learning for complex evaluation and detection of combustion health of Industrial Gas turbines Mshaleh, Mohammad January 2024 (has links) This study addresses the challenge of identifying anomalies within multivariate time series data, focusing specifically on the operational parameters of gas turbine combustion systems. In search of an effective detection method, the research explores the application of three distinct machine learning methods: the Long Short-Term Memory (LSTM) autoencoder, the Self-Organizing Map (SOM), and the Density-Based Spatial Clustering of Applications with Noise (DBSCAN). Through the experiment, these models are evaluated to determine their efficacy in anomaly detection. The findings show that the LSTM autoencoder not only surpasses its counterparts in performance metrics but also shows a unique capability to identify the underlying causes of detected anomalies. This paper delves into the comparative analysis of these techniques and discusses the implications of the models in maintaining the reliability and safety of gas turbine operations. Anomaly detection Semi-supervised learning Multivariate time-series Combustion systems Ethical AI Computer Sciences Datavetenskap (datalogi) Computer Engineering Datorteknik
30	Sur la validation des modèles de séries chronologiques spatio-temporelles multivariées Saint-Frard, Robinson 06 1900 (has links) Dans ce mémoire, nous avons utilisé le logiciel R pour la programmation. / Le présent mémoire porte sur les séries chronologiques qui en plus d’être observées dans le temps, présentent également une composante spatiale. Plus particulièrement, nous étudions une certaine classe de modèles, les modèles autorégressifs spatio-temporels généralisés, ou GSTAR. Dans un premier temps, des liens sont effectués avec les modèles vectoriels autorégressifs (VAR). Nous obtenons explicitement la distribution asymptotique des autocovariances résiduelles pour les modèles GSTAR en supposant que le terme d’erreur est un bruit blanc gaussien, ce qui représente une première contribution originale. De ce résultat, des tests de type portemanteau sont proposés, dont les distributions asymptotiques sont étudiées. Afin d’illustrer la performance des statistiques de test, une étude de simulations est entreprise où des modèles GSTAR sont simulés et correctement ajustés. La méthodologie est illustrée avec des données réelles. Il est question de la production mensuelle de thé en Java occidental pour 24 villes, pour la période janvier 1992 à décembre 1999. / In this master thesis, time series models are studied, which have also a spatial component, in addition to the usual time index. More particularly, we study a certain class of models, the Generalized Space-Time AutoRegressive (GSTAR) time series models. First, links are considered between Vector AutoRegressive models(VAR) and GSTAR models. We obtain explicitly the asymptotic distribution of the residual autocovariances for the GSTAR models, assuming that the error term is a Gaussian white noise, which is a first original contribution. From that result, test statistics of the portmanteau type are proposed, and their asymptotic distributions are studied. In order to illustrate the behaviour of the test statistics, a simulation study is conducted where GSTAR models are simulated and correctly fitted. The methodology is illustrated with monthly real data concerning the production of tea in west Java for 24 cities from the period January 1992 to December 1999. Séries chronologiques multivariées statistiques portemanteaux paramétrisation structurée Multivariate time series Portmanteau test statistic Residual autocovariance matrices Structured parameterization

Search results