Global ETD Search

41	Exploratory and predictive methods for multivariate time series data analysis in healthcare Aumon, Adrien Andréas 08 1900 (has links) Ce mémoire s'inscrit dans l'émergente globalisation de l'intelligence artificielle aux domaines de la santé. Par le biais de l'application d'algorithmes modernes d'apprentissage automatique à deux études de cas concrètes, l'objectif est d'exposer de manière rigoureuse et intelligible aux experts de la santé comment l'intelligence artificielle exploite des données cliniques à la fois multivariées et longitudinales à des fins de visualisation et de prognostic de populations de patients en situation d'urgence médicale. Nos résultats montrent que la récente méthode de réduction de la dimensionalité PHATE couplée à un algorithme de regroupement surpasse d'autres méthodes plus établies dans la projection en deux dimensions de trajectoires multidimensionelles et aide ainsi les experts à mieux visualiser l'évolution de certaines sous-populations. Nous mettons aussi en évidence l'efficacité des réseaux de neurones récurrents traditionnels et conditionnels dans le prognostic précoce de patients malades. Enfin, nous évoquons l'analyse topologique de données comme piste de solution adéquate aux problèmes usuels de données incomplètes et irrégulières auxquels nous faisons face inévitablement au cours de la seconde étude de cas. / This thesis aligns with the trending globalization of artificial intelligence in healthcare. Through two real-world applications of recent machine learning approaches, our fundamental goal is to rigorously and intelligibly expose to the domain experts how artificial intelligence uses clinical multivariate time series to provide visualizations and predictions related to populations of patients in an emergency condition. Our results demonstrate that the recent dimensionality reduction tool PHATE combined with a clustering algorithm outperforms other more established methods in projecting multivariate time series in two dimensions and thus help the experts visualize sub-populations' trajectories. We also highlight traditional and conditional recurrent neural networks' proficiency in the early prognosis of ill patients. Finally, we allude to topological data analysis as a suitable solution to common problems related to data irregularities and incompleteness we inevitably face in the second case study. Santé Apprentissage automatique Données multivariées longitudinales Visualisation Prognostic Healthcare Machine Learning Multivariate Time Series Visualization Prognosis
42	[en] E-AUTOMFIS: INTERPRETABLE MODEL FOR TIME SERIES FORECASTING USING ENSEMBLE LEARNING OF FUZZY INFERENCE SYSTEM / [pt] E-AUTOMFIS: MODELO INTERPRETÁVEL PARA PREVISÃO DE SÉRIES MULTIVARIADAS USANDO COMITÊS DE SISTEMAS DE INFERÊNCIA FUZZY THIAGO MEDEIROS CARVALHO 17 June 2021 (has links) [pt] Por definição, a série temporal representa o comportamento de uma variável em função do tempo. Para o processo de previsão de séries, o modelo deve ser capaz de aprender a dinâmica temporal das variáveis para obter valores futuros. Contudo, prever séries temporais com exatidão é uma tarefa que vai além de escolher o modelo mais complexo, e portanto a etapa de análise é um processo fundamental para orientar o ajuste do modelo. Especificamente em problemas multivariados, o AutoMFIS é um modelo baseado na lógica fuzzy, desenvolvido para introduzir uma explicabilidade dos resultados através de regras semanticamente compreensíveis. Mesmo com características promissoras e positivas, este sistema possui limitações que tornam sua utilização impraticável em problemas com bases de dados com alta dimensionalidade. E com a presença cada vez maior de bases de dados mais volumosas, é necessário que a síntese automática de sistemas fuzzy seja adaptada para abranger essa nova classe de problemas de previsão. Por conta desta necessidade, a presente dissertação propõe a extensão do modelo AutoMFIS para a previsão de séries temporais com alta dimensionalidade, chamado de e-AutoMFIS. Apresentase uma nova metodologia, baseada em comitê de previsores, para o aprendizado distribuído de geração de regras fuzzy. Neste trabalho, são descritas as características importantes do modelo proposto, salientando as modificações realizadas para aprimorar tanto a previsão quanto a interpretabilidade do sistema. Além disso, também é avaliado o seu desempenho em problemas reais, comparando-se a acurácia dos resultados com as de outras técnicas descritas na literatura. Por fim, em cada problema selecionado também é considerado o aspecto da interpretabilidade, discutindo-se os critérios utilizados para a análise de explicabilidade. / [en] By definition, the time series represents the behavior of a variable as a time function. For the series forecasting process, the model must be able to learn the temporal dynamics of the variables in order to obtain consistent future values. However, an accurate time series prediction is a task that goes beyond choosing the most complex (or promising) model that is applicable to the type of problem, and therefore the analysis step is a fundamental procedure to guide the adaptation of a model. Specifically, in multivariate problems, AutoMFIS is a model based on fuzzy logic, developed not only to give accurate forecasts but also to introduce the explainability of results through semantically understandable rules. Even with such promising characteristics, this system has shown practical limitations in problems that involve datasets of high dimensionality. With the increasing demand formethods to deal with large datasets, it should be great that approaches for the automatic synthesis of fuzzy systems could be adapted to cover a new class of forecasting problems. This dissertation proposes an extension of the base model AutoMFIS modeling method for time series forecasting with high dimensionality data, named as e-AutoMFIS. Based on the Ensemble learning theory, this new methodology applies distributed learning to generate fuzzy rules. The main characteristics of the proposed model are described, highlighting the changes in order to improve both the accuracy and the interpretability of the system. The proposed model is also evaluated in different case studies, in which the results are compared in terms of accuracy against the results produced by other methods in the literature. In addition, in each selected problem, the aspect of interpretability is also assessed, which is essential for explainability evaluation. [pt] BASE DE DADOS [pt] COMITE DE PREVISORES [pt] PREVISAO DE SERIES MULTIVARIADAS [pt] INTERPRETABILIDADE [pt] SISTEMA DE INFERENCIA FUZZY [en] BIG DATA [en] ENSEMBLE METHOD [en] INTERPRETABILITY [en] FUZZY INFERENCE SYSTEM
43	Anomaly Detection in the EtherCAT Network of a Power Station : Improving a Graph Convolutional Neural Network Framework Barth, Niklas January 2023 (has links) In this thesis, an anomaly detection framework is assessed and fine-tuned to detect and explain anomalies in a power station, where EtherCAT, an Industrial Control System, is employed for monitoring. The chosen framework is based on a previously published Graph Neural Network (GNN) model, utilizing attention mechanisms to capture complex relationships between diverse measurements within the EtherCAT system. To address the challenges in graph learning and improve model performance and computational efficiency, the study introduces a novel similarity thresholding approach. This approach dynamically selects the number of neighbors for each node based on their similarity instead of adhering to a fixed 'k' value, thus making the learning process more adaptive and efficient. Further in the exploration, the study integrates Extreme Value Theory (EVT) into the framework to set the anomaly detection threshold and assess its effectiveness. The effect of temporal features on model performance is examined, and the role of seconds of the day as a temporal feature is notably highlighted. These various methodological innovations aim to refine the application of the attention based GNN framework to the EtherCAT system. The results obtained in this study illustrate that the similarity thresholding approach significantly improves the model's F1 score compared to the standard TopK approach. The inclusion of seconds of the day as a temporal feature led to modest improvements in model performance, and the application of EVT as a thresholding technique was explored, although it did not yield significant benefits in this context. Despite the limitations, including the utilization of a single-day dataset for training, the thesis provides valuable insights for the detection of anomalies in EtherCAT systems, contributing both to the literature and the practitioners in the field. It lays the groundwork for future research in this domain, highlighting key areas for further exploration such as larger datasets, alternative anomaly detection techniques, and the application of the framework in streaming data environments. / I denna avhandling utvärderas och finslipas ett ramverk för att detektera och förklara anomalier på ett kraftverk, där EtherCAT, ett industriellt styrsystem, används för övervakning. Det valda ramverket är baserat på en tidigare publicerad graf neurala nätverksmodell (GNN) som använder uppmärksamhetsmekanismer för att fånga komplexa samband mellan olika mätningar inom EtherCAT-systemet. För att hantera utmaningar inom grafiskt lärande och förbättra modellens prestanda och beräkningseffektivitet introducerar studien en ny metod för likhetsgränsdragning. Denna metod väljer dynamiskt antalet grannar för varje nod baserat på deras likhet istället för att hålla sig till ett fast 'k'-värde, vilket gör inlärningsprocessen mer anpassningsbar och effektiv. I en vidare undersökning integrerar studien extremvärdesteori (EVT) i ramverket för att sätta tröskeln för detektering av anomalier och utvärdera dess effektivitet. Effekten av tidsberoende egenskaper på modellens prestanda undersöks, och sekunder av dagen som en tidsberoende egenskap framhävs särskilt. Dessa olika metodologiska innovationer syftar till att förädla användningen av det uppmärksamhetsbaserade GNN-ramverket på EtherCAT-systemet. Resultaten som erhållits i denna studie illustrerar att likhetsgränsdragning väsentligt förbättrar modellens F1-poäng jämfört med den standardiserade TopK-metoden. Inkluderingen av sekunder av dagen som en tidsberoende egenskap ledde till blygsamma förbättringar i modellens prestanda, och användningen av EVT som en tröskelmetod undersöktes, även om den inte gav några betydande fördelar i detta sammanhang. Trots begränsningarna, inklusive användningen av ett dataset för endast en dag för träning, ger avhandlingen värdefulla insikter för detektering av anomalier i EtherCAT-system, och bidrar både till litteraturen och praktiker inom området. Den lägger grunden för framtida forskning inom detta område, och belyser nyckelområden för ytterligare utforskning såsom större dataset, alternativa tekniker för detektering av anomalier och tillämpningen av ramverket i strömmande data-miljöer. Unsupervised Learning Multivariate Time Series Graph Convolutional Neural Networks Anomaly Detection Industrial Control System EtherCAT Power Station Electricity Grid Computer and Information Sciences Data- och informationsvetenskap
44	An empirical study of the impact of data dimensionality on the performance of change point detection algorithms / En empirisk studie av data dimensionalitetens påverkan på change point detection algoritmers prestanda Noharet, Léo January 2023 (has links) When a system is monitored over time, changes can be discovered in the time series of monitored variables. Change Point Detection (CPD) aims at finding the time point where a change occurs in the monitored system. While CPD methods date back to the 1950’s with applications in quality control, few studies have been conducted on the impact of data dimensionality on CPD algorithms. This thesis intends to address this gap by examining five different algorithms using synthetic data that incorporates changes in mean, covariance, and frequency across dimensionalities up to 100. Additionally, the algorithms are evaluated on a collection of data sets originating from various domains. The studied methods are then assessed and ranked based on their performance on both synthetic and real data sets, to aid future users in selecting an appropriate CPD method. Finally, stock data from the 30 most traded companies on the Swedish stock market are collected to create a new CPD data set to which the CPD algorithms are applied. The changes of the monitored system that the CPD algorithms aim to detect are the changes in policy rate set by the Swedish central bank, Riksbank. The results of the thesis show that the dimensionality impacts the accuracy of the methods when noise is present and when the degree of mean or covariance change is small. Additionally, the application of the algorithms on real world data sets reveals large differences in performance between the studied methods, underlining the importance of comparison studies. Ultimately, the kernel based CPD method performed the best across the real world data set employed in the thesis. / När system övervakas över tid kan förändringar upptäckas i de uppmätade variablers tidsseriedata. Change Point Detection (CPD) syftar till att hitta tidpunkten då en förändring inträffar i det övervakade systemet’s tidseriedata. Medan CPD-metoder har sitt urspring i kvalitetskontroll under 1950-talet, har få studier undersökt datans dimensionalitets påverkan på CPD-algoritmer’s förmåga. Denna avhandling avser att fylla denna kunskapslucka genom att undersöka fem olika algoritmer med hjälp av syntetiska data som inkorporerar förändringar i medelvärde, kovarians och frekvens över dimensioner upp till 100. Dessutom jämförs algoritmerna med hjälp av en samling av data från olika domäner. De studerade metoderna bedöms och rangordnas sedan baserat på deras prestanda på både syntetiska och verkliga datauppsättningar för att hjälpa framtida användare att välja en lämplig CPD algoritm. Slutligen har aktiedata samlats från de 30 mest handlade företagen på den svenska aktiemarknaden för att skapa ett nytt data set. De förändringar i det övervakade systemet som CPD-algoritmerna syftar till att upptäcka är förändringarna i styrräntan som fastställs av Riksbanken. Resultaten av studien tyder på att dimensionaliteten påverkar förmågan hos algoritmerna att upptäcka förändringspunkterna när brus förekommer i datan och när graden av förändringen är liten. Dessutom avslöjar tillämpningen av algoritmerna på den verkliga datan stora skillnader i prestanda mellan de studerade metoderna, vilket understryker vikten av jämförelsestudier för att avslöja dessa skillnader. Slutligen presterade den kernel baserade CPD metoden bäst. Time series segmentation Change point detection Multivariate time series Data dimensionality Tidsserie-segmentering Förändringspunkts detektering Mulitvariabla tidsserier Data dimentionalitet Computer and Information Sciences Data- och informationsvetenskap
45	Contributions to Data Reduction and Statistical Model of Data with Complex Structures Wei, Yanran 30 August 2022 (has links) With advanced technology and information explosion, the data of interest often have complex structures, with the large size and dimensions in the form of continuous or discrete features. There is an emerging need for data reduction, efficient modeling, and model inference. For example, data can contain millions of observations with thousands of features. Traditional methods, such as linear regression or LASSO regression, cannot effectively deal with such a large dataset directly. This dissertation aims to develop several techniques to effectively analyze large datasets with complex structures in the observational, experimental and time series data. In Chapter 2, I focus on the data reduction for model estimation of sparse regression. The commonly-used subdata selection method often considers sampling or feature screening. Un- der the case of data with both large number of observation and predictors, we proposed a filtering approach for model estimation (FAME) to reduce both the size of data points and features. The proposed algorithm can be easily extended for data with discrete response or discrete predictors. Through simulations and case studies, the proposed method provides a good performance for parameter estimation with efficient computation. In Chapter 3, I focus on modeling the experimental data with quantitative-sequence (QS) factor. Here the QS factor concerns both quantities and sequence orders of several compo- nents in the experiment. Existing methods usually can only focus on the sequence orders or quantities of the multiple components. To fill this gap, we propose a QS transformation to transform the QS factor to a generalized permutation matrix, and consequently develop a simple Gaussian process approach to model the experimental data with QS factors. In Chapter 4, I focus on forecasting multivariate time series data by leveraging the au- toregression and clustering. Existing time series forecasting method treat each series data independently and ignore their inherent correlation. To fill this gap, I proposed a clustering based on autoregression and control the sparsity of the transition matrix estimation by adap- tive lasso and clustering coefficient. The clustering-based cross prediction can outperforms the conventional time series forecasting methods. Moreover, the the clustering result can also enhance the forecasting accuracy of other forecasting methods. The proposed method can be applied on practical data, such as stock forecasting, topic trend detection. / Doctor of Philosophy / This dissertation focuses on three projects that are related to data reduction and statistical modeling of data with complex structures. In chapter 2, we propose a filtering approach of data for parameter estimation of sparse regression. Given data with thousands of ob- servations and predictors or even more, large storage and computation spaces is need to handle these data. It is challenging to computational power and takes long time in terms of computational cost. So we come up with an algorithm (FAME) that can reduce both the number of observations and predictors. After data reduction, this subdata selected by FAME keeps most information of the original dataset in terms of parameter estimation. Compare with existing methods, the dimension of the subdata generated by the proposed algorithm is smaller while the computational time does not increase. In chapter 3, we use quantitative-sequence (QS) factor to describe experimental data. One simple example of experimental data is milk tea. Adding 1 cup of milk first or adding 2 cup of tea first will influence the flavor. And this case can be extended to cases when there are thousands of ingredients need to be input into the experiment. Then the order and amount of ingredients will generate different experimental results. We use QS factor to describe this kind of order and amount. Then by transforming the QS factor to a matrix containing continuous value and set this matrix as input, we model the experimental results with a simple Gaussian process. In chapter 4, we propose an autoregression-based clustering and forecasting method of multi- variate time series data. Existing research works often treat each time series independently. Our approach incorporates the inherent correlation of data and cluster related series into one group. The forecasting is built based on each cluster and data within one cluster can cross predict each other. One application of this method is on topic trending detection. With thousands of topics, it is unfeasible to apply one model for forecasting all time series. Considering the similarity of trends among related topics, the proposed method can cluster topics based on their similarity, and then perform forecasting in autoregression model based on historical data within each cluster. high-dimensional data subdata selection filtering approach Analysis of experimental data Gaussian process Permutation matrix. QS factor multivariate time series spectral clustering autoregression cross prediction
46	Interaktiv identifiering av avvikelser i mätdata från testning av kretskort Berglund, Ebba, Kazemi, Baset January 2024 (has links) Visualisering är ett kraftfullt verktyg vid dataanalys, särskilt för att identifiera avvikelser. Att effektivt kunna identifiera felaktiga komponenter i elektronik kan förbättra och utveckla produktionsprocesserna avsevärd. Genom att tydligt visa korrelationen mellan felaktiga och fungerande komponenter kan analytiker identifiera nyckelkomponenter som orsakar defekta produkter. Multivariata data och multivariata tidsseriedata ställer höga krav på visualiseringar på grund av deras komplexitet. Den höga dimensionaliteten kan leda till problem som överlappning och dolda mönster beroende på vilken visualiseringsteknik som används. För att uppnå effektiv visualisering av multivariata data och multivariata tidsseriedata krävs det att både trender över tid och korrelationer mellan olika variabler visas. Studien genomfördes i samarbete med konsultföretaget Syntronic AB för att identifiera lämpliga visualiseringstekniker för data som samlats in vid testning av kretskort. Metoden som användes är design science, vilket omfattar en litteraturstudie, utveckling av prototyp och utvärdering av prototypen. Prototypen består av tre visualiseringstekniker som är: Kategorisk heatmap, Parallella koordinater och Scatterplot. Dessa tekniker jämfördes systematiskt för att bedöma deras effektivitet. Utvärderingen består av kvantitativa metoder såsom mätningar och enkäter, samt den kvalitativa metoden intervju. Resultatet av studien presenterar den utvecklade prototypen och analysen av utvärderingen. Resultatet av studien visar att kategoriska heatmaps är effektiv för att identifiera samband mellan avvikelser i multivariat data. Även om alla användare upplevde visualiseringen svårtolkad vid en första anblick uttryckte de att visualiseringen var effektiv på att visa korrelationer mellan avvikelser. Parallella koordinater upplevdes svårtolkad och ineffektiv på grund av den höga dimensionaliteten där alla dimensioner inte kan visas samtidigt. Förbättringsförslag för att öka användarvänlighet och användarupplevelse lyftes där tree view förslogs som ett alternativ för att välja de dimensioner som ska visas i stället för reglaget. Scatterplots visade sig vara användbar för att analysera enskilda testpunkter och visade generella trender på ett tydligt och begripligt sätt. Studien har även visat att interaktiviteten påverkar upplevelsen av visualisering, där begränsad interaktivitet medför att tekniken upplevds mindre användbar för att identifiera relationer mellan avvikelser. / Visualization is of great importance when analyzing data, especially when distinguishing anomalies. Identifying faulty components of electronics could evolve and improve the production processes tremendously. By effectively displaying the correlation between faulty and working components, analytics can identify key components causing faulty products.Multivariate data and multivariate time series data place high demands on visualizations due to their complexity. The high dimensionality can lead to issues such as overlapping and hidden patterns, depending on the visualization technique used. To achieve effective visualization of multivariate data and multivariate time series data, it is necessary to show both trends over time and correlations between different variables. This study was conducted in cooperation with Syntronic AB, a consulting company, to help identify suitable visualization techniques for data gathered by testing circuit boards. The methodology used is design research which includes research gathering, development of a prototype and evaluation of the prototype. The prototype consists of three visualization techniques: Categorical heatmap, Parallel Coordinates, and Scatterplot. These techniques were systematically compared to assess their effectiveness. The evaluation consists of quantitative methods such as time measurement and survey, and the qualitative method interview. The result of the study shows the developed prototype and the analysis of the evaluation. As a result, the study found categorical heatmaps effective in distinguishing correlation between anomalies in multivariate data. Although all users found the visualization difficult to grasp at first glance, expressed their beliefs regarding the effectiveness of displaying correlation. Parallel Coordinates were perceived as difficult to interpret and ineffective for high-dimensional datasets where all dimensions can´t be displayed simultaneously. Interactive options such as tree view to select test pointsto visualize were suggested to further improve the usefulness of Parallel Coordinates. Scatterplot proved useful for analyzing individual test points and showed general trends in a user-friendly way. Furthermore, the study also showed that interactivity affect the perception of visualizations. Limited interactivity resulted in users finding the visualizations less effective in distinguishing anomalies and were perceived as less user-friendly. Visualization Multivariate Time Series Data Anomaly Detection of Coherence Parallel Coordinates Scatterplot Datavisualisering Multivariat tidsseriedata Felidentifiering av samband Heatmap Parallella koordinater Scatterplot Computer and Information Sciences Data- och informationsvetenskap
47	AUGMENTATION AND CLASSIFICATION OF TIME SERIES FOR FINDING ACL INJURIES Johansson, Marie-Louise January 2022 (has links) This thesis addresses the problem where we want to apply machine learning over a small data set of multivariate time series. A challenge when classifying data is when the data set is small and overfitting is at risk. Augmentation of small data sets might avoid overfitting. The multivariate time series used in this project represent motion data of people with reconstructed ACLs and a control group. The approach was pairing motion data from the training set and using Euclidean Barycentric Averaging to create a new set of synthetic motion data so as to increase the size of the training set. The classifiers used were Dynamic Time Warping -One Nearest neighbour and Time Series Forest. In our example we found this way of increasing the training set a less productive strategy. We also found Time Series Forest to generally perform with higher accuracy on the chosen data sets, but there may be more effective augmentation strategies to avoid overfitting. computer science machine learning motion analysis reconstructed ACL anterior cruciate ligament time series forest dynamic time wapring ACL multivariate time series clasification MTSC time series classification TSC euclidean barycentric average euclidean barycentric averaging autmentation of time series augmentation of multivariate time series data augmentation augmentation Computer Sciences Datavetenskap (datalogi)
48	Elastic matching for classification and modelisation of incomplete time series / Appariement élastique pour la classification et la modélisation de séries temporelles incomplètes Phan, Thi-Thu-Hong 12 October 2018 (has links) Les données manquantes constituent un challenge commun en reconnaissance de forme et traitement de signal. Une grande partie des techniques actuelles de ces domaines ne gère pas l'absence de données et devient inutilisable face à des jeux incomplets. L'absence de données conduit aussi à une perte d'information, des difficultés à interpréter correctement le reste des données présentes et des résultats biaisés notamment avec de larges sous-séquences absentes. Ainsi, ce travail de thèse se focalise sur la complétion de larges séquences manquantes dans les séries monovariées puis multivariées peu ou faiblement corrélées. Un premier axe de travail a été une recherche d'une requête similaire à la fenêtre englobant (avant/après) le trou. Cette approche est basée sur une comparaison de signaux à partir d'un algorithme d'extraction de caractéristiques géométriques (formes) et d'une mesure d'appariement élastique (DTW - Dynamic Time Warping). Un package R CRAN a été développé, DTWBI pour la complétion de série monovariée et DTWUMI pour des séries multidimensionnelles dont les signaux sont non ou faiblement corrélés. Ces deux approches ont été comparées aux approches classiques et récentes de la littérature et ont montré leur faculté de respecter la forme et la dynamique du signal. Concernant les signaux peu ou pas corrélés, un package DTWUMI a aussi été développé. Le second axe a été de construire une similarité floue capable de prender en compte les incertitudes de formes et d'amplitude du signal. Le système FSMUMI proposé est basé sur une combinaison floue de similarités classiques et un ensemble de règles floues. Ces approches ont été appliquées à des données marines et météorologiques dans plusieurs contextes : classification supervisée de cytogrammes phytoplanctoniques, segmentation non supervisée en états environnementaux d'un jeu de 19 capteurs issus d'une station marine MAREL CARNOT en France et la prédiction météorologique de données collectées au Vietnam. / Missing data are a prevalent problem in many domains of pattern recognition and signal processing. Most of the existing techniques in the literature suffer from one major drawback, which is their inability to process incomplete datasets. Missing data produce a loss of information and thus yield inaccurate data interpretation, biased results or unreliable analysis, especially for large missing sub-sequence(s). So, this thesis focuses on dealing with large consecutive missing values in univariate and low/un-correlated multivariate time series. We begin by investigating an imputation method to overcome these issues in univariate time series. This approach is based on the combination of shape-feature extraction algorithm and Dynamic Time Warping method. A new R-package, namely DTWBI, is then developed. In the following work, the DTWBI approach is extended to complete large successive missing data in low/un-correlated multivariate time series (called DTWUMI) and a DTWUMI R-package is also established. The key of these two proposed methods is that using the elastic matching to retrieving similar values in the series before and/or after the missing values. This optimizes as much as possible the dynamics and shape of knowledge data, and while applying the shape-feature extraction algorithm allows to reduce the computing time. Successively, we introduce a new method for filling large successive missing values in low/un-correlated multivariate time series, namely FSMUMI, which enables to manage a high level of uncertainty. In this way, we propose to use a novel fuzzy grades of basic similarity measures and fuzzy logic rules. Finally, we employ the DTWBI to (i) complete the MAREL Carnot dataset and then we perform a detection of rare/extreme events in this database (ii) forecast various meteorological univariate time series collected in Vietnam Imputation Données manquantes Séries temporelles univariées Dynamic Time Warping Mesure de similarité Système d'inférence floue Imputation Missing data Univariate time series Uncorrelated multivariate time series Dynamic Time Warping Similarity measure Fuzzy inference system
49	Détection de ruptures multiples dans des séries temporelles multivariées : application à l'inférence de réseaux de dépendance / Multiple change-point detection in multivariate time series : application to the inference of dependency networks Harlé, Flore 21 June 2016 (has links) Cette thèse présente une méthode pour la détection hors-ligne de multiples ruptures dans des séries temporelles multivariées, et propose d'en exploiter les résultats pour estimer les relations de dépendance entre les variables du système. L'originalité du modèle, dit du Bernoulli Detector, réside dans la combinaison de statistiques locales issues d'un test robuste, comparant les rangs des observations, avec une approche bayésienne. Ce modèle non paramétrique ne requiert pas d'hypothèse forte sur les distributions des données. Il est applicable sans ajustement à la loi gaussienne comme sur des données corrompues par des valeurs aberrantes. Le contrôle de la détection d'une rupture est prouvé y compris pour de petits échantillons. Pour traiter des séries temporelles multivariées, un terme est introduit afin de modéliser les dépendances entre les ruptures, en supposant que si deux entités du système étudié sont connectées, les événements affectant l'une s'observent instantanément sur l'autre avec une forte probabilité. Ainsi, le modèle s'adapte aux données et la segmentation tient compte des événements communs à plusieurs signaux comme des événements isolés. La méthode est comparée avec d'autres solutions de l'état de l'art, notamment sur des données réelles de consommation électrique et génomiques. Ces expériences mettent en valeur l'intérêt du modèle pour la détection de ruptures entre des signaux indépendants, conditionnellement indépendants ou complètement connectés. Enfin, l'idée d'exploiter les synchronisations entre les ruptures pour l'estimation des relations régissant les entités du système est développée, grâce au formalisme des réseaux bayésiens. En adaptant la fonction de score d'une méthode d'apprentissage de la structure, il est vérifié que le modèle d'indépendance du système peut être en partie retrouvé grâce à l'information apportée par les ruptures, estimées par le modèle du Bernoulli Detector. / This thesis presents a method for the multiple change-points detection in multivariate time series, and exploits the results to estimate the relationships between the components of the system. The originality of the model, called the Bernoulli Detector, relies on the combination of a local statistics from a robust test, based on the computation of ranks, with a global Bayesian framework. This non parametric model does not require strong hypothesis on the distribution of the observations. It is applicable without modification on gaussian data as well as data corrupted by outliers. The detection of a single change-point is controlled even for small samples. In a multivariate context, a term is introduced to model the dependencies between the changes, assuming that if two components are connected, the events occurring in the first one tend to affect the second one instantaneously. Thanks to this flexible model, the segmentation is sensitive to common changes shared by several signals but also to isolated changes occurring in a single signal. The method is compared with other solutions of the literature, especially on real datasets of electrical household consumption and genomic measurements. These experiments enhance the interest of the model for the detection of change-points in independent, conditionally independent or fully connected signals. The synchronization of the change-points within the time series is finally exploited in order to estimate the relationships between the variables, with the Bayesian network formalism. By adapting the score function of a structure learning method, it is checked that the independency model that describes the system can be partly retrieved through the information given by the change-points, estimated by the Bernoulli Detector. Détection de ruptures Inférence bayésienne Statistiques de rang Séries temporelles multivariées Réseaux bayésiens Classes d'équivalence de Markov Change-Point detection Bayesian inference Rank statistics Multivariate time series Bayesian networks Markov equivalence classes 620
50	Ensaios em alocação de portfólio com mudança de regime Oliveira, André Barbosa 15 August 2014 (has links) Submitted by Andre Barbosa Oliveira (andre.boliveira@hotmail.com) on 2014-09-10T13:02:37Z No. of bitstreams: 1 EnsaiosPortfolioMudançaDeRegime.pdf: 2662067 bytes, checksum: af012615c3e200b24dcafe0ba45c563d (MD5) / Approved for entry into archive by Suzinei Teles Garcia Garcia (suzinei.garcia@fgv.br) on 2014-09-10T17:49:11Z (GMT) No. of bitstreams: 1 EnsaiosPortfolioMudançaDeRegime.pdf: 2662067 bytes, checksum: af012615c3e200b24dcafe0ba45c563d (MD5) / Made available in DSpace on 2014-09-10T18:01:56Z (GMT). No. of bitstreams: 1 EnsaiosPortfolioMudançaDeRegime.pdf: 2662067 bytes, checksum: af012615c3e200b24dcafe0ba45c563d (MD5) Previous issue date: 2014-08-15 / Uma das principais características dos ativos financeiros é a mudança de regime. Os preços dos ativos apresentam pouca variabilidade nos períodos de normalidade e possuem quedas inesperadas e são instáveis nos períodos de crise. Esta tese estuda alocação de portfólio com mudança de regime. O primeiro ensaio considera a decisão ótima de investimento entre os ativos de risco quando o mercado financeiro possui mudança de regime, definindo portfólios ótimos que dependem dos retornos esperados, risco e das crenças sobre o estado do mercado financeiro. O segundo ensaio estuda alocação de portfólio baseada em estimativas do modelo fatorial com mudança de regime e compara com alocações usando modelos fatoriais lineares e momentos amostrais. A mudança de regime tem maior efeito sobre o processo de escolha dos portfólios do que sobre as estimativas usadas para definir as carteiras. / Among the characteristics of the financial assets an important stylized fact is regime change. Asset prices show little variability in good times and have unexpected drops and are unstable in times of crisis. This thesis studies portfolio allocation with regime change. The first essay considers the optimal investment decision among risky assets when the financial market has regime switching. The optimal portfolio depend on expected returns and risk as well as on beliefs about the state of the financial market. The second essay studies asset allocation based on estimates of the factor model with regime change and compares with allocations using linear factor models and sample moments. The presence of multiple regimes has a greater effect on portfolio choice than on the estimates used to determine the portfolios. Teoria do portfólio Modelos fatoriais Asset allocation Factorial models Economia Investimentos Análise de séries temporais Alocação de ativos Análise multivariada Otimização matemática

Search results