Spelling suggestions: "subject:"multivariate time series"" "subject:"ultivariate time series""
51 |
Análise da qualidade do ar : um estudo de séries temporais para dados de contagemSilva, Kelly Cristina Ramos da 30 April 2013 (has links)
Made available in DSpace on 2016-06-02T20:06:08Z (GMT). No. of bitstreams: 1
5213.pdf: 2943691 bytes, checksum: 6d301fea12ee3950f36c4359dd4a627e (MD5)
Previous issue date: 2013-04-30 / Financiadora de Estudos e Projetos / The aim of this study was to investigate the monthly amount of unfavourable days to pollutant dispersion in the atmosphere on the metropolitan region of S ão Paulo (RMSP). It was considered two data sets derived from the air quality monitoring on the RMSP: (1) monthly observations of the times series of annual period and (2) monthly observations of the times series of period form May to September. It was used two classes of models: the Vector Autoregressive models (VAR) and Generalized Additive Models for Location, Scale and Shape (GAMLSS). The techniques presented in this dissertation was focus in: VAR class had emphasis on modelling stationary time series; and GAMLSS class had emphasis on models for count data, like Delaporte (DEL), Negative Binomial type I (NBI), Negative Binomial type II (NBII), Poisson (PO), inflated Poisson Zeros (ZIP), Inverse Poisson Gaussian (PIG) and Sichel (SI). The VAR was used only for the data set (1) obtaining a good prediction of the monthly amount of unfavourable days, although the adjustment had presented relatively large residues. The GAMLSS were used in both data sets, and the NBII model had good performance to data set (1), and ZIP model for data set (2). Also, it was made a simulation study to better understanding of the GAMLSS class for count data. The data were generated from three different Negative Binomial distributions. The results shows that the models NBI, NBII, and PIG adjusted well the data generated. The statistic techniques used in this dissertation was important to describe and understand the air quality problem. / O objetivo deste trabalho foi investigar a quantidade mensal de dias desfavoráveis à dispersão de poluentes na atmosfera da região metropolitana de São Paulo (RMSP). Foram considerados dois conjuntos de dados provenientes do monitoramento da qualidade do ar da RMSP: (1) um contendo observações mensais das séries temporais do período anual e (2) outro contendo observações mensais das séries temporais do período de maio a setembro. Foram utilizadas duas classes de modelos: os Modelos Vetoriais Autorregressivos (VAR) e os Modelos Aditivos Generalizados para Locação, Escala e Forma (GAMLSS), ressaltando que as técnicas apresentadas nessa dissertação da classe VAR têm ênfase na modelagem de séries temporais estacionárias e as da classe GAMLSS têm ênfase nos modelos para dados de contagem, sendo eles: Delaporte (DEL), Binomial Negativa tipo I (NBI), Binomial Negativa tipo II (NBII), Poisson (PO), Poisson Inflacionada de Zeros (ZIP), Poisson Inversa Gaussiana (PIG) e Sichel (SI). O modelo VAR foi utilizado apenas para o conjunto de dados (1), obtendo uma boa previsão da quantidade mensal de dias desfavoráveis, apesar do ajuste ter apresentado resíduos relativamente grandes. Os GAMLSS foram utilizados em ambos conjuntos de dados, sendo que os modelos NBII e ZIP melhor se ajustaram aos conjuntos de dados (1) e (2) respectivamente. Além disso, realizou-se um estudo de simulação para compreender melhor os GAMLSS investigados. Os dados foram gerados de três diferentes distribuições Binomiais Negativas. Os resultados obtidos mostraram que, tanto os modelos NBI e NBII como o modelo PIG, ajustaram bem os dados gerados. As técnicas estatísticas utilizadas nessa dissertação foram importantes para descrever e compreender o problema da qualidade do ar.
|
52 |
Phillipsova křivka z pohledu analýzy časových řad v České republice a Německu / Phillips curve verification by time series analysis of Czech republic and GermanyKrál, Ondřej January 2017 (has links)
Government fiscal and monetary policy has long been based on the theory that was neither proven nor refuted since its origination. The original form of the Phillips curve has undergone significant modifications but its relevance remains questionable. This thesis examines the correlation between inflation and unemployment observed in the Czech Republic and Germany over the last twenty years. The validity of the theory is tested by advanced methods of time series analysis in the R environment. All the variables are gradually tested which results in the assessment of the correlation between the time series. The outcome of the testing is presented for both countries and a comparison at international level is drawn. Is is discovered that both of the countries have dependencies in their data. Czech republic has significant dependency in both ways, for Germany is the dependency significantly weaker and only in one way.
|
53 |
PERFORMANCE EVALUATION OF UNIVARIATE TIME SERIES AND DEEP LEARNING MODELS FOR FOREIGN EXCHANGE MARKET FORECASTING: INTEGRATION WITH UNCERTAINTY MODELINGWajahat Waheed (11828201) 13 December 2021 (has links)
Foreign exchange market is the largest financial market in the world and thus prediction of
foreign exchange rate values is of interest to millions of people. In this research, I evaluated the
performance of Long Short Term Memory (LSTM), Gated Recurrent Unit (GRU),
Autoregressive Integrated Moving Average (ARIMA) and Moving Average (MA) on the
USD/CAD and USD/AUD exchange pairs for 1-day, 1-week and 2-weeks predictions. For
LSTM and GRU, twelve macroeconomic indicators along with past exchange rate values were
used as features using data from January 2001 to December 2019. Predictions from each model
were then integrated with uncertainty modeling to find out the chance of a model’s prediction
being greater than or less than a user-defined target value using the error distribution from the
test dataset, Monte-Carlo simulation trials and ChancCalc excel add-in. Results showed that
ARIMA performs slightly better than LSTM and GRU for 1-day predictions for both USD/CAD
and USD/AUD exchange pairs. However, when the period is increased to 1-week and 2-weeks,
LSTM and GRU outperform both ARIMA and moving average for both USD/CAD and
USD/AUD exchange pair.
|
54 |
Time Dependencies Between Equity Options Implied Volatility Surfaces and Stock Loans, A Forecast Analysis with Recurrent Neural Networks and Multivariate Time Series / Tidsberoenden mellan aktieoptioners implicerade volatilitetsytor och aktielån, en prognosanalys med rekursiva neurala nätverk och multidmensionella tidsserierWahlberg, Simon January 2022 (has links)
Synthetic short positions constructed by equity options and stock loan short sells are linked by arbitrage. This thesis analyses the link by considering the implied volatility surface (IVS) at 80%, 100%, and 120% moneyness, and stock loan variables such as benchmark rate (rt), utilization, short interest, and transaction trends to inspect time-dependent structures between the two assets. By applying multiple multivariate time-series analyses in terms of vector autoregression (VAR) and the recurrent neural networks long short-term memory (LSTM) and gated recurrent units (GRU) with a sliding window methodology. This thesis discovers linear and complex relationships between the IVS and stock loan data. The three-day-ahead out-of-sample LSTM forecast of IV at 80% moneyness improved by including lagged values of rt and yielded 19.6% MAPE and forecasted correct direction 81.1% of samples. The corresponding 100% moneyness GRU forecast was also improved by including stock loan data, at 10.8% MAPE and correct directions for 60.0% of samples. The 120% moneyness VAR forecast did not improve with stock loan data at 26.5% MAPE and correct directions for 66.2% samples. The one-month-ahead rt VAR forecast improved by including a lagged IVS, at 25.5% MAPE and 63.6% correct directions. The presented data was optimal for each target variable, showing that the application of LSTM and GRU was justified. These results indicate that considering stock loan data when forecasting IVS for 80% and 100% moneyness is advised to gain exploitable insights for short-term positions. They are further validated since the different models yielded parallel inferences. Similar analysis with other equity is advised to gain insights into the relationship and improve such forecasts. / Syntetiska kortpositioner konstruerade av aktieoptioner och blankning med aktielån är kopplade med arbitrage. Denna tes analyserar kopplingen genom att överväga den implicerade volatilitetsytan vid 80%, 100% och 120% moneyness och aktielånvariabler såsom referensränta rt, låneutnyttjande, låneintresse, och transaktionstrender för att granska tidsberoende strukturer mellan de två tillgångarna. Genom att tillämpa multipel multidimensionell tidsserieanalys såsom vektorautoregression (VAR) och de rekursiva neurala nätverken long short-term memory (LSTM) och gated recurrent units (GRU). Tesen upptäcker linjära och komplexa samband mellan implicerade volatilitetsytor och aktielånedata. Tre dagars LSTM-prognos av implicerade volatiliteten vid 80% moneyness förbättrades genom att inkludera fördröjda värden av rt och gav 19,6% MAPE och prognostiserade korrekt riktning för 81,1% av prover. Motsvarande 100% moneyness GRU-prognos förbättrades också genom att inkludera aktielånedata, resulterande i 10,8% MAPE och korrekt riktning för 60,0% av prover. VAR-prognosen för 120% moneyness förbättrades inte med alternativa data på 26,5% MAPE och korrekt riktning för 66,2% av prover. En månads VAR-prognos för rt förbättrades genom att inkludera en fördröjd implicerad volatilitetsyta, resulterande i 25,5% MAPE och 63,6% korrekta riktningar. Presenterad statistik var optimala för dessa variabler, vilket visar att tillämpningen av LSTM och GRU var motiverad. Därav rekommenderas det att inkludera aktielånedata för prognostisering av implicerade volatilitetsytor för 80% och 100% moneyness, speciellt för kortsiktiga positioner. Resultaten valideras ytterligare eftersom de olika modellerna gav dylika slutsatser. Liknande analys med andra aktier är rekommenderat för att få insikter i förhållandet och förbättra sådana prognoser.
|
55 |
Combinaison de l’Internet des objets, du traitement d’évènements complexes et de la classification de séries temporelles pour une gestion proactive de processus métier / Combining the Internet of things, complex event processing, and time series classification for a proactive business process management.Mousheimish, Raef 27 October 2017 (has links)
L’internet des objets est au coeur desprocessus industriels intelligents grâce à lacapacité de détection d’évènements à partir dedonnées de capteurs. Cependant, beaucoup resteà faire pour tirer le meilleur parti de cettetechnologie récente et la faire passer à l’échelle.Cette thèse vise à combler le gap entre les fluxmassifs de données collectées par les capteurs etleur exploitation effective dans la gestion desprocessus métier. Elle propose une approcheglobale qui combine le traitement de flux dedonnées, l’apprentissage supervisé et/oul’utilisation de règles sur des évènementscomplexes permettant de prédire (et doncéviter) des évènements indésirables, et enfin lagestion des processus métier étendue par cesrègles complexes.Les contributions scientifiques de cette thèse sesituent dans différents domaines : les processusmétiers plus intelligents et dynamiques; letraitement d’évènements complexes automatisépar l’apprentissage de règles; et enfin et surtout,dans le domaine de la fouille de données deséries temporelles multivariéespar la prédiction précoce de risques.L’application cible de cette thèse est le transportinstrumenté d’oeuvres d’art / Internet of things is at the core ofsmart industrial processes thanks to its capacityof event detection from data conveyed bysensors. However, much remains to be done tomake the most out of this recent technologyand make it scale. This thesis aims at filling thegap between the massive data flow collected bysensors and their effective exploitation inbusiness process management. It proposes aglobal approach, which combines stream dataprocessing, supervised learning and/or use ofcomplex event processing rules allowing topredict (and thereby avoid) undesirable events,and finally business process managementextended to these complex rules. The scientificcontributions of this thesis lie in several topics:making the business process more intelligentand more dynamic; automation of complexevent processing by learning the rules; and lastand not least, in datamining for multivariatetime series by early prediction of risks. Thetarget application of this thesis is theinstrumented transportation of artworks.
|
56 |
Robust methods in multivariate time series / Méthodes robustes dans les séries chronologiques multivariées / Métodos robustos em séries temporais multivariadasAranda Cotta, Higor Henrique 22 August 2019 (has links)
Ce manuscrit propose de nouvelles méthodes d’estimation robustes pour les fonctions matricielles d’autocovariance et d’autocorrélation de séries chronologiques multivariées stationnaires pouvant présenter des valeurs aberrantes aléatoires additives. Ces fonctions jouent un rôle important dans l’identification et l’estimation des paramètres de modèles de séries chronologiques multivariées stationnaires. Nous proposons tout d'abord de nouveaux estimateurs des fonctions matricielles d’autocovariance et d’autocorrélation construits en utilisant une approche spectrale à l'aide du périodogramme matriciel. Comme dans le cas des estimateurs classiques des fonctions d’autocovariance et d’autocorrélation matricielles, ces estimateurs sont affectés par des observations aberrantes. Ainsi, toute procédure d'identification ou d'estimation les utilisant est directement affectée, ce qui entraîne des conclusions erronées. Pour atténuer ce problème, nous proposons l’utilisation de techniques statistiques robustes pour créer des estimateurs résistants aux observations aléatoires aberrantes. Dans un premier temps, nous proposons de nouveaux estimateurs des fonctions d’autocorvariance et d’autocorrélation de séries chronologiques univariées. Les domaines temporel et fréquentiel sont liés par la relation existant entre la fonction d’autocovariance et la densité spectrale. Le périodogramme étant sensible aux données aberrantes, nous obtenons un estimateur robuste en le remplaçant parle $M$-périodogramme. Les propriétés asymptotiques des estimateurs sont établies. Leurs performances sont étudiées au moyen de simulations numériques pour différentes tailles d’échantillons et différents scénarios de contamination. Les résultats empiriques indiquent que les méthodes proposées fournissent des valeurs proches de celles obtenues par la fonction d'autocorrélation classique quand les données ne sont pas contaminées et resistent à différents cénarios de contamination. Ainsi, les estimateurs proposés dans cette thèse sont des méthodes alternatives utilisables pour des séries chronologiques présentant ou non des valeurs aberrantes. Les estimateurs obtenus pour des séries chronologiques univariées sont ensuite étendus au cas de séries multivariées. Cette extension est simplifiée par le fait que le calcul du périodogramme croisé ne fait intervenir que les coefficients de Fourier de chaque composante de la série. Le $M$-périodogramme matriciel apparaît alors comme une alternative robuste au périodogramme matriciel pour construire des estimateurs robustes des fonctions matricielles d’autocovariance et d’autocorrélation. Les propriétés asymptotiques sont étudiées et des expériences numériques sont réalisées. Comme exemple d'application avec des données réelles, nous utilisons les fonctions proposées pour ajuster un modèle autoregressif par la méthode de Yule-Walker à des données de pollution collectées dans la région de Vitória au Brésil.Enfin, l'estimation robuste du nombre de facteurs dans les modèles factoriels de grande dimension est considérée afin de réduire la dimensionnalité. En présence de valeurs aberrantes, les critères d’information proposés par Bai & Ng (2002) tendent à surestimer le nombre de facteurs. Pour atténuer ce problème, nous proposons de remplacer la matrice de covariance standard par la matrice de covariance robuste proposée dans ce manuscrit. Nos simulations montrent qu'en l'absence de contamination, les méthodes standards et robustes sont équivalentes. En présence d'observations aberrantes, le nombre de facteurs estimés augmente avec les méthodes non robustes alors qu'il reste le même en utilisant les méthodes robustes. À titre d'application avec des données réelles, nous étudions des concentrations de polluant PM$_{10}$ mesurées dans la région de l'Île-de-France en France. / This manuscript proposes new robust estimation methods for the autocovariance and autocorrelation matrices functions of stationary multivariates time series that may have random additives outliers. These functions play an important role in the identification and estimation of time series model parameters. We first propose new estimators of the autocovariance and of autocorrelation matrices functions constructed using a spectral approach considering the periodogram matrix periodogram which is the natural estimator of the spectral density matrix. As in the case of the classic autocovariance and autocorrelation matrices functions estimators, these estimators are affected by aberrant observations. Thus, any identification or estimation procedure using them is directly affected, which leads to erroneous conclusions. To mitigate this problem, we propose the use of robust statistical techniques to create estimators resistant to aberrant random observations.As a first step, we propose new estimators of autocovariance and autocorrelation functions of univariate time series. The time and frequency domains are linked by the relationship between the autocovariance function and the spectral density. As the periodogram is sensitive to aberrant data, we get a robust estimator by replacing it with the $M$-periodogram. The $M$-periodogram is obtained by replacing the Fourier coefficients related to periodogram calculated by the standard least squares regression with the ones calculated by the $M$-robust regression. The asymptotic properties of estimators are established. Their performances are studied by means of numerical simulations for different sample sizes and different scenarios of contamination. The empirical results indicate that the proposed methods provide close values of those obtained by the classical autocorrelation function when the data is not contaminated and it is resistant to different contamination scenarios. Thus, the estimators proposed in this thesis are alternative methods that can be used for time series with or without outliers.The estimators obtained for univariate time series are then extended to the case of multivariate series. This extension is simplified by the fact that the calculation of the cross-periodogram only involves the Fourier coefficients of each component from the univariate series. Thus, the $M$-periodogram matrix is a robust periodogram matrix alternative to build robust estimators of the autocovariance and autocorrelation matrices functions. The asymptotic properties are studied and numerical experiments are performed. As an example of an application with real data, we use the proposed functions to adjust an autoregressive model by the Yule-Walker method to Pollution data collected in the Vitória region Brazil.Finally, the robust estimation of the number of factors in large factorial models is considered in order to reduce the dimensionality. It is well known that the values random additive outliers affect the covariance and correlation matrices and the techniques that depend on the calculation of their eigenvalues and eigenvectors, such as the analysis principal components and the factor analysis, are affected. Thus, in the presence of outliers, the information criteria proposed by Bai & Ng (2002) tend to overestimate the number of factors. To alleviate this problem, we propose to replace the standard covariance matrix with the robust covariance matrix proposed in this manuscript. Our Monte Carlo simulations show that, in the absence of contamination, the standard and robust methods are equivalent. In the presence of outliers, the number of estimated factors increases with the non-robust methods while it remains the same using robust methods. As an application with real data, we study pollutant concentrations PM$_{10}$ measured in the Île-de-France region of France. / Este manuscrito é centrado em propor novos métodos de estimaçao das funçoes de autocovariancia e autocorrelaçao matriciais de séries temporais multivariadas com e sem presença de observaçoes discrepantes aleatorias. As funçoes de autocovariancia e autocorrelaçao matriciais desempenham um papel importante na analise e na estimaçao dos parametros de modelos de série temporal multivariadas. Primeiramente, nos propomos novos estimadores dessas funçoes matriciais construıdas, considerando a abordagem do dominio da frequencia por meio do periodograma matricial, um estimador natural da matriz de densidade espectral. Como no caso dos estimadores tradicionais das funçoes de autocovariancia e autocorrelaçao matriciais, os nossos estimadores tambem sao afetados pelas observaçoes discrepantes. Assim, qualquer analise subsequente que os utilize é diretamente afetada causando conclusoes equivocadas. Para mitigar esse problema, nos propomos a utilizaçao de técnicas de estatistica robusta para a criaçao de estimadores resistentes as observaçoes discrepantes aleatorias. Inicialmente, nos propomos novos estimadores das funçoes de autocovariancia e autocorrelaçao de séries temporais univariadas considerando a conexao entre o dominio do tempo e da frequencia por meio da relaçao entre a funçao de autocovariancia e a densidade espectral, do qual o periodograma tradicional é o estimador natural. Esse estimador é sensivel as observaçoes discrepantes. Assim, a robustez é atingida considerando a utilizaçao do Mperiodograma. O M-periodograma é obtido substituindo a regressao por minimos quadrados com a M-regressao no calculo das estimativas dos coeficientes de Fourier relacionados ao periodograma. As propriedades assintoticas dos estimadores sao estabelecidas. Para diferentes tamanhos de amostras e cenarios de contaminaçao, a performance dos estimadores é investigada. Os resultados empiricos indicam que os métodos propostos provem resultados acurados. Isto é, os métodos propostos obtêm valores proximos aos da funçao de autocorrelaçao tradicional no contexto de nao contaminaçao dos dados. Quando ha contaminaçao, os M-estimadores permanecem inalterados. Deste modo, as funçoes de M-autocovariancia e de M-autocorrelaçao propostas nesta tese sao alternativas vi aveis para séries temporais com e sem observaçoes discrepantes. A boa performance dos estimadores para o cenario de séries temporais univariadas motivou a extensao para o contexto de séries temporais multivariadas. Essa extensao é direta, haja vista que somente os coeficientes de Fourier relativos à cada uma das séries univariadas sao necessarios para o calculo do periodograma cruzado. Novamente, a relaçao de dualidade entre o dominio da frequência e do tempo é explorada por meio da conexao entre a funçao matricial de autocovariancia e a matriz de densidade espectral de séries temporais multivariadas. É neste sentido que, o presente artigo propoe a matriz M-periodograma como um substituto robusto à matriz periodograma tradicional na criaçao de estimadores das funçoes matriciais de autocovariancia e autocorrelaçao. As propriedades assintoticas sao estudas e experimentos numéricos sao realizados. Como exemplo de aplicaçao à dados reais, nos aplicamos as funçoes propostas no artigo na estimaçao dos parâmetros do modelo de série temporal multivariada pelo método de Yule-Walker para a modelagem dos dados MP10 da regiao de Vitoria/Brasil. Finalmente, a estimaçao robusta dos numeros de fatores em modelos fatoriais aproximados de alta dimensao é considerada com o objetivo de reduzir a dimensionalidade. Ésabido que dados discrepantes afetam as matrizes de covariancia e correlaçao. Em adiçao, técnicas que dependem do calculo dos autovalores e autovetores dessas matrizes, como a analise de componentes principais e a analise fatorial, sao completamente afetadas. Assim, na presença de observaçoes discrepantes, o critério de informaçao proposto por Bai & Ng (2002) tende a superestimar o numero de fatores. [...]
|
57 |
Modélisation des modèles autorégressifs vectoriels avec variables exogènes et sélection d’indicesOscar, Mylène 05 1900 (has links)
Ce mémoire porte sur l’étude des modèles autorégressifs avec variables exogènes et sélection d’indices. La littérature classique regorge de textes concernant la sélection d’indices dans les modèles autorégressifs. Ces modèles sont particulièrement utiles pour des données macroéconomiques mesurées sur des périodes de temps modérées à longues. Effectivement, la lourde paramétrisation des modèles complets peut souvent être allégée en utilisant la sélection d’indices aboutissant ainsi à des modèles plus parcimonieux. Les modèles à variables exogènes sont très intéressants dans le contexte où il est connu que les variables à l’étude sont affectées par d’autres variables, jouant le rôle de variables explicatives, que l’analyste ne veut pas forcément modéliser. Ce mémoire se propose donc d’étudier les modèles autorégressifs vectoriels avec variables exogènes et sélection d’indices. Ces modèles ont été explorés, entre autres, par Lütkepohl (2005), qui se contente cependant d’esquisser les développements mathématiques. Nous concentrons notre étude sur l’inférence statistique sous des conditions précises, la modélisation ainsi que les prévisions. Notre objectif est de comparer les modèles avec sélection d’indices aux modèles autorégressifs avec variables exogènes complets classiques. Nous désirons déterminer si l’utilisation des modèles avec sélection d’indices est marquée par une différence favorable au niveau du biais et de l’écart-type des estimateurs ainsi qu’au niveau des prévisions de valeurs futures. Nous souhaitons également comparer l’efficacité de la sélection d’indices dans les modèles autorégressifs ayant des variables exogènes à celle dans les modèles autorégressifs. Il est à noter qu’une motivation première dans ce mémoire est l’estimation dans les modèles autorégressifs avec variables exogènes à sous-ensemble d’indices.
Dans le premier chapitre, nous présentons les séries temporelles ainsi que les diverses notions qui y sont rattachées. De plus, nous présentons les modèles linéaires classiques multivariés, les modèles à variables exogènes puis des modèles avec sélection d’indices. Dans le deuxième chapitre, nous exposons le cadre théorique de l’estimation des moindres carrés dans les modèles autorégressifs à sous-ensemble d’indices ainsi que le comportement asymptotique de l’estimateur. Ensuite, nous développons la théorie pour l’estimation des moindres carrés (LS) ainsi que la loi asymptotique des estimateurs pour les modèles autorégressifs avec sélection d’indices (SVAR) puis nous faisons de même pour les modèles
autorégressifs avec variables exogènes et tenant compte de la sélection des indices (SVARX). Spécifiquement, nous établissons la convergence ainsi que la distribution asymptotique pour l’estimateur des moindres carrés d’un processus autorégressif vectoriel à sous-ensemble d’indices et avec variables exogènes. Dans le troisième chapitre, nous appliquons la théorie spécifiée précédemment lors de simulations de Monte Carlo. Nous évaluons de manière empirique les biais et les écarts-types des coefficients trouvés lors de l’estimation ainsi que la proportion de fois que le modèle ajusté correspond au vrai modèle pour différents critères de sélection, tailles échantillonnales et processus générateurs des données. Dans le quatrième chapitre, nous appliquons la théorie élaborée aux chapitres 1 et 2 à un vrai jeu de données provenant du système canadien d’information socioéconomique (CANSIM), constitué de la production mensuelle de fromage mozzarella, cheddar et ricotta au Canada, expliquée par les prix mensuels du lait de bovin non transformé dans les provinces de Québec, d’Ontario et de la Colombie-Britannique pour la période allant de janvier 2003 à juillet 2021. Nous ajustons ces données à un modèle autorégressif avec variables exogènes complet puis à un modèle autorégressif avec variables exogènes et sélection d’indices. Nous comparons ensuite les résultats obtenus avec le modèle complet à ceux obtenus avec le modèle restreint.
Mots-clés : Processus autorégressif à sous-ensemble d’indices, variables exogènes, esti mation des moindres carrés, sélection de modèle, séries chronologiques multivariées, processus
stochastiques, séries chronologiques. / This Master’s Thesis focuses on the study of subset autoregressive models with exoge nous variables. Many texts from the classical literature deal with the selection of indexes in autoregressive models. These models are particularly useful for macroeconomic data measured over moderate to long periods of time. Indeed, the heavy parameterization of full models can often be simplified by using the selection of indexes, thus resulting in more parsimonious models. Models with exogenous variables are very interesting in the context where it is known that the variables under study are affected by other variables, playing the role of explanatory variables, not necessarily modeled by the analyst. This Master’s
Thesis therefore proposes to study vector subset autoregressive models with exogenous variables. These models have been explored, among others, by Lütkepohl (2005), who merely sketches proofs of the statistical properties. We focus our study on statistical inference under precise conditions, modeling and forecasting for these models. Our goal is to compare
restricted models to full classical autoregressive models with exogenous variables. We want to determine whether the use of restricted models is marked by a favorable difference in the bias and standard deviation properties of the estimators as well as in forecasting future values. We also compare the efficiency of index selection in autoregressive models with exogenous variables to that in autoregressive models. It should be noted that a primary motivation in this Master’s Thesis is the estimation in subset autoregressive models with exogenous variables.
In the first chapter, we present time series as well as the various concepts which are attached to them. In addition, we present the classical multivariate linear models, models with exogenous variables and then we present subset models. In the second chapter, we present the theoretical framework for least squares estimation in subset autoregressive models as well as the asymptotic behavior of the estimator. Then, we develop the theory for the estimation of least squares (LS) as well as the asymptotic distribution of the estimators for the subset autoregressive models (SVAR), and we do the same for the subset autoregressive models with exogenous variables (SVARX). Specifically, we establish the convergence as well as the asymptotic distribution for the least squares estimator of a subset autoregressive process with exogenous variables. In the third chapter, we apply the theory specified above in Monte Carlo simulations. We evaluate empirically the biases
and the standard deviations of the coefficients found during the estimation as well as the proportion of times that the adjusted model matches the true model for different selection criteria, sample size and data generating processes. In the fourth chapter, we apply the theory developed in chapters 1 and 2 to a real dataset from the Canadian Socio-Economic
Information System (CANSIM) consisting of the monthly production of mozzarella, cheddar and ricotta cheese in Canada, explained by the monthly prices of unprocessed bovine milk in the provinces of Quebec, Ontario and British Columbia from January 2003 to July 2021. We fit these data with a full autoregressive model with exogenous variables and then to a
subset autoregressive model with exogenous variables. Afterwards, we compare the results obtained with the complete model to those obtained with the subset model.
Keywords : Subset autoregressive process, exogenous variables, least squares estimation,
model selection, multivariate time series, stochastic process, time series.
|
58 |
Real-time Classification of Multi-sensor Signals with Subtle Disturbances Using Machine Learning : A threaded fastening assembly case study / Realtidsklassificering av multi-sensorsignaler med små störningar med hjälp av maskininlärning : En fallstudie inom åtdragningsmonteringOlsson, Theodor January 2021 (has links)
Sensor fault detection is an actively researched area and there are a plethora of studies on sensor fault detection in various applications such as nuclear power plants, wireless sensor networks, weather stations and nuclear fusion. However, there does not seem to be any study focusing on detecting sensor faults in the threaded fastening assembly application. Since the threaded fastening tools use torque and angle measurements to determine whether or not a screw or bolt has been fastened properly, faulty measurements from these sensors can have dire consequences. This study aims to investigate the use of machine learning to detect a subtle kind of sensor faults, common in this application, that are difficult to detect using canonical model-based approaches. Because of the subtle and infrequent nature of these faults, a two-stage system was designed. The first component of this system is given sensor data from a tightening and then tries to classify each data point in the sensor data as normal or faulty using a combination of low-pass filtering to generate residuals and a support vector machine to classify the residual points. The second component uses the output from the first one to determine if the complete tightening is normal or faulty. Despite the modest performance of the first component, with the best model having an F1-score of 0.421 for classifying data points, the design showed promising performance for classifying the tightening signals, with the best model having an F1-score of 0.976. These results indicate that there indeed exist patterns in these kinds of torque and angle multi-sensor signals that make machine learning a feasible approach to classify them and detect sensor faults. / Sensorfeldetektering är för nuvarande ett aktivt forskningsområde med mängder av studier om feldetektion i olika applikationer som till exempel kärnkraft, trådlösa sensornätverk, väderstationer och fusionskraft. Ett applikationsområde som inte verkar ha undersökts är det inom åtdragningsmontering. Eftersom verktygen inom åtdragningsmontering använder mätvärden på vridmoment och vinkel för att avgöra om en skruv eller bult har dragits åt tillräckligt kan felaktiga mätvärden från dessa sensorer få allvarliga konsekvenser. Målet med denna studie är att undersöka om det går att använda maskininlärning för att detektera en subtil sorts sensorfel som är vanlig inom åtdragningsmontering och har visat sig vara svåra att detektera med konventionella modell-baserade metoder. I och med att denna typ av sensorfel är både subtila och infrekventa designades ett system bestående av två komponenter. Den första får sensordata från en åtdragning och försöker klassificera varje datapunkt som antingen normal eller onormal genom att uttnyttja en kombination av lågpassfiltrering för att generera residualer och en stödvektormaskin för att klassificera dessa. Den andra komponenten använder resultatet från den första komponenten för att avgöra om hela åtdragningen ska klassificeras som normal eller onormal. Trots att den första komponenten hade ett ganska blygsamt resultat på att klassificera datapunkter så visade systemet som helhet mycket lovande resultat på att klassificera hela åtdragningar. Dessa resultat indikerar det finns mönster i denna typ av sensordata som gör maskininlärning till ett lämpligt verktyg för att klassificera datat och detektera sensorfel.
|
59 |
A Deep Learning Approach to Predicting the Length of Stay of Newborns in the Neonatal Intensive Care Unit / En djupinlärningsstrategi för att förutsäga vistelsetiden för nyfödda i neonatala intensivvårdsavdelingenStraathof, Bas Theodoor January 2020 (has links)
Recent advancements in machine learning and the widespread adoption of electronic healthrecords have enabled breakthroughs for several predictive modelling tasks in health care. One such task that has seen considerable improvements brought by deep neural networks is length of stay (LOS) prediction, in which research has mainly focused on adult patients in the intensive care unit. This thesis uses multivariate time series extracted from the publicly available Medical Information Mart for Intensive Care III database to explore the potential of deep learning for classifying the remaining LOS of newborns in the neonatal intensive care unit (NICU) at each hour of the stay. To investigate this, this thesis describes experiments conducted with various deep learning models, including long short-term memory cells, gated recurrentunits, fully-convolutional networks and several composite networks. This work demonstrates that modelling the remaining LOS of newborns in the NICU as a multivariate time series classification problem naturally facilitates repeated predictions over time as the stay progresses and enables advanced deep learning models to outperform a multinomial logistic regression baseline trained on hand-crafted features. Moreover, it shows the importance of the newborn’s gestational age and binary masks indicating missing values as variables for predicting the remaining LOS. / Framstegen inom maskininlärning och det utbredda införandet av elektroniska hälsoregister har möjliggjort genombrott för flera prediktiva modelleringsuppgifter inom sjukvården. En sådan uppgift som har sett betydande förbättringar förknippade med djupa neurala nätverk är förutsägelsens av vistelsetid på sjukhus, men forskningen har främst inriktats på vuxna patienter i intensivvården. Den här avhandlingen använder multivariata tidsserier extraherade från den offentligt tillgängliga databasen Medical Information Mart for Intensive Care III för att undersöka potentialen för djup inlärning att klassificera återstående vistelsetid för nyfödda i den neonatala intensivvårdsavdelningen (neonatal-IVA) vid varje timme av vistelsen. Denna avhandling beskriver experiment genomförda med olika djupinlärningsmodeller, inklusive longshort-term memory, gated recurrent units, fully-convolutional networks och flera sammansatta nätverk. Detta arbete visar att modellering av återstående vistelsetid för nyfödda i neonatal-IVA som ett multivariat tidsserieklassificeringsproblem på ett naturligt sätt underlättar upprepade förutsägelser över tid och gör det möjligt för avancerade djupa inlärningsmodeller att överträffaen multinomial logistisk regressionsbaslinje tränad på handgjorda funktioner. Dessutom visar det vikten av den nyfödda graviditetsåldern och binära masker som indikerar saknade värden som variabler för att förutsäga den återstående vistelsetiden.
|
60 |
Neural Ordinary Differential Equations for Anomaly Detection / : Neurala Ordinära Differentialekvationer för AnomalidetektionHlöðver Friðriksson, Jón, Ågren, Erik January 2021 (has links)
Today, a large amount of time series data is being produced from a variety of different devices such as smart speakers, cell phones and vehicles. This data can be used to make inferences and predictions. Neural network based methods are among one of the most popular ways to model time series data. The field of neural networks is constantly expanding and new methods and model variants are frequently introduced. In 2018, a new family of neural networks was introduced. Namely, Neural Ordinary Differential Equations (Neural ODEs). Neural ODEs have shown great potential in modelling the dynamics of temporal data. Here we present an investigation into using Neural Ordinary Differential Equations for anomaly detection. We tested two model variants, LSTM-ODE and latent-ODE. The former model utilises a neural ODE to model the continuous-time hidden state in between observations of an LSTM model, the latter is a variational autoencoder that uses the LSTM-ODE as encoding and a Neural ODE as decoding. Both models are suited for modelling sparsely and irregularly sampled time series data. Here, we test their ability to detect anomalies on various sparsity and irregularity ofthe data. The models are compared to a Gaussian mixture model, a vanilla LSTM model and an LSTM variational autoencoder. Experimental results using the Human Activity Recognition dataset showed that the Neural ODEbased models obtained a better ability to detect anomalies compared to their LSTM based counterparts. However, the computational training cost of the Neural ODE models were considerably higher than for the models that onlyutilise the LSTM architecture. The Neural ODE based methods were also more memory consuming than their LSTM counterparts. / Idag produceras en stor mängd tidsseriedata från en mängd olika enheter som smarta högtalare, mobiltelefoner och fordon. Denna datan kan användas för att dra slutsatser och förutsägelser. Neurala nätverksbaserade metoder är bland de mest populära sätten att modellera tidsseriedata. Mycket forskning inom området neurala nätverk pågår och nya metoder och modellvarianter introduceras ofta. Under 2018 introducerades en ny familj av neurala nätverk. Nämligen, Neurala Ordinära Differentialekvationer (NeuralaODE:er). Neurala ODE:er har visat en stor potential i att modellera dynamiken hos temporal data. Vi presenterar här en undersökning i att använda neuralaordinära differentialekvationer för anomalidetektion. Vi testade två olika modellvarianter, en som kallas LSTM-ODE och en annan som kallas latent-ODE.Den förstnämnda använder Neurala ODE:er för att modellera det kontinuerliga dolda tillståndet mellan observationer av en LSTM-modell, den andra är en variational autoencoder som använder LSTM-ODE som kodning och en Neural ODE som avkodning. Båda dessa modeller är lämpliga för att modellera glest och oregelbundet samplade tidsserier. Därför testas deras förmåga att upptäcka anomalier på olika gleshet och oregelbundenhet av datan. Modellerna jämförs med en gaussisk blandningsmodell, en vanlig LSTM modell och en LSTM variational autoencoder. Experimentella resultat vid användning av datasetet Human Activity Recognition (HAR) visade att de Neurala ODE-baserade modellerna erhöll en bättre förmåga att upptäcka avvikelser jämfört med deras LSTM-baserade motsvarighet. Träningstiden förde Neurala ODE-baserade modellerna var dock betydligt långsammare än träningstiden för deras LSTM-baserade motsvarighet. Neurala ODE-baserade metoder krävde också mer minnesanvändning än deras LSTM motsvarighet.
|
Page generated in 0.113 seconds