Global ETD Search

1	Canonical auto and cross correlations of multivariate time series Woolf Bulach, Marcia January 1997 (has links) No description available. 519.5 Multivariate time series
2	Learning to Recognize Agent Activities and Intentions Kerr, Wesley January 2010 (has links) Psychological research has demonstrated that subjects shown animations consisting of nothing more than simple geometric shapes perceive the shapes as being alive, having goals and intentions, and even engaging in social activities such as chasing and evading one another. While the subjects could not directly perceive affective state, motor commands, or the beliefs and intentions of the actors in the animations, they still used intentional language to describe the moving shapes. The purpose of this dissertation is to design, develop, and evaluate computational representations and learning algorithms that learn to recognize the behaviors of agents as they perform and execute different activities. These activities take place within simulations, both 2D and 3D. Our goal is to add as little hand-crafted knowledge to the representation as possible and to produce algorithms that perform well over a variety of different activity types. Any patterns found in similar activities should be discovered by the learning algorithm and not by us, the designers. In addition, we demonstrate that if an artificial agent learns about activities through participation, where it has access to its own internal affective state, motor commands, etc., it can then infer the unobservable affective state of other agents. activity recognition artificial intelligence classification data mining multivariate time series
3	Modeling time-series with deep networks Längkvist, Martin January 2014 (has links) No description available. multivariate time-series deep learning representation learning unsupervised
4	Forecasting errors, directional accuracy and profitability of currency trading: The case of EUR/USD exchange rate Costantini, Mauro, Crespo Cuaresma, Jesus, Hlouskova, Jaroslava January 2016 (has links) (PDF) We provide a comprehensive study of out-of-sample forecasts for the EUR/USD exchange rate based on multivariate macroeconomic models and forecast combinations. We use profit maximization measures based on directional accuracy and trading strategies in addition to standard loss minimization measures. When comparing predictive accuracy and profit measures, data snooping bias free tests are used. The results indicate that forecast combinations, in particular those based on principal components of forecasts, help to improve over benchmark trading strategies, although the excess return per unit of deviation is limited.
5	Time series data mining in systems biology Tapinos, Avraam January 2013 (has links) Analysis of time series data constitutes an important activity in many scientific disciplines. Over the last years there has been an increase in the collection of time series data in all scientific fields and disciplines, such as the industry and engineering. Due to the increasing size of the time series datasets, new automated time series data mining techniques have been devised for comparing time series data and present information in a logical and easily comprehensible structure.In systems biology in particular, time series are used to the study biological systems. The time series representations of a systems’ dynamics behaviour are multivariate time series. Time series are considered multivariate when they contain observations for more than one variable component. The biological systems’ dynamics time series contain observations for every feature component that is included in the system; they thus are multivariate time series. Recently, there has been an increasing interest in the collection of biological time series. It would therefore be beneficial for systems biologist to be able to compare these multivariate time series.Over the last decade, the field of time series analysis has attracted the attention of people from different scientific disciplines. A number of researchers from the data mining community focus their efforts on providing solutions on numerous problems regarding different time series data mining tasks. Different methods have been proposed for instance, for comparing, indexing and clustering, of univariate time series. Furthermore, different methods have been proposed for creating abstract representations of time series data and investigating the benefits of using these representations for data mining tasks.The introduction of more advanced computing resources facilitated the collection of multivariate time series, which has become common practise in various scientific fields. The increasing number of multivariate time series data triggered the demand for methods to compare them. A small number of well-suited methods have been proposed for comparing these multivariate time series data.All the currently available methods for multivariate time series comparison are more than adequate for comparing multivariate time series with the same dimensionality. However, they all suffer the same drawback. Current techniques cannot process multivariate time series with different dimensions. A proposed solution for comparing multivariate time series with arbitrary dimensions requires the creation of weighted averages. However, the accumulation of weights data is not always feasible.In this project, a new method is proposed which enables the comparison of multivariate time series with arbitrary dimensions. The particular method is evaluated on multivariate time series from different disciplines in order to test the methods’ applicability on data from different fields of science and industry. Lastly, the newly formed method is applied to perform different time series data mining analyses on a set of biological data. 006.3
6	Explainable and Network-based Approaches for Decision-making in Emergency Management Tabassum, Anika 19 October 2021 (has links) Critical Infrastructures (CIs), such as power, transportation, healthcare, etc., refer to systems, facilities, technologies, and networks vital to national security, public health, and socio-economic well-being of people. CIs play a crucial role in emergency management. For example, the recent Hurricane Ida, Texas Winter storm, colonial cyber-attack that occurred during 2021 in the US, shows the CIs are highly inter-dependent with complex interactions. Hence power system failures and shutdown of natural gas pipelines, in turn, led to debilitating impacts on communication, waste systems, public health, etc. Consider power failures during a disaster, such as a hurricane. Subject Matter Experts (SMEs) such as emergency management authorities may be interested in several decision-making tasks. Can we identify disaster phases in terms of the severity of damage from analyzing changes in power failures? Can we tell the SMEs which power grids or regions are the most affected during each disaster phase and need immediate action to recover? Answering these questions can help SMEs to respond quickly and send resources for fast recovery from damage. Can we systematically provide how the failure of different power grids may impact the whole CIs due to inter-dependencies? This can help SMEs to better prepare and mitigate the risks by improving system resiliency. In this thesis, we explore problems to efficiently operate decision-making tasks during a disaster for emergency management authorities. Our research has two primary directions, guide decision-making in resource allocation and plans to improve system resiliency. Our work is done in collaboration with the Oak Ridge National Laboratory to contribute impactful research in real-life CIs and disaster power failure data. 1. Explainable resource allocation: In contrast to the current interpretable or explainable model that provides answers to understand a model output, we view explanations as answers to guide resource allocation decision-making. In this thesis, we focus on developing a novel model and algorithm to identify disaster phases from changes in power failures. Also, pinpoint the regions which can get most affected at each disaster phase so the SMEs can send resources for fast recovery. 2. Networks for improving system resiliency: We view CIs as a large heterogeneous network with nodes as infrastructure components and dependencies as edges. Our goal is to construct a visual analytic tool and develop a domain-inspired model to identify the important components and connections to which the SMEs need to focus and better prepare to mitigate the risk of a disaster. / Doctor of Philosophy / Critical Infrastructure Systems (CIs) entitle multiple infrastructures valuable for maintaining public life and national security, e.g., power, water, transportation. US Federal Emergency Management Agency (FEMA) aims to protect the nation and citizens by mitigating all hazards during natural or man-made disasters. For this, they aim to adopt different decision-making strategies efficiently. E.g., During an ongoing disaster, when to quickly send resources, which regions to send resources first, etc. Besides, they also need to plan how to prepare for a future disaster and which CIs need maintenance to improve system resiliency. We explore several data-mining problems which can guide FEMA towards developing efficient decision-making strategies. Our thesis emphasizes explainable and network-based models and algorithms to help decision-making operations for emergency management experts by leveraging critical infrastructures data. Critical Infrastructure Urban analytics Multivariate Time-series Urban-Net Explanations
7	Analysis of construction cost variations using macroeconomic, energy and construction market variables Shahandashti, Seyed Mohsen 27 August 2014 (has links) Recently, construction cost variations have been larger and less predictable. These variations are apparent in trends of indices such as Engineering News Record (ENR) Construction Cost Index (CCI) and National Highway Construction Cost Index (NHCCI). These variations are problematic for cost estimation, bid preparation and investment planning. Inaccurate cost estimation can result in bid loss or profit loss for contractors and hidden price contingencies, delayed or cancelled projects, inconsistency in budgets and unsteady flow of projects for owner organizations. Cost variation has become a major concern in all industry sectors, such as infrastructure, heavy industrial, light industrial, and building. The major problem is that construction cost is subject to significant variations that are difficult to forecast. The objectives of this dissertation are to identify the leading indicators of CCI and NHCCI from existing macroeconomic, energy and construction market variables and create appropriate models to use the information in past values of CCI and NHCCI and their leading indicators in order to forecast CCI and NHCCI more accurately than existing CCI and NHCCI forecasting models. A statistical approach based on multivariate time series analysis is used as the main research approach. The first step is to identify leading indicators of construction cost variations. A pool of 16 candidate (potential) leading indicators is initially selected based on a comprehensive literature review about construction cost variations. Then, the leading indicators of CCI are identified from the pool of candidate leading indicators using empirical tests including correlation tests, unit root tests, and Granger causality tests. The identified leading indicators represent the macroeconomic and construction market context in which the construction cost is changing. Based on the results of statistical tests, several multivariate time series models are created and compared with existing models for forecasting CCI. These models take advantage of contextual information about macroeconomic condition, energy price and construction market for forecasting CCI accurately. These multivariate time series models are rigorously diagnosed using statistical tests including Breusch-Godfrey serial correlation Lagrange multiplier tests and Autoregressive conditional heteroskedasticity (ARCH) tests. They are also compared with each other and other existing models. Comparison is based on two typical error measures: out-of-sample mean absolute prediction error and out-of-sample mean squared error. Based on the unit root tests and Granger causality tests, consumer price index, crude oil price, producer price index, housing starts and building permits are selected as leading indicators of CCI. In other words, past values of these variables contain information that is useful for forecasting CCI. Based on the results of cointegration tests, Vector Error Correction (VEC) models are created as proper multivariate time series models to forecast CCI. Our results show that the multivariate time series model including CCI and crude oil price pass diagnostic tests successfully. It is also more accurate than existing models for forecasting CCI in terms of out-of-sample mean absolute prediction error and out-of-sample mean square error. The predictability of the multivariate time series modeling for forecasting CCI is also evaluated using stochastically simulated data (Simulated CCI and crude oil price). First, 50 paths of crude oil price are created using Geometric Brownian Motion (GBM). Then, 50 paths of CCI are created using Gaussian Process that is considering the relationship between CCI and crude oil price over time. Finally, 50 multivariate and univariate time series models are created using the simulated data and the predictability of univariate and multivariate time series models are compared. The results show that the multivariate modeling is more accurate than univariate modeling for forecasting simulated CCI. The sensitivity of the models to inputs is also examined by adding errors to the simulated data and conducting sensitivity analysis. The proposed approach is also implemented for identifying the leading indicators of NHCCI from the pool of candidate leading indicators and creating appropriate multivariate forecasting models that use the information in past values of NHCCI and its leading indicators. Based on the unit root tests and Granger causality tests, crude oil price and average hourly earnings in the construction industry are selected as leading indicators of NHCCI. In other words, past values of these variables contain information that is useful for forecasting NHCCI. Based on the results of cointegration tests, Vector Error Correction (VEC) models are created as the proper multivariate time series models to forecast NHCCI. The results show that the VEC model including NHCCI and crude oil price, and the VEC model including NHCCI, crude oil price, and average hourly earnings pass diagnostic tests. These VEC models are also more accurate than the univariate models for forecasting NHCCI in terms of out-of-sample prediction error and out-of-sample mean square error. The findings of this dissertation contribute to the body of knowledge in construction cost forecasting by rigorous identification of the leading indicators of construction cost variations and creation of multivariate time series models that are more accurate than the existing models for forecasting construction cost variations. It is expected that proposed forecasting models enhance the theory and practice of construction cost forecasting and help cost engineers and capital planners prepare more accurate bids, cost estimates and budgets for capital projects. Construction cost variations Multivariate time series models Macroeconomic
8	Modely časových řad s exogenními proměnnými a jejich aplikace na ekonomická data / Time series models with exogenous variables and their application to economical data Vaverová, Jana January 2015 (has links) This thesis deals with analyzing multivariate financial and economical data. The first section describes the theory of multivariate time series and multivariate ARMA models. The second part deals with some models with exogenous variables such as simultaneous equations models and ARMAX model. In the final chapter, the described theory is applied to analyze the reciprocal dependence of time series of inflation rates and dependence of inflation rates on various macroeconomical indicators. The results were obtained by software Mathematica 8, Mathematica 10, EViews and R. Powered by TCPDF (www.tcpdf.org)
9	Statistical methods for certain large, complex data challenges Li, Jun 15 November 2018 (has links) Big data concerns large-volume, complex, growing data sets, and it provides us opportunities as well as challenges. This thesis focuses on statistical methods for several specific large, complex data challenges - each involving representation of data with complex format, utilization of complicated information, and/or intensive computational cost. The first problem we work on is hypothesis testing for multilayer network data, motivated by an example in computational biology. We show how to represent the complex structure of a multilayer network as a single data point within the space of supra-Laplacians and then develop a central limit theorem and hypothesis testing theories for multilayer networks in that space. We develop both global and local testing strategies for mean comparison and investigate sample size requirements. The methods were applied to the motivating computational biology example and compared with the classic Gene Set Enrichment Analysis(GSEA). More biological insights are found in this comparison. The second problem is the source detection problem in epidemiology, which is one of the most important issues for control of epidemics. Ideally, we want to locate the sources based on all history data. However, this is often infeasible, because the history data is complex, high-dimensional and cannot be fully observed. Epidemiologists have recognized the crucial role of human mobility as an important proxy to a complete history, but little in the literature to date uses this information for source detection. We recast the source detection problem as identifying a relevant mixture component in a multivariate Gaussian mixture model. Human mobility within a stochastic PDE model is used to calibrate the parameters. The capability of our method is demonstrated in the context of the 2000-2002 cholera outbreak in the KwaZulu-Natal province. The third problem is about multivariate time series imputation, which is a classic problem in statistics. To address the common problem of low signal-to-noise ratio in high-dimensional multivariate time series, we propose models based on state-space models which provide more precise inference of missing values by clustering multivariate time series components in a nonparametric way. The models are suitable for large-scale time series due to their efficient parameter estimation. / 2019-05-15T00:00:00Z Statistics Big data Hypothesis testing Multivariate time series imputation Source detection Statistical network analysis
10	Shrinkage methods for multivariate spectral analysis Böhm, Hilmar 29 January 2008 (has links) In spectral analysis of high dimensional multivariate time series, it is crucial to obtain an estimate of the spectrum that is both numerically well conditioned and precise. The conventional approach is to construct a nonparametric estimator by smoothing locally over the periodogram matrices at neighboring Fourier frequencies. Despite being consistent and asymptotically unbiased, these estimators are often ill-conditioned. This is because a kernel smoothed periodogram is a weighted sum over the local neighborhood of periodogram matrices, which are each of rank one. When treating high dimensional time series, the result is a bad ratio between the smoothing span, which is the effective local sample size of the estimator, and dimension. In classification, clustering and discrimination, and in the analysis of non-stationary time series, this is a severe problem, because inverting an estimate of the spectrum is unavoidable in these contexts. Areas of application like neuropsychology, seismology and econometrics are affected by this theoretical problem. We propose a new class of nonparametric estimators that have the appealing properties of simultaneously having smaller L2-risk than the smoothed periodogram and being numerically more stable due to a smaller condition number. These estimators are obtained as convex combinations of the averaged periodogram and a shrinkage target. The choice of shrinkage target depends on the availability of prior knowledge on the cross dimensional structure of the data. In the absence of any information, we show that a multiple of the identity matrix is the best choice. By shrinking towards identity, we trade the asymptotic unbiasedness of the averaged periodogram for a smaller mean-squared error. Moreover, the eigenvalues of this shrinkage estimator are closer to the eigenvalues of the real spectrum, rendering it numerically more stable and thus more appropriate for use in classification. These results are derived under a rigorous general asymptotic framework that allows for the dimension p to grow with the length of the time series T. Under this framework, the averaged periodogram even ceases to be consistent and has asymptotically almost surely higher L2-risk than our shrinkage estimator. Moreover, we show that it is possible to incorporate background knowledge on the cross dimensional structure of the data in the shrinkage targets. We derive an exemplary instance of a custom-tailored shrinkage target in the form of a one factor model. This offers a new answer to problems of model choice: instead of relying on information criteria such as AIC or BIC for choosing the order of a model, the minimum order model can be used as a shrinkage target and combined with a non-parametric estimator of the spectrum, in our case the averaged periodogram. Comprehensive Monte Carlo studies we perform show the overwhelming gain in terms of L2-risk of our shrinkage estimators, even for very small sample size. We also give an overview of regularization techniques that have been designed for iid data, such as ridge regression or sparse pca, and show the interconnections between them. Factor model Spectral analysis Regularization Condition number Shrinkage Multivariate time series

Search results