Global ETD Search

361	Adaptively-Halting RNN for Tunable Early Classification of Time Series Hartvigsen, Thomas 11 November 2018 (has links) Early time series classification is the task of predicting the class label of a time series before it is observed in its entirety. In time-sensitive domains where information is collected over time it is worth sacrificing some classification accuracy in favor of earlier predictions, ideally early enough for actions to be taken. However, since accuracy and earliness are contradictory objectives, a solution to this problem must find a task-dependent trade-off. There are two common state-of-the-art methods. The first involves an analyst selecting a timestep at which all predictions must be made. This does not capture earliness on a case-by-case basis, so if the selecting timestep is too early, all later signals are missed, and if a signal happens early, the classifier still waits to generate a prediction. The second method is the exhaustive search for signals, which encodes no timing information and is not scalable to high dimensions or long time series. We design the first early classification model called EARLIEST to tackle this multi-objective optimization problem, jointly learning (1) to decide at which time step to halt and generate predictions and (2) how to classify the time series. Each of these is learned based on the task and data features. We achieve an analyst-controlled balance between the goals of earliness and accuracy by pairing a recurrent neural network that learns to classify time series as a supervised learning task with a stochastic controller network that learns a halting-policy as a reinforcement learning task. The halting-policy dictates sequential decisions, one per timestep, of whether or not to halt the recurrent neural network and classify the time series early. This pairing of networks optimizes a global objective function that incorporates both earliness and accuracy. We validate our method via critical clinical prediction tasks in the MIMIC III database from the Beth Israel Deaconess Medical Center along with another publicly available time series classification dataset. We show that EARLIEST out-performs two state-of-the-art LSTM-based early classification methods. Additionally, we dig deeper into our model's performance using a synthetic dataset which shows that EARLIEST learns to halt when it observes signals without having explicit access to signal locations. The contributions of this work are three-fold. First, our method is the first neural network-based solution to early classification of time series, bringing the recent successes of deep learning to this problem. Second, we present the first reinforcement-learning based solution to the unsupervised nature of early classification, learning the underlying distributions of signals without access to this information through trial and error. Third, we propose the first joint-optimization of earliness and accuracy, allowing learning of complex relationships between these contradictory goals.
362	Fast, Scalable, and Accurate Algorithms for Time-Series Analysis Paparrizos, Ioannis January 2018 (has links) Time is a critical element for the understanding of natural processes (e.g., earthquakes and weather) or human-made artifacts (e.g., stock market and speech signals). The analysis of time series, the result of sequentially collecting observations of such processes and artifacts, is becoming increasingly prevalent across scientific and industrial applications. The extraction of non-trivial features (e.g., patterns, correlations, and trends) in time series is a critical step for devising effective time-series mining methods for real-world problems and the subject of active research for decades. In this dissertation, we address this fundamental problem by studying and presenting computational methods for efficient unsupervised learning of robust feature representations from time series. Our objective is to (i) simplify and unify the design of scalable and accurate time-series mining algorithms; and (ii) provide a set of readily available tools for effective time-series analysis. We focus on applications operating solely over time-series collections and on applications where the analysis of time series complements the analysis of other types of data, such as text and graphs. For applications operating solely over time-series collections, we propose a generic computational framework, GRAIL, to learn low-dimensional representations that natively preserve the invariances offered by a given time-series comparison method. GRAIL represents a departure from classic approaches in the time-series literature where representation methods are agnostic to the similarity function used in subsequent learning processes. GRAIL relies on the attractive idea that once we construct the data-to-data similarity matrix most time-series mining tasks can be trivially solved. To overcome scalability issues associated with approaches relying on such matrices, GRAIL exploits time-series clustering to construct a small set of landmark time series and learns representations to reduce the data-to-data matrix to a data-to-landmark points matrix. To demonstrate the effectiveness of GRAIL, we first present domain-independent, highly accurate, and scalable time-series clustering methods to facilitate exploration and summarization of time-series collections. Then, we show that GRAIL representations, when combined with suitable methods, significantly outperform, in terms of efficiency and accuracy, state-of-the-art methods in major time-series mining tasks, such as querying, clustering, classification, sampling, and visualization. Overall, GRAIL rises as a new primitive for highly accurate, yet scalable, time-series analysis. For applications where the analysis of time series complements the analysis of other types of data, such as text and graphs, we propose generic, simple, and lightweight methodologies to learn features from time-varying measurements. Such applications often organize operations over different types of data in a pipeline such that one operation provides input---in the form of feature vectors---to subsequent operations. To reason about the temporal patterns and trends in the underlying features, we need to (i) track the evolution of features over different time periods; and (ii) transform these time-varying features into actionable knowledge (e.g., forecasting an outcome). To address this challenging problem, we propose principled approaches to model time-varying features and study two large-scale, real-world, applications. Specifically, we first study the problem of predicting the impact of scientific concepts through temporal analysis of characteristics extracted from the metadata and full text of scientific articles. Then, we explore the promise of harnessing temporal patterns in behavioral signals extracted from web search engine logs for early detection of devastating diseases. In both applications, combinations of features with time-series relevant features yielded the greatest impact than any other indicator considered in our analysis. We believe that our simple methodology, along with the interesting domain-specific findings that our work revealed, will motivate new studies across different scientific and industrial settings. Computer science Algorithms Time-series analysis
363	Fourier expansions for Eisenstein series twisted by modular symbols and the distribution of multiples of real points on an elliptic curve Cowan, Alexander January 2019 (has links) This thesis consists of two unrelated parts. In the first part of this thesis, we give explicit expressions for the Fourier coefficients of Eisenstein series E∗(z, s, χ) twisted by modular symbols ⟨γ, f⟩ in the case where the level of f is prime and equal to the conductor of the Dirichlet character χ. We obtain these expressions by computing the spectral decomposition of an automorphic function closely related to E∗(z, s, χ). We then give applications of these expressions. In particular, we evaluate sums such as Σχ(γ)⟨γ, f⟩, where the sum is over γ ∈ Γ∞\Γ0(N) with c^2 + d^2 < X, with c and d being the lower-left and lower-right entries of γ respectively. This parallels past work of Goldfeld, Petridis, and Risager, and we observe that these sums exhibit different amounts of cancellation than what one might expect. In the second part of this thesis, given an elliptic curve E and a point P in E(R), we investigate the distribution of the points nP as n varies over the integers, giving bounds on the x and y coordinates of nP and determining the natural density of integers n for which nP lies in an arbitrary open subset of {R}^2. Our proofs rely on a connection to classical topics in the theory of Diophantine approximation. Mathematics Diophantine approximation Fourier series Curves, Elliptic
364	Similarity searching in sequence databases under time warping. January 2004 (has links) Wong, Siu Fung. / Thesis submitted in: December 2003. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2004. / Includes bibliographical references (leaves 77-84). / Abstracts in English and Chinese. / Abstract --- p.ii / Acknowledgement --- p.vi / Chapter 1 --- Introduction --- p.1 / Chapter 2 --- Preliminary --- p.6 / Chapter 2.1 --- Dynamic Time Warping (DTW) --- p.6 / Chapter 2.2 --- Spatial Indexing --- p.10 / Chapter 2.3 --- Relevance Feedback --- p.11 / Chapter 3 --- Literature Review --- p.13 / Chapter 3.1 --- Searching Sequences under Euclidean Metric --- p.13 / Chapter 3.2 --- Searching Sequences under Dynamic Time Warping Metric --- p.17 / Chapter 4 --- Subsequence Matching under Time Warping --- p.21 / Chapter 4.1 --- Subsequence Matching --- p.22 / Chapter 4.1.1 --- Sequential Search --- p.22 / Chapter 4.1.2 --- Indexing Scheme --- p.23 / Chapter 4.2 --- Lower Bound Technique --- p.25 / Chapter 4.2.1 --- Properties of Lower Bound Technique --- p.26 / Chapter 4.2.2 --- Existing Lower Bound Functions --- p.27 / Chapter 4.3 --- Point-Based indexing --- p.28 / Chapter 4.3.1 --- Lower Bound for subsequences matching --- p.28 / Chapter 4.3.2 --- Algorithm --- p.35 / Chapter 4.4 --- Rectangle-Based indexing --- p.37 / Chapter 4.4.1 --- Lower Bound for subsequences matching --- p.37 / Chapter 4.4.2 --- Algorithm --- p.41 / Chapter 4.5 --- Experimental Results --- p.43 / Chapter 4.5.1 --- Candidate ratio vs Width of warping window --- p.44 / Chapter 4.5.2 --- CPU time vs Number of subsequences --- p.45 / Chapter 4.5.3 --- CPU time vs Width of warping window --- p.46 / Chapter 4.5.4 --- CPU time vs Threshold --- p.46 / Chapter 4.6 --- Summary --- p.47 / Chapter 5 --- Relevance Feedback under Time Warping --- p.49 / Chapter 5.1 --- Integrating Relevance Feedback with DTW --- p.49 / Chapter 5.2 --- Query Reformulation --- p.53 / Chapter 5.2.1 --- Constraint Updating --- p.53 / Chapter 5.2.2 --- Weight Updating --- p.55 / Chapter 5.2.3 --- Overall Strategy --- p.58 / Chapter 5.3 --- Experiments and Evaluation --- p.59 / Chapter 5.3.1 --- Effectiveness of the strategy --- p.61 / Chapter 5.3.2 --- Efficiency of the strategy --- p.63 / Chapter 5.3.3 --- Usability --- p.64 / Chapter 5.4 --- Summary --- p.71 / Chapter 6 --- Conclusion --- p.72 / Chapter A --- Deduction of Data Bounding Hyper-rectangle --- p.74 / Chapter B --- Proof of Theorem2 --- p.76 / Bibliography --- p.77 / Publications --- p.84 Time-series analysis Database searching Information retrieval
365	Structural breaks estimation methods for time series data. January 2007 (has links) Kong, Cheuk Kwan. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2007. / Includes bibliographical references (leaves 42-44). / Abstracts in English and Chinese. / Chapter 1 --- Introduction --- p.1 / Chapter 2 --- Modelling Piecewise AR model --- p.4 / Chapter 2.1 --- Background --- p.4 / Chapter 2.2 --- Introduction to Auto-FARM --- p.5 / Chapter 2.3 --- Minimum Description Length --- p.6 / Chapter 2.4 --- Genetic Algorithm --- p.9 / Chapter 2.5 --- Reproduction Rules --- p.10 / Chapter 3 --- Bayesian-SCAD Approach --- p.14 / Chapter 3.1 --- Estimation via Penalty Function --- p.15 / Chapter 3.2 --- Introduction to SCAD --- p.17 / Chapter 3.3 --- Local Quadratic Approximation of SCAD --- p.20 / Chapter 3.4 --- Bayesian Formulation and GA Implementation --- p.22 / Chapter 4 --- Simulation Study --- p.25 / Chapter 4.1 --- Piecewise AR Process from Davis et al. (2006) --- p.25 / Chapter 4.2 --- Piecewise Seasonal AR Process --- p.29 / Chapter 5 --- Real Data Analysis --- p.33 / Chapter 5.1 --- Description and Source of Data --- p.33 / Chapter 5.2 --- Model Fitting --- p.36 / Chapter 5.3 --- Prediction Results --- p.39 / Chapter 6 --- Conclusion --- p.40 / Bibliography --- p.42 Time-series analysis Estimation theory Autoregression (Statistics)
366	Efficient similarity search in time series data. / CUHK electronic theses & dissertations collection January 2007 (has links) Time series data is ubiquitous in real world, and the similarity search in time series data is of great importance to many applications. This problem consists of two major parts: how to define the similarity between time series and how to search for similar time series efficiently. As for the similarity measure, the Euclidean distance is a good starting point; however, it also has several limitations. First, it is sensitive to the shifting and scaling transformations. Under a geometric model, we analyze this problem extensively and propose an angle-based similarity measure which is invariant to the shifting and scaling transformations. We then extend the conical index to support for the proposed angle-based similarity measure efficiently. Besides the distortions in amplitude axis, the Euclidean distance is also sensitive to the distortion in time axis; Dynamic Time Warping (DTW) distance is a very good similarity measure which is invariant to the time distortion. However, the time complexity of DTW is high which inhibits its application on large datasets. The index method under DTW distance is a common solution for this problem, and the lower-bound technique plays an important role in the indexing of DTW. We explain the existing lower-bound functions under a unified frame work and propose a group of new lower-bound functions which are much better. Based on the proposed lower-bound functions, an efficient index structure under DTW distance is implemented. In spite of the great success of DTW, it is not very suitable for the time scaling search problem where the time distortion is too large. We modify the traditional DTW distance and propose the Segment-wise Time Warping (STW) distance to adapt to the time scaling search problem. Finally, we devise an efficient search algorithm for the problem of online pattern detection in data streams under DTW distance. / Zhou, Mi. / "January 2007." / Adviser: Man Hon Wong. / Source: Dissertation Abstracts International, Volume: 68-09, Section: B, page: 6100. / Thesis (Ph.D.)--Chinese University of Hong Kong, 2007. / Includes bibliographical references (p. 167-180). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Electronic reproduction. [Ann Arbor, MI] : ProQuest Information and Learning, [200-] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Abstract in English and Chinese. / School code: 1307. Data mining Database searching Time-series analysis
367	A robust test for threshold-type non-linearity in bivariate time series analysis. January 2012 (has links) 在實際分析數據的時候,我們經常遇到二元時間序列數據。在許多情況下，由於普通線性二元時間序列模型未必足以說明較複雜的社會和自然現象，許多分析家認為，多元非線性時間序列模型可以提供一個可行的解決方案。在許多不同類型的多元非線性時間序列中，一個重要的類別是二元門限自回歸（BTAR）模型。BTAR 模型可以充分捕到時間序列數據中的極限週期跳躍現象及振幅頻率。Tsay (1998) [37] 提出了多元的門限型的非線性檢驗。然而，這種檢驗對被異常點污染了的時間序列數據的表現不太令人滿意。為了糾正Tsay (1998) [37]檢驗的缺點，我們提出一個穩健的檢驗程序。本論文的重點是二元時間序列數據。重新加權二元最小消平方法被採用從而推出一個穩健的門限型非線性檢驗。亦得出該檢驗的統計量在原假設下的漸近分怖。透過模擬實驗,找出提出的檢驗的性能，並且與根據最小消平方法建立的Tsay (1998) [37]的檢驗作出比較，我們也會提供實際的數據例子給予說明。 / Bivariate time series data are frequently encountered in practical situations. In many cases, since ordinary linear bivariate time series models may not be sufficient to de-scribe complex social and natural phenomena, many analysts believe that vector non-linear time series models could provide a viable solution. Among many different types of vector non-linear time series processes, an important class is the bivariate threshold autoregressive (BTAR) model. BTAR model can be employed to capture limit cycles, jump phenomenon and amplitude-frequency in the time series data. A test for threshold-type non-linearity in a vector time series was proposed by Tsay (1998) [38]. However, this test does not perform satisfactorily if the data are contaminated by outliers. To remedy the drawback of the Tsay' s (1998) [38] test, we propose a robust testing procedure. The focus of this thesis is on bivariate time series data. The reweighted bivariate least trimmed squares method is adopted to derive a robust test for threshold-type non-linearity. The asymptotic null distribution of the proposed test statistic is dervied. Simulation studies are conducted to investigate the performance of the proposed test and to compare it with the least squares method based on the Tsay's (1998) [38] test. Numerical examples are provided for illustrative purposes. / Detailed summary in vernacular field only. / Chow, Wai Kit. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2012. / Includes bibliographical references (leaves 40-44). / Abstracts also in Chinese. / Abstract --- p.i / Acknowledgement --- p.iii / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Linear Time Series Models and Their Applications --- p.1 / Chapter 1.2 --- Non-linear Time Series Models and Their Applications --- p.3 / Chapter 1.3 --- Threshold Autoregressive Model (TAR) and the Self-exciting Threshold Autoregressive Model (SETAR) --- p.4 / Chapter 1.4 --- Outliers in Univariate Time Series --- p.5 / Chapter 1.5 --- Bivariate Autoregressive Model (BAR) --- p.6 / Chapter 1.6 --- Bivariate Threshold Autoregressive Model (BTAR) --- p.7 / Chapter 1.7 --- Outliers in Bivariate Time Series --- p.8 / Chapter 1.8 --- Objectives of the Thesis --- p.9 / Chapter 1.9 --- Organisation of the Thesis --- p.10 / Chapter 2 --- The Proposed Test --- p.11 / Chapter 2.1 --- Tsay's Test --- p.11 / Chapter 2.2 --- Reweighted Multivariate Least Trimmed Squares Method --- p.14 / Chapter 2.3 --- The Proposed Test --- p.18 / Chapter 3 --- Simulation Study --- p.24 / Chapter 3.1 --- Under the Null Hypothesis --- p.24 / Chapter 3.2 --- Under the Alternative Hypothesis --- p.26 / Chapter 3.3 --- The Choice of γ and δ --- p.28 / Chapter 4 --- Examples --- p.31 / Chapter 4.1 --- Simulated Data --- p.31 / Chapter 4.2 --- Gas-Furnace Data --- p.33 / Chapter 4.3 --- Blowfly Data --- p.35 / Chapter 5 --- Conclusions and Further Research --- p.38 / Bibliography --- p.40 Nonlinear theories
368	Simultaneous prediction intervals for autoregressive integrated moving average models in the presence of outliers. January 2001 (has links) Cheung Tsai-Yee Crystal. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2001. / Includes bibliographical references (leaves 83-85). / Abstracts in English and Chinese. / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- The Importance of Forecasting --- p.1 / Chapter 2 --- Methodology --- p.5 / Chapter 2.1 --- Basic Idea --- p.5 / Chapter 2.2 --- Outliers in Time Series --- p.9 / Chapter 2.2.1 --- One Outlier Case --- p.9 / Chapter 2.2.2 --- Two Outliers Case --- p.17 / Chapter 2.2.3 --- General Case --- p.22 / Chapter 2.2.4 --- Time Series Parameters are Unknown --- p.24 / Chapter 2.3 --- Iterative Procedure for Detecting Outliers --- p.25 / Chapter 2.3.1 --- General Procedure for Detecting Outliers --- p.25 / Chapter 2.4 --- Methods of Constructing Simultaneous Prediction Intervals --- p.27 / Chapter 2.4.1 --- The Bonferroni Method --- p.28 / Chapter 2.4.2 --- The Exact Method --- p.28 / Chapter 3 --- An Illustrative Example --- p.29 / Chapter 3.1 --- Case A --- p.31 / Chapter 3.2 --- Case B --- p.32 / Chapter 3.3 --- Comparison --- p.33 / Chapter 4 --- Simulation Study --- p.36 / Chapter 4.1 --- Generate AR(1) with an Outlier --- p.36 / Chapter 4.1.1 --- Case A --- p.38 / Chapter 4.1.2 --- Case B --- p.40 / Chapter 4.2 --- Simulation Results I --- p.42 / Chapter 4.3 --- Generate AR(1) with Two Outliers --- p.45 / Chapter 4.4 --- Simulation Results II --- p.46 / Chapter 4.5 --- Concluding Remarks --- p.47 / Bibliography --- p.83 Outliers (Statistics) Autoregression (Statistics) Time-series analysis
369	Time Series Modeling with Shape Constraints Zhang, Jing January 2017 (has links) This thesis focuses on the development of semiparametric estimation methods for a class of time series models using shape constraints. Many of the existing time series models assume the noise follows some known parametric distributions. Typical examples are the Gaussian and t distributions. Then the model parameters are estimated by maximizing the resultant likelihood function. As an example, the autoregressive moving average (ARMA) models (Brockwell and Davis, 2009) assume Gaussian noise sequence and are estimated under the causal-invertible constraint by maximizing the Gaussian likelihood. Although the same estimates can also be used in the causal-invertible non-Gaussian case, they are not asymptotically optimal (Rosenblatt, 2012). Moreover, for the noncausal/noninvertible cases, the Gaussian likelihood estimation procedure is not applicable, since any second-order based methods cannot distinguish between causal-invertible and noncausal/noninvertible models (Brockwell and Davis,2009). As a result, many estimation methods for noncausal/noninvertible ARMA models assume the noise follows a known non-Gaussian distribution, like a Laplace distribution or a t distribution. To relax this distributional assumption and allow noncausal/noninvertible models, we borrow ideas from nonparametric shape-constraint density estimation and propose a semiparametric estimation procedure for general ARMA models by projecting the underlying noise distribution onto the space of log-concave measures (Cule and Samworth, 2010; Dümbgen et al., 2011). We show the maximum likelihood estimators in this semiparametric setting are consistent. In fact, the MLE is robust to the misspecification of log-concavity in cases where the true distribution of the noise is close to its log-concave projection. We derive a lower bound for the best asymptotic variance of regular estimators at rate sqrt(n) for AR models and construct a semiparametric efficient estimator. We also consider modeling time series of counts with shape constraints. Many of the formulated models for count time series are expressed via a pair of generalized state-space equations. In this set-up, the observation equation specifies the conditional distribution of the observation Yt at time t given a state-variable Xt. For count time series, this conditional distribution is usually specified as coming from a known parametric family such as the Poisson or the Negative Binomial distribution. To relax this formal parametric framework, we introduce a concave shape constraint into the one-parameter exponential family. This essentially amounts to assuming that the reference measure is log-concave. In this fashion, we are able to extend the class of observation-driven models studied in Davis and Liu (2016). Under this formulation, there exists a stationary and ergodic solution to the state-space model. In this new modeling framework, we consider the inference problem of estimating both the parameters of the mean model and the log-concave function, corresponding to the reference measure. We then compute and maximize the likelihood function over both the parameters associated with the mean function and the reference measure subject to a concavity constraint. The estimator of the mean function and the conditional distribution are shown to be consistent and perform well compared to a full parametric model specification. The finite sample behavior of the estimators are studied via simulation and two empirical examples are provided to illustrate the methodology. Statistics
370	Softwarové možnosti pro analýzu finančních časových řad / Software products for financial time series analysis Vlasáková, Romana January 2012 (has links) The present work deals with selected methods suitable to work with financial time series. Firstly, univariate linear models ARMA are introduced, followed by the description of volatility models ARCH and their generalization to GARCH models. There are many modifications of standard GARCH models designed with respect to the nature of financial data, some of which are presented. Another part of the work dealing with multiple time series focuses on VAR models and bivariate GARCH models. The most important part of the work are practical examples of building the theoretically described models in various types of software with built-in procedures for time series analysis. We apply five different types of commercial and non-commercial software, namely EViews, Mathematica, R, S-PLUS and XploRe. The used software products are presented and compared in terms of their capabilities and the results obtained for particular methods.

Search results