Return to search

Time Series Data Analytics

Given the ubiquity of time series data, and the exponential growth of databases, there has recently been an explosion of interest in time series data mining. Finding similar trends and patterns among time series data is critical for many applications ranging from financial planning, weather forecasting, stock analysis to policy making. With time series being high-dimensional objects, detection of similar trends especially at the granularity of subsequences or among time series of different lengths and temporal misalignments incurs prohibitively high computation costs. Finding trends using non-metric correlation measures further compounds the complexity, as traditional pruning techniques cannot be directly applied. My dissertation addresses these challenges while meeting the need to achieve near real-time responsiveness. First, for retrieving exact similarity results using Lp-norm distances, we design a two-layered time series index for subsequence matching. Time series relationships are compactly organized in a directed acyclic graph embedded with similarity vectors capturing subsequence similarities. Powerful pruning strategies leveraging the graph structure greatly reduce the number of time series as well as subsequence comparisons, resulting in a several order of magnitude speed-up. Second, to support a rich diversity of correlation analytics operations, we compress time series into Euclidean-based clusters augmented by a compact overlay graph encoding correlation relationships. Such a framework supports a rich variety of operations including retrieving positive or negative correlations, self correlations and finding groups of correlated sequences. Third, to support flexible similarity specification using computationally expensive warped distance like Dynamic Time Warping we design data reduction strategies leveraging the inexpensive Euclidean distance with subsequent time warped matching on the reduced data. This facilitates the comparison of sequences of different lengths and with flexible alignment still within a few seconds of response time. Comprehensive experimental studies using real-world and synthetic datasets demonstrate the efficiency, effectiveness and quality of the results achieved by our proposed techniques as compared to the state-of-the-art methods.

Identiferoai:union.ndltd.org:wpi.edu/oai:digitalcommons.wpi.edu:etd-dissertations-1528
Date29 April 2019
CreatorsAhsan, Ramoza
ContributorsElke A. Rundensteiner, Advisor, Vassilis Athitsos, Committee Member, Xiangnan Kong, Committee Member, Gabor Sarkozy
PublisherDigital WPI
Source SetsWorcester Polytechnic Institute
Detected LanguageEnglish
Typetext
Formatapplication/pdf
SourceDoctoral Dissertations (All Dissertations, All Years)

Page generated in 0.0024 seconds