Global ETD Search

1	A sliding window BIRCH algorithm with performance evaluations Li, Chuhe January 2017 (has links) An increasing number of applications covered various fields generate transactional data or other time-stamped data which all belongs to time series data. Time series data mining is a popular topic in the data mining field, it introduces some challenges to improve accuracy and efficiency of algorithms for time series data. Time series data are dynamical, large-scale and high complexity, which makes it difficult to discover patterns among time series data with common methods suitable for static data. One of hierarchical-based clustering methods called BIRCH was proposed and employed for addressing the problems of large datasets. It minimizes the costs of I/O and time. A CF tree is generated during its working process and clusters are generated after four phases of the whole BIRCH procedure. A drawback of BIRCH is that it is not very scalable. This thesis is devoted to improve accuracy and efficiency of BIRCH algorithm. A sliding window BIRCH algorithm is implemented on the basis of BIRCH algorithm. At the end of thesis, the accuracy and efficiency of sliding window BIRCH are evaluated. A performance comparison among SW BIRCH, BIRCH and K-means are also presented with Silhouette Coefficient index and Calinski-Harabaz Index. The preliminary results indicate that the SW BIRCH may achieve a better performance than BIRCH in some cases. Clustering time series data Computer Systems Datorsystem
2	Data mining časových řad / Time series data mining Novák, Petr January 2009 (has links) This work deals with modern trends in time series data mining web mining; time series; data mining
3	REAL-TIME RECOGNITION OF TIME-SERIES PATTERNS Morrill, Jeffrey P., Delatizky, Jonathan 10 1900 (has links) International Telemetering Conference Proceedings / October 25-28, 1993 / Riviera Hotel and Convention Center, Las Vegas, Nevada / This paper describes a real-time implementation of the pattern recognition technology originally developed by BBN [Delatizky et al] for post-processing of time-sampled telemetry data. This makes it possible to monitor a data stream for a characteristic shape, such as an arrhythmic heartbeat or a step-response whose overshoot is unacceptably large. Once programmed to recognize patterns of interest, it generates a symbolic description of a time-series signal in intuitive, object-oriented terms. The basic technique is to decompose the signal into a hierarchy of simpler components using rules of grammar, analogous to the process of decomposing a sentence into phrases and words. This paper describes the basic technique used for pattern recognition of time-series signals and the problems that must be solved to apply the techniques in real time. We present experimental results for an unoptimized prototype demonstrating that 4000 samples per second can be handled easily on conventional hardware. pattern recognition real-time analysis signal processing time-series data
4	NEW METHODS FOR MINING SEQUENTIAL AND TIME SERIES DATA Al-Naymat, Ghazi January 2009 (has links) Doctor of Philosophy (PhD) / Data mining is the process of extracting knowledge from large amounts of data. It covers a variety of techniques aimed at discovering diverse types of patterns on the basis of the requirements of the domain. These techniques include association rules mining, classification, cluster analysis and outlier detection. The availability of applications that produce massive amounts of spatial, spatio-temporal (ST) and time series data (TSD) is the rationale for developing specialized techniques to excavate such data. In spatial data mining, the spatial co-location rule problem is different from the association rule problem, since there is no natural notion of transactions in spatial datasets that are embedded in continuous geographic space. Therefore, we have proposed an efficient algorithm (GridClique) to mine interesting spatial co-location patterns (maximal cliques). These patterns are used as the raw transactions for an association rule mining technique to discover complex co-location rules. Our proposal includes certain types of complex relationships – especially negative relationships – in the patterns. The relationships can be obtained from only the maximal clique patterns, which have never been used until now. Our approach is applied on a well-known astronomy dataset obtained from the Sloan Digital Sky Survey (SDSS). ST data is continuously collected and made accessible in the public domain. We present an approach to mine and query large ST data with the aim of finding interesting patterns and understanding the underlying process of data generation. An important class of queries is based on the flock pattern. A flock is a large subset of objects moving along paths close to each other for a predefined time. One approach to processing a “flock query” is to map ST data into high-dimensional space and to reduce the query to a sequence of standard range queries that can be answered using a spatial indexing structure; however, the performance of spatial indexing structures rapidly deteriorates in high-dimensional space. This thesis sets out a preprocessing strategy that uses a random projection to reduce the dimensionality of the transformed space. We use probabilistic arguments to prove the accuracy of the projection and to present experimental results that show the possibility of managing the curse of dimensionality in a ST setting by combining random projections with traditional data structures. In time series data mining, we devised a new space-efficient algorithm (SparseDTW) to compute the dynamic time warping (DTW) distance between two time series, which always yields the optimal result. This is in contrast to other approaches which typically sacrifice optimality to attain space efficiency. The main idea behind our approach is to dynamically exploit the existence of similarity and/or correlation between the time series: the more the similarity between the time series, the less space required to compute the DTW between them. Other techniques for speeding up DTW, impose a priori constraints and do not exploit similarity characteristics that may be present in the data. Our experiments demonstrate that SparseDTW outperforms these approaches. We discover an interesting pattern by applying SparseDTW algorithm: “pairs trading” in a large stock-market dataset, of the index daily prices from the Australian stock exchange (ASX) from 1980 to 2002. Data mining Spatial data Spatio-temporal Time series data
5	NEW METHODS FOR MINING SEQUENTIAL AND TIME SERIES DATA Al-Naymat, Ghazi January 2009 (has links) Doctor of Philosophy (PhD) / Data mining is the process of extracting knowledge from large amounts of data. It covers a variety of techniques aimed at discovering diverse types of patterns on the basis of the requirements of the domain. These techniques include association rules mining, classification, cluster analysis and outlier detection. The availability of applications that produce massive amounts of spatial, spatio-temporal (ST) and time series data (TSD) is the rationale for developing specialized techniques to excavate such data. In spatial data mining, the spatial co-location rule problem is different from the association rule problem, since there is no natural notion of transactions in spatial datasets that are embedded in continuous geographic space. Therefore, we have proposed an efficient algorithm (GridClique) to mine interesting spatial co-location patterns (maximal cliques). These patterns are used as the raw transactions for an association rule mining technique to discover complex co-location rules. Our proposal includes certain types of complex relationships – especially negative relationships – in the patterns. The relationships can be obtained from only the maximal clique patterns, which have never been used until now. Our approach is applied on a well-known astronomy dataset obtained from the Sloan Digital Sky Survey (SDSS). ST data is continuously collected and made accessible in the public domain. We present an approach to mine and query large ST data with the aim of finding interesting patterns and understanding the underlying process of data generation. An important class of queries is based on the flock pattern. A flock is a large subset of objects moving along paths close to each other for a predefined time. One approach to processing a “flock query” is to map ST data into high-dimensional space and to reduce the query to a sequence of standard range queries that can be answered using a spatial indexing structure; however, the performance of spatial indexing structures rapidly deteriorates in high-dimensional space. This thesis sets out a preprocessing strategy that uses a random projection to reduce the dimensionality of the transformed space. We use probabilistic arguments to prove the accuracy of the projection and to present experimental results that show the possibility of managing the curse of dimensionality in a ST setting by combining random projections with traditional data structures. In time series data mining, we devised a new space-efficient algorithm (SparseDTW) to compute the dynamic time warping (DTW) distance between two time series, which always yields the optimal result. This is in contrast to other approaches which typically sacrifice optimality to attain space efficiency. The main idea behind our approach is to dynamically exploit the existence of similarity and/or correlation between the time series: the more the similarity between the time series, the less space required to compute the DTW between them. Other techniques for speeding up DTW, impose a priori constraints and do not exploit similarity characteristics that may be present in the data. Our experiments demonstrate that SparseDTW outperforms these approaches. We discover an interesting pattern by applying SparseDTW algorithm: “pairs trading” in a large stock-market dataset, of the index daily prices from the Australian stock exchange (ASX) from 1980 to 2002. Data mining Spatial data Spatio-temporal Time series data
6	Uncertainity in Renewable Energy Time Series Prediction using Neural Networks Aupke, Phil January 2021 (has links) With the increasing demand for solar energy, the forecast of the PV station energy production has to be as precisely as possible. To make the prediction more robust, also correlated infor- mation about the weather can be added to the previous energy production of the PV station. This thesis is part of a project, which has the goal to build an energy marketplace for a smart energy grid between households. To make the decisions of the prosumer more accurate, a forecast for the PV station energy production has to be as accurate as possible. Because not every household or even some smart grids will contain a weather station, also interpolated weather information has to be considered. The objective of this work is the evaluation of the accuracy difference between precise weather information, located directly at the PV station and interpolated weather data. The errors of the data were recorded due to misfunctions in the sensors and were cleared with the usage of winsorization. The unnecessary weather features have been detected with several feature selection methods. For the forecast of the energy production three established machine learning algorithms were used: Random Forest, LSTM and Facebook Prophet. For the com- parison of the performance different performance metrics were used. The validation of the three models was carried out by a walk-forward cross validation with unseen data. Further- more, for each of the two datasets one of the three machine learning model were trained. For the performance measurement i.e., the LSTM model trained on precise weather information also received the interpolated data as an input for the prediction and vice versa. As a conclu- sion, the Random Forest model performed better than the other two model types, with an av- erage normalized error of 0.15. Whereas the LSTM model received an error of 0.37 and the Prophet model 0.58. For the difference between interpolated and actual weather information the results prove, that the uncertainity in those variables also affects the prediction of the PV station energy outcome. The LSTM model MSE increased by 14 percent and the Random Forest results with an increasement of 16 percent. The end of the thesis includes a discussion about the results and possible tasks for future work takes place. Time Series Data Uncertainty Machine Learning Computer Sciences Datavetenskap (datalogi)
7	Advanced Data Analytics Methodologies for Anomaly Detection in Multivariate Time Series Vehicle Operating Data Alizadeh, Morteza 06 August 2021 (has links) Early detection of faults in the vehicle operating systems is a research domain of high significance to sustain full control of the systems since anomalous behaviors usually result in performance loss for a long time before detecting them as critical failures. In other words, operating systems exhibit degradation when failure begins to occur. Indeed, multiple presences of the failures in the system performance are not only anomalous behavior signals but also show that taking maintenance actions to keep the system performance is vital. Maintaining the systems in the nominal performance for the lifetime with the lowest maintenance cost is extremely challenging and it is important to be aware of imminent failure before it arises and implement the best countermeasures to avoid extra losses. In this context, the timely anomaly detection of the performance of the operating system is worthy of investigation. Early detection of imminent anomalous behaviors of the operating system is difficult without appropriate modeling, prediction, and analysis of the time series records of the system. Data based technologies have prepared a great foundation to develop advanced methods for modeling and prediction of time series data streams. In this research, we propose novel methodologies to predict the patterns of multivariate time series operational data of the vehicle and recognize the second-wise unhealthy states. These approaches help with the early detection of abnormalities in the behavior of the vehicle based on multiple data channels whose second-wise records for different functional working groups in the operating systems of the vehicle. Furthermore, a real case study data set is used to validate the accuracy of the proposed prediction and anomaly detection methodologies. Time series data analytics Operational behavior prediction Unhealthy states detection
8	Time Series Data Analytics Ahsan, Ramoza 29 April 2019 (has links) Given the ubiquity of time series data, and the exponential growth of databases, there has recently been an explosion of interest in time series data mining. Finding similar trends and patterns among time series data is critical for many applications ranging from financial planning, weather forecasting, stock analysis to policy making. With time series being high-dimensional objects, detection of similar trends especially at the granularity of subsequences or among time series of different lengths and temporal misalignments incurs prohibitively high computation costs. Finding trends using non-metric correlation measures further compounds the complexity, as traditional pruning techniques cannot be directly applied. My dissertation addresses these challenges while meeting the need to achieve near real-time responsiveness. First, for retrieving exact similarity results using Lp-norm distances, we design a two-layered time series index for subsequence matching. Time series relationships are compactly organized in a directed acyclic graph embedded with similarity vectors capturing subsequence similarities. Powerful pruning strategies leveraging the graph structure greatly reduce the number of time series as well as subsequence comparisons, resulting in a several order of magnitude speed-up. Second, to support a rich diversity of correlation analytics operations, we compress time series into Euclidean-based clusters augmented by a compact overlay graph encoding correlation relationships. Such a framework supports a rich variety of operations including retrieving positive or negative correlations, self correlations and finding groups of correlated sequences. Third, to support flexible similarity specification using computationally expensive warped distance like Dynamic Time Warping we design data reduction strategies leveraging the inexpensive Euclidean distance with subsequent time warped matching on the reduced data. This facilitates the comparison of sequences of different lengths and with flexible alignment still within a few seconds of response time. Comprehensive experimental studies using real-world and synthetic datasets demonstrate the efficiency, effectiveness and quality of the results achieved by our proposed techniques as compared to the state-of-the-art methods. correlation measure dynamic time warping lp-norm distances similarity search time series data
9	Exploratory Visualization of Data Pattern Changes in Multivariate Data Streams Xie, Zaixian 21 October 2011 (has links) " More and more researchers are focusing on the management, querying and pattern mining of streaming data. The visualization of streaming data, however, is still a very new topic. Streaming data is very similar to time-series data since each datapoint has a time dimension. Although the latter has been well studied in the area of information visualization, a key characteristic of streaming data, unbounded and large-scale input, is rarely investigated. Moreover, most techniques for visualizing time-series data focus on univariate data and seldom convey multidimensional relationships, which is an important requirement in many application areas. Therefore, it is necessary to develop appropriate techniques for streaming data instead of directly applying time-series visualization techniques to it. As one of the main contributions of this dissertation, I introduce a user-driven approach for the visual analytics of multivariate data streams based on effective visualizations via a combination of windowing and sampling strategies. To help users identify and track how data patterns change over time, not only the current sliding window content but also abstractions of past data in which users are interested are displayed. Sampling is applied within each single time window to help reduce visual clutter as well as preserve data patterns. Sampling ratios scheduled for different windows reflect the degree of user interest in the content. A degree of interest (DOI) function is used to represent a user's interest in different windows of the data. Users can apply two types of pre-defined DOI functions, namely RC (recent change) and PP (periodic phenomena) functions. The developed tool also allows users to interactively adjust DOI functions, in a manner similar to transfer functions in volume visualization, to enable a trial-and-error exploration process. In order to visually convey the change of multidimensional correlations, four layout strategies were designed. User studies showed that three of these are effective techniques for conveying data pattern changes compared to traditional time-series data visualization techniques. Based on this evaluation, a guide for the selection of appropriate layout strategies was derived, considering the characteristics of the targeted datasets and data analysis tasks. Case studies were used to show the effectiveness of DOI functions and the various visualization techniques. A second contribution of this dissertation is a data-driven framework to merge and thus condense time windows having small or no changes and distort the time axis. Only significant changes are shown to users. Pattern vectors are introduced as a compact format for representing the discovered data model. Three views, juxtaposed views, pattern vector views, and pattern change views, were developed for conveying data pattern changes. The first shows more details of the data but needs more canvas space; the last two need much less canvas space via conveying only the pattern parameters, but lose many data details. The experiments showed that the proposed merge algorithms preserves more change information than an intuitive pattern-blind averaging. A user study was also conducted to confirm that the proposed techniques can help users find pattern changes more quickly than via a non-distorted time axis. A third contribution of this dissertation is the history views with related interaction techniques were developed to work under two modes: non-merge and merge. In the former mode, the framework can use natural hierarchical time units or one defined by domain experts to represent timelines. This can help users navigate across long time periods. Grid or virtual calendar views were designed to provide a compact overview for the history data. In addition, MDS pattern starfields, distance maps, and pattern brushes were developed to enable users to quickly investigate the degree of pattern similarity among different time periods. For the merge mode, merge algorithms were applied to selected time windows to generate a merge-based hierarchy. The contiguous time windows having similar patterns are merged first. Users can choose different levels of merging with the tradeoff between more details in the data and less visual clutter in the visualizations. The usability evaluation demonstrated that most participants could understand the concepts of the history views correctly and finished assigned tasks with a high accuracy and relatively fast response time. " Multivariate Data Visualization Streaming Data Visualization Time-series Data Visualization Data streams
10	An Application of Autoregressive Conditional Heteroskedasticity (ARCH) and Generalized Autoregressive Conditional Heteroskedasticity (GARCH) Modelling on Taiwan's Time-Series Data: Three Essays Chang, Tsangyao 01 May 1995 (has links) In this dissertation, three essays are presented that apply recent advances in time-series methods to the analysis of inflation and stock market index data for Taiwan. Specifically, ARCH and GARCH methodologies are used to investigate claims of increased volatility in economic time-series data since 1980. In the first essay, analysis that accounts for structural change reveals that the fundamental relationship between inflation and its variability was severed by policies implemented during economic liberalization in Taiwan in the early 1980s. Furthermore, if residuals are corrected for serial correlation, evidence in favor of ARCH effects is weakened. In the second essay, dynamic linkages between daily stock returns and daily trading volume are explored. Both linear and nonlinear dependence are evaluated using Granger causality tests and GARCH modelling. Results suggest significant unidirectional Granger causality from stock returns to trading volume. In the third essay, comparative analysis of the frequency structure of the Taiwan stock index data is conducted using daily, weekly, and monthly data. Results demonstrate that the relationship between mean return and its conditional standard deviation is positive and significant only for high-frequency daily data. ARCH GARCH Modelling Taiwan's Time-series data Economics

Search results