1 |
Real-time data stream clustering over sliding windowsBadiozamany, Sobhan January 2016 (has links)
In many applications, e.g. urban traffic monitoring, stock trading, and industrial sensor data monitoring, clustering algorithms are applied on data streams in real-time to find current patterns. Here, sliding windows are commonly used as they capture concept drift. Real-time clustering over sliding windows is early detection of continuously evolving clusters as soon as they occur in the stream, which requires efficient maintenance of cluster memberships that change as windows slide. Data stream management systems (DSMSs) provide high-level query languages for searching and analyzing streaming data. In this thesis we extend a DSMS with a real-time data stream clustering framework called Generic 2-phase Continuous Summarization framework (G2CS). G2CS modularizes data stream clustering by taking as input clustering algorithms which are expressed in terms of a number of functions and indexing structures. G2CS supports real-time clustering by efficient window sliding mechanism and algorithm transparent indexing. A particular challenge for real-time detection of a high number of rapidly evolving clusters is efficiency of window slides for clustering algorithms where deletion of expired data is not supported, e.g. BIRCH. To that end, G2CS includes a novel window maintenance mechanism called Sliding Binary Merge (SBM). To further improve real-time sliding performance, G2CS uses generation-based multi-dimensional indexing where indexing structures suitable for the clustering algorithms can be plugged-in.
|
2 |
New Blind Constant Modulus Sliding Windows GSC-RLS Algorithm for DS-CDMA Receiver with Min/Max CriterionLuo, Yin-chen 30 August 2007 (has links)
The code division multiple access (CDMA) system implemented by the direct-sequence (DS) spread spectrum (SS) technique is one of the most promising multiplexing technologies for the wireless communications services. The SS communication adopts a technique of using much wider bandwidth necessary to transmit the information over the channel, and has been proposed for third generation broadband wireless access. The capacity and performance of the DS-CDMA system are mainly limited by the multiple access interference (MAI) and the inter-symbol-interference (ISI) caused by the multipath-fading channel. To circumvent the above-mentioned problems many adaptive multiuser detectors, for instance the minimum mean square error (MMSE) and the minimum output energy (MOE) criteria, subject to certain constraints, have been proposed. Since the LCMV criterion is the linearly constrained (LC) version of MOE, it is high sensitivity to the channel mismatch caused by the unreliable estimation. In order to deal with this problem, the LC constant modulus (LCCM) criterion was considered to avoid capturing the interfering user instead of the desired user when the power of interfering user is much higher than the desired user.
In this thesis, based on the Min/Max criterion we propose a novel blind LCCM recursive least-square (RLS) algorithm, with the generalized side-lobe canceller (GSC) structure, named as the CM GSC-RLS algorithm, to effectively alleviate the effect of MAI and ISI for DS-CDMA receiver, for time-varying channel. Due to the variation of channel at the receiver, the desired user amplitude or power is not available and has to be estimated. To solve this problem, we propose a simple scheme to estimate the parameter of constant modulus, adaptively, associated with the CM GSC-RLS algorithm. With the new proposed algorithm, the amplitude variation of desired user, due to changing characteristics of channel, can be tracked, effectively. Thus, better performance achievement, in terms of output signal-to-interference-plus-noise ratio (SINR) and bit error rate (BER), over the conventional GSC-RLS algorithms can be expected.
|
3 |
Today's Space Weather in the Planetarium : visualization and feature extraction pipeline for astrophysical observation and simulation dataHuy Nikkilä, Sovanny, Kollberg, Axel January 2019 (has links)
This thesis describes the work of two students in collaboration with OpenSpace and the Community Coordinated Modelling Center (CCMC). The need expressed by both parties is a way to more accessibly visualize space weather data from the CCMC in OpenSpace. Firstly, space weather data is preprocessed for downloading and visualizing, a process that involves reducing the size of the data whilst keeping important features. Secondly, a pipeline is created for dynamically fetching the time varying data from the web during runtime of OpenSpace. A sliding window technique is employed to manage the downloading of the data. The results show a complete and working system for downloading data during runtime. Measurements of the performance of running the space weather visualizations by dynamically downloading versus running them locally, show that the new system impacts the frame time marginally. The results also show a visualization of space weather data with enhanced features, which facilitate the exploration of the data, and creates a more comprehensible representation of the data. Data is originally kept in a tabular FITS file format, and file sizes after data reduction and feature extractionare approximately 3% of the original file sizes.
|
4 |
Multiple Continuous Query Processing with Relative Window Predicates "Juggler"Silva, Asima 27 May 2004 (has links)
"Efficient querying over streaming data is a critical technology which requires the ability to handle numerous and possibly similar queries in real time dynamic environments such as the stock market and medical devices. Existing DBMS technology is not well suited for this domain since it was developed for static historical data. Queries over streams often contain relative window predicates such as in the query: ``Heart rate decreased to fifty-two beats per second within four seconds after the patient's temperature started rising." Relative window predicates are a specific type of join between streams that is based on the tuple's timestamp. In our operator, called Juggler, predicates are classified into three types: attribute, join, and window. Attribute predicates are stream values compared to a constant. Join predicates are stream values compared to another stream's values. Window predicates are join predicates where the streams' timestamp values are compared. Juggler's composite operator incorporates the processing of similar though not identical, query functionalities as one complex computation process. This execution strategy handles multi-way joins for multiple selection and join predicates. It adaptively orders the execution of predicates by their selectivity to efficiently process multiple continuous queries based on stream characteristics. In Juggler, all similar predicates are grouped into lists. These indices are represented by a collection of bits. Every tuple contains the bit structure representation of the predicate lists which encodes tuple predicate evaluation history. Every query also contains a similar bit structure to encode the predicate's relationship to the registered queries. The tuple's and query's bit structures are compared to assess if the tuple has satisfied a query. Juggler is designed and implemented in Java. Experiments were conducted to verify correctness and to assess the performance of Juggler's three features. Its adaptivity of reordering the evaluation of predicate types performed as well as the most selective predicate ordering. Its ability to exploit similar predicates in multiple queries showed reduction in number of comparisons. Its effectiveness when multiple queries are combined in a single Juggler operator indicated potential performance improvements after optimization of Juggler's data structures."
|
5 |
Processing Exact Results for Queries over Data StreamsChakraborty, Abhirup 23 February 2010 (has links)
In a growing number of information-processing applications, such as network-traffic monitoring, sensor networks, financial analysis, data mining for e-commerce, etc., data takes the form of continuous data streams rather than traditional stored databases/relational tuples. These applications have some common features like the need for real time analysis, huge volumes of data, and unpredictable and bursty arrivals of stream elements. In all of these applications, it is infeasible to process queries over data streams by loading the data into a traditional database management system (DBMS) or into main memory. Such an approach does not scale with high stream rates. As a consequence, systems that can manage streaming data have gained tremendous importance. The need to process a large number of continuous queries over bursty, high volume online data streams, potentially in real time, makes it imperative to design algorithms that should use limited resources.
This dissertation focuses on processing exact results for join queries over high speed data streams using limited resources, and proposes several novel techniques for processing join queries incorporating secondary storages and non-dedicated computers. Existing approaches for stream joins either, (a) deal with memory limitations by shedding loads, and therefore can not produce exact or highly accurate results for the stream joins over data streams with time varying arrivals of stream tuples, or (b) suffer from large I/O-overheads due to random disk accesses. The proposed techniques exploit the high bandwidth of a disk subsystem by rendering the data access pattern largely sequential, eliminating small, random disk accesses. This dissertation proposes an I/O-efficient algorithm to process hybrid join queries, that join a fast, time varying or bursty data stream and a persistent disk relation. Such a hybrid join is the crux of a number of common transformations in an active data warehouse. Experimental results demonstrate that the proposed scheme reduces the response time in output results by exploiting spatio-temporal locality within the input stream, and minimizes disk overhead through disk-I/O amortization.
The dissertation also proposes an algorithm to parallelize a stream join operator over a shared-nothing system. The proposed algorithm distributes the processing loads across a number of independent, non-dedicated nodes, based on a fixed or predefined communication pattern; dynamically maintains the degree of declustering in order to minimize communication and processing overheads; and presents mechanisms for reducing storage and communication overheads while scaling over a large number of nodes. We present experimental results showing the efficacy of the proposed algorithms.
|
6 |
Processing Exact Results for Queries over Data StreamsChakraborty, Abhirup 23 February 2010 (has links)
In a growing number of information-processing applications, such as network-traffic monitoring, sensor networks, financial analysis, data mining for e-commerce, etc., data takes the form of continuous data streams rather than traditional stored databases/relational tuples. These applications have some common features like the need for real time analysis, huge volumes of data, and unpredictable and bursty arrivals of stream elements. In all of these applications, it is infeasible to process queries over data streams by loading the data into a traditional database management system (DBMS) or into main memory. Such an approach does not scale with high stream rates. As a consequence, systems that can manage streaming data have gained tremendous importance. The need to process a large number of continuous queries over bursty, high volume online data streams, potentially in real time, makes it imperative to design algorithms that should use limited resources.
This dissertation focuses on processing exact results for join queries over high speed data streams using limited resources, and proposes several novel techniques for processing join queries incorporating secondary storages and non-dedicated computers. Existing approaches for stream joins either, (a) deal with memory limitations by shedding loads, and therefore can not produce exact or highly accurate results for the stream joins over data streams with time varying arrivals of stream tuples, or (b) suffer from large I/O-overheads due to random disk accesses. The proposed techniques exploit the high bandwidth of a disk subsystem by rendering the data access pattern largely sequential, eliminating small, random disk accesses. This dissertation proposes an I/O-efficient algorithm to process hybrid join queries, that join a fast, time varying or bursty data stream and a persistent disk relation. Such a hybrid join is the crux of a number of common transformations in an active data warehouse. Experimental results demonstrate that the proposed scheme reduces the response time in output results by exploiting spatio-temporal locality within the input stream, and minimizes disk overhead through disk-I/O amortization.
The dissertation also proposes an algorithm to parallelize a stream join operator over a shared-nothing system. The proposed algorithm distributes the processing loads across a number of independent, non-dedicated nodes, based on a fixed or predefined communication pattern; dynamically maintains the degree of declustering in order to minimize communication and processing overheads; and presents mechanisms for reducing storage and communication overheads while scaling over a large number of nodes. We present experimental results showing the efficacy of the proposed algorithms.
|
7 |
Neural basis and behavioral effects of dynamic resting state functional magnetic resonance imaging as defined by sliding window correlation and quasi-periodic patternsThompson, Garth John 20 September 2013 (has links)
While task-based functional magnetic resonance imaging (fMRI) has helped us understand the functional role of many regions in the human brain, many diseases and complex behaviors defy explanation. Alternatively, if no task is performed, the fMRI signal between distant, anatomically connected, brain regions is similar over time. These correlations in “resting state” fMRI have been strongly linked to behavior and disease. Previous work primarily calculated correlation in entire fMRI runs of six minutes or more, making understanding the neural underpinnings of these fluctuations difficult. Recently, coordinated dynamic activity on shorter time scales has been observed in resting state fMRI: correlation calculated in comparatively short sliding windows and quasi-periodic (periodic but not constantly active) spatiotemporal patterns. However, little relevance to behavior or underlying neural activity has been demonstrated. This dissertation addresses this problem, first by using 12.3 second windows to demonstrate a behavior-fMRI relationship previously only observed in entire fMRI runs. Second, simultaneous recording of fMRI and electrical signals from the brains of anesthetized rats is used to demonstrate that both types of dynamic activity have strong correlates in electrophysiology. Very slow neural signals correspond to the quasi-periodic patterns, supporting the idea that low-frequency activity organizes large scale information transfer in the brain. This work both validates the use of dynamic analysis of resting state fMRI, and provides a starting point for the investigation of the systemic basis of many neuropsychiatric diseases.
|
Page generated in 0.0852 seconds