Global ETD Search

1	Platforms for Real-time Moving Object Location Stream Processing Gadhoumi, Shérazade January 2017 (has links) Boarder security is usually based on observing and analyzing the movement of MovingPoint Objects (MPOs): vehicle, boats, pedestrian or aircraft for example. This movementanalysis can directly be made by an operator observing the MPOs in real-time, but theprocess is time-consuming and approximate. This is why the states of each MPO (ID, location,speed, direction) are sensed in real-time using Global Navigation Satellite System(GNSS), Automatic Identification System (AIS) and radar sensing, thus creating a streamof MPO states. This research work proposes and carries out (1) a method for detectingfour different moving point patterns based on this input stream (2) a comparison betweenthree possible implementations of the moving point pattern detectors based on three differentData Stream Management Systems (DSMS). Moving point patterns can be dividedin two groups: (1) individual location patterns are based on the analysis of the successivestates of one MPO, (2) set-based relative motion patterns are based on the analysis ofthe relative motion of groups of MPOs within a set. This research focuses on detectingfour moving point patterns: (1) the geofence pattern consists of one MPO enteringor exiting one of the predefined areas called geofences, (2) the track pattern consists ofone MPO following the same direction for a given number of time steps and satisfying agiven spatial constraint, (3) the flock pattern consists of a group of geographically closeMPOs following the same direction, (4) the leadership pattern consists of a track patternwith the corresponding MPO anticipating the direction of geographically close MPOs atthe last time step. The two first patterns are individual location patterns, while the othersare set-based relative motion patterns. This research work proposes a method for detectinggeofence patterns based on the update of a table storing the last sensed state of eachMPO. The approach used for detecting track, flock and leadership patterns is based on theupdate of a REMO matrix (RElative MOtion matrix) where rows correspond to MPOs,columns to time steps and cells record the direction of movement. For the detection offlock patterns a simple but effective probabilistic grid-based approach is proposed in orderto detect clusters of MPOs within the MPOs following the same direction: (1) the Filteringphase partitions the study area into square-shaped cells -according to the dimensionof the spatial constraint- and selects spatially contiguous grid cells called candidate areasthat potentially contain flock patterns (2) for each candidate area, the Refinement phasegenerates disks of the size of the spatial constraint within the selected area until one diskcontains enough MPOs, so that the corresponding MPOs are considered to build a flockpattern. The pattern detectors are implemented on three DSMSs presenting differentcharacteristics: Esri ArcGIS GeoEvent Extension for Server (GeoEvent Ext.), a workflow-based technology that ingests each MPO state separately, Apache Spark Streaming(Spark), a MapReduce-based technology that processes the input stream in batches in ahighly-parallel processing framework and Apache Flink (Flink), a hybrid technology thatingests the states separately but offers several MapReduce semantics. GeoEvent Ext. onlylends itself for a nature implementation of the geofence detector, while the other DSMSsaccommodate the implementation of all detectors. Therefore, the geofence, track, flockand leadership pattern detectors are implemented on Spark and Flink, and empiricallyevaluated in terms of scalability in time/space based on the variation of parameters characterizingthe patterns and/or the input stream. The results of the empirical evaluationshows that the implementation on Flink uses globally less computer resources than theone on Spark. Moreover, the program based on Flink is less sensitive to the variability ofparameters describing either the input stream or the patterns to be detected. Data Stream Management Systems Data mining Moving point patterns Information Systems
2	A Dynamic Attribute-Based Load Shedding and Data Recovery Scheme for Data Stream Management Systems Ahuja, Amit 29 June 2006 (has links) (PDF) Data streams being transmitted over a network channel with capacity less than the data rate of the data streams is very common when using network channels such as dial-up, low bandwidth wireless links. Not only does this lower capacity creates delays but also causes sequential network problems such as packet losses, network congestion, errors in data packets giving rise to other problems and creating a cycle of problems hard to break out from. In this thesis, we present a new approach for shedding the less informative attribute data from a data stream with a fixed schema to maintain a data rate lesser than the network channels capacity. A scheme for shedding attributes, instead of tuples, becomes imperative in stream data where the data for one of the attributes remains relatively constant or changes less frequently compared to the data for the other attributes. In such a data stream management system, shedding a complete tuple would lead to shedding of some informative-attribute data along with the less informative-attribute data in the tuple, whereas shedding of the less informative-attribute data would cause only the less informative data to be dropped. In this thesis, we deal with two major problems in load shedding: the intra-stream load shedding and the inter-stream load shedding problems. The intra-stream load shedding problem deals with shedding of the less informative attributes when a single data stream with the data rate greater than the channel capacity has to be transmitted to the destination over the channel. The inter-stream load shedding problem refers to shedding of attributes among different streams when more than one stream has to be transferred to the destination over a channel with the channel capacity less than the combined data rate of all the streams to be transmitted. As a solution to the inter-stream or intra-stream load shedding problem, we apply our load shedding schema approach to determine a ranking amongst the attributes on a singe data stream or multiple data streams with the least informative attribute(s) being ranked the highest. The amount of data to be shed to maintain the data rate below the capacity is calculated dynamically, which means that the amount of data to be shed changes with any change in the channel capacity or any change in the data rate. Using these two pieces of information, a load shedding schema describing the attributes to be shed is generated. The load shedding schema is generated dynamically, which means that the load shedding schema is updated with any change in (i) the rankings of attributes that capture the rate of change on the values of each attribute, (ii) channel capacity, and (iii) data rate even after load shedding has been invoked. The load shedding schema is updated using our load shedding schema re-evaluation algorithm, which adapts to the data stream characteristics and follows the attribute data variation curve of the data stream. Since data dropped at the source may be of interest to the user at the destination, we also propose a recovery module which can be invoked to recover attribute data already shed. The recovery module maintains the minimal amount of information about data already shed for recovery purpose. Preliminary experimental results have shown that recovery accuracy ranges from 90% to 99%, which requires only 5% to 33% and 4.88% to 50% of the dropped data to be stored for weather reports and stock exchanges, respectively. Storing of recovery information imposes storage and processing burden on the source site, and our recovery method aims at satisfactory recovery accuracy while imposing minimal burden on the source site. Our load shedding approach, which achieves a high performance in reducing the data stream load, (i) handles wide range of data streams in different application domains (such as weather, stocks, and network performance, etc.), (ii) is dynamic in nature, which means that the load shedding scheme adjusts the amount of data to be shed and which attribute data to be shed according to the current load and network capacity, and (iii) provides a data recovery mechanism that is capable to recover any shedded attribute data with recovery accuracy up to 90% with very low burden on the source site and 99% with a higher burden on some stream data. To the best of our knowledge, the dynamic load shedding scheme we propose is the first one in the literature to shed attributes, instead of tuples, along with providing a recovery mechanism in a data stream management system. Our load shedding approach is unique since it is not a static load shedding schema, which is less appealing in an ever-changing (sensor) network environment, and is not based on queries, but works on the general characteristics of the data stream under consideration instead. data stream load shedding dynamic load shedding shed data recovery amit ahuja attribute based data stream management systems Computer Sciences

Search results

Platforms for Real-time Moving Object Location Stream Processing

A Dynamic Attribute-Based Load Shedding and Data Recovery Scheme for Data Stream Management Systems