11 |
Efficient Stream Analysis and its Application to Big Data Processing / Analyse efficace de flux de données et applications au traitement des grandes masses de donnéesRivetti di Val Cervo, Nicolo 30 September 2016 (has links)
L’analyse de flux de données est utilisée dans beaucoup de contexte où la masse des données et/ou le débit auquel elles sont générées, excluent d’autres approches (par exemple le traitement par lots). Le modèle flux fourni des solutions aléatoires et/ou fondées sur des approximations pour calculer des fonctions d’intérêt sur des flux (repartis) de n-uplets, en considérant le pire cas, et en essayant de minimiser l’utilisation des ressources. En particulier, nous nous intéressons à deux problèmes classiques : l’estimation de fréquence et les poids lourds. Un champ d’application moins courant est le traitement de flux qui est d’une certaine façon un champ complémentaire aux modèle flux. Celui-ci fournis des systèmes pour effectuer des calculs génériques sur les flux en temps réel souple, qui passent à l’échèle. Cette dualité nous permet d’appliquer des solutions du modèle flux pour optimiser des systèmes de traitement de flux. Dans cette thèse, nous proposons un nouvel algorithme pour la détection d’éléments surabondants dans des flux repartis, ainsi que deux extensions d’un algorithme classique pour l’estimation des fréquences des items. Nous nous intéressons également à deux problèmes : construire un partitionnement équitable de l’univers des n-uplets par rapport à leurs poids et l’estimation des valeurs de ces n-uplets. Nous utilisons ces algorithmes pour équilibrer et/ou délester la charge dans les systèmes de traitement de flux. / Nowadays stream analysis is used in many context where the amount of data and/or the rate at which it is generated rules out other approaches (e.g., batch processing). The data streaming model provides randomized and/or approximated solutions to compute specific functions over (distributed) stream(s) of data-items in worst case scenarios, while striving for small resources usage. In particular, we look into two classical and related data streaming problems: frequency estimation and (distributed) heavy hitters. A less common field of application is stream processing which is somehow complementary and more practical, providing efficient and highly scalable frameworks to perform soft real-time generic computation on streams, relying on cloud computing. This duality allows us to apply data streaming solutions to optimize stream processing systems. In this thesis, we provide a novel algorithm to track heavy hitters in distributed streams and two extensions of a well-known algorithm to estimate the frequencies of data items. We also tackle two related problems and their solution: provide even partitioning of the item universe based on their weights and provide an estimation of the values carried by the items of the stream. We then apply these results to both network monitoring and stream processing. In particular, we leverage these solutions to perform load shedding as well as to load balance parallelized operators in stream processing systems.
|
12 |
A Belief Rule Based Flood Risk Assessment Expert System Using Real Time Sensor Data StreamingMonrat, Ahmed Afif January 2018 (has links)
Among the various natural calamities, flood is considered one of the most catastrophic natural hazards, which has a significant impact on the socio-economic lifeline of a country. The Assessment of flood risks facilitates taking appropriate measures to reduce the consequences of flooding. The flood risk assessment requires Big data which are coming from different sources, such as sensors, social media, and organizations. However, these data sources contain various types of uncertainties because of the presence of incomplete and inaccurate information. This paper presents a Belief rule-based expert system (BRBES) which is developed in Big data platform to assess flood risk in real time. The system processes extremely large dataset by integrating BRBES with Apache Spark while a web-based interface has developed allowing the visualization of flood risk in real time. Since the integrated BRBES employs knowledge driven learning mechanism, it has been compared with other data-driven learning mechanisms to determine the reliability in assessing flood risk. Integrated BRBES produces reliable results comparing from the other data-driven approaches. Data for the expert system has been collected targeting different case study areas from Bangladesh to validate the integrated system.
|
13 |
Measurement and resource allocation problems in data streaming systemsZhao, Haiquan 26 April 2010 (has links)
In a data streaming system, each component consumes one or several streams of data on the fly and produces one or several streams of data for other components. The entire Internet can be viewed as a giant data streaming system. Other examples include real-time exploratory data mining and high performance transaction processing. In this thesis we study several measurement and resource allocation optimization problems of data streaming systems.
Measuring quantities associated with one or several data streams is often challenging because the sheer volume of data makes it impractical to store the streams in memory or ship them across the network. A data streaming algorithm processes a long stream of data in one pass using a small working memory (called a sketch). Estimation queries can then be answered from one or more such sketches. An important task is to analyze the performance guarantee of such algorithms. In this thesis we describe a tail bound problem that often occurs and present a technique for solving it using majorization and convex ordering theories. We present two algorithms that utilize our technique. The first is to store a large array of counters in DRAM while achieving the update speed of SRAM. The second is to detect global icebergs across distributed data streams.
Resource allocation decisions are important for the performance of a data streaming system. The processing graph of a data streaming system forms a fork and join network. The underlying data processing tasks consists of a rich set of semantics that include synchronous and asynchronous data fork and data join. The different types of semantics and processing requirements introduce complex interdependence between various data streams within the network. We study the distributed resource allocation problem in such systems with the goal of achieving the maximum total utility of output streams. For networks with only synchronous fork and join semantics, we present several decentralized iterative algorithms using primal and dual based optimization techniques. For general networks with both synchronous and asynchronous fork and join semantics, we present a novel modeling framework to formulate the resource allocation problem, and present a shadow-queue based decentralized iterative algorithm to solve the resource allocation problem. We show that all the algorithms guarantee optimality and demonstrate through simulation that they can adapt quickly to dynamically changing environments.
|
14 |
Fast data streaming in resource constrained wireless sensor networksSoroush, Emad 19 August 2008 (has links)
In many emerging applications, data streams are monitored in a network environment. Due to limited communication bandwidth and other resource constraints, a critical and practical demand is to online compress data streams continuously with quality guarantee. Although many data compression and digital signal processing methods have been developed to reduce data volume, their super-linear time and more-than-constant space complexity prevents them from being applied directly on
data streams, particularly over resource-constrained sensor networks. In this thesis, we tackle the problem of online quality guaranteed compression of data streams using fast linear approximation (i.e., using line segments to approximate a time series). Technically, we address two versions of the problem which explore quality guarantees in different forms. We develop online algorithms with linear time complexity and constant cost in space. Our algorithms are optimal in the sense that they generate the minimum number of segments that approximate a time series with the required quality guarantee. To meet the resource constraints in sensor networks, we also develop a fast algorithm which creates connecting segments with very simple computation. The low cost nature of our methods leads to a unique edge on the applications of massive and high speed streaming environment, low bandwidth networks, and heavily constrained nodes in computational power (e.g., tiny sensor nodes). We implement and evaluate our methods in the application of an acoustic wireless sensor network.
|
15 |
A unified framework for real-time streaming and processing of IoT dataZamam, Mohamad January 2017 (has links)
The emergence of the Internet of Things (IoT) is introducing a new era to the realm of computing and technology. The proliferation of sensors and actuators that are embedded in things enables these devices to understand the environments and respond accordingly more than ever before. Additionally, it opens the space to unlimited possibilities for building applications that turn this sensation into big benefits, and within various domains. From smart cities to smart transportation and smart environment and the list is quite long. However, this revolutionary spread of IoT devices and technologies rises big challenges. One major challenge is the diversity in IoT vendors that results in data heterogeneity. This research tackles this problem by developing a data management tool that normalizes IoT data. Another important challenge is the lack of practical IoT technology with low cost and low maintenance. That has often limited large-scale deployments and mainstream adoption. This work utilizes open-source data analytics in one unified IoT framework in order to address this challenge. What is more, billions of connected things are generating unprecedented amounts of data from which intelligence must be derived in real-time. This unified framework processes real-time streams of data from IoT. A questionnaire that involved participants with background knowledge in IoT was conducted in order to collect feedback about the proposed framework. The aspects of the framework were presented to the participants in a form of demonstration video describing the work that has been done. Finally, using the participants’ feedback, the contribution of the developed framework to the IoT was discussed and presented.
|
16 |
Využití protokolu RTP pro distribuci procesních dat v reálném čase / RTP Protocol for Real-Time Data DistributionŠkarecký, Tomáš January 2010 (has links)
This paper explores the potential of process data distribution in real time using the RTP protocol. Firstly, there is an evaluation of existing protocols that are used for this purpose. Secondly, there is a description of the trio of protocols RTP, RTCP, and RTSP, explained their functions and explored possibilities for their expansion. On the basis of this acquired knowledge, problems which may occur with the distribution of process data by means of RTP are identified and their possible solutions are proposed. To demonstrate the data distribution via RTP, the application, that collects GPS position from the PDA and then provides clients with them, has been designed and implemented. Clients can display these data on the map.
|
17 |
Developing a Car2X Communication Application using a Queriable Wireless Sensor NetworkJitta, Srinivasu 11 September 2018 (has links)
The development of wireless sensor networks has reached a point where each individual node of a network may store and deliver a massive amount of (sensor-based) information at once or over time. In the future, massively connected, highly dynamic wireless sensor networks such as vehicle-2-vehicle communication scenarios may hold an even greater information potential. This is mostly due to the increase in node complexity. Consequently, data volumes will become a problem for traditional data aggregation strategies trafficwise as well as with regard to energy efficiency. Therefore, in this thesis, proposed a database aggregation strategy can be used to minimize most of these problems of big data in embedded and wireless sensor networks which enables the efficient use of energy and the handling of large data volumes.
Moreover, evaluated latency and traffic volume in the network based on experiments using sensor platforms.
|
18 |
Real-time Outlier Detection using Unbounded Data Streaming and Machine LearningÅkerström, Emelie January 2020 (has links)
Accelerated advancements in technology, the Internet of Things, and cloud computing have spurred an emergence of unstructured data that is contributing to rapid growth in data volumes. No human can manage to keep up with monitoring and analyzing these unbounded data streams and thus predictive and analytic tools are needed. By leveraging machine learning this data can be converted into insights which are enabling datadriven decisions that can drastically accelerate innovation, improve user experience, and drive operational efficiency. The purpose of this thesis is to design and implement a system for real-time outlier detection using unbounded data streams and machine learning. Traditionally, this is accomplished by using alarm-thresholds on important system metrics. Yet, a static threshold cannot account for changes in trends and seasonality, changes in the system, or an increased system load. Thus, the intention is to leverage machine learning to instead look for deviations in the behavior of the data not caused by natural changes but by malfunctions. The use-case driving the thesis forward is real-time outlier detection in a Content Delivery Network (CDN). The input data includes Http-error messages received by clients, and contextual information like region, cache domains, and error codes, to provide tailormade predictions accounting for the trends in the data. The outlier detection system consists of a data collection pipeline leveraging the technique of stream processing, a MiniBatchKMeans clustering model that provides online clustering of incoming data according to their similar characteristics, and an LSTM AutoEncoder that accounts for temporal nature of the data and detects outlier data points in the clusters. An important finding is that an outlier is defined as an abnormal amount of outlier data points all originating from the same cluster, not a single outlier data point. Thus, the alerting system will be implementing an outlier percentage threshold. The experimental results show that an outlier is detected within one minute from a cache break-down. This triggers an alert to the system owners, containing graphs of the clustered data to narrow down the search area of the cause to enable preventive action towards the prominent incident. Further results show that within 2 minutes from fixing the cause the system will provide feedback that the actions taken were successful. Considering the real-time requirements of the CDN environment, it is concluded that the short delay for detection is indeed real-time. Proving that machine learning is indeed able to detect outliers in unbounded data streams in a real-time manner. Further analysis shows that the system is more accurate during peakhours when more data is in circulation than during none peak-hours, despite the temporal LSTM layers. Presumably, an effect from the model needing to train on more data to better account for seasonality and trends. Future work necessary to put the outlier detection system in production thus includes more training to improve accuracy and correctness. Furthermore, one could consider implementing necessary functionality for a production environment and possibly adding enhancing features that can automatically avert incidents detected and handle the causes of them.
|
19 |
Towards Ideal Network Traffic Measurement: A Statistical Algorithmic ApproachZhao, Qi 03 October 2007 (has links)
With the emergence of computer networks as one of the primary platforms of communication,
and with their adoption for an increasingly broad range of applications, there is a growing need for high-quality network traffic measurements to better understand, characterize and engineer the network behaviors. Due to the inherent lack of fine-grained measurement capabilities in the original design of the Internet, it does not have enough data or information to compute or even approximate
some traffic statistics such as traffic matrices and per-link delay. While it is possible to infer these statistics from indirect aggregate measurements that are widely supported by network measurement devices (e.g., routers), how to obtain the best possible inferences is often a challenging research problem. We name this as "too little data" problem after its root cause. Interestingly, while "too little data" is clearly a problem, "too much data" is not a blessing either. With the rapid increase
of network link speeds, even to keep sampled summarized network traffic (for inferring various
network statistics) at low sample rates results in too much data to be stored, processed, and transmitted over measurement devices. In summary high-quality measurements in today's Internet is
very challenging due to resource limitations and lack of built-in support, manifested as either "too little data" or "too much data".
We present some new practices and proposals to alleviate these two problems.The contribution is four fold: i) designing universal methodologies towards ideal network traffic measurements; ii) providing accurate estimations for several critical traffic statistics guided
by the proposed methodologies; iii) offering multiple useful and extensible building blocks which can be used to construct a universal network measurement system in the future; iv) leading to some notable mathematical results such as a new large deviation theorem that finds applications in various areas.
|
20 |
Resource management for data streaming applicationsAgarwalla, Bikash Kumar 07 July 2010 (has links)
This dissertation investigates novel middleware mechanisms for building streaming
applications. Developing streaming applications is a challenging task
because (i) they are continuous in nature; (ii) they require fusion of data coming from multiple sources to derive
higher level information; (iii) they require
efficient transport of data from/to distributed sources and sinks;
(iv) they need access to heterogeneous resources spanning sensor networks and high
performance computing; and (v) they are time critical in nature. My thesis is that an
intuitive programming abstraction will make it easier to build dynamic,
distributed, and ubiquitous data streaming applications. Moreover, such an abstraction will
enable an efficient allocation of shared and heterogeneous computational resources thereby making it easier for
domain experts to build these applications. In support of the thesis, I present a novel programming abstraction, called DFuse,
that makes it easier to develop these applications. A domain expert only needs to specify the input and output
connections to fusion channels, and the fusion functions. The subsystems developed in
this dissertation take care of instantiating the application,
allocating resources for the application (via the scheduling heuristic developed in this dissertation) and dynamically
managing the resources (via the dynamic scheduling algorithm presented in this dissertation). Through extensive
performance evaluation, I demonstrate that the resources are allocated efficiently to optimize the throughput and latency
constraints of an application.
|
Page generated in 0.095 seconds