Spelling suggestions: "subject:"data stream"" "subject:"mata stream""
1 |
An investigation into the effects of using limited precision integer arithmetic in digital modemsHale, Roger G. January 1990 (has links)
The main aim of this thesis is to study the effects of using a reduced level of arithmetical precision (as found in a 16-bit microprocessor) whilst running various algorithms in the detection stages of a digital modem. The reason for using a lower precision is to see if these algorithms will run on a limited precision device, such as a Texas Instruments TMS320C25 digital signal processor, in real time.
|
2 |
Microblaze-based coprocessor for data stream management systemsAlqaisi, Tareq S. 06 December 2017 (has links)
Indiana University-Purdue University Indianapolis (IUPUI) / Data network's speed and availability are increasing at a record rate. More and more devices are now able to connect to the Internet and stream data. Processing this ever-growing amount of data in real time continues to be a challenge.
Multiple studies have been conducted to address the growing demands for real-time processing and analysis of continuous data streams. Developed in a previous work, Symbiote Coprocessor Unit (SCU) is a hardware accelerator capable of providing up to 150X speedup over traditional data stream processors in the field of data stream management systems.
However, SCU implementation is very complex, fixed, and uses an outdated host interface, which limits future improvements.
In this study, we present a new SCU architecture that is based on a Xilinx MicroBlaze configurable microcontroller. The proposed architecture reduces complexity, allows future implementations of new algorithms in a relatively short amount of time while maintaining the SCU's high performance. It also has an industry standard PCIe interface. Finally, it uses a standard AMBA AXI4 bus interconnect, which enables easier integration of new hardware components.
The new architecture is implemented using a Xilinx VC709 development board. Our experimental results have shown a minimal loss of performance as compared to the original SCU while providing a flexible and simple design.
|
3 |
Extending AdaBoost:Varying the Base Learners and Modifying the Weight CalculationNeves de Souza, Erico 27 May 2014 (has links)
AdaBoost has been considered one of the best classifiers ever
developed, but two important problems have not yet been addressed. The
first is the dependency on the ``weak" learner, and the second is the
failure to maintain the performance of learners with small error rates
(i.e. ``strong" learners). To solve the first problem, this work
proposes using a different learner in each iteration - known as AdaBoost
Dynamic (AD) - thereby ensuring that the performance of the algorithm
is almost equal to that of the best ``weak" learner executed with AdaBoost.M1. The work
then further modifies the procedure to vary the learner in each
iteration, in order to locate the learner with the smallest error rate
in its training data. This is done using the same weight calculation
as in the original AdaBoost; this version is known as AdaBoost Dynamic
with Exponential Loss (AB-EL). The results were poor, because AdaBoost
does not perform well with strong learners, so, in this sense, the work
confirmed previous works' results. To determine how to improve the
performance, the weight calculation is modified to use the sigmoid function
with algorithm output being the derivative of the same sigmoid function,
rather than the logistic
regression weight calculation originally used by AdaBoost; this
version is known as AdaBoost Dynamic with Logistic Loss (AB-DL). This
work presents the convergence proof that binomial weight calculation
works, and that this approach improves the results for the strong
learner, both theoretically and empirically. AB-DL also has some
disadvantages, like the search for the ``best" classifier and that
this search reduces the diversity among the classifiers. In order
to attack these issues, another algorithm is proposed that combines
AD ``weak" leaner execution policy with a small modification of AB-DL's
weight calculation, called AdaBoost Dynamic with Added Cost (AD-AC).
AD-AC also has a theoretical upper bound error, and the algorithm
offers a small accuracy improvement when compared with AB-DL, and traditional
AdaBoost approaches. Lastly, this work also adapts AD-AC's weight calculation
approach to deal with data stream problem, where classifiers must deal with
very large data sets (in the order of millions of instances), and limited
memory availability.
|
4 |
Extending AdaBoost:Varying the Base Learners and Modifying the Weight CalculationNeves de Souza, Erico January 2014 (has links)
AdaBoost has been considered one of the best classifiers ever
developed, but two important problems have not yet been addressed. The
first is the dependency on the ``weak" learner, and the second is the
failure to maintain the performance of learners with small error rates
(i.e. ``strong" learners). To solve the first problem, this work
proposes using a different learner in each iteration - known as AdaBoost
Dynamic (AD) - thereby ensuring that the performance of the algorithm
is almost equal to that of the best ``weak" learner executed with AdaBoost.M1. The work
then further modifies the procedure to vary the learner in each
iteration, in order to locate the learner with the smallest error rate
in its training data. This is done using the same weight calculation
as in the original AdaBoost; this version is known as AdaBoost Dynamic
with Exponential Loss (AB-EL). The results were poor, because AdaBoost
does not perform well with strong learners, so, in this sense, the work
confirmed previous works' results. To determine how to improve the
performance, the weight calculation is modified to use the sigmoid function
with algorithm output being the derivative of the same sigmoid function,
rather than the logistic
regression weight calculation originally used by AdaBoost; this
version is known as AdaBoost Dynamic with Logistic Loss (AB-DL). This
work presents the convergence proof that binomial weight calculation
works, and that this approach improves the results for the strong
learner, both theoretically and empirically. AB-DL also has some
disadvantages, like the search for the ``best" classifier and that
this search reduces the diversity among the classifiers. In order
to attack these issues, another algorithm is proposed that combines
AD ``weak" leaner execution policy with a small modification of AB-DL's
weight calculation, called AdaBoost Dynamic with Added Cost (AD-AC).
AD-AC also has a theoretical upper bound error, and the algorithm
offers a small accuracy improvement when compared with AB-DL, and traditional
AdaBoost approaches. Lastly, this work also adapts AD-AC's weight calculation
approach to deal with data stream problem, where classifiers must deal with
very large data sets (in the order of millions of instances), and limited
memory availability.
|
5 |
Scalable Validation of Data StreamsXu, Cheng January 2016 (has links)
In manufacturing industries, sensors are often installed on industrial equipment generating high volumes of data in real-time. For shortening the machine downtime and reducing maintenance costs, it is critical to analyze efficiently this kind of streams in order to detect abnormal behavior of equipment. For validating data streams to detect anomalies, a data stream management system called SVALI is developed. Based on requirements by the application domain, different stream window semantics are explored and an extensible set of window forming functions are implemented, where dynamic registration of window aggregations allow incremental evaluation of aggregate functions over windows. To facilitate stream validation on a high level, the system provides two second order system validation functions, model-and-validate and learn-and-validate. Model-and-validate allows the user to define mathematical models based on physical properties of the monitored equipment, while learn-and-validate builds statistical models by sampling the stream in real-time as it flows. To validate geographically distributed equipment with short response time, SVALI is a distributed system where many SVALI instances can be started and run in parallel on-board the equipment. Central analyses are made at a monitoring center where streams of detected anomalies are combined and analyzed on a cluster computer. SVALI is an extensible system where functions can be implemented using external libraries written in C, Java, and Python without any modifications of the original code. The system and the developed functionality have been applied on several applications, both industrial and for sports analytics.
|
6 |
Targeted Prioritized Processing in Overloaded Data Stream SystemsWorks, Karen E. 11 December 2013 (has links)
"We are in an era of big data, sensors, and monitoring technology. One consequence of this technology is the continuous generation of massive volumes of streaming data. To support this, stream processing systems have emerged. These systems must produce results while meeting near-real time response obligations. However, computation intensive processing on high velocity streams is challenging. Stream arrival rates are often unpredictable and can fluctuate. This can cause systems to not always be able to process all incoming data within their required response time.Yet inherently some results may be much more significant than others. The delay or complete neglect of producing certain highly significant results could result in catastrophic consequences. Unfortunately, this critical problem of targeted prioritized processing in overloaded environments remains largely unaddressed to date. In this talk, I will describe four key challenges that my dissertation successfully tackled. First, I address the problem of optimally processing the most significant tuples identified by the user at compile-time before less critical ones. Second, I propose a new aggregate operator that increases the accuracy of aggregate results produced for TP systems. Third, I address the problem of identifying and pulling forward significant tuples at run-time via dynamic determinants. Fourth, I design multi-input operators, such as the join operator, which produce multi-stream results in significance order. My experimental studies explore a rich diversity of workloads, queries, and data sets, including real data streams. The results substantiate that my approaches are a significant improvement over the state-of-the-art approaches."
|
7 |
A dynamic approximate representation scheme for streaming time seriesZhou, Pu January 2009 (has links)
The huge volume of time series data generated in many applications poses new challenges in the techniques of data storage, transmission, and computation. Further more, when the time series are in the form of streaming data, new problems emerge and new techniques are required because of the streaming characteristics, e.g. high volume, high speed and continuous flowing. Approximate representation is one of the most efficient and effective solutions to address the large-volume-high-speed problem. In this thesis, we propose a dynamic representation scheme for streaming time series. Existing methods use a unitary function form for the entire approximation task. In contrast, our method adopts a set of function candidates such as linear function, polynomial function(degree ≥ 2), and exponential function. We provide a novel segmenting strategy to generate subsequences and dynamically choose candidate functions to approximate the subsequences. / Since we are dealing with streaming time series, the segmenting points and the corresponding approximate functions are incrementally produced. For a certain function form, we use a buffer window to find the local farthest possible segmenting point under a user specified error tolerance threshold. To achieve this goal, we define a feasible space for the coefficients of the function and show that we can indirectly find the local best segmenting point by the calculation in the coefficient space. Given the error tolerance threshold, the candidate function representing more information by unit parameter is chosen as the approximate function. Therefore, our representation scheme is more flexible and compact. We provide two dynamic algorithms, PLQS and PLQES, which involve two and three candidate functions, respectively. We also present the general strategy of function selection when more candidate functions are considered. In the experimental test, we examine the effectiveness of our algorithms with synthetic and real time series data sets. We compare our method with the piecewise linear approximation method and the experimental results demonstrate the evident superiority of our dynamic approach under the same error tolerance threshold.
|
8 |
Mining Time-Changing Data StreamsTao, Yingying January 2011 (has links)
Streaming data have gained considerable attention in database and
data mining communities because of the emergence of a class of
applications, such as financial marketing, sensor networks, internet
IP monitoring, and telecommunications that produce these data. Data
streams have some unique characteristics that are not exhibited by
traditional data: unbounded, fast-arriving, and time-changing.
Traditional data mining techniques that make multiple passes over
data or that ignore distribution changes are not applicable to
dynamic data streams. Mining data streams has been an active
research area to address requirements of the streaming applications.
This thesis focuses on developing techniques for distribution change
detection and mining time-changing data streams. Two techniques are
proposed that can detect distribution changes in generic data
streams. One approach for tackling one of the most popular stream
mining tasks, frequent itemsets mining, is also presented in this
thesis. All the proposed techniques are implemented and empirically
studied. Experimental results show that the proposed techniques can
achieve promising performance for detecting changes and mining
dynamic data streams.
|
9 |
Design of Buffering Mechanism for Improving Instruction and Data StreamWu, Chih-Kang 25 June 2003 (has links)
In the microprocessor system, the bandwidth problems of instruction stream and data stream are the main causes that limit the performance of the system. Although cache can effectively smooth this problem, the processor still needs more than one clock cycle to get the data. The large hardware cost and power consumption also limit the cache in the embedded system applications. The buffering techniques, such as the loop buffer and the prefetch buffer, can improve the performance in low hardware. Their mechanisms emphasize on the buffering of the continuous data space. For the non-continuous data space accesses caused by the branch instructions, they cannot exploit the reference localities. In this thesis, we propose a new buffering mechanism called as the ABP buffer, which is composed of a buffering mechanism and a prefetching mechanism. The buffering mechanism can effectively buffer the non-continuous data space and replace the buffer lines in a replacement policy, which is suitable for hardware realization. The prefetching mechanism exploits the hit time to prefetch the data that can be used in near future. The simulation and implement results show that the ABP buffer can gain high performance in low hardware and the control parts of the mechanism only occupy 4% of the total hardware.
|
10 |
Lightweight Top-K Analysis in DBMSs Using Data Stream Analysis TechniquesHuang, Jing 03 September 2009 (has links)
Problem determination is the identification of problems and performance issues that occur in an observed system and the discovery of solutions to resolve them. Top-k analysis is common task in problem determination in database management systems. It involves the identification of the set of most frequently occurring objects according to some criteria, such as the top-k most frequently used tables or most frequent queries, or the top-k queries with respect to CPU usage or amount of I/O.
Effective problem determination requires sufficient monitoring and rapid analysis of the collected monitoring statistics. System monitoring often incurs a great deal of overhead and can interfere with the performance of the observed system. Processing vast amounts of data may require several passes through the analysis system and thus be very time consuming.
In this thesis, we present our lightweight top-k analysis framework in which lightweight monitoring tools are used to continuously poll system statistics producing several continuous data streams which are then processed by stream mining techniques. The results produced by our tool are the “top-k” values for the observed statistics. This information can be valuable to an administrator in determining the source of a problem.
We implement the framework as a prototype system called Tempo. Tempo uses IBM DB2’s snapshot API and a lightweight monitoring tool called DB2PD to generate the data streams. The system reports the top-k executed SQL statements and the top-k most frequently accessed tables in an on-line fashion. Several experiments are conducted to verify the feasibility and effectiveness of our approach. The experimental results show that our approach achieves low system overhead. / Thesis (Master, Computing) -- Queen's University, 2009-08-31 12:42:48.944
|
Page generated in 0.0659 seconds