1 |
Characterizing Popularity Dynamics of User-generated Videos: A Category-based Study of YouTube2013 August 1900 (has links)
Understanding the growth pattern of content popularity has become a subject of immense interest to
Internet service providers, content makers and on-line advertisers. This understanding is also important for
the sustainable development of content distribution systems. As an approach to comprehend the characteristics of this growth pattern, a significant amount of research has been done in analyzing the popularity
growth patterns of YouTube videos. Unfortunately, no work has been done that intensively investigates the
popularity patterns of YouTube videos based on video object category. In this thesis, an in-depth analysis
of the popularity pattern of YouTube videos is performed, considering the categories of videos.
Metadata and request patterns were collected by employing category-specific YouTube crawlers. The
request patterns were observed for a period of five months. Results confirm that the time varying popularity
of di fferent YouTube categories are conspicuously diff erent, in spite of having sets of categories with very
similar viewing patterns. In particular, News and Sports exhibit similar growth curves, as do Music and
Film.
While for some categories views at early ages can be used to predict future popularity, for some others
predicting future popularity is a challenging task and require more sophisticated techniques, e.g., time-series clustering. The outcomes of these analyses are instrumental towards designing a reliable workload generator, which can be further used to evaluate diff erent caching policies for YouTube and similar sites. In this
thesis, workload generators for four of the YouTube categories are developed. Performance of these workload generators suggest that a complete category-specific workload generator can be developed using time-series clustering. Patterns of users' interaction with YouTube videos are also analyzed from a dataset collected in a local network. This shows the possible ways of improving the performance of Peer-to-Peer video distribution
technique along with a new video recommendation method.
|
2 |
Generating and Analyzing Synthetic Workloads using Iterative DistillationKurmas, Zachary Alan 14 May 2004 (has links)
The exponential growth in computing capability and use has produced a
high demand for large, high-performance storage systems.
Unfortunately, advances in storage system research have been limited
by (1) a lack of evaluation workloads, and (2) a limited understanding
of the interactions between workloads and storage systems. We have
developed a tool, the Distiller that helps address both
limitations.
Our thesis is as follows: Given a storage system and a workload for
that system, one can automatically identify a set of workload
characteristics that describes a set of synthetic workloads with the
same performance as the workload they model. These representative
synthetic workloads increase the number of available workloads with
which storage systems can be evaluated. More importantly, the
characteristics also identify those workload properties that affect
disk array performance, thereby highlighting the interactions between
workloads and storage systems.
This dissertation presents the design and evaluation of the Distiller.
Specifically, our contributions are as follows. (1) We demonstrate
that the Distiller finds synthetic workloads with at most 10% error
for six out of the eight workloads we tested. (2) We also find that
all of the potential error metrics we use to compare workload
performance have limitations. Additionally, although the internal
threshold that determines which attributes the Distiller chooses has a
small effect on the accuracy of the final synthetic workloads, it has
a large effect on the Distiller's running time. Similarly, (3) we find
that we can reduce the precision with which we measure attributes and
only moderately reduce the resulting synthetic workload's
accuracy. Finally, (4) we show how to use the information contained in
the chosen attributes to predict the performance effects of modifying
the storage system's prefetch length and stripe unit size.
|
3 |
MACHINE LEARNING-ASSISTED LOAD TESTINGIsaku, Erblin January 2021 (has links)
The increasing worldwide demand for software systems involved in society has led to the need where not only functionality is fundamental and addressed, but end-user satisfaction in terms of availability, throughput, and response time is essential and should be preserved. Thus, systems must be evaluated at preset load levels to assess the non-functional quality problems from the closest perspective of real application use. In this context, where the problem involves a high and complex search space, a search-based approach for load test generation is required. This thesis proposes and evaluates an evolutionary search-based approach for load test generation using multi-objective optimization methods consisting of selection, crossover, and mutation operators. In this thesis, load testing is addressed as a multi-objective optimization problem by using four different evolutionary algorithms: Non-dominated Storing Genetic Algorithm II (NSGA-II), Pareto Archived Evolution Strategy (PAES), The Strength Pareto Evolutionary Algorithm 2 (SPEA2), Multi-Objective Cellular Genetic Algorithm (MOCell) as well as a Random Search algorithm. Additionally, this study demonstrates the applicability of the proposed approach by running several experiments, aiming to compare the algorithms’ efficiency based on different quality indicators such as hypervolume, spread, and epsilon. Experimental results show that evolutionary search-based methods can be used to generate effective workloads. Since, all algorithms have found the optimal workload, having the hypervolume values to zero, we believe that the objectives of the problem could be combined as a single objective, hence scalarization techniques can be applicable. Based on the other quality indicators (Spread and Epsilon respectively), NSGA-II and MOCell tend to perform better compared to other algorithms. Finally, the study concludes that multi-objective evolutionary algorithms can be used for load testing purpose, obtaining better results in generating optimal workloads than an existing (adapted) model-free reinforcement learning approach.
|
4 |
Automated Performance Test Generation and Comparison for Complex Data Structures - Exemplified on High-Dimensional Spatio-Temporal IndicesMenninghaus, Mathias 23 August 2018 (has links)
There exist numerous approaches to index either spatio-temporal or high-dimensional data. None of them is able to efficiently index hybrid data types, thus spatio-temporal and high-dimensional data. As the best high-dimensional indexing techniques are only able to index point-data and not now-relative data and the best spatio-temporal indexing techniques suffer from the curse of dimensionality, this thesis introduces the Spatio-Temporal Pyramid Adapter (STPA). The STPA maps spatio-temporal data on points, now-values on the median of the data set and indexes them with the pyramid technique. For high-dimensional and spatio-temporal index structures no generally accepted benchmark exists. Most index structures are only evaluated by custom benchmarks and compared to a tiny set of competitors. Benchmarks may be biased as a structure may be created to perform well in a certain benchmark or a benchmark does not cover a certain speciality of the investigated structures. In this thesis, the Interface Based Performance Comparison (IBPC) technique is introduced. It automatically generates test sets with a high code coverage on the system under test (SUT) on the basis of all functions defined by a certain interface which all competitors support. Every test set is performed on every SUT and the performance results are weighted by the achieved coverage and summed up. These weighted performance results are then used to compare the structures. An implementation of the IBPC, the Performance Test Automation Framework (PTAF) is compared to a classic custom benchmark, a workload generator whose parameters are optimized by a genetic algorithm and a specific PTAF alternative which incorporates the specific behavior of the systems under test. This is done for a set of two high-dimensional spatio-temporal indices and twelve variants of the R-tree. The evaluation indicates that PTAF performs at least as good as the other approaches in terms of minimal test cases with a maximized coverage. Several case studies on PTAF demonstrate its widespread abilities.
|
Page generated in 0.0979 seconds