Global ETD Search

1	ENABLING REAL TIME INSTRUMENTATION USING RESERVOIR SAMPLING AND BIN PACKING Sai Pavan Kumar Meruga (16496823) 30 August 2023 (has links) <p><em>Software Instrumentation is the process of collecting data during an application’s runtime,</em></p> <p><em>which will help us debug, detect errors and optimize the performance of the binary. The</em></p> <p><em>recent increase in demand for low latency and high throughput systems has introduced new</em></p> <p><em>challenges to the process of Software Instrumentation. Software Instrumentation, especially</em></p> <p><em>dynamic, has a huge impact on systems performance in scenarios where there is no early</em></p> <p><em>knowledge of data to be collected. Naive approaches collect too much or too little</em></p> <p><em>data, negatively impacting the system’s performance.</em></p> <p><em>This thesis investigates the overhead added by reservoir sampling algorithms at different</em></p> <p><em>levels of granularity in real-time instrumentation of distributed software systems. Also, this thesis describes the implementation of sampling techniques and algorithms to reduce the overhead caused by instrumentation.</em></p> Software quality, processes and metrics Reservoir Sampling Bin Packing Dynamic Binary Instrumentation
2	Non-uniformity issues and workarounds in bounded-size sampling Gemulla, Rainer, Haas, Peter J., Lehner, Wolfgang 27 January 2023 (has links) A variety of schemes have been proposed in the literature to speed up query processing and analytics by incrementally maintaining a bounded-size uniform sample from a dataset in the presence of a sequence of insertion, deletion, and update transactions. These algorithms vary according to whether the dataset is an ordinary set or a multiset and whether the transaction sequence consists only of insertions or can include deletions and updates. We report on subtle non-uniformity issues that we found in a number of these prior bounded-size sampling schemes, including some of our own. We provide workarounds that can avoid the non-uniformity problem; these workarounds are easy to implement and incur negligible additional cost. We also consider the impact of non-uniformity in practice and describe simple statistical tests that can help detect non-uniformity in new algorithms. info:eu-repo/classification/ddc/004 ddc:004
3	Maintaining bounded-size sample synopses of evolving datasets Gemulla, Rainer, Lehner, Wolfgang, Haas, Peter J. 12 January 2023 (has links) Perhaps the most flexible synopsis of a database is a uniform random sample of the data; such samples are widely used to speed up processing of analytic queries and data-mining tasks, enhance query optimization, and facilitate information integration. The ability to bound the maximum size of a sample can be very convenient from a system-design point of view, because the task of memory management is simplified, especially when many samples are maintained simultaneously. In this paper, we study methods for incrementally maintaining a bounded-size uniform random sample of the items in a dataset in the presence of an arbitrary sequence of insertions and deletions. For “stable” datasets whose size remains roughly constant over time, we provide a novel sampling scheme, called “random pairing” (RP), that maintains a bounded-size uniform sample by using newly inserted data items to compensate for previous deletions. The RP algorithm is the first extension of the 45-year-old reservoir sampling algorithm to handle deletions; RP reduces to the “passive” algorithm of Babcock et al. when the insertions and deletions correspond to a moving window over a data stream. Experiments show that, when dataset-size fluctuations over time are not too extreme, RP is the algorithm of choice with respect to speed and sample-size stability. For “growing” datasets, we consider algorithms for periodically resizing a bounded-size random sample upwards. We prove that any such algorithm cannot avoid accessing the base data, and provide a novel resizing algorithm that minimizes the time needed to increase the sample size. We also show how to merge uniform samples from disjoint datasets to obtain a uniform sample of the union of the datasets; the merged sample can be incrementally maintained. Our new RPMerge algorithm extends the HRMerge algorithm of Brown and Haas to effectively deal with deletions, thereby facilitating efficient parallel sampling. info:eu-repo/classification/ddc/004 ddc:004
4	Online Sample Selection for Resource Constrained Networked Systems Sjösvärd, Philip, Miksits, Samuel January 2022 (has links) As more devices with different service requirements become connected to networked systems, such as Internet of Things (IoT) devices, maintaining quality of service becomes increasingly difficult. Large data sets can be obtained ahead of time in networks to train prediction models offline, however, resulting in high computational costs. Online learning is an alternative approach where a smaller cache of fixed size is maintained for training using sample selection algorithms, allowing for lower computational costs and real-time model re-computation. This project has resulted in two newly designed sample selection algorithms, Binned Relevance and Redundancy Sample Selection (BRR-SS) and Autoregressive First, In First Out-buffer (AR-FIFO). The algorithms are evaluated on data traces retrieved from a Key Value store and a Video on Demand service. Prediction accuracy of the resulting model while using the sample selection algorithms and the time to process a received sample is evaluated and compared to the pre-existing Reservoir Sampling (RS) and Relevance and Redundancy Sample Selection (RR-SS) with and without model re-computation. The results show that, while RS maintains the lowest computational overhead, BRR-SS outperforms both RS and RR-SS in prediction accuracy on the investigated traces. AR-FIFO, with its low computational cost, outperforms offline learning for larger cache sizes on the Key Value data set but shows inconsistencies on the Video on Demand trace. Model re-computation results in reduced error rates and significantly lowered variance on the investigated data traces, where periodic model re-computation overall outperforms change detection in practicality, prediction accuracy, and computational overhead. / Allteftersom fler enheter med olika servicekrav ansluts till nätverkssystem, såsom Internet of Things (IoT) enheter, ökar svårigheten att erhålla nödvändig servicekvalitet. Nätverk kan ge upphov till stora datamängder för träning av prediktionsmodeller offline, dock till en hög beräkningskostnad. Ett alternativt tillvägagångssätt är onlineinlärning där en mindre cache av fast storlek upprätthålls för träning med hjälp av datapunkturvalsalgoritmer. Detta möjliggör lägre beräkningskostnader samt realtidsmodellomräkningar. Detta projekt har resulterat i två nydesignade datapunkturvalsalgoritmer, Binned Relevance and Redundancy Sample Selection (BRR-SS) och Autoregressive First In, First Out-buffer (AR-FIFO). Algoritmerna utvärderas på dataspår som hämtats från ett Key Value-lager och en Video on Demand-tjänst. Förutsägelseförmåga för den resulterande modellen när datapunkturvalsalgoritmerna används och tid för bearbetning av mottagen datapunkt utvärderas och jämförs med dem redan existerande Reservoir Sampling (RS) och Relevance and Redundancy Sample Selection (RR-SS), med och utan modellomräkning. RS resulterar i lägst beräkningskostnad medan BRR-SS överträffar både RS och RR-SS i förutsägelseförmåga på dem undersökta spåren. AR-FIFO, med sin låga beräkningskostnad, överträffar offlineinlärning för större cachestorlekar på Key Value-spåret, men visar inkonsekvent beteende på Video on Demand-spåret. Modellomräkning resulterar i mindre fel och avsevärt sänkt varians på dem undersökta spåren, där periodisk modellomräkning totalt sett överträffar förändringsdetektering i praktikalitet, förutsägelseförmåga och beräkningskostnad. / Kandidatexjobb i elektroteknik 2022, KTH, Stockholm Online learning Sample Selection Real-time Learning Random Forest Reservoir Sampling Relevance and Redundancy Binned Distribution Autoregressive Model Change Detection Model Re-computation Elektroteknik och elektronik

1

Page generated in 0.0728 seconds