1 |
Using Rabin-Karp fingerprints and LevelDB for faster searchesDeighton, Richard A. 01 December 2012 (has links)
This thesis represents the results of a study into using fingerprints generated according to the Rabin-Karp Algorithm, and a database LevelDB to achieve Text Search times below GREP, which is a standard command-line UNIX text search tool.
Text Search is a set of algorithms that find a string of characters called a Search Pattern in a much larger string of characters in a document we call a text file.
The Rabin-Karp Algorithm iterates through a text file converting character strings into fingerprints at each location. A fingerprint numerically represents a window length string of characters to the left of its location. The algorithm compares the calculated fingerprint to the Search Pattern’s fingerprint. When fingerprints are not equal, we can guarantee the corresponding strings will not match. Whereas when fingerprints are, the strings probably match. A verification process confirms matches by checking respective characters.
Our application emerges after making the following major changes to the Rabin-Karp Algorithm. First, we employ a two-step technique rather than one. During step 1, the preprocessing step, we calculate and store fingerprints in a LevelDB database called an Index Database. This is our first major change unique to us. Step 2, the matching step, is our second unique change. We use the Index Database to look-up the Search Pattern’s fingerprint and gather its set of locations. Finally, we allow the pattern to be any length relative to the window length. We even created an equation to check if the difference in length is too long for the fingerprint’s number system base.
We facilitated our performance experiments by first building our application and testing it against GREP for a wide range of different parameters. Our conclusions and recommendations determine that although we currently only outperform GREP in about half the cases, we identify some promising opportunities to modify some parts of our application so that we can outperform GREP in all instances. / UOIT
|
2 |
An analysis of LSM caching in NVRAMLersch, Lucas, Oukid, Ismail, Lehner, Wolfgang, Schreter, Ivan 13 June 2022 (has links)
The rise of NVRAM technologies promises to change the way we think about system architectures. In order to fully exploit its advantages, it is required to develop systems specially tailored for NVRAM devices. Not only this imposes great challenges, but developing full system architectures from scratch is undesirable in many scenarios due to prohibitive development costs. Instead, we analyze in this paper the behavior of an existing log-structured persistent key-value store, namely LevelDB, when run on top of an emulated NVRAM device. We investigate initial opportunities for improvement when adapting a system tailored for HDD/SSDs to run on top of an NVRAM environment. Furthermore, we analyze the behavior of the legacy DRAM caching component of LevelDB and whether more suitable caching policies are required.
|
3 |
Measuring RocksDB performance and adaptive sampling for model estimationLaprés-Chartrand, Jean 01 1900 (has links)
This thesis focuses on two topics, namely statistical learning and the prediction of key performance indicators in the performance evaluation of a storage engine.
The part on statistical learning presents a novel algorithm adjusting the sampling size for the Monte Carlo approximation of the function to be minimized, allowing a reduction of the true function at a given probability and this, at a lower numerical cost.
The sampling strategy is embedded in a trust-region algorithm, using the Fisher Information matrix, also called BHHH approximation, to approximate the Hessian matrix. The sampling strategy is tested on a logit model generated from synthetic data.
Numerical results exhibit a significant reduction in the time required to optimize the model when an adequate smoothing is applied to the function.
The key performance indicator prediction part describes a novel strategy to select better settings for RocksDB that optimize its throughput, using the log files to analyze and identify suboptimal parameters, opening the possibility to greatly accelerate modern storage engine tuning. / Ce mémoire s’intéresse à deux sujets, un relié à l’apprentisage statistique et le second à la
prédiction d’indicateurs de performance dans un système de stockage de type clé-valeur.
La partie sur l’apprentissage statistique développe un algorithme ajustant la taille
d’échantillonnage pour l’approximation Monte Carlo de la fonction à minimiser, permettant
une réduction de la véritable fonction avec une probabilité donnée, et ce à un coût
numérique moindre. La stratégie d’échantillonnage est développée dans un contexte de région
de confiance en utilisant la matrice d’information de Fisher, aussi appelée approximation
BHHH de la matrice hessienne. La stratégie d’échantillonnage est testée sur un modèle logit
généré à partir de données synthétiques suivant le même modèle. Les résultats numériques
montrent une réduction siginificative du temps requis pour optimiser le modèle lorsqu’un
lissage adéquat est appliqué.
La partie de prédiction d’indicateurs de performance décrit une nouvelle approche pour
optimiser la vitesse maximale d’insertion de paire clé-valeur dans le système de stockage
RocksDB. Les fichiers journaux sont utilisés pour identifier les paramètres sous-optimaux du
système et accélérer la recherche de paramètres optimaux.
|
Page generated in 0.0294 seconds