Global ETD Search

1	Optimizing Communication Cost in Distributed Query Processing / Optimisation du coût de communication des données dans le traitement des requêtes distribuées Belghoul, Abdeslem 07 July 2017 (has links) Dans cette thèse, nous étudions le problème d’optimisation du temps de transfert de données dans les systèmes de gestion de données distribuées, en nous focalisant sur la relation entre le temps de communication de données et la configuration du middleware. En réalité, le middleware détermine, entre autres, comment les données sont divisées en lots de F tuples et messages de M octets avant d’être communiqués à travers le réseau. Concrètement, nous nous concentrons sur la question de recherche suivante : étant donnée requête Q et l’environnement réseau, quelle est la meilleure configuration de F et M qui minimisent le temps de communication du résultat de la requête à travers le réseau?A notre connaissance, ce problème n’a jamais été étudié par la communauté de recherche en base de données.Premièrement, nous présentons une étude expérimentale qui met en évidence l’impact de la configuration du middleware sur le temps de transfert de données. Nous explorons deux paramètres du middleware que nous avons empiriquement identifiés comme ayant une influence importante sur le temps de transfert de données: (i) la taille du lot F (c’est-à-dire le nombre de tuples dans un lot qui est communiqué à la fois vers une application consommant des données) et (ii) la taille du message M (c’est-à-dire la taille en octets du tampon du middleware qui correspond à la quantité de données à transférer à partir du middleware vers la couche réseau). Ensuite, nous décrivons un modèle de coût permettant d’estimer le temps de transfert de données. Ce modèle de coût est basé sur la manière dont les données sont transférées entre les noeuds de traitement de données. Notre modèle de coût est basé sur deux observations cruciales: (i) les lots et les messages de données sont communiqués différemment sur le réseau : les lots sont communiqués de façon synchrone et les messages dans un lot sont communiqués en pipeline (asynchrone) et (ii) en raison de la latence réseau, le coût de transfert du premier message d’un lot est plus élevé que le coût de transfert des autres messages du même lot. Nous proposons une stratégie pour calibrer les poids du premier et non premier messages dans un lot. Ces poids sont des paramètres dépendant de l’environnement réseau et sont utilisés par la fonction d’estimation du temps de communication de données. Enfin, nous développons un algorithme d’optimisation permettant de calculer les valeurs des paramètres F et M qui fournissent un bon compromis entre un temps optimisé de communication de données et une consommation minimale de ressources. L’approche proposée dans cette thèse a été validée expérimentalement en utilisant des données issues d’une application en Astronomie. / In this thesis, we take a complementary look to the problem of optimizing the time for communicating query results in distributed query processing, by investigating the relationship between the communication time and the middleware configuration. Indeed, the middleware determines, among others, how data is divided into batches and messages before being communicated over the network. Concretely, we focus on the research question: given a query Q and a network environment, what is the best middleware configuration that minimizes the time for transferring the query result over the network? To the best of our knowledge, the database research community does not have well-established strategies for middleware tuning. We present first an intensive experimental study that emphasizes the crucial impact of middleware configuration on the time for communicating query results. We focus on two middleware parameters that we empirically identified as having an important influence on the communication time: (i) the fetch size F (i.e., the number of tuples in a batch that is communicated at once to an application consuming the data) and (ii) the message size M (i.e., the size in bytes of the middleware buffer, which corresponds to the amount of data that can be communicated at once from the middleware to the network layer; a batch of F tuples can be communicated via one or several messages of M bytes). Then, we describe a cost model for estimating the communication time, which is based on how data is communicated between computation nodes. Precisely, our cost model is based on two crucial observations: (i) batches and messages are communicated differently over the network: batches are communicated synchronously, whereas messages in a batch are communicated in pipeline (asynchronously), and (ii) due to network latency, it is more expensive to communicate the first message in a batch compared to any other message that is not the first in its batch. We propose an effective strategy for calibrating the network-dependent parameters of the communication time estimation function i.e, the costs of first message and non first message in their batch. Finally, we develop an optimization algorithm to effectively compute the values of the middleware parameters F and M that minimize the communication time. The proposed algorithm allows to quickly find (in small fraction of a second) the values of the middleware parameters F and M that translate a good trade-off between low resource consumption and low communication time. The proposed approach has been evaluated using a dataset issued from application in Astronomy. Middleware Traitement des requêtes distribuées Coût de communication de données Taille du fetch Taille du message Optimisation du coût de communication Middleware Distributed query processing Communication cost model Fetch size Message size Optimizing communication cost
2	Forest edges in boreal landscapes - factors affecting edge influence Jansson, Ulrika January 2009 (has links) The boreal forest in Fennoscandia has been subjected to major loss and fragmentation of natural forests due to intensive forestry. This has resulted in that forest edges are now abundant and important landscape features. Edges have documented effects on the structure, function and biodiversity in forests. Edge influence on biodiversity is complex and depends on interactions between many local and regional factors. This thesis focuses on sharp forest edges and their potential to influence biodiversity at the landscape-level. I have developed a method for quantification and characterization of sharp forest edges by interpretation of colour infrared (CIR) aerial photographs in combination with line intersect sampling (LIS) and sample plots. The method was used to estimate density of forest edge in 28 landscapes (each 1600 ha) in northern Sweden, differing in management intensity, landscape composition and geographical location. Forest edges were described in detail using edge, canopy and neighbourhood attributes. By combining these attributes it was possible to classify edges with respect to levels of exposure. A field experiment was conducted to examine the effect of edge contrast on growth of the old forest lichen Usnea longissima. The edge quantification method is accurate and efficient for estimating the length of sharp forest edges on an area basis (edge density, m ha-1) and for collecting detailed attributes of edges and their surroundings. In northern Sweden, the forest edge density is high (54 m ha-1) but varies extensively (12-102 m ha-1) between landscapes. Edge density is strongly correlated with the level of human disturbance and increases towards the southern part of the study area, at lower altitudes were management intensity is highest. Edge orientation, contrast and neighbourhood size shows an immense variation between edges and also varies between edge types. Regenerating edges are generally of higher contrast and face larger neighbourhoods than natural edges. Maintained edges had high contrast but small neighbourhoods. A larger proportion of edges in mature forests are highly exposed to microclimatic edge influence than edges in general. The field experiment revealed that growth of U. longissima was highest near edges where the vegetation on the adjacent area was sheltering, but not shading, the lichen. In the present thesis, I have provided a valuable tool for estimating density of forest edges with potential to yield information on important factors determining edge influence at landscape-level. The large variability in edge density, edge and neighbourhood attributes imply large differences in microclimate anf thus in the potential for ede influence. Management and conservation strategies must incorporate these factors to realistically address edge influence on biota at the landscape-level. aerial photographs edge contrast edge density edge length fetch size forest fragmentation lichen growth line intersect sampling pendulous lichen photo interpretation skogskant flygbild kantlängd lav Terrestrial ecology Terrestisk ekologi

Search results

Optimizing Communication Cost in Distributed Query Processing / Optimisation du coût de communication des données dans le traitement des requêtes distribuées

Forest edges in boreal landscapes - factors affecting edge influence