Global ETD Search

1	Computations on Massive Data Sets : Streaming Algorithms and Two-party Communication / Calculs sur des grosses données : algorithmes de streaming et communication entre deux joueurs Konrad, Christian 05 July 2013 (has links) Dans cette thèse on considère deux modèles de calcul qui abordent des problèmes qui se posent lors du traitement des grosses données. Le premier modèle est le modèle de streaming. Lors du traitement des grosses données, un accès aux données de façon aléatoire est trop couteux. Les algorithmes de streaming ont un accès restreint aux données: ils lisent les données de façon séquentielle (par passage) une fois ou peu de fois. De plus, les algorithmes de streaming utilisent une mémoire d'accès aléatoire de taille sous-linéaire dans la taille des données. Le deuxième modèle est le modèle de communication. Lors du traitement des données par plusieurs entités de calcul situées à des endroits différents, l'échange des messages pour la synchronisation de leurs calculs est souvent un goulet d'étranglement. Il est donc préférable de minimiser la quantité de communication. Un modèle particulier est la communication à sens unique entre deux participants. Dans ce modèle, deux participants calculent un résultat en fonction des données qui sont partagées entre eux et la communication se réduit à un seul message. On étudie les problèmes suivants: 1) Les couplages dans le modèle de streaming. L'entrée du problème est un flux d'arêtes d'un graphe G=(V,E) avec n=\|V\|. On recherche un algorithme de streaming qui calcule un couplage de grande taille en utilisant une mémoire de taille O(n polylog n). L'algorithme glouton remplit ces contraintes et calcule un couplage de taille au moins 1/2 fois la taille d'un couplage maximum. Une question ouverte depuis longtemps demande si l'algorithme glouton est optimal si aucune hypothèse sur l'ordre des arêtes dans le flux est faite. Nous montrons qu'il y a un meilleur algorithme que l'algorithme glouton si les arêtes du graphe sont dans un ordre uniformément aléatoire. De plus, nous montrons qu'avec deux passages on peut calculer un couplage de taille strictement supérieur à 1/2 fois la taille d'un couplage maximum sans contraintes sur l'ordre des arêtes. 2) Les semi-couplages en streaming et en communication. Un semi-couplage dans un graphe biparti G=(A,B,E) est un sous-ensemble d'arêtes qui couple tous les sommets de type A exactement une fois aux sommets de type B de façon pas forcement injective. L'objectif est de minimiser le nombre de sommets de type A qui sont couplés aux même sommets de type B. Pour ce problème, nous montrons un algorithme qui, pour tout 0<=ε<=1, calcule une O(n^((1-ε)/2))-approximation en utilisant une mémoire de taille Ô(n^(1+ε)). De plus, nous montrons des bornes supérieures et des bornes inférieurs pour la complexité de communication entre deux participants pour ce problème et des nouveaux résultats concernant la structure des semi-couplages. 3) Validité des fichiers XML dans le modèle de streaming. Un fichier XML de taille n est une séquence de balises ouvrantes et fermantes. Une DTD est un ensemble de contraintes de validité locales d'un fichier XML. Nous étudions des algorithmes de streaming pour tester si un fichier XML satisfait les contraintes décrites dans une DTD. Notre résultat principal est un algorithme de streaming qui fait O(log n) passages, utilise 3 flux auxiliaires et une mémoire de taille O(log^2 n). De plus, pour le problème de validation des fichiers XML qui décrivent des arbres binaires, nous présentons des algorithmes en un passage et deux passages qui une mémoire de taille sous-linéaire. 4) Correction d'erreur pour la distance du cantonnier. Alice et Bob ont des ensembles de n points sur une grille en d dimensions. Alice envoit un échantillon de petite taille à Bob qui, après réception, déplace ses points pour que la distance du cantonnier entre les points d'Alice et les points de Bob diminue. Pour tout k>0 nous montrons qu'il y a un protocole presque optimal de communication avec coût de communication Ô(kd) tel que les déplacements des points effectués par Bob aboutissent à un facteur d'approximation de O(d) par rapport aux meilleurs déplacements de d points. / In this PhD thesis, we consider two computational models that address problems that arise when processing massive data sets. The first model is the Data Streaming Model. When processing massive data sets, random access to the input data is very costly. Therefore, streaming algorithms only have restricted access to the input data: They sequentially scan the input data once or only a few times. In addition, streaming algorithms use a random access memory of sublinear size in the length of the input. Sequential input access and sublinear memory are drastic limitations when designing algorithms. The major goal of this PhD thesis is to explore the limitations and the strengths of the streaming model. The second model is the Communication Model. When data is processed by multiple computational units at different locations, then the message exchange of the participating parties for synchronizing their calculations is often a bottleneck. The amount of communication should hence be as little as possible. A particular setting is the one-way two-party communication setting. Here, two parties collectively compute a function of the input data that is split among the two parties, and the whole message exchange reduces to a single message from one party to the other one. We study the following four problems in the context of streaming algorithms and one-way two-party communication: (1) Matchings in the Streaming Model. We are given a stream of edges of a graph G=(V,E) with n=\|V\|, and the goal is to design a streaming algorithm that computes a matching using a random access memory of size O(n polylog n). The Greedy matching algorithm fits into this setting and computes a matching of size at least 1/2 times the size of a maximum matching. A long standing open question is whether the Greedy algorithm is optimal if no assumption about the order of the input stream is made. We show that it is possible to improve on the Greedy algorithm if the input stream is in uniform random order. Furthermore, we show that with two passes an approximation ratio strictly larger than 1/2 can be obtained if no assumption on the order of the input stream is made. (2) Semi-matchings in Streaming and in Two-party Communication. A semi-matching in a bipartite graph G=(A,B,E) is a subset of edges that matches all A vertices exactly once to B vertices, not necessarily in an injective way. The goal is to minimize the maximal number of A vertices that are matched to the same B vertex. We show that for any 0<=ε<=1, there is a one-pass streaming algorithm that computes an O(n^((1-ε)/2))-approximation using Ô(n^(1+ε)) space. Furthermore, we provide upper and lower bounds on the two-party communication complexity of this problem, as well as new results on the structure of semi-matchings. (3) Validity of XML Documents in the Streaming Model. An XML document of length n is a sequence of opening and closing tags. A DTD is a set of local validity constraints of an XML document. We study streaming algorithms for checking whether an XML document fulfills the validity constraints of a given DTD. Our main result is an O(log n)-pass streaming algorithm with 3 auxiliary streams and O(log^2 n) space for this problem. Furthermore, we present one-pass and two-pass sublinear space streaming algorithms for checking validity of XML documents that encode binary trees. (4) Budget-Error-Correcting under Earth-Mover-Distance. We study the following one-way two-party communication problem. Alice and Bob have sets of n points on a d-dimensional grid [Δ]^d for an integer Δ. Alice sends a small sketch of her points to Bob and Bob adjusts his point set towards Alice's point set so that the Earth-Mover-Distance of Bob's points and Alice's points decreases. For any k>0, we show that there is an almost tight randomized protocol with communication cost Ô(kd) such that Bob's adjustments lead to an O(d)-approximation compared to the k best possible adjustments that Bob could make. Algorithmes de streaming Complexité de la communication Mémoire sous-linéaire Couplage Semi-couplage Xml Validité Distance du cantonnier Streaming algorithms Communication complexity Sublinear space Matching Semi-matching Xml Validity Earth-mover-distance
2	Développement de bancs de tests dédiés à la modélisation comportementale d’ampliﬁcateurs de puissance RF et micro-ondes / Development of test benches dedicated to the behavioral modeling of RF and microwave power ampliﬁers Gapillout, Damien 15 November 2017 (has links) Le travail présenté dans ce manuscrit a pour objet l’étude et le développement d’un banc de caractérisation généraliste appliqué à l’extraction du modèle comportemental d’ampliﬁcateur TPM-NIM (Two-Path Memory Nonlinear Integral). Ce modèle qui dispose d’une des architectures les plus abouties au laboratoire XLIM requiert une instrumentation microonde haut de gamme, très onéreuse, hors de portée de la majorité des concepteurs pour sa mise en œuvre expérimentale. L’objectif est donc de proposer des principes de mesure originaux permettant d’identiﬁer le modèle TPM-NIM avec une instrumentation standard. Dans ces travaux, deux bancs sont présentés : tout d’abord, un banc de caractérisation développé autour d’une instrumentation de pointe disposant des meilleures propriétés pour extraire le modèle. Puis, un banc construit autour d’une instrumentation standard mais incluant des méthodes de traitement et de mesure novatrices. Ces deux bancs ont été utilisés avec plusieurs véhicules de tests et il ressort que le second permet de diminuer le bruit des mesures de phase tout en réduisant le coût total des équipements. Enﬁn, une dernière partie est consacrée à la comparaison du modèle TPM-NIM avec deux modèles comportementaux classiques mettant en avant sa polyvalence. / The work presented in this manuscript is devoted to the study and development of a general characterization bench applied to the extraction of the TPM-NIM (Two-Path Memory Nonlinear Integral) ampliﬁer behavioral model. This model, has one of the most advanced architectures at the XLIM laboratory. It requires a high-end microwave instrumentation, overpriced and beyond reach for most of the designers for its experimental implementation. The aim is to propose some original measurements principles allowing the TPM-NIM model’s identiﬁcation with a standard instrumentation. Two benches are presented in these works : ﬁrstly, a characterization bench, developed using a high performance instrumentation with the best properties to extract the model. Then, a bench, built with a standard instrumentation but through innovative processing and measurement methods. These two benches have been used with several test vehicles and it appears that the second one decreases the noise of phase measurements while reducing the equipment’s total cost. Finally, a last part is dedicated to the comparison of the TPM-NIM model with two classic behavioral models by emphasizing its versatility. Amplificateur de puissance Modélisation comportementale Séries de Volterra Mémoire non-linéaire Banc de mesure NVNA VNA Power amplifier Behavioral modeling Volterra series Nonlinear memory Measurement setup NVNA VNA 621.381 535
3	Profile guided hybrid compilation / Compilation hybride guidée pour profilage Nunes Sampaio, Diogo 14 December 2016 (has links) L'auteur n'a pas fourni de résumé en français / The end of chip frequency scaling capacity, due heat dissipation limitations, made manufacturers search for an alternative to sustain the processing capacity growth. The chosen solution was to increase the hardware parallelism, by packing multiple independent processors in a single chip, in a Multiple-Instruction Multiple-Data (MIMD) fashion, each with special instructions to operate over a vector of data, in a Single-Instruction Multiple-Data (SIMD) manner. Such paradigm change, brought to software developer the convoluted task of producing efficient and scalable applications. Programming languages and associated tools evolved to aid such task for new developed applications. But automated optimizations capable of coping with such a new complex hardware, from legacy, single threaded applications, is still lacking.To apply code transformations, either developers or compilers, require to assert that, by doing so, they are not changing the expected comportment of the application producing unexpected results. But syntactically poor codes, such as use of pointer parameters with multiple possible indirections, complex loop structures, or incomplete codes, make very hard to extract application behavior solely from the source code in what is called a static analyses. To cope with the lack of information extracted from the source code, many tools and research has been done in, how to use dynamic analyses, that does application profiling based on run-time information, to fill the missing information. The combination of static and dynamic information to characterize an application are called hybrid analyses. This works advocates for the use of hybrid analyses to be able to optimizations on loops, regions where most of computations are done. It proposes a framework capable of statically applying some complex loop transformations, that previously would be considered unsafe, by assuring their safe use during run-time with a lightweight test.The proposed framework uses application execution profiling to help the static loop optimizer to: 1) identify and classify program hot-spots, so as to focus only on regions vital for the execution time; 2) guide the optimizer in understanding the overall loop behavior, so as to reduce the valid loop transformations search space; 3) using instruction's memory access functions, it statically builds a lightweight run-time test that determine, based on the program parameters values, if a given optimization is safe to be used or not. It's applicability is shown by performing complex loop transformations into a variety of loops, obtained from applications of different fields, and demonstrating that the run-time overhead is insignificant compared to the loop execution time or gained performance, in the vast majority of cases. Optimisation de boucles Compilation hybride Profilage Accès mémoire non-Linéaire Élimination quantificateurs Compilateur Compilers Hybrid compilation Loop optimization Non-affine memory access functions Profiling Quantifier elimination 004

1

Page generated in 0.0544 seconds