Spelling suggestions: "subject:"iniform sampling"" "subject:"niform sampling""
11 |
Estimation spectrale parcimonieuse de signaux à échantillonnage irrégulier : application à l’analyse vibratoire d’aubes de turbomachines à partir de signaux tip-timing / Sparse spectral analysis of irregularly sampled signals : application to the vibrating analysis of turbomachine blades from tip-timing signalsBouchain, Antoine 25 April 2019 (has links)
Dans le cadre de la certification de ses moteurs d'hélicoptères, Safran Helicopter Engines réalise des essais en fonctionnement lors desquels les réponses vibratoires de turbomachines (compresseurs et turbines) sont mesurées. Les réponses vibratoires contiennent des modes (ou raies spectrales) dont les fréquences et amplitudes doivent être caractérisées. Les mesures sont réalisées par la technologie tip-timing qui permet d'observer les vibrations de toutes les pales d'un aubage en rotation.Cependant, la technologie tip-timing présente deux spécificités importantes. Premièrement, l'échantillonnage des signaux de vibrations est irrégulier quasi-périodique. Deuxièmement, l'ordre de grandeur des fréquences de vibration est généralement supérieur à la fréquence d'échantillonnage équivalente. Ces deux caractéristiques donnent lieu à des artefacts des composantes fréquentielles sur les spectres des signaux de vibrations. Ceux-ci gênent alors fortement l'identification du contenu spectral et perturbent donc l'interprétation du comportement vibratoire des pales.La nouvelle méthode d'analyse spectrale proposée s'appuie sur une modélisation parcimonieuse des signaux tip-timing et prend en compte les variations de la fréquence de rotation. L'analyse spectrale des signaux est alors réalisée par la minimisation d'un critère des moindres carrés linéaires régularisé par une pénalisation de "norme-l0" par l'algorithme Block-OMP.À l'aide de résultats numériques sur signaux synthétiques, il est démontré que cette méthode fournit de bonnes performances d'estimations des composantes spectrales et réalise une réduction importante de leurs artefacts. La prise en compte des variations de la fréquence de rotation permet en effet de tirer profit de l'utilisation de longues durées d'observation afin de réduire significativement les artefacts des composantes fréquentielles contenus dans les spectres. Par ailleurs, avec des performances légèrement meilleures à celles de l'ESMV (méthode reconnue pour l'analyse spectrale des signaux tip-timing), la méthode proposée est environ cent fois plus rapide.Deux cas de données réelles sont étudiés. À travers une détection de crique de pale, le premier cas d'étude montre que la méthode proposée est pertinente et réalise des estimations comparables aux méthodes industrielles. Le second cas d'étude présente plusieurs vibrations synchrones et asynchrones simultanées. Cela met en avant la capacité de réduction des artefacts des composantes fréquentielles de la méthode développée afin de faciliter l'interprétation du contenu vibratoire complexe de ce signal.L'optimisation du placement des sondes tip-timing est également étudiée pour faciliter l'identification des composantes synchrones. À partir de résultats numériques, il est démontré qu'éloigner les capteurs améliore l'estimation des amplitudes ce type de composantes. / As part of the certification of its helicopter engines, Safran Helicopter Engines performs operational tests in which the vibrations responses of turbomachines (compressors and turbines) are measured. The vibratory responses contain modes (or spectral lines) whose frequencies and amplitudes must be characterized. The measurements are provided by the tip-timing technology which can observe the vibrations of all the blades while rotating.However, tip-timing technology has two important features. Firstly, the sampling of the vibrating signals is irregular quasi-periodic. Secondly, the vibrating frequencies are generally higher than the equivalent sampling frequency. These two characteristics generate frequency components artefacts onto the vibrating signals spectrum. As a consequence, they strongly hinder the identification of the spectral content and thus disturb the interpretation of the blades vibratory behaviour.The proposed new spectral analysis method relies on sparse modelling of the tip-timing signals and considers the variations of the rotational frequency. The spectral analysis of the signals is then performed by the minimization of a linear least squares criterion regularized by a penalty of "norm-l0" by the Block-OMP algorithm.Using numerical results from synthetic signals, it is shown that this method provides good spectral component estimation performances and achieves a significant reduction of their artefacts. Considering the variations of the rotational frequency allows to take advantage of the use of long observation periods in order to significantly reduce the frequency components artefacts contained in the spectrum. In addition, with slightly better performances than the ESMV (acknowledged method for the tip-timing signals spectral analysis), the proposed method is about a hundred times faster.Two cases of real data are studied. Through a detection of a blade crack, the first studied case shows that the proposed method is relevant and makes equivalent estimates with respect to industrial methods. The second studied case presents several simultaneous synchronous and asynchronous vibrations. That highlights the ability to reduce the frequency components artefacts of the developed method in order to simplify the interpretation of the complex vibratory content of this signal.The optimization of the positioning of the tip-timing probes is also studied in order to simplify the identification of synchronous components. From numerical results, it is demonstrated that moving away the probes improves the amplitudes estimation of this type of components.
|
12 |
Compressed Sensing for Electronic Radio Frequency Receiver:Detection, Sensitivity, and ImplementationLin, Ethan 02 May 2016 (has links)
No description available.
|
13 |
Capteur d'images événementiel, asynchrone à échantillonnage non-uniforme / Asynchronous Event-driven Image SensorDarwish, Amani 27 June 2016 (has links)
Face aux défis actuels liés à la conception de capteurs d'images à forte résolution comme la limitation de la consommation électrique, l'augmentation du flux de données ainsi que le traitement de données associé, on propose, à travers cette thèse, un capteur d'image novateur asynchrone à échantillonnage non uniforme.Ce capteur d’images asynchrone est basé sur une matrice de pixels événementiels qui intègrent un échantillonnage non uniforme par traversée de niveaux. Contrairement aux imageurs conventionnels, où les pixels sont lus systématiquement lors de chaque trame, les pixels événementiels proposés sont consultés que lorsqu’ils contiennent une information pertinente. Cela induit un flux de données réduit et dépendant de l’image.Pour compléter la chaîne de traitement des pixels, on présente également une architecture numérique de lecture dédiée conçue en utilisant de la logique asynchrone et destinée à contrôler et à gérer le flux de données des pixels événementiels. Ce circuit de lecture numérique permet de surmonter les difficultés classiques rencontrées lors de la gestion des demandes simultanées des pixels événementiels sans dégrader la résolution et le facteur de remplissage du capteur d’images. En outre, le circuit de lecture proposé permet de réduire considérablement les redondances spatiales dans une image ce qui diminue encore le flux de données.Enfin, en combinant l'aspect échantillonnage par traversée de niveau et la technique de lecture proposée, on a pu remplacer la conversion analogique numérique classique de la chaîne de traitement des pixels par une conversion temps-numérique (Time-to-Digital Conversion). En d'autres termes, l'information du pixel est codée par le temps. Il en résulte une diminution accrue de la consommation électrique du système de vision, le convertisseur analogique-numérique étant un des composants les plus consommant du système de lecture des capteurs d'images conventionnels / In order to overcome the challenges associated with the design of high resolution image sensors, we propose, through this thesis, an innovative asynchronous event-driven image sensor based on non-uniform sampling. The proposed image sensor aims the reduction of the data flow and its associated data processing by limiting the activity of our image sensor to the new captured information.The proposed asynchronous image sensor is based on an event-driven pixels that incorporate a non-uniform sampling crossing levels. Unlike conventional imagers, where the pixels are read systematically at each frame, the proposed event-driven pixels are only read when they hold new and relevant information. This induces a reduced and scene dependent data flow.In this thesis, we introduce a complete pixel reading sequence. Beside the event-driven pixel, the proposed reading system is designed using asynchronous logic and adapted to control and manage the flow of data from event pixels. This digital reading system overcomes the traditional difficulties encountered in the management of simultaneous requests for event pixels without degrading the resolution and fill factor of the image sensor. In addition, the proposed reading circuit significantly reduces the spatial redundancy in an image which further reduces the data flow.Finally, by combining the aspect of level crossing sampling and the proposed reading technique, we replaced the conventional analog to digital conversion of the pixel processing chain by a time-to-digital Conversion (TDC). In other words, the pixel information is coded by time. This results in an increased reduction in power consumption of the vision system, the analog-digital converter being one of the most consuming reading system of conventional image sensors components
|
14 |
Spectrométrie de masse FT-ICR bidimensionnelle, développements et applications / Two-dimensional FT-ICR mass spectrometry, developments and applicationsBouclon, Julien 26 January 2018 (has links)
La spectrométrie de masse fournit deux types d’informations : la masse moléculaire des molécules présentes dans un mélange, en une première expérience (MS), puis leurs structures après isolation suivie de fragmentation, obtenues une à une (MS/MS). La spectrométrie de masse FT-ICR bidimensionnelle permet d’obtenir toutes ces informations en une seule expérience, sans isolation, quelle que soit la complexité de l’échantillon. Le prix à payer est une faible résolution dans la dimension indirecte, pouvant être améliorée par une augmentation du temps d’analyse, mais qui semblait limiter cette technique à une simple curiosité scientifique.Le premier objectif est d’implémenter l’échantillonnage non uniforme (NUS) en FT-ICR MS 2D. Cette technique consiste en l’acquisition aléatoire du même nombre de points dans la dimension indirecte que lors d’une acquisition uniforme, mais sur une plage de t1max plus grande. Les points manquants sont ensuite reconstruits par des algorithmes, entrainant une augmentation significative de la résolution du signal sans perte de temps sur le spectromètre. La première étape est de créer un algorithme générant un échantillonnage aléatoire de distribution uniforme pour une couverture optimale de la plage de t1max. Les algorithmes de reconstruction ayant des difficultés à reconstituer des signaux de faible intensité quand le nombre de points non échantillonnés augmente, la deuxième étape est de déterminer le facteur de sous-échantillonnage optimal afin d’obtenir le bon compromis entre résolution et signal-sur-bruit. La troisième étape est de réaliser des spectres MS/MS ayant des massifs isotopiques corrects pour des fragments produits à partir des isotopes les plus lourds.Le deuxième objectif est de décrire le comportement des ions dans la cellule ICR en fonction des impulsions RF utilisées à partir des équations de Lorentz. Dans une première partie, le but est d’établir les équations qui régissent le mouvement des ions précurseurs jusqu’à leur détection. Ensuite, il s’agit d’introduire la fragmentation et de déterminer les solutions analytiques décrivant le mouvement des fragments. La dernière étape est de simuler le comportement des ions précurseurs tout au long de la séquence d’impulsions ainsi que celui des nuages de fragments, de leur formation à leur détection. / Mass spectrometry provides two kinds of information: the molecular mass of molecules present in a mixture, obtained all at once (MS), and structure through isolation and fragmentation, obtained one by one (MS/MS). Two-dimensional FT-ICR MS allows simultaneous parallel acquisition of structural information without isolation, regardless of the number of molecules. Nevertheless, the low resolution in the indirect dimension, which could be improved by increasing the acquisition time, seemed to limit this method to a simple curiosity. The first objective is to implement non-uniform-sampling (NUS) in 2D FT-ICR MS. This method consist in the random acquisition of the same number of points in the indirect dimension as in uniform acquisition, but over a wider t1max range. Processing algorithms then reconstruct skipped points, and the result is an increase of signal resolution without wasting analysis time. The first step is to create an algorithm that can generate random sampling with a uniform distribution for an optimal coverage of the t1max range. Processing algorithms may have trouble to reconstruct small peaks when the number of skipped points increases. The second step is to choose the under-sampling ratio for the best compromise between the gain in resolution and the signal-to-noise ratio. The third and last step is to obtain MS/MS spectra with corrects isotopic patterns for fragments produced from heavy isotopes. The second objective is to improve the understanding of ion motions in ICR cells depending on the RF pulses by using Lorentz equations. The first goal is to determine the equations governing precursors motions until their detection. Then the fragmentation will be introduced and analytical solutions describing fragments motions will be established. The last step is to simulate the trajectories of precursors throughout the entire pulse sequence as well as the behavior of fragments clouds, from their formation to detection.
|
15 |
Padrões de Diversidade de Muscidae (Insecta, Diptera) na Planície Costeira do Rio Grande do Sul, Brasil / Diversity patterns of Muscidae (Insecta, Diptera) in the Coastal Plains of Rio Grande do Sul, BrazilSILVA, ândrio Zafalon da 05 July 2013 (has links)
Made available in DSpace on 2014-08-20T14:31:30Z (GMT). No. of bitstreams: 1
dissertacao_andrio_zafalon_silva.pdf: 3030038 bytes, checksum: a10638c2996d28623646b9fa7980b243 (MD5)
Previous issue date: 2013-07-05 / The fragmentation of natural environments is a major concern among conservation ecologists nowadays, especially in areas with a great diversity of habitats as the coastal plains of Rio Grande do Sul. In order to provide the basis to the managers responsible for the maintenance of natural areas, as well as to develop the knowledge of the Muscidae family in the southern Brazil, this study aimed to determine the biodiversity of the Muscidae family, in addition to proposing a method to evaluate the pattern of the species-area relationship (SAR) without using a different sampling effort between the areas of different sizes. To do that, 140 Malaise traps were installed distributed in 35 areas of five regions in the Coastal Plains. Each region was composed of seven areas with four traps at proportional distances. A total of 6102 individuals were collected distributed among 120 species/morphspecies in the five regions. Based on the lists it can be stated the degree of conservation and the anthropic influence on the studied areas, mainly by the presence of the species of Muscinae and Coenosiinae. The similarity between the communities of Muscidae was determined by the proximity of the collection places or by the climatic conditions of the period, and it was also observed a positive relationship in the species-area relationship for the linear and logistic models. Although the model with the power function is not significant, it showed coefficients that allow the interpretation of the types of fragments that occur in the coastal plains of RS. / A fragmentação das áreas naturais é uma das maiores preocupações entre os ecólogos da conservação na atualidade, principalmente em áreas com uma grande diversidade de habitats como a Planície Costeira do Rio Grande do Sul. A fim de prover embasamento aos gestores responsáveis pela manutenção das áreas naturais e desenvolver o conhecimento sobre a família Muscidae no sul do Brasil, o presente estudo objetivou conhecer a biodiversidade da família Muscidae, além de propor um método para a avaliação do padrão de relacionamento espécie área (SAR) sem utilizar esforço amostral distinto entre as áreas de diferentes tamanhos. Para isso, foram instaladas 140 armadilhas Malaise distribuídas em 35 áreas de cinco regiões na Planície Costeira. Cada região foi composta por sete áreas com quatro armadilhas com distâncias proporcionais. Foram coletados 6102 indivíduos distribuídos entre 120 espécies/morfoespécies nas cinco regiões. A partir das listas podemos constatar o grau de conservação e a influência da ação antrópica nas áreas naturais amostradas, principalmente pela presença de espécies de Muscinae e Coenosiinae. A similaridade entre as comunidades de Muscidae foi determinada pela proximidade entre os locais de coletas ou pelas condições climáticas do período, e observamos ainda uma relação positiva no relacionamento espécie-área para os modelos linear e logístico. Apesar do modelo com a função de poder não ser significativo, apresentou coeficientes que possibilitam a interpretação dos tipos de fragmentos que existem na planície costeira do RS.
|
16 |
Discrétisation des systèmes de Lur'e : stabilisation et consistance / Discretization of Lur’e systems : stabilization and consistencyLouis, Julien 27 August 2015 (has links)
De récents résultats sur l’étude des systèmes de Lur’e (commutés) à temps discret mettent en avant une fonction de Lyapunov de type Lur’e avancée, dont les lignes de niveau peuvent être non convexes et non connexes. Celles-ci soulèvent de larges questions pour les systèmes de Lur’e à temps discret obtenus par la discrétisation d’un système continu. Les contributions de cette thèse sont d’apporter des éléments de réponse à ces questions. Tout d’abord, le verrou des lignes de niveau non-connexes est levé en construisant à partir de celles-ci une suite décroissante d’ensembles connexes et bornés qui converge vers l’origine et qui contient le futur de la trajectoire à temps continu. Dans un second temps, le problème de la stabilisation conjointe d’un système de Lur’e à données échantillonnées avec un échantillonnage non-uniforme est traité. Quand la période d’échantillonnage est à choisir parmi un nombre fini de valeurs, il est montré que ce problème se traduit comme la stabilisation conjointe d’un système commuté de Lur’e avec des incertitudes bornées en norme. En associant de plus à chaque mode un critère quadratique, une stratégie de type min-switching permet de résoudre cette question à l’aide d’un problème d’optimisation sous contraintes LMI. Enfin, les propriétés de la stratégie de min-switching pour les systèmes de Lur’e commutés à temps discret sont étudiées. Une extension de la notion de consistance permet de prouver que cette stratégie est consistante vis-à-vis de majorants quadratiques modaux du critère de performance et ainsi de garantir l’intérêt de la stratégie d’échantillonnage non-uniforme développée / Recent studies dealing with discrete-time (switched) Lur’e systems involve an adapted Lur’e type function exhibiting possibly non-convex and disconnected level sets. These properties raise fundamental issues in the case of discrete-time Lur’e system obtained by the sampling of a continuous time one. This PhD thesis aims at answering these questions. The first contribution is to avoid the discrete-time disconnected level sets by a decreasing sequence of bounded and connected sets that converges to the origin and that contain the future of the continuous-time trajectory. The second contribution deals with the joint stabilization of a sampled-data Lur’e system with non-uniform sampling. When the sampling period belongs to a finite set of values, this problem is reformulated as the joint stabilization of a discrete-time Lur’e switched system with norm-bounded uncertain parameters. Futhermore, if a quadratic criterion is associated with each mode, a min-switching strategy combined with LMI constraints allow to provide a solution to this problem. Finally the property of consistency for discrete-time switched Lur’e systems is investigated. It is shown that the min-switching strategy is consistent with respect to quadratic upper bounds of the performances. This result is applied on the stabilization of Lur’e systems with non-uniform sampling.
|
17 |
Sampling Algorithms for Evolving DatasetsGemulla, Rainer 24 October 2008 (has links) (PDF)
Perhaps the most flexible synopsis of a database is a uniform random sample of the data; such samples are widely used to speed up the processing of analytic queries and data-mining tasks, to enhance query optimization, and to facilitate information integration. Most of the existing work on database sampling focuses on how to create or exploit a random sample of a static database, that is, a database that does not change over time. The assumption of a static database, however, severely limits the applicability of these techniques in practice, where data is often not static but continuously evolving. In order to maintain the statistical validity of the sample, any changes to the database have to be appropriately reflected in the sample. In this thesis, we study efficient methods for incrementally maintaining a uniform random sample of the items in a dataset in the presence of an arbitrary sequence of insertions, updates, and deletions. We consider instances of the maintenance problem that arise when sampling from an evolving set, from an evolving multiset, from the distinct items in an evolving multiset, or from a sliding window over a data stream. Our algorithms completely avoid any accesses to the base data and can be several orders of magnitude faster than algorithms that do rely on such expensive accesses. The improved efficiency of our algorithms comes at virtually no cost: the resulting samples are provably uniform and only a small amount of auxiliary information is associated with the sample. We show that the auxiliary information not only facilitates efficient maintenance, but it can also be exploited to derive unbiased, low-variance estimators for counts, sums, averages, and the number of distinct items in the underlying dataset. In addition to sample maintenance, we discuss methods that greatly improve the flexibility of random sampling from a system's point of view. More specifically, we initiate the study of algorithms that resize a random sample upwards or downwards. Our resizing algorithms can be exploited to dynamically control the size of the sample when the dataset grows or shrinks; they facilitate resource management and help to avoid under- or oversized samples. Furthermore, in large-scale databases with data being distributed across several remote locations, it is usually infeasible to reconstruct the entire dataset for the purpose of sampling. To address this problem, we provide efficient algorithms that directly combine the local samples maintained at each location into a sample of the global dataset. We also consider a more general problem, where the global dataset is defined as an arbitrary set or multiset expression involving the local datasets, and provide efficient solutions based on hashing.
|
18 |
Signal Processing for Spectroscopic ApplicationsGudmundson, Erik January 2010 (has links)
Spectroscopic techniques allow for studies of materials and organisms on the atomic and molecular level. Examples of such techniques are nuclear magnetic resonance (NMR) spectroscopy—one of the principal techniques to obtain physical, chemical, electronic and structural information about molecules—and magnetic resonance imaging (MRI)—an important medical imaging technique for, e.g., visualization of the internal structure of the human body. The less well-known spectroscopic technique of nuclear quadrupole resonance (NQR) is related to NMR and MRI but with the difference that no external magnetic field is needed. NQR has found applications in, e.g., detection of explosives and narcotics. The first part of this thesis is focused on detection and identification of solid and liquid explosives using both NQR and NMR data. Methods allowing for uncertainties in the assumed signal amplitudes are proposed, as well as methods for estimation of model parameters that allow for non-uniform sampling of the data. The second part treats two medical applications. Firstly, new, fast methods for parameter estimation in MRI data are presented. MRI can be used for, e.g., the diagnosis of anomalies in the skin or in the brain. The presented methods allow for a significant decrease in computational complexity without loss in performance. Secondly, the estimation of blood flow velo-city using medical ultrasound scanners is addressed. Information about anomalies in the blood flow dynamics is an important tool for the diagnosis of, for example, stenosis and atherosclerosis. The presented methods make no assumption on the sampling schemes, allowing for duplex mode transmissions where B-mode images are interleaved with the Doppler emissions.
|
19 |
Flot de conception pour l'ultra faible consommation : échantillonnage non-uniforme et électronique asynchrone / Design flow for ultra-low power : non-uniform sampling and asynchronous circuitsSimatic, Jean 07 December 2017 (has links)
Les systèmes intégrés sont souvent des systèmes hétérogènes avec des contraintes fortes de consommation électrique. Ils embarquent aujourd'hui des actionneurs, des capteurs et des unités pour le traitement du signal. Afin de limiter l'énergie consommée, ils peuvent tirer profit des techniques évènementielles que sont l'échantillonnage non uniforme et l'électronique asynchrone. En effet, elles permettent de réduire drastiquement la quantité de données échantillonnées pour de nombreuses classes de signaux et de diminuer l'activité. Pour aider les concepteurs à développer rapidement des plateformes exploitant ces deux techniques évènementielles, nous avons élaboré un flot de conception nommé ALPS. Il propose un environnement permettant de déterminer et de simuler au niveau algorithmique le schéma d'échantillonnage et les traitements associés afin de sélectionner les plus efficients en fonction de l'application ciblée. ALPS génère directement le convertisseur analogique/numérique à partir des paramètres d'échantillonnage choisis. L'élaboration de la partie de traitement s'appuie quant à elle sur un outil de synthèse de haut niveau synchrone et une méthode de désynchronisation exploitant des protocoles asynchrones spécifiques, capables d'optimiser la surface et la consommation du circuit. Enfin, des simulations au niveau porteslogiques permettent d'analyser et de valider l'énergie consommée avant de poursuivre par un flot classique de placement et routage. Les évaluations conduites montrent une réduction d'un facteur 3 à 8 de la consommation des circuits automatiquement générés. Le flot ALPS permet à un concepteur non-spécialiste de se concentrer sur l'optimisation de l'échantillonnage et de l'algorithme en fonction de l'application et de potentiellement réduire d'un ou plusieurs ordres de grandeur la consommation du circuit. / Integrated systems are mainly heterogeneous systems with strong powerconsumption constraints. They embed actuators, sensors and signalprocessing units. To limit the energy consumption, they can exploitevent-based techniques, namely non-uniform sampling and asynchronouscircuits. Indeed, they allow cutting drastically the amount of sampleddata for many types of signals and reducing the system activity. To helpdesigners in quickly developing platforms that exploit those event-basedtechniques, we elaborated a design framework called ALPS. It proposes anenvironment to determine and simulate at algorithmic level the samplingscheme and the associated processing in order to select the mostefficient ones depending on the targetted application. ALPS generatesdirectly the analog-to-digital converter based on the chosen samplingparameters. The elaboration of the processing unit uses a synchronoushigh-level synthesis tool and a desynchronization method that exploitsspecific asynchronous protocols to optimize the circuit area and powerconsumption. Finally, gate-level simulations allow analyzing andvalidating the energy consumption before continuing with a standardplacement and routing flow. The conducted evaluations show a reductionfactor of 3 to 8 of the consumption of the automatically generatedcirctuis. The flow ALPS allow non-specialists to concentrate on theoptimization of the sampling and the processing in function of theirapplication and to reduice the circuit power consumptions by one toseveral orders of magnitude.
|
20 |
Sampling Algorithms for Evolving DatasetsGemulla, Rainer 20 October 2008 (has links)
Perhaps the most flexible synopsis of a database is a uniform random sample of the data; such samples are widely used to speed up the processing of analytic queries and data-mining tasks, to enhance query optimization, and to facilitate information integration. Most of the existing work on database sampling focuses on how to create or exploit a random sample of a static database, that is, a database that does not change over time. The assumption of a static database, however, severely limits the applicability of these techniques in practice, where data is often not static but continuously evolving. In order to maintain the statistical validity of the sample, any changes to the database have to be appropriately reflected in the sample. In this thesis, we study efficient methods for incrementally maintaining a uniform random sample of the items in a dataset in the presence of an arbitrary sequence of insertions, updates, and deletions. We consider instances of the maintenance problem that arise when sampling from an evolving set, from an evolving multiset, from the distinct items in an evolving multiset, or from a sliding window over a data stream. Our algorithms completely avoid any accesses to the base data and can be several orders of magnitude faster than algorithms that do rely on such expensive accesses. The improved efficiency of our algorithms comes at virtually no cost: the resulting samples are provably uniform and only a small amount of auxiliary information is associated with the sample. We show that the auxiliary information not only facilitates efficient maintenance, but it can also be exploited to derive unbiased, low-variance estimators for counts, sums, averages, and the number of distinct items in the underlying dataset. In addition to sample maintenance, we discuss methods that greatly improve the flexibility of random sampling from a system's point of view. More specifically, we initiate the study of algorithms that resize a random sample upwards or downwards. Our resizing algorithms can be exploited to dynamically control the size of the sample when the dataset grows or shrinks; they facilitate resource management and help to avoid under- or oversized samples. Furthermore, in large-scale databases with data being distributed across several remote locations, it is usually infeasible to reconstruct the entire dataset for the purpose of sampling. To address this problem, we provide efficient algorithms that directly combine the local samples maintained at each location into a sample of the global dataset. We also consider a more general problem, where the global dataset is defined as an arbitrary set or multiset expression involving the local datasets, and provide efficient solutions based on hashing.
|
Page generated in 0.0904 seconds