Global ETD Search

21	Dynamic Data Citation Service-Subset Tool for Operational Data Management Schubert, Chris, Seyerl, Georg, Sack, Katharina January 2019 (has links) (PDF) In earth observation and climatological sciences, data and their data services grow on a daily basis in a large spatial extent due to the high coverage rate of satellite sensors, model calculations, but also by continuous meteorological in situ observations. In order to reuse such data, especially data fragments as well as their data services in a collaborative and reproducible manner by citing the origin source, data analysts, e.g., researchers or impact modelers, need a possibility to identify the exact version, precise time information, parameter, and names of the dataset used. A manual process would make the citation of data fragments as a subset of an entire dataset rather complex and imprecise to obtain. Data in climate research are in most cases multidimensional, structured grid data that can change partially over time. The citation of such evolving content requires the approach of "dynamic data citation". The applied approach is based on associating queries with persistent identifiers. These queries contain the subsetting parameters, e.g., the spatial coordinates of the desired study area or the time frame with a start and end date, which are automatically included in the metadata of the newly generated subset and thus represent the information about the data history, the data provenance, which has to be established in data repository ecosystems. The Research Data Alliance Data Citation Working Group (RDA Data Citation WG) summarized the scientific status quo as well as the state of the art from existing citation and data management concepts and developed the scalable dynamic data citation methodology of evolving data. The Data Centre at the Climate Change Centre Austria (CCCA) has implemented the given recommendations and offers since 2017 an operational service on dynamic data citation on climate scenario data. With the consciousness that the objective of this topic brings a lot of dependencies on bibliographic citation research which is still under discussion, the CCCA service on Dynamic Data Citation focused on the climate domain specific issues, like characteristics of data, formats, software environment, and usage behavior. The current effort beyond spreading made experiences will be the scalability of the implementation, e.g., towards the potential of an Open Data Cube solution.
22	Application de la validation de données dynamiques au suivi de performance d'un procédé Ullrich, Christophe 17 October 2010 (has links) La qualité des mesures permettant de suivre l'évolution de procédés chimiques ou pétrochimiques peut affecter de manière significative leur conduite. Malheureusement, toute mesure est entachée d'erreur. Les erreurs présentes dans les données mesurées peuvent mener à des dérives significatives dans la conduite du procédé, ce qui peut avoir des effets néfastes sur la sécurité du procédé ou son rendement. La validation de données est une tâche très importante car elle transforme l'ensemble des données disponibles en un jeu cohérent de valeurs définissant l'état du procédé. La validation de données permet de corriger les mesures, d'estimer les valeurs des variables non mesurées et de calculer les incertitudes a posteriori de toutes les variables. À l'échelle industrielle, elle est régulièrement appliquée à des procédés fonctionnant en continu, représentés par des modèles stationnaires. Cependant, pour le suivi de phénomènes transitoires, les algorithmes de validation stationnaires ne sont plus efficaces. L'étude abordée dans le cadre de cette thèse est l'application de la validation de données dynamiques au suivi des performances des procédés chimiques. L'algorithme de validation de données dynamiques développé dans le cadre de cette thèse, est basé sur une résolution simultanée du problème d'optimisation et des équations du modèle. Les équations différentielles sont discrétisées par une méthode des résidus pondérés : les collocations orthogonales. L'utilisation de la méthode des fenêtres de temps mobiles permet de conserver un problème de dimension raisonnable. L'algorithme d'optimisation utilisé est un algorithme "Successive Quadratic Programming" à point intérieur. L'algorithme de validation de données dynamiques développé a permis la réduction de l'incertitude des estimées. Les exemples étudiés sont présentés du plus simple au plus complexe. Les premiers modèles étudiés sont des cuves de stockages interconnectées. Ce type de modèle est composé uniquement de bilans de matière. Les modèles des exemples suivants, des réacteurs chimiques, sont composés des bilans de matière et de chaleur. Le dernier modèle étudié est un ballon de séparation liquide vapeur. Ce dernier est composé de bilans de matière et de chaleur couplés à des phénomènes d'équilibre liquide-vapeur. L'évaluation de la matrice de sensibilité et du calcul des variances a posteriori a été étendue aux procédés représentés par des modèles dynamiques. Son application a été illustrée par plusieurs exemples. La modification des paramètres de fenêtre de validation influence la redondance dans celle-ci et donc le facteur de réduction de variances a posteriori. Les développements proposés dans ce travail offrent donc un critère rationnel de choix de la taille de fenêtre pour les applications de validation de données dynamiques. L'intégration d'estimateurs alternatifs dans l'algorithme permet d'en augmenter la robustesse. En effet, ces derniers permettent l'obtention d'estimées non-biaisées en présence d'erreurs grossières dans les mesures. Organisation de la thèse : La thèse débute par un chapitre introductif présentant le problème, les objectifs de la recherche ainsi que le plan du travail. La première partie de la thèse est consacrée à l'état de l'art et au développement théorique d'une méthode de validation de données dynamiques. Elle est organisée de la manière suivante : -Le premier chapitre est consacré à la validation de données stationnaires. Il débute en montrant le rôle joué par la validation de données dans le contrôle des procédés. Les différents types d'erreurs de mesure et de redondances sont ensuite présentés. Différentes méthodes de résolution de problèmes stationnaires linéaires et non linéaires sont également explicitées. Ce premier chapitre se termine par la description d'une méthode de calcul des variances a posteriori. -Dans le deuxième chapitre, deux catégories des méthodes de validation de données dynamiques sont présentées : les méthodes de filtrage et les méthodes de programmation non-linéaire. Pour chaque type de méthode, les principales formulations trouvées dans la littérature sont exposées avec leurs principaux avantages et inconvénients. -Le troisième chapitre est consacré au développement théorique de l'algorithme de validation de données dynamiques mis au point dans le cadre de cette thèse. Les différents choix stratégiques effectués y sont également présentés. L'algorithme choisi se base sur une formulation du problème d'optimisation comprenant un système d'équations algébro-différentielles. Les équations différentielles sont discrétisées au moyen d'une méthode de collocations orthogonales utilisant les polynômes d'interpolation de Lagrange. Différentes méthodes de représentation des variables d'entrée sont discutées. Afin de réduire les coûts de calcul et de garder un problème d'optimisation résoluble, la méthode des fenêtres de temps mobiles est utilisée. Un algorithme "Interior Point Sucessive Quadratic Programming" est utilisé afin de résoudre de manière simultanée les équations différentielles discrétisées et les équations du modèle. Les dérivées analytiques du gradient de la fonction objectif et du Jacobien des contraintes sont également présentées dans ce chapitre. Pour terminer, un critère de qualité permettant de comparer les différentes variantes de l'algorithme est proposé. -Cette première partie se termine par le développement d'un algorithme original de calcul des variances a posteriori. La méthode développée dans ce chapitre est similaire à la méthode décrite dans le premier chapitre pour les procédés fonctionnant de manière stationnaire. Le développement est réalisé pour les deux représentations des variables d'entrée discutées au chapitre 3. Pour terminer le chapitre, cette méthode de calcul des variances a posteriori est appliquée de manière théorique sur un petit exemple constitué d'une seule équation différentielle et d'une seule équation de liaison. La seconde partie de la thèse est consacrée à l'application de l'algorithme de validation de données dynamiques développé dans la première partie à l'étude de plusieurs cas. Pour chacun des exemples traités, l'influence des paramètres de l'algorithme sur la robustesse, la facilité de convergence et la réduction de l'incertitude des estimées est examinée. La capacité de l'algorithme à réduire l'incertitude des estimées est évaluée au moyen du taux de réduction d'erreur et du facteur de réduction des variances. -Le premier chapitre de cette deuxième partie est consacré à l'étude d'une ou plusieurs cuves de stockage à niveau variable, avec ou sans recyclage de fluide. Ce premier cas comporte uniquement des bilans de matière. - Le chapitre 6 examine le cas d'un réacteur à cuve agitée avec échange de chaleur. L'exemple traité dans ce chapitre est donc constitué de bilans de matière et d'énergie. -L'étude d'un ballon flash au chapitre 7 permet de prendre en compte les équilibres liquide-vapeur. - Le chapitre 8 est consacré aux estimateurs robustes dont la performance est comparée pour les exemples étudiés aux chapitres 5 et 6. La thèse se termine par un chapitre consacré à la présentation des conclusions et de quelques perspectives futures. systeme dynamique/dynamic system fenetre glissante/moving window
23	SenMinCom: Pervasive Distributed Dynamic Sensor Data Mining for Effective Commerce Hiremath, Naveen 18 July 2008 (has links) In last few years, the use of wireless sensor networks and cell phones has become ubiquitous; fusing these technologies in the field of business will open up new possibilities. To fill this lacuna, I propose a novel idea where the combination of these will facilitate companies to receive feedback on their products and services. System's unobtrusive sensors will not only collect shopping, mobile usage data from consumers but will also make effective use of this information to increase revenue, cut costs, etc.; the use of mobile agent based data mining allows analyzing the data from different dimensions, categorizing it on factors such as product positioning, promotion of goods, etc. as in the case of a shopping store. Additionally, because of the dynamic mining system the companies get on-the-scene recommendation of products rather than off-the-scene. In this thesis, a novel distributed pervasive mining system is proposed to get dynamic shopping information and mobile device usage of the customers. real time shopping information commerce mobile agent distributed pervasive mining dynamic data mining wireless sensor network Computer Sciences
24	DATA ASSIMILATION AND VISUALIZATION FOR ENSEMBLE WILDLAND FIRE MODELS Chakraborty, Soham 01 January 2008 (has links) This thesis describes an observation function for a dynamic data driven application system designed to produce short range forecasts of the behavior of a wildland fire. The thesis presents an overview of the atmosphere-fire model, which models the complex interactions between the fire and the surrounding weather and the data assimilation module which is responsible for assimilating sensor information into the model. Observation plays an important role in data assimilation as it is used to estimate the model variables at the sensor locations. Also described is the implementation of a portable and user friendly visualization tool which displays the locations of wildfires in the Google Earth virtual globe. Computer Sciences
25	A Personalized Smart Cube for Faster and Reliable Access to Data Antwi, Daniel K. 02 December 2013 (has links) Organizations own data sources that contain millions, billions or even trillions of rows and these data are usually highly dimensional in nature. Typically, these raw repositories are comprised of numerous independent data sources that are too big to be copied or joined, with the consequence that aggregations become highly problematic. Data cubes play an essential role in facilitating fast Online Analytical Processing (OLAP) in many multi-dimensional data warehouses. Current data cube computation techniques have had some success in addressing the above-mentioned aggregation problem. However, the combined problem of reducing data cube size for very large and highly dimensional databases, while guaranteeing fast query response times, has received less attention. Another issue is that most OLAP tools often causes users to be lost in the ocean of data while performing data analysis. Often, most users are interested in only a subset of the data. For example, consider in such a scenario, a business manager who wants to answer the crucial location-related business question. "Why are my sales declining at location X"? This manager wants fast, unambiguous location-aware answers to his queries. He requires access to only the relevant ltered information, as found from the attributes that are directly correlated with his current needs. Therefore, it is important to determine and to extract, only that small data subset that is highly relevant from a particular user's location and perspective. In this thesis, we present the Personalized Smart Cube approach to address the abovementioned scenario. Our approach consists of two main parts. Firstly, we combine vertical partitioning, partial materialization and dynamic computation to drastically reduce the size of the computed data cube while guaranteeing fast query response times. Secondly, our personalization algorithm dynamically monitors user query pattern and creates a personalized data cube for each user. This ensures that users utilize only that small subset of data that is most relevant to them. Our experimental evaluation of our Personalized Smart Cube approach showed that our work compared favorably with other state-of-the-art methods. We evaluated our work focusing on three main criteria, namely the storage space used, query response time and the cost savings ratio of using a personalized cube. The results showed that our algorithm materializes a relatively smaller number of views than other techniques and it also compared favourable in terms of query response time. Further, our personalization algorithm is superior to the state-of-the art Virtual Cube algorithm, when evaluated in terms of the number of user queries that were successfully answered when using a personalized cube, instead of the base cube. Data Cube Dynamic Data Cube Personalized Cube Smart Cube Smart Data Cube Data Cube partitioning Small Data cube
26	A Personalized Smart Cube for Faster and Reliable Access to Data Antwi, Daniel K. January 2013 (has links) Organizations own data sources that contain millions, billions or even trillions of rows and these data are usually highly dimensional in nature. Typically, these raw repositories are comprised of numerous independent data sources that are too big to be copied or joined, with the consequence that aggregations become highly problematic. Data cubes play an essential role in facilitating fast Online Analytical Processing (OLAP) in many multi-dimensional data warehouses. Current data cube computation techniques have had some success in addressing the above-mentioned aggregation problem. However, the combined problem of reducing data cube size for very large and highly dimensional databases, while guaranteeing fast query response times, has received less attention. Another issue is that most OLAP tools often causes users to be lost in the ocean of data while performing data analysis. Often, most users are interested in only a subset of the data. For example, consider in such a scenario, a business manager who wants to answer the crucial location-related business question. "Why are my sales declining at location X"? This manager wants fast, unambiguous location-aware answers to his queries. He requires access to only the relevant ltered information, as found from the attributes that are directly correlated with his current needs. Therefore, it is important to determine and to extract, only that small data subset that is highly relevant from a particular user's location and perspective. In this thesis, we present the Personalized Smart Cube approach to address the abovementioned scenario. Our approach consists of two main parts. Firstly, we combine vertical partitioning, partial materialization and dynamic computation to drastically reduce the size of the computed data cube while guaranteeing fast query response times. Secondly, our personalization algorithm dynamically monitors user query pattern and creates a personalized data cube for each user. This ensures that users utilize only that small subset of data that is most relevant to them. Our experimental evaluation of our Personalized Smart Cube approach showed that our work compared favorably with other state-of-the-art methods. We evaluated our work focusing on three main criteria, namely the storage space used, query response time and the cost savings ratio of using a personalized cube. The results showed that our algorithm materializes a relatively smaller number of views than other techniques and it also compared favourable in terms of query response time. Further, our personalization algorithm is superior to the state-of-the art Virtual Cube algorithm, when evaluated in terms of the number of user queries that were successfully answered when using a personalized cube, instead of the base cube. Data Cube Dynamic Data Cube Personalized Cube Smart Cube Smart Data Cube Data Cube partitioning Small Data cube
27	Comparative Analysis of Machine Learning and Sequential Deep learning Models in Higher Education Fundraising Umeki, Atsuko 09 May 2022 (has links) Deep learning models have been used widely in various areas and applications of our everyday lives. They could also change the way non-profit organizations work and help optimize fundraising results. In this thesis, sequential models are applied in fundraising to compare their performance against the traditional machine learning model. Sequential model is a type of neural network that is specialized for processing sequential data. Although some research utilizing machine learning algorithms in fundraising context exists, it is based on the data extracted from the specific time window, which does not take time-dependency of features into account; therefore, time-series features are independent at each data point relative to others. This approach results in loss of time notion. In this thesis, we experiment with the application of time-dependent sequential models including Long Short Term Memory (LSTM), Gated Recurrent Unit (GRU) and their variants in the fundraising domain to predict the alumni monetary contribution to the university. We also expand our study by including the architecture that treats time-invariant demographic data as a condition to the sequential layers. In this model, the time-dependent data is concatenated after running the sequential model. Sequential deep learning is empirically evaluated and compared against the traditional machine learning models. The results demonstrate the potential use of both traditional machine learning and sequential deep learning in the prediction of fundraising outcomes and offer non-profit organizations solutions to achieve their mission. / Graduate Machine learning Deep learning sequential model lstm gru rnn Higher education fundraising dynamic data time series conditional learning
28	A Data Layout Descriptor Language (LADEL). Jeelani, Ashfaq Ahmed 01 May 2001 (has links) (PDF) To transfer data between devices and main memory, standard C block I/O interfaces use block buffers of type char. C++ programs that perform block I/O commonly use typecasting to move data between structures and block buffers. The subject of this thesis, the layout description language (LADEL), represents a high-level solution to the problem of block buffer management. LADEL provides operators that hide the casting ordinarily required to pack and to unpack buffers and guard against overflow of the virtual fields. LADEL also allows a programmer to dynamically define a structured view of a block buffer's contents. This view includes the use of variable length field specifiers, which supports the development of a general specification for an I/O block that optimizes the use of preset buffers. The need for optimizing buffer use arises in file processing algorithms that perform optimally when I/O buffers are filled to capacity. Packing a buffer to capacity can require reasonably complex C++ code. LADEL can be used to reduce this complexity to considerable extent. C++ programs written using LADEL are less complex, easy to maintain, and easier to read than equivalent programs written LADEL. This increase in maintainability is achieved at a cost of approximately 11 % additional time in comparison to programs that use casting to manipulate block buffer data. structured data network I/O dynamic data layout data layout block I/O Computer Sciences Physical Sciences and Mathematics
29	Predicting multibody assembly of proteins Rasheed, Md. Muhibur 25 September 2014 (has links) This thesis addresses the multi-body assembly (MBA) problem in the context of protein assemblies. [...] In this thesis, we chose the protein assembly domain because accurate and reliable computational modeling, simulation and prediction of such assemblies would clearly accelerate discoveries in understanding of the complexities of metabolic pathways, identifying the molecular basis for normal health and diseases, and in the designing of new drugs and other therapeutics. [...] [We developed] F²Dock (Fast Fourier Docking) which includes a multi-term function which includes both a statistical thermodynamic approximation of molecular free energy as well as several of knowledge-based terms. Parameters of the scoring model were learned based on a large set of positive/negative examples, and when tested on 176 protein complexes of various types, showed excellent accuracy in ranking correct configurations higher (F² Dock ranks the correcti solution as the top ranked one in 22/176 cases, which is better than other unsupervised prediction software on the same benchmark). Most of the protein-protein interaction scoring terms can be expressed as integrals over the occupied volume, boundary, or a set of discrete points (atom locations), of distance dependent decaying kernels. We developed a dynamic adaptive grid (DAG) data structure which computes smooth surface and volumetric representations of a protein complex in O(m log m) time, where m is the number of atoms assuming that the smallest feature size h is [theta](r[subscript max]) where r[subscript max] is the radius of the largest atom; updates in O(log m) time; and uses O(m)memory. We also developed the dynamic packing grids (DPG) data structure which supports quasi-constant time updates (O(log w)) and spherical neighborhood queries (O(log log w)), where w is the word-size in the RAM. DPG and DAG together results in O(k) time approximation of scoring terms where k << m is the size of the contact region between proteins. [...] [W]e consider the symmetric spherical shell assembly case, where multiple copies of identical proteins tile the surface of a sphere. Though this is a restricted subclass of MBA, it is an important one since it would accelerate development of drugs and antibodies to prevent viruses from forming capsids, which have such spherical symmetry in nature. We proved that it is possible to characterize the space of possible symmetric spherical layouts using a small number of representative local arrangements (called tiles), and their global configurations (tiling). We further show that the tilings, and the mapping of proteins to tilings on arbitrary sized shells is parameterized by 3 discrete parameters and 6 continuous degrees of freedom; and the 3 discrete DOF can be restricted to a constant number of cases if the size of the shell is known (in terms of the number of protein n). We also consider the case where a coarse model of the whole complex of proteins are available. We show that even when such coarse models do not show atomic positions, they can be sufficient to identify a general location for each protein and its neighbors, and thereby restricts the configurational space. We developed an iterative refinement search protocol that leverages such multi-resolution structural data to predict accurate high resolution model of protein complexes, and successfully applied the protocol to model gp120, a protein on the spike of HIV and currently the most feasible target for anti-HIV drug design. / text Spatial data structures Dynamic data structures Geometric optimization Fast Fourier methods Computational geometry Tiling Polyhedra molecular modeling Molecular surface Free energy Uncertainty quantification
30	Optimal synthesis of sensor networks/Synthèse optimale de réseaux de capteurs Gerkens, Carine 02 October 2009 (has links) To allow monitoring and control of chemical processes, a sensor network has to be installed. It must allow the estimation of all important variables of the process. However, all measurements are erroneous, it is not possible to measure every variable and some types of sensors are expensive. Data reconciliation allows to correct the measurements, to estimate the values of unmeasured variables and to compute a posteriori uncertainties of all variables. However, a posteriori standard deviations are function of the number, the location and the precision of the measurement tools that are installed. A general method to design the cheapest sensor network able to estimate all process key variables within a prescribed accuracy in the case of steady-state processes has been developed. That method uses a posteriori variances estimation method based on the analysis of the sensitivity matrix. The goal function of the optimization problem depends on the annualized cost of the sensor network and on the accuracies that can be reached for the key variables. The problem is solved by means of a genetic algorithm. To reduce the computing time, two parallelization techniques using the message passing interface have been examined: the global parallelization and the distributed genetic algorithms. Both methods have been tested on several examples. To extend the method to dynamic processes, a dynamic data reconciliation method allowing to estimate a posteriori variances was necessary. Kalman filtering approach and orthogonal collocation-based moving horizon method have been compared. A posteriori variances computing has been developed using a similar method than the one used for the steady-state case. The method has been reconciled on several small examples. On the basis of the variances estimation an observability criterion has been defined for dynamic systems so that the sensor network design algorithm could be modified for the dynamic case. Another problem that sensor networks have to allow to solve is process faults detection and localisation. The method has been adapted to generate sensor networks that allow to detect and locate process faults among a list of faults in the case of steady-state processes./ Afin de permettre le suivi et le contrôle des procédés chimiques, un réseau de capteurs doit être installé. Il doit permettre l'estimation de toutes les variables importantes du procédé. Cependant, toutes les mesures sont entachées d'erreurs, toutes les variables ne peuvent pas être mesurées et certains types de capteurs sont onéreux. La réconciliation de données permet de corriger les mesures, d'estimer les valeurs des variables non mesurées et de calculer les incertitudes a posteriori de toutes les variables. Cependant, les écarts-types a posteriori sont fonction du nombre, de la position et de la précision des instruments de mesure qui sont installés. Une méthode générale pour réaliser le design du réseau de capteur le moins onéreux capable d'estimer toutes les variables clés avec une précision déterminée dans le cas des procédés stationnaires a été développée. Cette méthode utilise une technique d'estimation des variances a posteriori basée sur l'analyse de la matrice de sensibilité. La fonction objectif du problème d'optimisation dépend du coût annualisé du réseau de capteurs et des précisions qui peuvent être obtenues pour les variables clés. Le problème est résolu au moyen d'un algorithme génétique. Afin de réduire le temps de calcul, deux techniques de parallélisation utilisant une interface de passage de messages (MPI) ont été examinées: la parallélisation globale et les algorithmes génétiques distribués. Les deux méthodes ont été testées sur plusieurs exemples. Afin d'étendre la méthode aux procédés fonctionnant de manière dynamique, une méthode de réconciliation dynamique des données permettant le calcul des variances a posteriori est nécessaire. La méthode des filtres de Kalman et une technique de fenêtre mobile basée sur les collocations orthogonales ont été comparées. Le calcul des variances a posteriori a été développé grâce à une méthode similaire à celle utilisée dans le cas stationnaire. La méthode a été validée sur plusieurs petits exemples. Grâce à la méthode d'estimation des variances a posteriori, un critère d'observabilité a été défini pour les systèmes dynamiques de sorte que l'algorithme de design de réseaux de capteurs a pu être adapté aux systèmes dynamiques. Un autre problème que les réseaux de capteurs doivent permettre de résoudre est la détection et la localisation des erreurs de procédé. La méthode a été adaptée afin de générer des réseaux de capteurs permettant de détecter et de localiser les erreurs de procédé parmi une liste d'erreurs dans le cas des procédés fonctionnant de manière stationnaire. faults detection/detection de pannes genetic algorithm/algorithme genetique sensor networks/reseaux de capteurs

Search results