91 |
A spaceborne synthetic aperture radar data processorWelsh, Simon 21 September 2023 (has links) (PDF)
This thesis is concerned with the design and implementation of a Synthetic Aperture Radar (SAR) data processor. The implementation of the processing is based on a standard sequential approach to the problem and employs commonly used algorithms. The processing was done using the C language running on an IBM Compatible Personal Computer. The raw data processed was that obtained from the Shuttle Imaging Radar B (SIRB) and was supplied by the Jet Propulsion Laboratories (JPL) in California. The basic functions performed by the software include range and azimuth processing, which involve the match filtering of reference functions with the raw data. Compensation for the effects of being a spaceborne SAR were also implemented, which involved compensation for the effect of planet rotation and radar height. Images processed by JPL of the same area were also available, which allowed for direct comparisons between the outputs of the two SAR processors. The images produced were passed through a number of filters, to improve the image quality, and resulted in favourable comparisons to the JPL generated images. The actual images are included in the later sections of the thesis.
|
92 |
Bayesian Spatiotemporal Modeling with Gaussian ProcessesHe, Qing 01 January 2022 (has links) (PDF)
Bayesian spatiotemporal models have been successfully applied to various fields of science, such as ecology and epidemiology. The complicated nature of spatiotemporal patterns can be well represented through priors such as Gaussian processes. This dissertation is focused on two applications of Bayesian spatiotemporal models: a) anomaly detection for spatiotemporal data with missingness and b) zero-inflated spatiotemporal count data analysis. Missingness in spatiotemporal data prohibits anomaly detection algorithms from learning characteristic rules and patterns due to the lack of most data. This project is motivated by a challenge provided by the National Science Foundation (NSF) and the National Geospatial-Intelligence Agency (NGA). The proposed model uses traffic patterns at nearby hours of the same day and the same time on different days of the week to recover the complete data. We compare the proposed model with the baseline and other models on the given dataset. It is also tested on a new dataset by the challenge organizer. In the zero-inflated spatiotemporal data analysis, a set of latent variables from Pólya-Gamma distributions are introduced to the Bayesian zero-inflated negative binomial model. The parameters of interest have conjugate priors conditional on the latent variables, which facilitates efficient posterior Markov chain Monte Carlo sampling. Varying spatial and temporal random effects are accommodated through Gaussian processes. To overcome the computation bottleneck that Gaussian processes may suffer when the sample size is large, a nearest-neighbor Gaussian process approach is implemented by constructing a sparse covariance matrix. The proposed Bayesian zero-inflated nearest-neighbor Gaussian processes model has been applied to simulated and COVID-19 data.
|
93 |
Operationalizing Data Culture: The US Army's Engagements With Data Science 1961-2023Jantzen, Linda Carol 21 August 2024 (has links)
Culture frames what an institution values, reveres, and rewards. It emerges over an extended period, sometimes deliberately, often indirectly. As a subset of organizational culture, a common understanding of both data and culture is needed in order to build the data culture the Army desires. This study examines data culture within the context of the US Army and its interactions with data science over the past six decades. It uses Science and Technology Studies (STS) scholarship to analyze Army data culture from the perspectives of leadership, expertise, technology, and structure and practices to better understand how it can be shaped to better support the Army's goals. This study posits that rather than adopting a data culture as something entirely new, the Army would be better served by an understanding of the data culture it already has, made up of entrenched policy and operational approaches perpetuated over decades, some of which are unsuitable for the current and future environment. A second posit is that Army data culture is situated within a broader context and cannot be understood independently of the external cultures and social systems with which it interacts. And third, STS scholarship is uniquely suited to inform this type of analysis. I conclude that the Army should focus resources on educating leaders on how to assess, build and sustain positive data cultures in their organizations. / Doctor of Philosophy / Culture frames what an institution values, reveres, and rewards. It emerges over an extended period, sometimes deliberately, often indirectly. As a subset of organizational culture, a common understanding of both data and culture is needed in order to build the data culture the Army desires. This study examines data culture within the context of the US Army and its interactions with data science over the past six decades. It uses Science and Technology Studies (STS) scholarship to analyze Army data culture from the perspectives of leadership, expertise, technology, and structure and practices to better understand how it can be shaped to better support the Army's goals. This study posits that rather than adopting a data culture as something entirely new, the Army would be better served by an understanding of the data culture it already has, made up of entrenched policy and operational approaches perpetuated over decades, some of which are unsuitable for the current and future environment. A second posit is that Army data culture is situated within a broader context and cannot be understood independently of the external cultures and social systems with which it interacts. And third, STS scholarship is uniquely suited to inform this type of analysis. I conclude that the Army should focus resources on educating leaders on how to assess, build and sustain positive data cultures in their organizations.
|
94 |
ARCHITECTURE FOR A NEXT GENERATION TELEMETRY AND DATA ACQUISITION BUSDAWSON, D.M. 11 1900 (has links)
International Telemetering Conference Proceedings / November 04-07, 1991 / Riviera Hotel and Convention Center, Las Vegas, Nevada / During the requirements definition process for a new telemetry and data acquisition product, Veda Systems engineers had the opportunity to examine the requirements for the ideal bus architecture to support future needs. Design goals and requirements were solicited from major users in flight test, space ground station data monitoring and command applications, and C41, as well as Veda’s own engineers. The process resulted in a bus architecture design which could potentially set the standard for the next generation of telemetry and data acquisition systems. This paper outlines the design goals selected and the thought process that yielded the goals in an attempt to promote advancement of current bus design approaches and increased availability of standard architectures and operating environments.
|
95 |
Data Visualization for the Benchmarking EngineJoish, Sudha 16 May 2003 (has links)
In today's information age, data collection is not the ultimate goal; it is simply the first step in extracting knowledge-rich information to shape future decisions. In this thesis, we present ChartVisio - a simple web-based visual data-mining system that lets users quickly explore databases and transform raw data into processed visuals. It is highly interactive, easy to use and hides the underlying complexity of querying from its users. Data from tables is internally mapped into charts using aggregate functions across tables. The tool thus integrates querying and charting into a single general-purpose application. ChartVisio has been designed as a component of the Benchmark data engine, being developed at the Computer Science department, University of New Orleans. The data engine is an intelligent website generator and users who create websites using the Data Engine are the site owners. Using ChartVisio, owners may generate new charts and save them as XML templates for prospective website surfers. Everyday Internet users may view saved charts with the touch of a button and get real-time data, since charts are generated dynamically. Website surfers may also generate new charts, but may not save them as templates. As a result, even non-technical users can design and generate charts with minimal time and effort.
|
96 |
Interactive data mining and visualization on multi-dimensional data.January 1999 (has links)
by Chu, Hong Ki. / Thesis (M.Phil.)--Chinese University of Hong Kong, 1999. / Includes bibliographical references (leaves 75-79). / Abstracts in English and Chinese. / Acknowledgments --- p.ii / Abstract --- p.iii / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Problem Definitions --- p.3 / Chapter 1.2 --- Experimental Setup --- p.5 / Chapter 1.3 --- Outline of the thesis --- p.6 / Chapter 2 --- Survey on Previous Researches --- p.8 / Chapter 2.1 --- Association rules --- p.8 / Chapter 2.2 --- Clustering --- p.10 / Chapter 2.3 --- Motivation --- p.12 / Chapter 3 --- ID AN on discovering quantitative association rules --- p.16 / Chapter 3.1 --- Briefing --- p.17 / Chapter 3.2 --- A-Tree --- p.18 / Chapter 3.3 --- Insertion Algorithm --- p.25 / Chapter 3.4 --- Visualizing Association Rules --- p.28 / Chapter 4 --- ID AN on discovering patterns of clustering --- p.34 / Chapter 4.1 --- Briefing --- p.34 / Chapter 4.2 --- A-Tree --- p.36 / Chapter 4.3 --- Dimensionality Curse --- p.37 / Chapter 4.3.1 --- Discrete Fourier Transform --- p.38 / Chapter 4.3.2 --- Discrete Wavelet Transform --- p.40 / Chapter 4.3.3 --- Singular Value Decomposition --- p.42 / Chapter 4.4 --- IDAN - Algorithm --- p.45 / Chapter 4.5 --- Visualizing clustering patterns --- p.49 / Chapter 4.6 --- Comparison --- p.51 / Chapter 5 --- Performance Studies --- p.55 / Chapter 5.1 --- Association Rules --- p.55 / Chapter 5.2 --- Clustering --- p.58 / Chapter 6 --- Survey on data visualization techniques --- p.63 / Chapter 6.1 --- Geometric Projection Techniques --- p.64 / Chapter 6.1.1 --- Scatter-plot Matrix --- p.64 / Chapter 6.1.2 --- Parallel Coordinates --- p.65 / Chapter 6.2 --- Icon-based Techniques --- p.67 / Chapter 6.2.1 --- Chernoff Face --- p.67 / Chapter 6.2.2 --- Stick Figures --- p.68 / Chapter 6.3 --- Pixel-oriented Techniques --- p.70 / Chapter 6.4 --- Hierarchical Techniques --- p.72 / Chapter 7 --- Conclusion --- p.73 / Bibliography --- p.74
|
97 |
Bioinformatic mining and analysis of genetic elements in genomes. / CUHK electronic theses & dissertations collectionJanuary 2013 (has links)
在海量的生物數據中發掘重要的功能元件、揭示其功能特徵及相應的潛在生物機制是後基因組時代的一個巨大的挑戰。這裡,以特定的基因組為對象,運用生物信息學的理論與方法,對基因組島及後翻譯修飾系統進行了系統的挖掘、分析。 / 首先,收集源於7個真核生物的超過70,000個試驗驗證的翻譯後修飾事件。對照不帶有任何後翻譯修飾靶點的蛋白, 對受多種翻譯後修飾調控的蛋白 (MTP-蛋白) 的特性和功能進行了分析比較。(1) MTP-蛋白顯著傾向於形成蛋白質複合物,並能與更多的蛋白質相互作用,同時偏好於在蛋白質-蛋白質相互作用網絡中擔當樞紐。(2) MTP-蛋白還具有獨特的功能偏好以及特定的亞細胞定位。(3) 約80的後翻譯修飾位點位於蛋白的無序區域。同時MTP-蛋白比不受後翻譯修飾調控的蛋白擁有更多的無序區域。(4) 擁有較少無序區域的MTP-蛋白主要和蛋白質-DNA複合物的形成相關。(5) 只有一小部分單個後翻譯修飾事件對結合能的影響大於2kcal/mol,但組合的多種後翻譯修飾,如磷酸化加上乙酰化, 對結合能的影響大 幅提升。 / 隨後,對74真菌基因組中泛素化系統的不同組件(分別為泛素,E1,E2,E3和E3的底物) 進行註釋並比較分析。 (1) 與擔子菌的其他基因組相比, 菇類基因組中具有顯著多的泛素。 (2) 儘管E1的數目在目標基因組之間波動極小, 菇類基因組中E2的數目仍顯著高於其他擔子菌。 (3) 對於候選的E3,菇類基因組中Paracaspase和F-box的數目也顯著高於其他擔子菌。這些結果表明,泛素化系統很可能在真菌形態分化、尤其是菇的形成中扮演著重要角色。 / 然後,與全基因組相比,發現基因組島具有顯著高的轉錄起始信號富集. 基於這種特異的轉錄調控信號,設計了一個新的基因組島預測程序(命名GIST)。通過分析顯示GIST具有較高的靈敏度和準確性. 最後,運用GIST,對最近在德國暴發的菌株TY-2482中的基因組島進行了首次的檢測和分析。 / 總之,這些工作不僅大大拓展了我們關於特定功能元素的理解,如MTP-蛋白和基因組島,同時也為進一步的相關研究提供了重要的工具和線索,如GIST以及菇類基因組中的泛素化系統。 / In the post-genomic era, it is a huge challenge to detect the functional elements in the "ocean" of data and provide meaningful biological inferences. Here, many interesting functional elements have been characterized and analyzed among targeted genomes. / First, through compiling more than 70,000 experimentally determined posttranslational modification (PTM) events from 7 eukaryotic organisms, the features and functions of proteins regulated by multiple types of PTMs (Mtp-Proteins) are detected and analyzed by compared with proteins harboring no known target site of PTMs. (1) The Mtp-Proteins are found significantly enriched in protein complexes, having more protein partners and preferred to act as hubs in protein-protein interaction network. (2) Mtp-Proteins also possess distinct function focus and biased subcellular locations. (3) Overall, about 80% analyzed PTM events are embedded in intrinsic disordered regions (IDRs). And most Mtp-Proteins have more IDRs than proteins without PTM sites. It suggests IDR may account most for why some proteins can harbor so many extraordinary functions. (4) Interestingly, some particular Mtp-Proteins biased carrying PTMs located in ordered regions are observed mainly related to "protein-DNA complex assembly". (5) We further evaluated the energetic effects of PTMs on stability of PPI and found that only a small fraction of single PTM event influence the binding energy more than 2kcal/mol; but combinational use of PTM types i.e. combinational phosphorylation and acetylation can change the binding energy dramatically. / On the second part, the different components in ubiquitination system, respectively ubiquitin, E1, E2, E3 and the substrates of E3, are identified and analyzed comparatively across 74 fungi genomes. The results mainly include: (1) the ubiquitin number is significantly higher within the mushroom-forming genomes compared to other basidiomycota genomes. (2) The number of E1, with the average of 2.92, is consistent among most genomes. However, the number of E2 is different between mushroom-forming genomes and other basidiomycota genomes. (3) For the E3 candidates, it is found that the number of domain Paracaspase and F-box in the mushroom-forming genomes is significantly higher than the other basidiomycota genomes. These results suggest that the ubiquitination system may play vital role in divergence of fungi morphogenesis, especially, such as the formation of mushroom. / Then, the focus shift to genomic islands (GIs). Compared to the whole genome, highly enriched transcription initiation positions are firstly found to be precipitated in GI regions. Based on this heterogeneous transcriptional regulatory signal, a novel procedure GIST (Genome-island Identification by Signals of Transcription) for genomic island detection is designed. Interestingly, our method demonstrates higher sensitivity in detecting genomic islands harboring genes with biased GI-like function, preferenced subcellular localization, skewed GC property and shorter gene length. Finally, using the GIST, many interesting GIs are detected and analyzed in the German outbreak strain TY-2482 for the first time. / In summary, these work not only considerably expand our understanding of several functional genetic elements, such as genomic island and proteins regulated by combinational multiple PTMs, but also provide important tool and clues, such as GIST and potential E3 expansion in mushroom-forming fungi, for further related studies. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Huang, Qianli. / Thesis (Ph.D.)--Chinese University of Hong Kong, 2013. / Includes bibliographical references (leaves 161-186). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Abstracts also in Chinese. / Abstract --- p.i / 論文摘要 --- p.iii / Abbreviations --- p.v / Acknowledgements --- p.vi / Declaration --- p.viii / Table of Contents --- p.ix / List of Figures --- p.xi / List of Tables --- p.xiv / Chapter Chapter 1 --- Literature Review --- p.1 / Chapter 1.1 --- General introduction --- p.1 / Chapter 1.2 --- Post-translational modification --- p.2 / Chapter 1.2.1 --- Combinational multiple types of post-translational modification --- p.2 / Chapter 1.3 --- Genomic islands --- p.7 / Chapter 1.3.1 --- Brief introduction --- p.7 / Chapter 1.3.2 --- Bioinformatic tools and database for identification of Genomic islands --- p.9 / Chapter 1.4 --- Objectives and significance --- p.13 / Chapter Chapter 2 --- Systematic analysis on features and functions of proteins regulated by combinational multiple types of post-translational modifications --- p.15 / Chapter 2.1 --- Introduction --- p.15 / Chapter 2.2 --- Materials and Methods --- p.18 / Chapter 2.2.1 --- Annotation of PTM pattern and analyses on target residues --- p.18 / Chapter 2.2.2 --- Classification of Human Proteins --- p.19 / Chapter 2.2.3 --- Dataset of human protein-protein interactions (PPIs) and Construction of PPI network --- p.19 / Chapter 2.2.4 --- Calculation of Binding Energy --- p.20 / Chapter 2.2.5 --- Functional characterization and subcellular localization analysis --- p.21 / Chapter 2.2.5 --- Annotating IDR regions --- p.22 / Chapter 2.2.7 --- Statistical analyses --- p.23 / Chapter 2.3 --- Results --- p.23 / Chapter 2.3.1 --- Combinational interactions of multiple PTM types are undergoing evolutionary selection --- p.23 / Chapter 2.3.2 --- Evolutionary profile of modified amino acid residues --- p.33 / Chapter 2.3.3 --- Mtp-Proteins are enriched in the protein complex --- p.43 / Chapter 2.3.4 --- Multiple PTMs enable target protein function as hub or super-hub in PPI network --- p.46 / Chapter 2.3.5 --- Energetic effect of PTMs on the Stability of protein-protein binding --- p.60 / Chapter 2.3.6 --- Mtp-Proteins demonstrate distinct function focus --- p.65 / Chapter 2.3.7 --- Mtp-Proteins: located preferedly in Cytoplasm and Nucleus --- p.69 / Chapter 2.3.8 --- Why Mtp-Proteins possess so many special features : importance of IDR --- p.75 / Chapter 2.4 --- Discussion --- p.82 / Chapter 2.4.1 --- The hints from the features of Mtp-Proteins --- p.82 / Chapter 2.4.2 --- The implication of combinational interaction between two different functional PTM categories: biased locating in IDRs and ordered regions respectively --- p.84 / Chapter Chapter 3 --- Genome-wide comparative analyses of ubiquitome among basidiomycota and other typical fungi genomes --- p.87 / Chapter 3.1 --- Introduction --- p.87 / Chapter 3.2 --- Materials and Methods --- p.89 / Chapter 3.2.1 --- Genome sequences and annotation acquirement. --- p.89 / Chapter 3.2.2 --- Bioinformatic prediction of components in ubiquitome --- p.89 / Chapter 3.3 --- Results --- p.90 / Chapter 3.3.1 --- Identification of ubiquitin candidates among 74 fungi genomes --- p.90 / Chapter 3.3.2 --- Detection of potential E1 and E2 among all considered genomes --- p.94 / Chapter 3.3.3 --- Prediction and comparative analysis of different types of E3 --- p.98 / Chapter 3.3.4 --- The possible substrates of E3 --- p.104 / Chapter 3.4 --- Discussion --- p.107 / Chapter Chapter 4 --- Genomic islands Identification by Signals of Transcription --- p.109 / Chapter 4.1 --- Introduction --- p.109 / Chapter 4.2 --- Materials and Methods --- p.112 / Chapter 4.2.1 --- Genome sequence and annotation data --- p.112 / Chapter 4.2.2 --- Transcription start points (TSPs) scanning --- p.113 / Chapter 4.2.3 --- Genomic island dataset construction --- p.114 / Chapter 4.2.4 --- GIST: Genomic-island Identification by Signal of Transcription --- p.115 / Chapter 4.2.5 --- Functional characterization and subcellular localization analysis --- p.116 / Chapter 4.2.6 --- Codon usage, GC content and gene length --- p.117 / Chapter 4.2.7 --- Statistical analyses --- p.118 / Chapter 4.3 --- Results --- p.132 / Chapter 4.3.1 --- High-density transcriptional initiation signals associated with GIs --- p.132 / Chapter 4.3.2 --- Predict the potential novel GIs through GIST: Genomic-island Identification by Signal of Transcription --- p.134 / Chapter 4.3.3 --- Comparative Analysis: Distribution of gene function categories --- p.138 / Chapter 4.3.4 --- Comparative Analysis: Divergence of subcellular locations --- p.140 / Chapter 4.3.5 --- Comparative Analysis: GC property and gene length --- p.144 / Chapter 4.3.6 --- Hints of "non-optimal" codon usage bias --- p.145 / Chapter 4.3.7 --- Application of GIST to analyze GIs in the German E. coli O104:H4 outbreak strain --- p.147 / Chapter 4.4 --- Discussion --- p.152 / Chapter Chapter 5 --- Concluding remarks --- p.158 / References --- p.161
|
98 |
A study of frequent pattern and association rule mining: with applications in inventory update and marketing.January 2004 (has links)
Wong, Chi-Wing. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2004. / Includes bibliographical references (leaves 149-153). / Abstracts in English and Chinese. / Abstract --- p.i / Acknowledgement --- p.iv / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- MPIS --- p.3 / Chapter 1.2 --- ISM --- p.5 / Chapter 1.3 --- MPIS and ISM --- p.5 / Chapter 1.4 --- Thesis Organization --- p.6 / Chapter 2 --- MPIS --- p.7 / Chapter 2.1 --- Introduction --- p.7 / Chapter 2.2 --- Related Work --- p.10 / Chapter 2.2.1 --- Item Selection Related Work --- p.11 / Chapter 2.3 --- Problem Definition --- p.22 / Chapter 2.3.1 --- NP-hardness --- p.25 / Chapter 2.4 --- Cross Selling Effect by Association Rules --- p.28 / Chapter 2.5 --- Quadratic Programming Method --- p.32 / Chapter 2.6 --- Algorithm MPIS_Alg --- p.41 / Chapter 2.6.1 --- Overall Framework --- p.43 / Chapter 2.6.2 --- Enhancement Step --- p.47 / Chapter 2.6.3 --- Implementation Details --- p.48 / Chapter 2.7 --- Genetic Algorithm --- p.60 / Chapter 2.7.1 --- Crossover --- p.62 / Chapter 2.7.2 --- Mutation --- p.64 / Chapter 2.8 --- Performance Analysis --- p.64 / Chapter 2.8.1 --- Preparation Phase --- p.65 / Chapter 2.8.2 --- Main Phase --- p.69 / Chapter 2.9 --- Experimental Result --- p.77 / Chapter 2.9.1 --- Tools for Quadratic Programming --- p.77 / Chapter 2.9.2 --- Partition Matrix Technique --- p.78 / Chapter 2.9.3 --- Data Sets --- p.81 / Chapter 2.9.4 --- Empirical Study for GA --- p.84 / Chapter 2.9.5 --- Experimental Results --- p.92 / Chapter 2.9.6 --- Scalability --- p.102 / Chapter 2.10 --- Conclusion --- p.106 / Chapter 3 --- ISM --- p.107 / Chapter 3.1 --- Introduction --- p.107 / Chapter 3.2 --- Related Work --- p.108 / Chapter 3.2.1 --- Network Model --- p.108 / Chapter 3.3 --- Problem Definition --- p.112 / Chapter 3.4 --- Association Based Cross-Selling Effect --- p.117 / Chapter 3.5 --- Quadratic Programming --- p.118 / Chapter 3.5.1 --- Quadratic Form --- p.119 / Chapter 3.5.2 --- Algorithm --- p.128 / Chapter 3.5.3 --- Example --- p.129 / Chapter 3.6 --- Hill-Climbing Approach --- p.134 / Chapter 3.6.1 --- Efficient Calculation of Formula of Profit Gain --- p.134 / Chapter 3.6.2 --- FP-tree Implementation --- p.135 / Chapter 3.7 --- Empirical Study --- p.136 / Chapter 3.7.1 --- Data Set --- p.137 / Chapter 3.7.2 --- Experimental Results --- p.138 / Chapter 3.8 --- Conclusion --- p.141 / Chapter 4 --- Conclusion --- p.147 / Bibliography --- p.153
|
99 |
Uma Estratégia de Desenvolvimento de Data Warehouse Geográfico com Integração Híbrida Aplicada ao Monitoramento de Queimadas na AmazôniaHiléia da Silva Melo, Áurea January 2003 (has links)
Made available in DSpace on 2014-06-12T17:40:23Z (GMT). No. of bitstreams: 2
arquivo7012_1.pdf: 2004375 bytes, checksum: 57e9f7963b941c4b0a6bf7bc31c6e104 (MD5)
license.txt: 1748 bytes, checksum: 8a4605be74aa9ea9d79846c1fba20a33 (MD5)
Previous issue date: 2003 / Os sistemas de suporte à decisão atravessaram as fronteiras dos negócios comerciais e atingiram, atualmente , as mais diversas áreas do conhecimento. Seja em instituições de ensino, governamentais, privadas com ou sem fins lucrativos, onde houve um processo decisório, haverá a necessidade de sistemas desse tipo. Nesta categoria destaca-se o Data Warehouse (DW), que permite a geração de dados integrados e históricos possibilitando que seus usuários tomem decisões com base em fatos e não em especulações e, embora já exista há quase vinte anos, ainda não há uma metodologia formal para sua implementação. Tal fato estende-se também aos DW Geográficos (DWG), ou seja, um ambiente de DW com tratamento e armazenamento de dados geo-referenciados. Nesse contexto, o presente trabalho define uma estratégia de desenvolvimento de DWG, contemplando desde a fase de levantamento até a inserção dos dados, seguindo o método de interação híbrida e tendo como base, as metodologias de Ralph Kimball e Jonh A. Zachman para DW tradicionais. Além disso, é feito um estudo de caso do Sistema de Monitoramento de Queimadas da Amazônia Legal de forma a avaliar e validar cada uma das etapas da estratégia implementada
|
100 |
Data Quality Through Active Constraint Discovery and MaintenanceChiang, Fei Yen 10 December 2012 (has links)
Although integrity constraints are the primary means for enforcing data integrity, there are cases in which they are not defined or are not strictly enforced. This leads to inconsistencies in the data, causing poor data quality. In this thesis, we leverage the power of constraints to improve data quality. To ensure that the data conforms to the intended application domain semantics, we develop two algorithms focusing on constraint discovery. The first algorithm discovers a class of conditional constraints, which hold over a subset of the relation, under specific conditional values. The second algorithm discovers attribute domain constraints, which bind specific values to the attributes of a relation for a given domain. These two types of constraints have been shown to be useful for data cleaning.
In practice, weak enforcement of constraints often occurs for performance reasons. This leads to inconsistencies between the data and the set of defined constraints. To resolve this inconsistency, we must determine whether it is the constraints or the data that is incorrect, and then make the necessary corrections. We develop a repair model that considers repairs to the data and repairs to the constraints on an equal footing. We present repair algorithms that find the necessary repairs to bring the data and the constraints back to a consistent state. Finally, we study the efficiency and quality of our techniques. We show that our constraint discovery algorithms find meaningful constraints with good precision and recall. We also show that our repair algorithms resolve many inconsistencies with high quality repairs, and propose repairs that previous algorithms did not consider.
|
Page generated in 0.063 seconds