Global ETD Search

311	Estimating Per-pixel Classification Confidence of Remote Sensing Images Jiang, Shiguo 19 December 2012 (has links) No description available. Geographic Information Science Geography Remote Sensing spatial data quality GIS remote sensing image classification classification confidence sample design classification error posterior probability entropy maximum likelihood support vector machine neural network boosted decision tree
312	Quality of Data in Scope 3 Sustainability Reporting for the Mining and Extractive Industry Bratan, Dastan, Jacob, Steve Anthony January 2024 (has links) Climate change is the most pressing environmental challenge of our time, with global temperatures rising and extreme weather events becoming more frequent. The corporate sector, particularly large industrial entities, faces increasing scrutiny due to its significant contributions to global emissions. This thesis examines Scope 3 emissions reporting within the mining and extractive industries, focusing on data quality and comparing Nordic and international practices. Using a self-developed theoretical model of logic, concepts, metrics, and tools, this research investigates how these industries report their Scope 3 emissions and identifies gaps in current practices. The study utilizes an abductive approach, combining both inductive and deductive methods, supported by a mixed-method combined of empirical data from publicly available sustainability reports and an online questionnaire, targeting mining and extractive companies and sustainability managers. Key findings reveal considerable variability in how companies report Scope 3 emissions, with Nordic companies often lagging behind their international counterparts despite strong sustainability credentials. Data quality concerns, including issues of accuracy, completeness, and timeliness, undermine stakeholders' ability to make informed decisions. Additionally, the research highlights diverse tools and methodologies employed by companies, noting that the lack of clear guidelines often hinders their effectiveness. This thesis contributes to a deeper understanding of Scope 3 emissions management, emphasizing the need for standardized and effective reporting practices. Scope 3 emissions Data Data Quality Carbon accounting Sustainability accounting Sustainability Reporting Decarbonization non-financial reporting mining value chain extractive industry mining companies Greenhouse gas GHG Protocol Governance ESG Business Administration Företagsekonomi
313	Machine Learning Methods for Data Quality Aspects in Edge Computing Platforms Mitra, Alakananda 12 1900 (has links) In this research, three aspects of data quality with regard to artifical intelligence (AI) have been investigated: detection of misleading fake data, especially deepfakes, data scarcity, and data insufficiency, especially how much training data is required for an AI application. Different application domains where the selected aspects pose issues have been chosen. To address the issues of data privacy, security, and regulation, these solutions are targeted for edge devices. In Chapter 3, two solutions have been proposed that aim to preempt such misleading deepfake videos and images on social media. These solutions are deployable at edge devices. In Chapter 4, a deepfake resilient digital ID system has been described. Another data quality aspect, data scarcity, has been addressed in Chapter 5. One of such agricultural problems is estimating crop damage due to natural disasters. Data insufficiency is another aspect of data quality. The amount of data required to achieve acceptable accuracy in a machine learning (ML) model has been studied in Chapter 6. As the data scarcity problem is studied in the agriculture domain, a similar scenario—plant disease detection and damage estimation—has been chosen for this verification. This research aims to provide ML or deep learning (DL)-based methods to solve several data quality-related issues in different application domains and achieve high accuracy. We hope that this work will contribute to research on the application of machine learning techniques in domains where data quality is a barrier to success. Machine Learning Deep Learning Data Quality Edge Computing Deepfake Digital ID Smart City Smart Agriculture Agricultural Cyber Physical System Crop Damage Estimation Computer Science Artificial Intelligence
314	Three essays on the German capital market Brückner, Roman 04 April 2013 (has links) Die vorliegende Dissertation umfasst drei eigen¬ständige Aufsätze zum deutschen Aktienmarkt. In allen drei Aufsätzen stehen eigene empirische Unter¬suchungen im Mittel-punkt. Bisherige empirische Untersuchungen zum deutschen Aktienmarkt konzentrieren sich meist auf das höchste deutsche Marktsegment, den Amtlichen Markt in Frankfurt. Die Anzahl der empirischen Arbeiten zu den Aktien der unteren Marktsegmente, insbesondere zum Geregelten Markt in Frankfurt, ist hingegen gering. Der erste Aufsatz beschäftigt sich mit dem Geregelten Markt und analysiert, ob die Performance mit dem Amtlichen Markt zu vergleichen ist. Beispielsweise war unklar, ob der Geregelte Markt ein ähnliches Desaster wie der Neue Markt darstellt. Wir stellen fest, dass die Aktien des Geregelten Marktes durchaus mit denen des Amtlichen Marktes mithalten können. Im zweiten Aufsatz wird untersucht, inwiefern das das CAPM nach Sharpe (1964), Lintner (1965) und Mossin (1966) die Renditen deutscher Aktien erklären kann. Zusätzlich wird untersucht, ob das CAPM in Deutschland um Faktoren für Size und Buchwert-Marktwert erweitert werden sollte. Auf Basis unserer Untersuchungen stellen wir fest, dass eine Erweiterung des CAPMs um Size- und/oder Buchwert-/Marktwertfaktoren derzeit nicht sinnvoll erscheint. In beiden Aufsätzen und in empirischen Untersuchungen im Allgemeinen spielt die Qualität der verwendeten Daten eine wichtige Rolle. Dementsprechend wird im dritten Aufsatz die Qualität der Datastream Daten für den deutschen Aktienmarkt untersucht. Hierbei kommen wir zu dem Ergebnis, dass Datastream als primäre Datenquelle für den deutschen Kapitalmarkt vor 1990 ungeeignet ist. Nach 1990 kommt es zwar zu zufälligen Fehlern in Datastream, diese sind allerdings nur selten gravierend. / This thesis consists of three empirical essays on the German stock market. Prior empirical research on the German stock market primarily focused on the top market segment in Germany, the Amtlicher Markt in Frankfurt. Only few empirical studies look at the lower market segments like the Geregelter Markt in Frankfurt. The first essay examines whether the performance of the stocks of the Geregelter Markt is comparable to the performance of the stocks listed in the Amtlicher Markt. For example, it was unclear whether the stocks from the Geregelter Markt performed as disastrous as the stocks from the Neuer Markt. We find that the performance of the stocks from the Geregelter Markt is comparable to those of the Amtlicher Markt. The second essay, examines whether the CAPM of Sharpe (1964), Lintner (1965) and Mossin (1966) explains the cross-section of German stock returns. Additionally, we evaluate whether the CAPM should be extended by factors for size and book-to-market. Based on our results, we conclude that the CAPM should not be extended by size and book-to-market in Germany. Data quality is an important aspect of both essays. As a consequence, we examine the quality of equity data from Datastream for the German stock market in the third essay. We conclude that before 1990 Datastream should not be used as the primary data source for empirical studies on the German stock market. After 1990, we find random errors in Datastream, but these are rarely severe. Deutscher Aktienmarkt Amtlicher Markt Geregelter Markt CAPM Size Buchwert-Marktwert Datastream Datenqualität German Stock Market Amtlicher Markt Geregelter Markt CAPM Size Book-to-Market Datastream Data Quality 330 Wirtschaft 17 Wirtschaft QK 620 ddc:330
315	Uso de propriedades visuais-interativas na avaliação da qualidade de dados / Using visual-interactive properties in the data quality assessment Josko, João Marcelo Borovina 29 April 2016 (has links) Os efeitos dos dados defeituosos sobre os resultados dos processos analíticos são notórios. Aprimorar a qualidade dos dados exige estabelecer alternativas a partir de vários métodos, técnicas e procedimentos disponíveis. O processo de Avaliação da Qualidade dos Dados - pAQD - provê relevantes insumos na definição da alternativa mais adequada por meio do mapeamento dos defeitos nos dados. Relevantes abordagens computacionais apoiam esse processo. Tais abordagens utilizam métodos quantitativos ou baseados em asserções que usualmente restringem o papel humano a interpretação dos seus resultados. Porém, o pAQD depende do conhecimento do contexto dos dados visto que é impossível confirmar ou refutar a presença de defeitos baseado exclusivamente nos dados. Logo, a supervisão humana é essencial para esse processo. Sistemas de visualização pertencem a uma classe de abordagens supervisionadas que podem tornar visíveis as estruturas dos defeitos nos dados. Apesar do considerável conhecimento sobre o projeto desses sistemas, pouco existe para o domínio da avaliação visual da qualidade dos dados. Isto posto, este trabalho apresenta duas contribuições. A primeira reporta uma taxonomia que descreve os defeitos relacionados aos critérios de qualidade da acuracidade, completude e consistência para dados estruturados e atemporais. Essa taxonomia seguiu uma metodologia que proporcionou a cobertura sistemática e a descrição aprimorada dos defeitos em relação ao estado-da-arte das taxonomias. A segunda contribuição reporta relacionamentos entre propriedades-defeitos que estabelecem que certas propriedades visuais-interativas são mais adequadas para a avaliação visual de certos defeitos em dadas resoluções de dados. Revelados por um estudo de caso múltiplo e exploratório, esses relacionamentos oferecem indicações que reduzem a subjetividade durante o projeto de sistemas de visualização de apoio a avaliação visual da qualidade dos dados. / The effects of poor data quality on the reliability of the outcomes of analytical processes are notorious. Improving data quality requires alternatives that combine procedures, methods, techniques and technologies. The Data Quality Assessment process - DQAp - provides relevant and practical inputs for choosing the most suitable alternative through a data defects mapping. Relevant computational approaches support this process. Such approaches apply quantitative or assertions-based methods that usually limit the human interpretation of their outcomes. However, the DQAp process strongly depends on data context knowledge since it is impossible to confirm or refute a defect based only on data. Hence, human supervision is essential throughout this process. Visualization systems belong to a class of supervised approaches that can make visible data defect structures. Despite their considerable design knowledge encodings, there is little support design to data quality visual assessment. Therefore, this work reports two contributions. The first reports a taxonomy that organizes a detailed description of defects on structured and timeless data related to the quality criteria of accuracy, completeness and consistency. This taxonomy followed a methodology which enabled a systematic coverage of data defects and an improved description of data defects in regard to state-of-art literature. The second contribution reports a set of property-defect relationships that establishes that certain visual and interactive properties are more suitable for visual assessment of certain data defects in a given data resolution. Revealed by an exploratory and multiple study case, these relationships provides implications that reduce the subjectivity in the visualization systems design for data quality visual assessment. Análise intensiva de dados Avaliação visual da qualidade de dados Bancos de dados relacionais Data defects Data quality visual assessment Data visualization Defeitos nos dados Estudo de caso exploratório Estudo observacional qualitativo Exploratory study case Formal taxonomy Intensive data analysis Qualitative observational study Relational database Taxonomia formal Visualização de dados
316	工商及服務業普查資料品質之研究 / Data quality research of industry and commerce census 邱詠翔 Unknown Date (has links) 資料品質的好壞會影響決策品質以及各種行動的執行成果，所以資料品質在近年來越來越受到重視。本研究包含了兩個資料庫，一個是產業創新調查資料庫，一個是95年工商及服務業普查資料庫，資料品質的好壞對一個資料庫來說也是一個相當重要的議題，資料庫中往往都含有錯誤的資料，錯誤的資料會導致分析結果出現偏差的狀況，所以在進行資料分析之前，資料清理與整理是必要的事前處理工作。我們從母體資料分佈與樣本資料分佈得知，在清理與整理資料之前，平均創新員工人數為92.08，平均工商員工人數為135.54；在清理與整理資料之後，我們比較兩個資料庫員工人數的相關性、相似性、距離等性質，結果顯示兩個資料庫的資料一致性極高，平均創新員工人數與平均工商員工人數分別為39.01與42.12，跟母體平均員工人數7.05較為接近，也顯示出資料清理的重要性。本研究使用的方法為事後分層抽樣，主要研究目的是要利用產業創新調查樣本來推估95年工商及服務業普查母體資料的準確性。產業創新調查樣本在推估母體從業員工人數與母體營業收入方面皆出現高估的狀況，推測出現高估的原因是產業創新調查母體為前中華徵信所出版的五千大企業名冊為母體底冊，而工商及服務業普查企業資料為一般企業母體底冊。因此，我們利用和產業創新調查樣本所相對應的工商普查樣本做驗證，發現95年工商及服務業普查樣本與產業創新調查樣本的資料一致性極高。 / Data quality is good or bad will affect the decision quality and achievements in the implementation of various actions, so the data quality more and more attention in recent years. This study consists of two databases, one is the industrial innovation survey database, another is the industry and commerce census database in ninety five years. Data quality is good or bad of a database is also a very important issue, the database often contain erroneous information, incorrect information will result in bias of the analysis results. So before carrying out data analysis, data cleaning and consolidation is necessary. We can know from the parent and the sample data distribution. Before data cleaning and consolidation, the average number of innovation employees is 92.08, and the average number of industrial-commerce employees is 135.54. After data cleaning and consolidation, we compare the correlation, similarity, and distance of the number of employees in two databases. The results show the data consistency of the two databases is very high, the average number of innovation employees is 39.01, and the average number of industrial-commerce employees is 42.12, it is closer to the average number of parent employees 7.05. This also shows the importance of data cleaning. Method used in the study is post-stratified sampling, the main research objective is to use industrial innovation survey sample to estimate the data accuracy of the industry and commerce census in ninety five years. Use industrial innovation survey sample to estimate the number of employees and operating revenue in the industry and commerce census in ninety five years are both overestimated, we guess the reason is that the parent of the industrial innovation survey is five thousand large enterprises published by China Credit Information, and the parent of the industry and commerce census is general enterprises. Therefore, we use the corresponding industry and commerce census sample for validation. The results show that the data consistency of the industrial innovation survey sample and the industry and commerce census sample in ninety five years is very high. 資料品質事後分層抽樣產業創新調查工商及服務業普查資料清理與整理 Data Quality Post-Stratified Sampling Industrial Innovation Survey Industry and Commerce Census Data Cleaning and Consolidation
317	Skoner en kleiner vertaalgeheues Wolff, Friedel 10 1900 (has links) Rekenaars kan ’n nuttige rol speel in vertaling. Twee benaderings is vertaalgeheuestelsels en masjienvertaalstelsels. By hierdie twee tegnologieë word ’n vertaalgeheue gebruik—’n tweetalige versameling vorige vertalings. Hierdie proefskrif bied metodes aan om die kwaliteit van ’n vertaalgeheue te verbeter. ’n Masjienleerbenadering word gevolg om foutiewe inskrywings in ’n vertaalgeheue te identifiseer. ’n Verskeidenheid leerkenmerke in drie kategorieë word aangebied: kenmerke wat verband hou met tekslengte, kenmerke wat deur kwaliteittoetsers soos vertaaltoetsers, ’n speltoetser en ’n grammatikatoetser bereken word, asook statistiese kenmerke wat met behulp van eksterne data bereken word. Die evaluasie van vertaalgeheuestelsels is nog nie gestandaardiseer nie. In hierdie proefskrif word ’n verskeidenheid probleme met bestaande evaluasiemetodes uitgewys, en ’n verbeterde evaluasiemetode word ontwikkel. Deur die foutiewe inskrywings uit ’n vertaalgeheue te verwyder, is ’n kleiner, skoner vertaalgeheue beskikbaar vir toepassings. Eksperimente dui aan dat so ’n vertaalgeheue beter prestasie behaal in ’n vertaalgeheuestelsel. As ondersteunende bewys vir die waarde van ’n skoner vertaalgeheue word ’n verbetering ook aangedui by die opleiding van ’n masjienvertaalstelsel. / Computers can play a useful role in translation. Two approaches are translation memory systems and machine translation systems. With these two technologies a translation memory is used— a bilingual collection of previous translations. This thesis presents methods to improve the quality of a translation memory. A machine learning approach is followed to identify incorrect entries in a translation memory. A variety of learning features in three categories are presented: features associated with text length, features calculated by quality checkers such as translation checkers, a spell checker and a grammar checker, as well as statistical features computed with the help of external data. The evaluation of translation memory systems is not yet standardised. This thesis points out a number of problems with existing evaluation methods, and an improved evaluation method is developed. By removing the incorrect entries in a translation memory, a smaller, cleaner translation memory is available to applications. Experiments demonstrate that such a translation memory results in better performance in a translation memory system. As supporting evidence for the value of a cleaner translation memory, an improvement is also achieved in training a machine translation system. / School of Computing / Ph. D. (Rekenaarwetenskap) Taaltegnologie Natuuurliketaalverwerking Vertaalgeheue Rekenaargesteunde vertaling Parallelle korpus Datakwaliteit Masjienleer Masjienvertaling Regressie Klassifikasie Language technology Natural language processing Translation memory Computer-assisted translation Parallel corpus Data quality Machine learning Machine translation Regression Classification 006.35 Machine learning Machine translating
318	Une approche automatisée basée sur des contraintes d’intégrité définies en UML et OCL pour la vérification de la cohérence logique dans les systèmes SOLAP : applications dans le domaine agri-environnemental / An automated approach based on integrity constraints defined in UML and OCL for the verification of logical consistency in SOLAP systems : applications in the agri-environmental field Boulil, Kamal 26 October 2012 (has links) Les systèmes d'Entrepôts de Données et OLAP spatiaux (EDS et SOLAP) sont des technologies d'aide à la décision permettant l'analyse multidimensionnelle de gros volumes de données spatiales. Dans ces systèmes, la qualité de l'analyse dépend de trois facteurs : la qualité des données entreposées, la qualité des agrégations et la qualité de l’exploration des données. La qualité des données entreposées dépend de critères comme la précision, l'exhaustivité et la cohérence logique. La qualité d'agrégation dépend de problèmes structurels (e.g. les hiérarchies non strictes qui peuvent engendrer le comptage en double des mesures) et de problèmes sémantiques (e.g. agréger les valeurs de température par la fonction Sum peut ne pas avoir de sens considérant une application donnée). La qualité d'exploration est essentiellement affectée par des requêtes utilisateur inconsistantes (e.g. quelles ont été les valeurs de température en URSS en 2010 ?). Ces requêtes peuvent engendrer des interprétations erronées des résultats. Cette thèse s'attaque aux problèmes d'incohérence logique qui peuvent affecter les qualités de données, d'agrégation et d'exploration. L'incohérence logique est définie habituellement comme la présence de contradictions dans les données. Elle est typiquement contrôlée au moyen de Contraintes d'Intégrité (CI). Dans cette thèse nous étendons d'abord la notion de CI (dans le contexte des systèmes SOLAP) afin de prendre en compte les incohérences relatives aux agrégations et requêtes utilisateur. Pour pallier les limitations des approches existantes concernant la définition des CI SOLAP, nous proposons un Framework basé sur les langages standards UML et OCL. Ce Framework permet la spécification conceptuelle et indépendante des plates-formes des CI SOLAP et leur implémentation automatisée. Il comporte trois parties : (1) Une classification des CI SOLAP. (2) Un profil UML implémenté dans l'AGL MagicDraw, permettant la représentation conceptuelle des modèles des systèmes SOLAP et de leurs CI. (3) Une implémentation automatique qui est basée sur les générateurs de code Spatial OCL2SQL et UML2MDX qui permet de traduire les spécifications conceptuelles en code au niveau des couches EDS et serveur SOLAP. Enfin, les contributions de cette thèse ont été appliquées dans le cadre de projets nationaux de développement d'applications (S)OLAP pour l'agriculture et l'environnement. / Spatial Data Warehouse (SDW) and Spatial OLAP (SOLAP) systems are Business Intelligence (BI) allowing for interactive multidimensional analysis of huge volumes of spatial data. In such systems the quality ofanalysis mainly depends on three components : the quality of warehoused data, the quality of data aggregation, and the quality of data exploration. The warehoused data quality depends on elements such accuracy, comleteness and logical consistency. The data aggregation quality is affected by structural problems (e.g., non-strict dimension hierarchies that may cause double-counting of measure values) and semantic problems (e.g., summing temperature values does not make sens in many applications). The data exploration quality is mainly affected by inconsistent user queries (e.g., what are temperature values in USSR in 2010?) leading to possibly meaningless interpretations of query results. This thesis address the problems of logical inconsistency that may affect the data, aggregation and exploration qualities in SOLAP. The logical inconsistency is usually defined as the presence of incoherencies (contradictions) in data ; It is typically controlled by means of Integrity Constraints (IC). In this thesis, we extends the notion of IC (in the SOLAP domain) in order to take into account aggregation and query incoherencies. To overcome the limitations of existing approaches concerning the definition of SOLAP IC, we propose a framework that is based on the standard languages UML and OCL. Our framework permits a plateforme-independent conceptual design and an automatic implementation of SOLAP IC ; It consists of three parts : (1) A SOLAP IC classification, (2) A UML profile implemented in the CASE tool MagicDraw, allowing for a conceptual design of SOLAP models and their IC, (3) An automatic implementation based on the code generators Spatial OCLSQL and UML2MDX, which allows transforming the conceptual specifications into code. Finally, the contributions of this thesis have been experimented and validated in the context of French national projetcts aimming at developping (S)OLAP applications for agriculture and environment. OLAP Spatial Entrepôt de Données Spatiales Qualité de Données Qualité d'Agrégation de Données Qualité d'Exploration SOLAP Modélisation Multidimensionnelle Profil UML Langage de Contraintes Objet Génération de Code Spatial OLAP Spatial Data Warehouse Data Quality Data Aggregation Quality SOLAP Exploration Quality Multidimensional Modelling UML Profile Object Constraint Language Code Generation
319	Dashboardy - jejich analýza a implementace v prostředí SAP Business Objects / An analysis and implementation of Dashboards within SAP Business Objects 4.0/4.1 Kratochvíl, Tomáš January 2013 (has links) The diploma thesis is focused on dashboards analysis and distribution and theirs implementation afterwards in SAP Dashboards and Web Intelligence tools. The main goal of this thesis is an analysis of dashboards for different area of company management according to chosen of architecture solution. Another goal of diploma thesis is to take into account the principles of dashboards within the company and it deals with indicator comparison as well. The author further defines data life cycle within Business Intelligence and deals with the decomposition of particular dashboard types in theoretical part. At the end of theory, it is included an important chapter from point of view data quality, data quality process and data quality improvement and an using of SAP Best Practices and KBA as well for BI tools published by SAP. The implementation of dashboards should be back up theoretical part. Implementation is divided into 3 chapters according to selected architecture, using multisource systems, SAP Infosets/Query and using Data Warehouse or Data Mart as an architecture solution for reporting purposes. The deep implementing section should be help reader to make his own opinion to different architecture, but especially difference in used BI tools within SAP Business Objects. At the end of each section regarding architecture and its solution, there are defined pros and cons.
320	Uso de propriedades visuais-interativas na avaliação da qualidade de dados / Using visual-interactive properties in the data quality assessment João Marcelo Borovina Josko 29 April 2016 (has links) Os efeitos dos dados defeituosos sobre os resultados dos processos analíticos são notórios. Aprimorar a qualidade dos dados exige estabelecer alternativas a partir de vários métodos, técnicas e procedimentos disponíveis. O processo de Avaliação da Qualidade dos Dados - pAQD - provê relevantes insumos na definição da alternativa mais adequada por meio do mapeamento dos defeitos nos dados. Relevantes abordagens computacionais apoiam esse processo. Tais abordagens utilizam métodos quantitativos ou baseados em asserções que usualmente restringem o papel humano a interpretação dos seus resultados. Porém, o pAQD depende do conhecimento do contexto dos dados visto que é impossível confirmar ou refutar a presença de defeitos baseado exclusivamente nos dados. Logo, a supervisão humana é essencial para esse processo. Sistemas de visualização pertencem a uma classe de abordagens supervisionadas que podem tornar visíveis as estruturas dos defeitos nos dados. Apesar do considerável conhecimento sobre o projeto desses sistemas, pouco existe para o domínio da avaliação visual da qualidade dos dados. Isto posto, este trabalho apresenta duas contribuições. A primeira reporta uma taxonomia que descreve os defeitos relacionados aos critérios de qualidade da acuracidade, completude e consistência para dados estruturados e atemporais. Essa taxonomia seguiu uma metodologia que proporcionou a cobertura sistemática e a descrição aprimorada dos defeitos em relação ao estado-da-arte das taxonomias. A segunda contribuição reporta relacionamentos entre propriedades-defeitos que estabelecem que certas propriedades visuais-interativas são mais adequadas para a avaliação visual de certos defeitos em dadas resoluções de dados. Revelados por um estudo de caso múltiplo e exploratório, esses relacionamentos oferecem indicações que reduzem a subjetividade durante o projeto de sistemas de visualização de apoio a avaliação visual da qualidade dos dados. / The effects of poor data quality on the reliability of the outcomes of analytical processes are notorious. Improving data quality requires alternatives that combine procedures, methods, techniques and technologies. The Data Quality Assessment process - DQAp - provides relevant and practical inputs for choosing the most suitable alternative through a data defects mapping. Relevant computational approaches support this process. Such approaches apply quantitative or assertions-based methods that usually limit the human interpretation of their outcomes. However, the DQAp process strongly depends on data context knowledge since it is impossible to confirm or refute a defect based only on data. Hence, human supervision is essential throughout this process. Visualization systems belong to a class of supervised approaches that can make visible data defect structures. Despite their considerable design knowledge encodings, there is little support design to data quality visual assessment. Therefore, this work reports two contributions. The first reports a taxonomy that organizes a detailed description of defects on structured and timeless data related to the quality criteria of accuracy, completeness and consistency. This taxonomy followed a methodology which enabled a systematic coverage of data defects and an improved description of data defects in regard to state-of-art literature. The second contribution reports a set of property-defect relationships that establishes that certain visual and interactive properties are more suitable for visual assessment of certain data defects in a given data resolution. Revealed by an exploratory and multiple study case, these relationships provides implications that reduce the subjectivity in the visualization systems design for data quality visual assessment. Análise intensiva de dados Avaliação visual da qualidade de dados Bancos de dados relacionais Defeitos nos dados Estudo de caso exploratório Estudo observacional qualitativo Taxonomia formal Visualização de dados Data defects Data quality visual assessment Data visualization Exploratory study case Formal taxonomy Intensive data analysis Qualitative observational study Relational database

Search results