• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 151
  • 34
  • 33
  • 26
  • 12
  • 10
  • 9
  • 4
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 331
  • 331
  • 76
  • 59
  • 49
  • 35
  • 34
  • 32
  • 31
  • 30
  • 30
  • 29
  • 28
  • 28
  • 27
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
161

An analysis of semantic data quality defiencies in a national data warehouse: a data mining approach

Barth, Kirstin 07 1900 (has links)
This research determines whether data quality mining can be used to describe, monitor and evaluate the scope and impact of semantic data quality problems in the learner enrolment data on the National Learners’ Records Database. Previous data quality mining work has focused on anomaly detection and has assumed that the data quality aspect being measured exists as a data value in the data set being mined. The method for this research is quantitative in that the data mining techniques and model that are best suited for semantic data quality deficiencies are identified and then applied to the data. The research determines that unsupervised data mining techniques that allow for weighted analysis of the data would be most suitable for the data mining of semantic data deficiencies. Further, the academic Knowledge Discovery in Databases model needs to be amended when applied to data mining semantic data quality deficiencies. / School of Computing / M. Tech. (Information Technology)
162

Strategies to Improve Data Quality for Forecasting Repairable Spare Parts

Eguasa, Uyi Harrison 01 January 2016 (has links)
Poor input data quality used in repairable spare parts forecasting by aerospace small and midsize enterprises (SME) suppliers results in poor inventory practices that manifest into higher costs and critical supply shortage risks. Guided by the data quality management (DQM) theory as the conceptual framework, the purpose of this exploratory multiple case study was to identify the key strategies that the aerospace SME repairable spares suppliers use to maximize their input data quality used in forecasting repairable spare parts. The multiple case study comprised of a census sample of 6 forecasting business leaders from aerospace SME repairable spares suppliers located in the states of Florida and Kansas. The sample was collected via semistructured interviews and supporting documentation from the consenting participants and organizational websites. Eight core themes emanated from the application of the content data analysis process coupled with methodological triangulation. These themes were labeled as establish data governance, identify quality forecast input data sources, develop a sustainable relationship and collaboration with customers and vendors, utilize a strategic data quality system, conduct continuous input data quality analysis, identify input data quality measures, incorporate continuous improvement initiatives, and engage in data quality training and education. Of the 8 core themes, 6 aligned to the DQM theory's conceptual constructs while 2 surfaced as outliers. The key implication of the research toward positive social change may include the increased situational awareness for SME forecasting business leaders to focus on enhancing business practices for input data quality to forecast repairable spare parts to attain sustainable profits.
163

Measurement properties of respondent-defined rating-scales : an investigation of individual characteristics and respondent choices

Chami-Castaldi, Elisa January 2010 (has links)
It is critical for researchers to be confident of the quality of survey data. Problems with data quality often relate to measurement method design, through choices made by researchers in their creation of standardised measurement instruments. This is known to affect the way respondents interpret and respond to these instruments, and can result in substantial measurement error. Current methods for removing measurement error are post-hoc and have been shown to be problematic. This research proposes that innovations can be made through the creation of measurement methods that take respondents' individual cognitions into consideration, to reduce measurement error in survey data. Specifically, the aim of the study was to develop and test a measurement instrument capable of having respondents individualise their own rating-scales. A mixed methodology was employed. The qualitative phase provided insights that led to the development of the Individualised Rating-Scale Procedure (IRSP). This electronic measurement method was then tested in a large multi-group experimental study, where its measurement properties were compared to those of Likert-Type Rating-Scales (LTRSs). The survey included pre-validated psychometric constructs which provided a baseline for comparing the methods, as well as to explore whether certain individual characteristics are linked to respondent choices. Structural equation modelling was used to analyse the survey data. Whilst no strong associations were found between individual characteristics and respondent choices, the results demonstrated that the IRSP is reliable and valid. This study has produced a dynamic measurement instrument that accommodates individual-level differences, not addressed by typical fixed rating-scales.
164

Factors affecting the performance of trainable models for software defect prediction

Bowes, David Hutchinson January 2013 (has links)
Context. Reports suggest that defects in code cost the US in excess of $50billion per year to put right. Defect Prediction is an important part of Software Engineering. It allows developers to prioritise the code that needs to be inspected when trying to reduce the number of defects in code. A small change in the number of defects found will have a significant impact on the cost of producing software. Aims. The aim of this dissertation is to investigate the factors which a ect the performance of defect prediction models. Identifying the causes of variation in the way that variables are computed should help to improve the precision of defect prediction models and hence improve the cost e ectiveness of defect prediction. Methods. This dissertation is by published work. The first three papers examine variation in the independent variables (code metrics) and the dependent variable (number/location of defects). The fourth and fifth papers investigate the e ect that di erent learners and datasets have on the predictive performance of defect prediction models. The final paper investigates the reported use of di erent machine learning approaches in studies published between 2000 and 2010. Results. The first and second papers show that independent variables are sensitive to the measurement protocol used, this suggests that the way data is collected a ects the performance of defect prediction. The third paper shows that dependent variable data may be untrustworthy as there is no reliable method for labelling a unit of code as defective or not. The fourth and fifth papers show that the dataset and learner used when producing defect prediction models have an e ect on the performance of the models. The final paper shows that the approaches used by researchers to build defect prediction models is variable, with good practices being ignored in many papers. Conclusions. The measurement protocols for independent and dependent variables used for defect prediction need to be clearly described so that results can be compared like with like. It is possible that the predictive results of one research group have a higher performance value than another research group because of the way that they calculated the metrics rather than the method of building the model used to predict the defect prone modules. The machine learning approaches used by researchers need to be clearly reported in order to be able to improve the quality of defect prediction studies and allow a larger corpus of reliable results to be gathered.
165

Historical aerial photographs and digital photogrammetry for landslide assessment

Walstra, Jan January 2006 (has links)
This study demonstrates the value of historical aerial photographs as a source for monitoring long-term landslide evolution, which can be unlocked by using appropriate photogrammetric methods. The understanding of landslide mechanisms requires extensive data records; a literature review identified quantitative data on surface movements as a key element for their analysis. It is generally acknowledged that, owing to the flexibility and high degree of automation of modern digital photogrammetric techniques, it is possible to derive detailed quantitative data from aerial photographs. In spite of the relative ease of such techniques, there is only scarce research available on data quality that can be achieved using commonly available material, hence the motivation of this study. In two landslide case-studies (the Mam Tor and East Pentwyn landslides) the different types of products were explored, that can be derived from historical aerial photographs. These products comprised geomorphological maps, automatically derived elevation models (DEMs) and displacement vectors. They proved to be useful and sufficiently accurate for monitoring landslide evolution. Comparison with independent survey data showed good consistency, hence validating the techniques used. A wide range of imagery was used in terms of quality, media and format. Analysis of the combined datasets resulted in improvements to the stochastic model and establishment of a relationship between image ground resolution and data accuracy. Undetected systematic effects provided a limiting constraint to the accuracy of the derived data, but the datasets proved insufficient to quantify each factor individually. An important advancement in digital photogrammetry is image matching, which allows automation of various stages of the working chain. However, it appeared that the radiometric quality of historical images may not always assure good results, both for extracting DEMs and vectors using automatic methods. It can be concluded that the photographic archive can provide invaluable data for landslide studies, when modern photogrammetric techniques are being used. As ever, independent and appropriate checks should always be included in any photogrammetric design.
166

Postoje adolescentů ve výzkumech veřejného mínění, kvalita a spolehlivost získaných dat / Adolescent's attitudes in public opinion research, data quality and reliability

Šlégrová, Petra January 2014 (has links)
The diploma thesis focuses on the youngest age cathegory of respondents in public opinion polls. The main goal is to examine character and quality of information about adolescent's attitudes and opinions obtained in public opinion polls that are held by The Public Opinion Research Centre. To achieve the main goal nonattitude is examined. The thesis will be divided into theoretical and practical part. Theoretical part stands on the basis of public opinion sociology and developmental psychology and the issue of attitude measurement is introduced along with adolescents developmental theory and characteristics. Practical part reflects the information summoned in theoretical part and test them on data collected by The Public Opinion Research Centre which were obtained in continuous research within project Our society. Analysis focuses on examination of nonresponse, don't know answers and neutral attitudes. Results are compared among all age groups.
167

Modèle d'estimation de l'imprécision des mesures géométriques de données géographiques / A model to estimate the imprecision of geometric measurements computed from geographic data.

Girres, Jean-François 04 December 2012 (has links)
De nombreuses applications SIG reposent sur des mesures de longueur ou de surface calculées à partir de la géométrie des objets d'une base de données géographiques (comme des calculs d'itinéraires routiers ou des cartes de densité de population par exemple). Cependant, aucune information relative à l'imprécision de ces mesures n'est aujourd'hui communiquée à l'utilisateur. En effet, la majorité des indicateurs de précision géométrique proposés porte sur les erreurs de positionnement des objets, mais pas sur les erreurs de mesure, pourtant très fréquentes. Dans ce contexte, ce travail de thèse cherche à mettre au point des méthodes d'estimation de l'imprécision des mesures géométriques de longueur et de surface, afin de renseigner un utilisateur dans une logique d'aide à la décision. Pour répondre à cet objectif, nous proposons un modèle permettant d'estimer les impacts de règles de représentation (projection cartographique, non-prise en compte du terrain, approximation polygonale des courbes) et de processus de production (erreur de pointé et généralisation cartographique) sur les mesures géométriques de longueur et de surface, en fonction des caractéristiques des données vectorielles évaluées et du terrain que ces données décrivent. Des méthodes d'acquisition des connaissances sur les données évaluées sont également proposées afin de faciliter le paramétrage du modèle par l'utilisateur. La combinaison des impacts pour produire une estimation globale de l'imprécision de mesure demeure un problème complexe et nous proposons des premières pistes de solutions pour encadrer au mieux cette erreur cumulée. Le modèle proposé est implémenté au sein du prototype EstIM (Estimation de l'Imprécision des Mesures) / Many GIS applications are based on length and area measurements computed from the geometry of the objects of a geographic database (such as route planning or maps of population density, for example). However, no information concerning the imprecision of these measurements is now communicated to the final user. Indeed, most of the indicators on geometric quality focuses on positioning errors, but not on measurement errors, which are very frequent. In this context, this thesis seeks to develop methods for estimating the imprecision of geometric measurements of length and area, in order to inform a user for decision support. To achieve this objective, we propose a model to estimate the impacts of representation rules (cartographic projection, terrain, polygonal approximation of curves) and production processes (digitizing error, cartographic generalisation) on geometric measurements of length and area, according to the characteristics and the spatial context of the evaluated objects. Methods for acquiring knowledge about the evaluated data are also proposed to facilitate the parameterization of the model by the user. The combination of impacts to produce a global estimation of the imprecision of measurement is a complex problem, and we propose approaches to approximate the cumulated error bounds. The proposed model is implemented in the EstIM prototype (Estimation of the Imprecision of Measurements)
168

Innovative methods in European road freight transport statistics: A pilot study

Fürst, Elmar Wilhelm, Oberhofer, Peter, Vogelauer, Christian, Bauer, Rudolf, Herold, David Martin January 2019 (has links) (PDF)
By using innovative methods, such as the automated transfer of corporate electronic data to National Statistical Institutions, official transport data can be significantly improved in terms of reliability, costs and the burden on respondents. In this paper, we show that the automated compilation of statistical reports is possible and feasible. Based on previous findings, a new method and tool were developed in cooperation with two business partners from the logistics sector in Austria. The results show that the prototype could successfully be implemented at the partner companies. Improved data quality can lead to more reliable analyses in various fields. Compared to actual volumes of investments into transport, the costs of transport statistics are limited. By using the new and innovative data collection techniques, these costs can even be reduced in the long run; at the same time, the risk of bad investments and wrong decisions caused by analyses relying on poor data quality can be reduced. This results in a substantial value for business, research, the economy and the society.
169

L’évolution du web de données basée sur un système multi-agents / Web of data evolution based on multi-agents

Chamekh, Fatma 07 December 2016 (has links)
Cette thèse porte sur la modélisation d’un système d’aide à l’évolution du web de données en utilisant un système multi-agents. Plus particulièrement, elle a pour but de guider l’utilisateur dans sa démarche de modification d’une base de connaissances RDF. Elle aborde les problématiques suivantes : intégrer de nouveaux triplets résultant de l'annotation des documents, proposer le changement adéquat dans les deux niveaux, ontologie et données, en se basant sur des mesures de similarités, analyser les effets de changements sur la qualité des données et la gestion des versions en prenant en considération d'éventuels conflits. Cette question de recherche complexe engendre plusieurs problématiques dont les réponses sont dépendantes les unes des autres. Pour cela, nous nous sommes orientées vers le paradigme agent pour décomposer le problème. Il s’agit de répartir les tâches dans des agents. La coopération entre les agents permet de répondre au besoin de dépendance évoqué ci-dessus pour bénéficier de l’aspect dynamique et combler les inconvénients d’un système modulaire classique. Le choix d’un tel écosystème nous a permis de proposer une démarche d’évaluation de la qualité des données en employant un modèle d’argumentation. Il s’agit d’établir un consensus entre les agents pour prendre en considération les trois dimensions intrinsèques : la cohérence, la concision la complétude, la validation syntaxique et sémantique. Nous avons modélisé les métriques d’évaluation de chaque dimension sous forme d’arguments. L’acceptation ou pas d’un argument se décide via les préférences des agents.Chaque modification donne lieu à une nouvelle version de la base de connaissances RDF. Nous avons choisi de garder la dernière version de la base de connaissances. Pour cette raison, nous avons choisi de préserver les URI des ressources. Pour garder la trace des changements, nous annotons chaque ressource modifiée. Néanmoins, une base de connaissances peut être modifiée par plusieurs collaborateurs ce qui peut engendrer des conflits. Ils sont conjointement le résultat d’intégration de plusieurs données et le chevauchement des buts des agents. Pour gérer ces conflits, nous avons défini des règles. Nous avons appliqué notre travail de recherche au domaine de médecine générale. / In this thesis, we investigate the evolution of RDF datasets from documents and LOD. We identify the following issues : the integration of new triples, the proposition of changes by taking into account the data quality and the management of differents versions.To handle with the complexity of the web of data evolution, we propose an agent based argumentation framework. We assume that the agent specifications could facilitate the process of RDF dataset evolution. The agent technology is one of the most useful solution to cope with a complex problem. The agents work as a team and are autonomous in the sense that they have the ability to decide themselves which goals they should adopt and how these goals should be acheived. The Agents use argumentation theory to reach a consensus about the best change alternative. Relatively to this goal, we propose an argumentation model based on the metric related to the intrinsic dimensions.To keep a record of all the occured modifications, we are focused on the ressource version. In the case of a collaborative environment, several conflicts could be generated. To manage those conflicts, we define rules.The exploited domain is general medecine.
170

Um estudo sobre qualidade de dados em biodiversidade: aplicação a um sistema de digitalização de ocorrências de espécies / A study about data quality in biodiversity: application to a species ocurrences digitization system

Veiga, Allan Koch 09 February 2012 (has links)
Para o combate da atual crise de sustentabilidade ambiental, diversos estudos sobre a biodiversidade e o meio ambiente têm sido realizados com o propósito de embasar estratégias eficientes de conservação e uso de recursos naturais. Esses estudos são fundamentados em avaliações e monitoramentos da biodiversidade que ocorrem por meio da coleta, armazenamento, análise, simulação, modelagem, visualização e intercâmbio de um volume expressivo de dados sobre a biodiversidade em amplo escopo temporal e espacial. Dados sobre ocorrências de espécies são um tipo de dado de biodiversidade particularmente importante, pois são amplamente utilizados em diversos estudos. Contudo, para que as análises e os modelos gerados a partir desses dados sejam confiáveis, os dados utilizados devem ser de alta qualidade. Assim, para melhorar a Qualidade de Dados (QD) sobre ocorrências de espécies, o objetivo deste trabalho foi realizar um estudo sobre QD aplicado a dados de ocorrências de espécies que permitisse avaliar e melhorar a QD por meio de técnicas e recursos de prevenção a erros. O estudo foi aplicado a um Sistema de Informação (SI) de digitalização de dados de ocorrências de espécies, o Biodiversity Data Digitizer (BDD), desenvolvido no âmbito dos projetos da Inter-American Biodiversity Information Network Pollinators Thematic Network (IABIN-PTN) e BioAbelha FAPESP. Foi realizada uma revisão da literatura sobre dados de ocorrências de espécies e sobre os seus domínios de dados mais relevantes. Para os domínios de dados identificados como mais importantes (táxon, geoespacial e localização), foi realizado um estudo sobre a Avaliação da QD, no qual foi definido um conceito de QD em relação a cada domínio de dados por meio da identificação, definição e inter-relação de dimensões de QD (aspectos) importantes e de problemas que afetam essas dimensões. Embasado nesse estudo foram identificados recursos computacionais que permitissem melhorar a QD por meio da redução de erros. Utilizando uma abordagem de Gerenciamento da QD de prevenção a erros, foram identificados 13 recursos computacionais que auxiliam na prevenção de 8 problemas de QD, proporcionando, assim, uma melhoria da acurácia, precisão, completude, consistência, credibilidade da fonte e confiabilidade de dados taxonômicos, geoespaciais e de localização de ocorrências de espécies. Esses recursos foram implementados em duas ferramentas integradas ao BDD. A primeira é a BDD Taxon Tool. Essa ferramenta facilita a entrada de dados taxonômicos de ocorrências livres de erros por meio de, entre outros recursos, técnicas de fuzzy matching e sugestões de nomes e de hierarquias taxonômicas baseados no Catalog of Life. A segunda ferramenta, a BDD Geo Tool, auxilia o preenchimento de dados geoespaciais e de localização de ocorrências de espécies livres de erros por meio de técnicas de georeferenciamento a partir de descrição em linguagem natural da localização, de georeferenciamento reverso e de mapas interativos do Google Earth, entre outros recursos. Este trabalho demonstrou que com a implementação de determinados recursos computacionais em SI, problemas de QD podem ser reduzidos por meio da prevenção a erros. Como consequência, a QD em domínios de dados específicos é melhorada em relação a determinadas dimensões de QD. / For fighting the current environment sustainability crisis, several studies on biodiversity and the environment have been conducted in order to support efficient strategies for conservation and sustainable use of natural resources. These studies are based on assessment and monitoring of biodiversity that occur by means of the collection, storage, analysis, simulation, modeling, visualization and sharing of a significant volume of biodiversity data in broad temporal and spatial scale. Species occurrences data are a particularly important type of biodiversity data because they are widely used in various studies. Nevertheless, for the analyzing and modeling obtained from these data to be reliable, the data used must be high quality. Thus, to improve the Data Quality (DQ) of species occurrences, the aim of this work was to conduct a study about DQ applied to species occurrences data that allowed assessing and improving the DQ using techniques and resources to prevent errors. This study was applied to an Information System (IS) designed to digitize species occurrences, the Biodiversity Data Digitizer (BDD), that was developed in the scope of the Inter-American Biodiversity Information Network Pollinators Thematic Network (IABIN-PTN) and BioAbelha FAPESP projects. A literature review about species occurrences data and about the most relevant data domains was conducted. For the most important data domains identified (taxon, geospatial and location), a study on the DQ Assessment was performed, in which important DQ dimensions (aspects) and problems that affect theses dimensions were identified, defined and interrelated. Based upon this study, computational resources were identified that would allow improving the DQ by reducing errors. Using the errors preventing DQ Management approach, 13 computing resources to support the prevention of 8 DQ problems were identified, thus providing an improvement of accuracy, precision, completeness, consistency, credibility of source and believability of taxonomic, geospatial and location data of species occurrences. These resources were implemented in two tools integrated to the BDD IS. The first tool is the BDD Taxon Tool. This tool facilitates the entrance of error-free taxonomic data of occurrences by means of fuzzy matching techniques and suggestions for taxonomic names and hierarchies based on Catalog of Life, among other resources. The second tool, the BDD Geo Tool, helps to fill in error-free geospatial and location data about species occurrence by means of georeferencing techniques from natural language description of location, reverse georeferencing and Google Earth interactive maps, among other resources. This work showed that with the development of certain computing resources integrated to an IS, DQ problems are reduced by preventing errors. As a result of reducing some problems in particular, the DQ in specific data domains is improved for certain DQ dimensions.

Page generated in 0.0953 seconds