• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 152
  • 34
  • 33
  • 26
  • 12
  • 10
  • 9
  • 4
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 334
  • 334
  • 76
  • 59
  • 49
  • 35
  • 34
  • 32
  • 31
  • 30
  • 30
  • 29
  • 28
  • 28
  • 27
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
171

Innovative methods in European road freight transport statistics: A pilot study

Fürst, Elmar Wilhelm, Oberhofer, Peter, Vogelauer, Christian, Bauer, Rudolf, Herold, David Martin January 2019 (has links) (PDF)
By using innovative methods, such as the automated transfer of corporate electronic data to National Statistical Institutions, official transport data can be significantly improved in terms of reliability, costs and the burden on respondents. In this paper, we show that the automated compilation of statistical reports is possible and feasible. Based on previous findings, a new method and tool were developed in cooperation with two business partners from the logistics sector in Austria. The results show that the prototype could successfully be implemented at the partner companies. Improved data quality can lead to more reliable analyses in various fields. Compared to actual volumes of investments into transport, the costs of transport statistics are limited. By using the new and innovative data collection techniques, these costs can even be reduced in the long run; at the same time, the risk of bad investments and wrong decisions caused by analyses relying on poor data quality can be reduced. This results in a substantial value for business, research, the economy and the society.
172

L’évolution du web de données basée sur un système multi-agents / Web of data evolution based on multi-agents

Chamekh, Fatma 07 December 2016 (has links)
Cette thèse porte sur la modélisation d’un système d’aide à l’évolution du web de données en utilisant un système multi-agents. Plus particulièrement, elle a pour but de guider l’utilisateur dans sa démarche de modification d’une base de connaissances RDF. Elle aborde les problématiques suivantes : intégrer de nouveaux triplets résultant de l'annotation des documents, proposer le changement adéquat dans les deux niveaux, ontologie et données, en se basant sur des mesures de similarités, analyser les effets de changements sur la qualité des données et la gestion des versions en prenant en considération d'éventuels conflits. Cette question de recherche complexe engendre plusieurs problématiques dont les réponses sont dépendantes les unes des autres. Pour cela, nous nous sommes orientées vers le paradigme agent pour décomposer le problème. Il s’agit de répartir les tâches dans des agents. La coopération entre les agents permet de répondre au besoin de dépendance évoqué ci-dessus pour bénéficier de l’aspect dynamique et combler les inconvénients d’un système modulaire classique. Le choix d’un tel écosystème nous a permis de proposer une démarche d’évaluation de la qualité des données en employant un modèle d’argumentation. Il s’agit d’établir un consensus entre les agents pour prendre en considération les trois dimensions intrinsèques : la cohérence, la concision la complétude, la validation syntaxique et sémantique. Nous avons modélisé les métriques d’évaluation de chaque dimension sous forme d’arguments. L’acceptation ou pas d’un argument se décide via les préférences des agents.Chaque modification donne lieu à une nouvelle version de la base de connaissances RDF. Nous avons choisi de garder la dernière version de la base de connaissances. Pour cette raison, nous avons choisi de préserver les URI des ressources. Pour garder la trace des changements, nous annotons chaque ressource modifiée. Néanmoins, une base de connaissances peut être modifiée par plusieurs collaborateurs ce qui peut engendrer des conflits. Ils sont conjointement le résultat d’intégration de plusieurs données et le chevauchement des buts des agents. Pour gérer ces conflits, nous avons défini des règles. Nous avons appliqué notre travail de recherche au domaine de médecine générale. / In this thesis, we investigate the evolution of RDF datasets from documents and LOD. We identify the following issues : the integration of new triples, the proposition of changes by taking into account the data quality and the management of differents versions.To handle with the complexity of the web of data evolution, we propose an agent based argumentation framework. We assume that the agent specifications could facilitate the process of RDF dataset evolution. The agent technology is one of the most useful solution to cope with a complex problem. The agents work as a team and are autonomous in the sense that they have the ability to decide themselves which goals they should adopt and how these goals should be acheived. The Agents use argumentation theory to reach a consensus about the best change alternative. Relatively to this goal, we propose an argumentation model based on the metric related to the intrinsic dimensions.To keep a record of all the occured modifications, we are focused on the ressource version. In the case of a collaborative environment, several conflicts could be generated. To manage those conflicts, we define rules.The exploited domain is general medecine.
173

Um estudo sobre qualidade de dados em biodiversidade: aplicação a um sistema de digitalização de ocorrências de espécies / A study about data quality in biodiversity: application to a species ocurrences digitization system

Veiga, Allan Koch 09 February 2012 (has links)
Para o combate da atual crise de sustentabilidade ambiental, diversos estudos sobre a biodiversidade e o meio ambiente têm sido realizados com o propósito de embasar estratégias eficientes de conservação e uso de recursos naturais. Esses estudos são fundamentados em avaliações e monitoramentos da biodiversidade que ocorrem por meio da coleta, armazenamento, análise, simulação, modelagem, visualização e intercâmbio de um volume expressivo de dados sobre a biodiversidade em amplo escopo temporal e espacial. Dados sobre ocorrências de espécies são um tipo de dado de biodiversidade particularmente importante, pois são amplamente utilizados em diversos estudos. Contudo, para que as análises e os modelos gerados a partir desses dados sejam confiáveis, os dados utilizados devem ser de alta qualidade. Assim, para melhorar a Qualidade de Dados (QD) sobre ocorrências de espécies, o objetivo deste trabalho foi realizar um estudo sobre QD aplicado a dados de ocorrências de espécies que permitisse avaliar e melhorar a QD por meio de técnicas e recursos de prevenção a erros. O estudo foi aplicado a um Sistema de Informação (SI) de digitalização de dados de ocorrências de espécies, o Biodiversity Data Digitizer (BDD), desenvolvido no âmbito dos projetos da Inter-American Biodiversity Information Network Pollinators Thematic Network (IABIN-PTN) e BioAbelha FAPESP. Foi realizada uma revisão da literatura sobre dados de ocorrências de espécies e sobre os seus domínios de dados mais relevantes. Para os domínios de dados identificados como mais importantes (táxon, geoespacial e localização), foi realizado um estudo sobre a Avaliação da QD, no qual foi definido um conceito de QD em relação a cada domínio de dados por meio da identificação, definição e inter-relação de dimensões de QD (aspectos) importantes e de problemas que afetam essas dimensões. Embasado nesse estudo foram identificados recursos computacionais que permitissem melhorar a QD por meio da redução de erros. Utilizando uma abordagem de Gerenciamento da QD de prevenção a erros, foram identificados 13 recursos computacionais que auxiliam na prevenção de 8 problemas de QD, proporcionando, assim, uma melhoria da acurácia, precisão, completude, consistência, credibilidade da fonte e confiabilidade de dados taxonômicos, geoespaciais e de localização de ocorrências de espécies. Esses recursos foram implementados em duas ferramentas integradas ao BDD. A primeira é a BDD Taxon Tool. Essa ferramenta facilita a entrada de dados taxonômicos de ocorrências livres de erros por meio de, entre outros recursos, técnicas de fuzzy matching e sugestões de nomes e de hierarquias taxonômicas baseados no Catalog of Life. A segunda ferramenta, a BDD Geo Tool, auxilia o preenchimento de dados geoespaciais e de localização de ocorrências de espécies livres de erros por meio de técnicas de georeferenciamento a partir de descrição em linguagem natural da localização, de georeferenciamento reverso e de mapas interativos do Google Earth, entre outros recursos. Este trabalho demonstrou que com a implementação de determinados recursos computacionais em SI, problemas de QD podem ser reduzidos por meio da prevenção a erros. Como consequência, a QD em domínios de dados específicos é melhorada em relação a determinadas dimensões de QD. / For fighting the current environment sustainability crisis, several studies on biodiversity and the environment have been conducted in order to support efficient strategies for conservation and sustainable use of natural resources. These studies are based on assessment and monitoring of biodiversity that occur by means of the collection, storage, analysis, simulation, modeling, visualization and sharing of a significant volume of biodiversity data in broad temporal and spatial scale. Species occurrences data are a particularly important type of biodiversity data because they are widely used in various studies. Nevertheless, for the analyzing and modeling obtained from these data to be reliable, the data used must be high quality. Thus, to improve the Data Quality (DQ) of species occurrences, the aim of this work was to conduct a study about DQ applied to species occurrences data that allowed assessing and improving the DQ using techniques and resources to prevent errors. This study was applied to an Information System (IS) designed to digitize species occurrences, the Biodiversity Data Digitizer (BDD), that was developed in the scope of the Inter-American Biodiversity Information Network Pollinators Thematic Network (IABIN-PTN) and BioAbelha FAPESP projects. A literature review about species occurrences data and about the most relevant data domains was conducted. For the most important data domains identified (taxon, geospatial and location), a study on the DQ Assessment was performed, in which important DQ dimensions (aspects) and problems that affect theses dimensions were identified, defined and interrelated. Based upon this study, computational resources were identified that would allow improving the DQ by reducing errors. Using the errors preventing DQ Management approach, 13 computing resources to support the prevention of 8 DQ problems were identified, thus providing an improvement of accuracy, precision, completeness, consistency, credibility of source and believability of taxonomic, geospatial and location data of species occurrences. These resources were implemented in two tools integrated to the BDD IS. The first tool is the BDD Taxon Tool. This tool facilitates the entrance of error-free taxonomic data of occurrences by means of fuzzy matching techniques and suggestions for taxonomic names and hierarchies based on Catalog of Life, among other resources. The second tool, the BDD Geo Tool, helps to fill in error-free geospatial and location data about species occurrence by means of georeferencing techniques from natural language description of location, reverse georeferencing and Google Earth interactive maps, among other resources. This work showed that with the development of certain computing resources integrated to an IS, DQ problems are reduced by preventing errors. As a result of reducing some problems in particular, the DQ in specific data domains is improved for certain DQ dimensions.
174

Modèles et méthodes pour l'information spatio-temporelle évolutive / Models and methods for handling of evolutive spatio-temporal data

Plumejeaud, Christine 22 September 2011 (has links)
Cette thèse se situe dans le domaine de la modélisation spatio-temporelle, et nos travaux portent plus particulièrement sur la gestion de l'information statistique territoriale. Aujourd'hui, la mise à disposition d'un grand volume d'informations statistiques territoriales par différents producteurs (Eurostat, l'INSEE, l'Agence Européenne de l'Environnement, l'ONU, etc.) offre une perspective d'analyses riches, permettant de combiner des données portant sur des thématiques diverses (économiques, sociales, environnementales), à des niveaux d'étude du territoire multiples : du local (les communes) au global (les états). Cependant, il apparaît que les supports, les définitions, les modalités de classification, et le niveau de fiabilité de ces données ne sont pas homogènes, ni dans l'espace, ni dans le temps. De ce fait, les données sont difficilement comparables. Cette hétérogénéité est au cœur de notre problématique, et pour lui faire face, c'est-à-dire l'appréhender, la mesurer et la contrôler, nous faisons dans cette thèse trois propositions pour permettre in fine une exploitation avisée de ce type de données. La première proposition a pour cible le support de l'information statistique territoriale, et cherche à rendre compte à la fois de son caractère évolutif et de son caractère hiérarchique. La deuxième proposition traite du problème de variabilité sémantique des valeurs statistiques associées au support, au moyen de métadonnées. Nous proposons un profil adapté du standard ISO 19115, facilitant l'acquisition de ces métadonnées pour des producteurs de données. La troisième proposition explore la mise à disposition d'outils pour analyser et explorer ces informations dans un mode interactif. Nous proposons une plate-forme dédiée aux analyses statistiques et visant à repérer des valeurs exceptionnelles (outliers en anglais), et à les mettre en relation avec leur origine, et les modalités de leur production. / This thesis is in the field of spatiotemporal modelling, and our work focuses specifically on the management of territorial statistical information. Today, the availability of large amounts of statistical information by different regional producers (Eurostat, INSEE, the European Environment Agency, the UN, etc..) offers a rich analytical perspective, by the combination of data on various topics (economic, social, environmental), at various levels of study: from the local (municipalities) to global (states). However, it appears that the spatial supports, the definitions, the various classifications and the reliability level of those data are very heterogeneous. This heterogeneity is at the core of our problem. In order to cope with that, that is to say to measure, control and analyse this heterogeneity, we are drawing three proposals allowing for a wiser exploitation of this kind of data. The first proposal aims at taking into account the change of the support through the time, modelling both the evolutive aspect of the territories and their hierarchical organisation. The second proposal deals with the semantic variability of the data values associated to this support, through the use of metadata. A profile of the ISO 19115 standard is defined, in order to ease the edition of those metadata for data producers. The last proposal defines a platform dedicated to spatiotemporal data exploration and comparison. In particular, outliers can be discovered though the use of statistical methods, and their values can be discussed and documented further through the use of metadata showing their origin and how they have been produced.
175

Avaliação experimental de uma técnica de padronização de escores de similaridade / Experimental evaluation of a similarity score standardization technique

Nunes, Marcos Freitas January 2009 (has links)
Com o crescimento e a facilidade de acesso a Internet, o volume de dados cresceu muito nos últimos anos e, consequentemente, ficou muito fácil o acesso a bases de dados remotas, permitindo integrar dados fisicamente distantes. Geralmente, instâncias de um mesmo objeto no mundo real, originadas de bases distintas, apresentam diferenças na representação de seus valores, ou seja, os mesmos dados no mundo real podem ser representados de formas diferentes. Neste contexto, surgiram os estudos sobre casamento aproximado utilizando funções de similaridade. Por consequência, surgiu a dificuldade de entender os resultados das funções e selecionar limiares ideais. Quando se trata de casamento de agregados (registros), existe o problema de combinar os escores de similaridade, pois funções distintas possuem distribuições diferentes. Com objetivo de contornar este problema, foi desenvolvida em um trabalho anterior uma técnica de padronização de escores, que propõe substituir o escore calculado pela função de similaridade por um escore ajustado (calculado através de um treinamento), o qual é intuitivo para o usuário e pode ser combinado no processo de casamento de registros. Tal técnica foi desenvolvida por uma aluna de doutorado do grupo de Banco de Dados da UFRGS e será chamada aqui de MeaningScore (DORNELES et al., 2007). O presente trabalho visa estudar e realizar uma avaliação experimental detalhada da técnica MeaningScore. Com o final do processo de avaliação aqui executado, é possível afirmar que a utilização da abordagem MeaningScore é válida e retorna melhores resultados. No processo de casamento de registros, onde escores de similaridades distintos devem ser combinados, a utilização deste escore padronizado ao invés do escore original, retornado pela função de similaridade, produz resultados com maior qualidade. / With the growth of the Web, the volume of information grew considerably over the past years, and consequently, the access to remote databases became easier, which allows the integration of distributed information. Usually, instances of the same object in the real world, originated from distinct databases, present differences in the representation of their values, which means that the same information can be represented in different ways. In this context, research on approximate matching using similarity functions arises. As a consequence, there is a need to understand the result of the functions and to select ideal thresholds. Also, when matching records, there is the problem of combining the similarity scores, since distinct functions have different distributions. With the purpose of overcoming this problem, a previous work developed a technique that standardizes the scores, by replacing the computed score by an adjusted score (computed through a training), which is more intuitive for the user and can be combined in the process of record matching. This work was developed by a Phd student from the UFRGS database research group, and is referred to as MeaningScore (DORNELES et al., 2007). The present work intends to study and perform an experimental evaluation of this technique. As the validation shows, it is possible to say that the usage of the MeaningScore approach is valid and return better results. In the process of record matching, where distinct similarity must be combined, the usage of the adjusted score produces results with higher quality.
176

Evaluation of Antiretroviral Therapy Information System In Mbale Regional Referral Hospital, Uganda.

Olupot-Olupot, Peter. January 2008 (has links)
<p>HIV/AIDS is the largest and most serious global epidemic in the recent times. To date, the epidemic has affected approximately 40 million people (range 33 &ndash / 46 million) of whom 67%, that is, an estimated 27 million people are in the Sub Saharan Africa. The Sub Saharan Africa is also reported to have the highest regional prevalence of 7.2% compared to an average of 2% in other regions. A medical cure for HIV/AIDS remains elusive but use of antiretroviral therapy (ART) has resulted in improvement of quality and quantity of life as evidenced by the reduction of mortality and morbidity associated with the infection, hence longer and good quality life for HIV/AIDS patients on ART.</p>
177

Varying data quality and effects in economic analysis and planning

Eklöf, Jan A. January 1992 (has links)
Economic statistics are often taken as given facts, assumed to describe exactly, actual phenomena in society. Many economic series are published in various forms from preliminary, via revisions to definitive estimates. Preliminary series are issued for a number of central economic processes in order to allow for rapid, up-to-date signals. This dissertation focuses on qualitative aspects of available data, and effects of possible inaccuracy when data are used for economic modelling, analysis and planning. Four main questions are addressed: How to characterize quality of data for central economic time series? What effects may possible inaccuracies in data have when used in econometric modelling? What effects do inaccuracies and errors in data have when models are used for economic analysis and planning? Is it possible to specify a criterion for deciding the cost-effective quality of data to be produced as input for economic policy analysis? The various realizations of economic variables often show considerable systematic as well as stochastic discrepancies for the same quantity. Preliminary series are generally found to be of questionable quality, but still considerably better than simple trend forecasts. Compared with the situation in a few other industrialized countries, the variability of Swedish economic statistics is, though, not extraordinary. Illustrations of effects of using inaccurate data, especially of combining preliminary, revised and definitive observations in the same model, are presented. Such inconsistent combinations of various realizations are in actual fact found in many open sources. Inclusion of preliminary series tends to indicate stronger changes in the economy than when definite observations are used throughout. The study is concluded with a section on cost-benefit aspects of economic statistics, and a sketch model for appraising data of variable quality is proposed. / Diss. Stockholm : Handelshögsk.
178

Detecting Disguised Missing Data

Belen, Rahime 01 February 2009 (has links) (PDF)
In some applications, explicit codes are provided for missing data such as NA (not available) however many applications do not provide such explicit codes and valid or invalid data codes are recorded as legitimate data values. Such missing values are known as disguised missing data. Disguised missing data may affect the quality of data analysis negatively, for example the results of discovered association rules in KDD-Cup-98 data sets have clearly shown the need of applying data quality management prior to analysis. In this thesis, to tackle the problem of disguised missing data, we analyzed embedded unbiased sample heuristic (EUSH), demonstrated the methods drawbacks and proposed a new methodology based on Chi Square Two Sample Test. The proposed method does not require any domain background knowledge and compares favorably with EUSH.
179

Estimation and prediction of travel time from loop detector data for intelligent transportation systems applications

Vanajakshi, Lelitha Devi 01 November 2005 (has links)
With the advent of Advanced Traveler Information Systems (ATIS), short-term travel time prediction is becoming increasingly important. Travel time can be obtained directly from instrumented test vehicles, license plate matching, probe vehicles etc., or from indirect methods such as loop detectors. Because of their wide spread deployment, travel time estimation from loop detector data is one of the most widely used methods. However, the major criticism about loop detector data is the high probability of error due to the prevalence of equipment malfunctions. This dissertation presents methodologies for estimating and predicting travel time from the loop detector data after correcting for errors. The methodology is a multi-stage process, and includes the correction of data, estimation of travel time and prediction of travel time, and each stage involves the judicious use of suitable techniques. The various techniques selected for each of these stages are detailed below. The test sites are from the freeways in San Antonio, Texas, which are equipped with dual inductance loop detectors and AVI. ?? Constrained non-linear optimization approach by Generalized Reduced Gradient (GRG) method for data reduction and quality control, which included a check for the accuracy of data from a series of detectors for conservation of vehicles, in addition to the commonly adopted checks. ?? A theoretical model based on traffic flow theory for travel time estimation for both off-peak and peak traffic conditions using flow, occupancy and speed values obtained from detectors. ?? Application of a recently developed technique called Support Vector Machines (SVM) for travel time prediction. An Artificial Neural Network (ANN) method is also developed for comparison. Thus, a complete system for the estimation and prediction of travel time from loop detector data is detailed in this dissertation. Simulated data from CORSIM simulation software is used for the validation of the results.
180

Uncertainty in the information supply chain: Integrating multiple health care data sources

Tremblay, Monica Chiarini 01 June 2007 (has links)
Similar to a product supply chain, an information supply chain is a dynamic environment where networks of information-sharing agents gather data from many sources and utilize the same data for different tasks. Unfortunately, raw data arriving from a variety of sources are often plagued by errors (Ballou et al. 1998), which can lead to poor decision making. Supporting decision making in this challenging environment demands a proactive approach to data quality management, since the decision maker has no control over these data sources (Shankaranarayan et al. 2003). This is true in health care, and in particular in health planning, where health care resource allocation is often based on summarized data from a myriad of sources such as hospital admissions, vital statistic records, and specific disease registries. This work investigates issues of data quality in the information supply chain. It proposes three result-driven data quality metrics that inform and aid decision makers with incomplete and inconsistent data and help mitigate insensitivity to sample size, a well known decision bias. To design and evaluate the result-driven data quality metrics this thesis utilizes the design science paradigm (Simon 1996; Hevner, March et al. 2004). The metrics are implemented within a simple OLAP interface, utilizing data aggregated from several healthcare data sources, and presented to decision makers in four focus groups. This research is one of the first to propose and outline the use of focus groups as a technique to demonstrate utility and efficacy of design science artifacts. Results from the focus groups demonstrate that the proposed metrics are useful, and that the metrics are efficient in altering a decision maker's data analytic strategies. Additionally, results indicate that comparative techniques, such as benchmarking or scenario based approaches, are promising approaches in data quality. Finally, results from this research reveal that decision making literature needs to be considered in the design of BI tools. Participants of the focus groups confirmed that people are insensitive to sample size, but when attention was drawn to small sample sizes, this bias was mitigated.

Page generated in 0.0625 seconds