Spelling suggestions: "subject:"data quality."" "subject:"mata quality.""
281 |
Observation of a Higgs boson and measurement of its mass in the diphoton decay channel with the ATLAS detector at the LHC / Observation d’un boson de Higgs et mesure de sa masse dans le canal de désintégration en deux photons avec le détecteur ATLAS au LHCLorenzo Martinez, Narei 10 September 2013 (has links)
Le Modèle Standard de la physique des particules prédit l’existence d’un boson scalaire massif, appelé boson de Higgs dans la littérature, comme résultant d’un mécanisme de brisure spontanée de symétrie, qui permettrait de générer la masse des particules. Le boson de Higgs, dont la masse est inconnue théoriquement, est recherché expérimentalement depuis plusieurs décennies. L’expérience ATLAS, au collisionneur LHC, a aussi entrepris cette recherche, depuis le début des collisions de protons à haute énergie en 2010. Un des canaux de désintégrations les plus intéressants à étudier dans cet environnement est le canal en deux photons, car l’état final peut être intégralement reconstruit avec une grande précision. La réponse en énergie des photons est un point crucial dans la recherche du boson de Higgs, car une résonance fine émergeant d’un bruit de fond important est attendue. Dans cette thèse une étude approfondie de la réponse en énergie des photons en utilisant le calorimètre électromagnétique d’ATLAS a été faite. Ces études ont permis de mieux comprendre la résolution et l’échelle d’énergie des photons, et donc d’améliorer la sensibilité de l’analyse d’une part et de mieux estimer les incertitudes expérimentales sur la position du signal d’autre part. Le canal en deux photons a eu un rôle prépondérant dans la découverte d’une nouvelle particule compatible avec le boson de Higgs en Juillet 2012 par les expériences ATLAS et CMS. En utilisant ce canal ainsi que la meilleure compréhension de la réponse en énergie acquise au cours de cette thèse, une mesure de la masse du boson est proposée avec les données collectées durant les années 2011 et 2012 avec une énergie de centre de masse de 7 TeV et 8 TeV.Une masse de 126.8 +/- 0.2 (stat) +/- 0.7 (syst) GeV/c2 est trouvée. L’étalonnage de la mesure de l’énergie des photons avec le calorimètre électromagnétique est la plus grande source d’incertitude sur cette mesure. Une stratégie pour réduire cette erreur systématique sur la masse est discutée. / The Standard Model of the particle physics predicts the existence of a massive scalar boson, usually referred to as Higgs boson in the literature, as resulting from the Spontaneous Symmetry Breaking mechanism, needed to generate the mass of the particles. The Higgs boson whose mass is theoretically undetermined, is experimentally looked for since half a century by various experiments. This is the case of the ATLAS experiment at LHC which started taking data from high energy collisions in 2010. One of the most important decay channel in the LHC environment is the diphoton channel, because the final state can be completely reconstructed with high precision. The photon energy response is a key point in this analysis, as the signal would appear as a narrow resonance over a large background. In this thesis, a detailed study of the photon energy response, using the ATLAS electromagnetic calorimeter has been performed. This study has provided a better understanding of the photon energy resolution and scale, thus enabling an improvement of the sensitivity of the diphoton analysis as well as a precise determination of the systematic uncertainties on the peak position. The diphoton decay channel had a prominent role in the discovery of a new particle compatible with the Standard Model Higgs boson by the ATLAS and CMS experiments, that occurred in July 2012. Using this channel as well as the better understanding of the photon energy response, a measurement of the mass of this particle is proposed in this thesis, with the data collected in 2011 and 2012 at a center-of-mass energy of 7 TeV and 8 TeV. A mass of 126.8 +/- 0.2 (stat) +\- 0.7 (syst) GeV/c2 is found. The calibration of the photon energy measurement with the calorimeter is the source of the largest systematic uncertainty on this measurement. Strategies to reduce this systematic error are discussed.
|
282 |
Estudo dos fatores influenciadores da intenção de uso da informação dos sistemas de Business Intelligence em empresas brasileiras / Study of factors that impact use intention of Business Intelligence systems in Brazilian companiesSantos, Claudinei de Paula 21 August 2014 (has links)
Neste final de século o processo de globalização dos mercados e seu efeito sobre os padrões de conduta econômica, política, social e organizacional, vêm assumindo importância crescente, compondo um cenário no qual a competitividade emerge como uma questão imperativa. Como característica das empresas modernas, tem-se o aumento de padrão de automação onde as tecnologias tem disponibilizado o acesso a uma grande quantidade de dados. Tecnologias de data warehouse (DW) têm servido como repositores desses dados e o avanço nas aplicações de extração, transformação e carregamento (ETL) têm aumentado a velocidade da coleta. Atualmente, muito se tem discutido a respeito desse produto secundário resultante dos processos empresariais, os dados, que tem sido vistos como uma potencial fonte de informação capaz de possibilitar às instituições a garantia de sobrevivência em sua indústria. Nesse contexto, os sistemas de Business Intelligence (SBI), que têm como função prover o tratamento dos dados e entregar informação acionável que pode ser usada para uma específica tomada de decisão, têm recebido o reconhecimento de sua importância por parte dos executivos para a continuidade de suas empresas. Fato esse reforçado pelos resultados de pesquisas realizadas mundialmente pelo Gartner onde por anos seguidos os SBI têm sido relatados pelos executivos como o sonho de consumo das empresas. Aplicações de business intelligence têm dominado a lista de prioridade de tecnologia de muitos CIOs. Apesar desse cenário bastante favorável para os SBI, o Gartner Group aponta um elevado índice na subutilização desses sistemas, o que nos leva a questionar porque um sistema importante e desejado pelas empresas não consegue atender as expectativas dos usuários. Assim surgiu a proposta de estudar nesse trabalho a influência das dimensões fatores críticos de sucesso (FCS) e benefícios esperados (BE) sobre a dimensão intenção de uso (USO) da informação disponibilizada pelos SBI, verificando o efeito das variáveis de cada dimensão sobre o USO. Para isso foi estabelecido um modelo conceitual relacionando as dimensões mencionadas utilizando-se como referência outros trabalhos acadêmicos, suas variáveis e resultados de pesquisa. Foi realizada uma pesquisa quantitativa com a aplicação da técnica estatística Partial Least Square (PLS) com os dados obtidos de usuários de SBI em diferentes áreas da empresa de diferentes setores. Com o uso da técnica PLS, foi possível obter os indicadores para as variáveis das dimensões e estabelecer o modelo estrutural baseado em confiança. / As this century ends, the market globalization process and its effect on patterns of economic, political, social and organizational behaviors become increasingly important, composing a scenario in which competitiveness emerges as an imperative issue. As a trait of modern enterprises, there is an increase in automation standards where technologies provide access to a large amount of data. Technologies of data warehouse (DW) have been serving as repositories of such data and advances in extraction, transformation and loading (ETL) applications have been increasing the speed of data collection. More recently, much has been discussed about this secondary product resulting from business processing: the data that has been seen as a potential source of information able to allow institutions guarantee survival in their industry. In this context, Business Intelligence Systems (BIS), that have as function provide data processing and deliver actionable information, i.e., information that could be used for a specific decision making, have received recognition from executives of its importance to the continuity of their business since for years, has been reported in research conducted worldwide by Gartner as the technology desire of these professionals. Business Intelligence applications have been considered the technology priority investment of many CIOs. Despite of this favorable scenario for Business Intelligence Systems, the Gartner Group indicates a high level of underutilization of these systems which leads us to question why an important and desired business system cannot achieve user\'s expectations. Thus, this work proposes to study the influence of the dimensions critical success factors (CSF) and expected benefits (BE) on the dimension use (USO) to the information provided by BIS, checking the effect of each dimension on the USO emerged. To do this a conceptual model was established by relating these dimensions using as reference other academic papers, their variables and search results. It was realized a quantitative research with an application of statistical technique Partial Least Square (PLS) with data obtained from users of BIS in different areas of the company from different sectors. Using the PLS technique, it was possible to obtain indicators for the variables and dimensions to establish the structural model based on trust.
|
283 |
Recherche de technicouleur avec l'expérience ATLAS. Développement d'outils et étude des performances du calorimètre à argon liquide / Looking for Technicolor using ATLAS. Tools development and performances study of the Liquid Argon Calorimeter.Helary, Louis 09 December 2011 (has links)
En 2011, le LHC a fourni près de 5 fb−1 de données aux expériences. Ces données ont été utilisées pour comprendre plus avant les détecteurs, leurs performances et effectuer des analyses de physique. Cette thèse est organisée en cinq chapitres. Le premier est une introduction théorique au Modèle Standard et à une de ses extensions possible : la TechniCouleur. Le deuxième chapitre donne un bref aperçu de l'accélérateur LHC et de l'expérience ATLAS. Dans le troisième chapitre, l'un des principaux sous-système de l'expérience ATLAS est présenté : le calorimètre à argon liquide. L'algorithme de contrôle de l'acquisition et de la qualité des données que j'ai développé au cours de ma thèse est également présenté. Le quatrième chapitre présente une étude des performances de la reconstruction des jets basée sur l'ensemble des données acquises en 2010. Cette étude a montré qu'en 2010, la résolution en énergie des jets dans le Monte-Carlo a été sous-estimée d'un facteur relatif d'environ 10% par rapport aux données. Cette étude a été ensuite reconduite pour évaluer l'impact de la réduction de la HV dans des zones du calorimètre sur la résolution en énergie des jets. Cet impact a été jugée négligeable. Pour des jets produits avec une rapidité |y| < 0.8, l'augmentation de la résolution en énergie due à la réduction de la HV, est évaluée à moins de 3 % pour un jet de pT = 30 GeV jet, et moins de 0,1 % pour un jet de pT = 500 GeV. Enfin, le dernier chapitre de cette thèse présente une étude de l'état final Wgamma. La contribution des différents processus du MS participant à cet état final a été estimée à partir du Monte Carlo et des données. Une recherche de résonances étroites a ensuite été effectuée en utilisant la distribution M(W,gamma) dans un intervalle [220,440] GeV, mais aucun écart significatif des prédictions du MS n'a été observé. Cette étude a permis de fixer des limites sur la production de particules TC correspondant à M(a_{T}) > 265 GeV ou M(\rho_{T}) > 243 GeV. / In 2011 the LHC has provided almost 5 fb-1 of data to the experiments. These data have been used to perform a deep commissioning of the detectors, understand the performances of the detector and perform physics analysis. This thesis is organized in five chapter. The first one is a theoretical introduction to the Standard Model and to one of its possible extension: the TechniColor. The second chapter gives a brief overview of the LHC and the ATLAS experiments. In the third chapter one of the key subsystem of the ATLAS experiment is presented: the LAr calorimeters. The monitoring of the data acquisition developed during my thesis is also presented in this chapter. The fourth chapter presents a study of the jet performances based on the data set acquired in 2010. This study has shown that in 2010, the Monte Carlo was underestimating the jet energy resolution by a relative factor of about $10\%$. This study was refocused to evaluate the impact of the reduced LAr HV area on the jet energy resolution. The impact of the HV reduced region was found to be negligible. For jets produced with a rapidity |y|<0.8, the increase of energy resolution due to the HV reduction, is evaluated at less than 3% for a pT=30 GeV jet, and less than 0.1% for a pT=500 GeV jet. Finally the last chapter of this thesis present a study of the Wgamma final state. The standard model backgrounds contributing to this final state were estimated from Monte Carlo and from data. A search for narrow resonances was then conducted using the M(Wgamma) distribution in a range [220,440] GeV, but no significant deviation from the SM was observed. This study has allowed to set limits on the production of TC particles corresponding to M(a_{T}) > 265 GeV or M(\rho_{T}) > 243 GeV.
|
284 |
A conceptual framework on biodiversity data quality. / Um framework conceitual em qualidade de dados de biodiversidade.Veiga, Allan Koch 28 November 2016 (has links)
The increasing availability of digitized biodiversity data worldwide, provided by an increasing number of sources, and the growing use of those data for a variety of purposes have raised concerns related to the \"fitness for use\" of such data and the impact of data quality (DQ) on outcomes of analyses, reports and decisions making. A consistent approach to assess and manage DQ is currently critical for biodiversity data users. However, achieving this goal has been particularly challenging because of the idiosyncrasies inherent to the concept of quality. DQ assessment and management cannot be suitably carried out if we have not clearly established the meaning of quality according to the data user\'s standpoint. This thesis presents a formal conceptual framework to support the Biodiversity Informatics (BI) community to consistently describe the meaning of data \"fitness for use\". Principles behind data fitness for use are used to establish a formal and common ground for the collaborative definition of DQ needs, solutions and reports useful for DQ assessment and management. Based on the study of the DQ domain and its contextualization in the BI domain, which involved discussions with experts in DQ and BI in an iterative process, a comprehensive framework was designed and formalized. The framework defines eight fundamental concepts and 21 derived concepts, organized into three classes: DQ Needs, DQ Solutions and DQ Report. The concepts of each class describe, respectively, the meaning of DQ in a given context, the methods and tools that can serve as solutions for meeting DQ needs, and reports that present the current status of quality of a data resource. The formalization of the framework was presented using conceptual maps notation and sets theory notation. In order to validate the framework, we present a proof of concept based on a case study conducted at the Museum of Comparative Zoology of Harvard University. The tools FP-Akka Kurator and the BDQ Toolkit were used in the case study to perform DQ measures, validations and improvements in a dataset of the Arizona State University Hasbrouck Insect Collection. The results illustrate how the framework enables data users to assess and manage DQ of datasets and single records using quality control and quality assurance approaches. The proof of concept has also shown that the framework is adequately formalized and flexible, and sufficiently complete for defining DQ needs, solutions and reports in the BI domain. The framework is able of formalizing human thinking into well-defined components to make it possible sharing and reusing definitions of DQ in different scenarios, describing and finding DQ tools and services, and communicating the current status of quality of data in a standardized format among the stakeholders. In addition, the framework supports the players of that community to join efforts on the collaborative gathering and developing of the necessary components for the DQ assessment and management in different contexts. The framework is also the foundation of a Task Group on Data Quality, under the auspices of the Biodiversity Information Standards (TDWG) and the Global Biodiversity Information Facility (GBIF) and is being used to help collect user\'s needs on data quality on agrobiodiversity and on species distributed modeling, initially. In future work, we plan to use the framework to engage the BI community to formalize and share DQ profiles related to a number of other data usages, to recommend methods, guidelines, protocols, metadata schemas and controlled vocabulary for supporting data fitness for use assessment and management in distributed system and data environments. In addition, we plan to build a platform based on the framework to serve as a common backbone for registering and retrieving DQ concepts, such as DQ profiles, methods, tools and reports. / A crescente disponibilização de dados digitalizados sobre a biodiversidade em todo o mundo, fornecidos por um crescente número de fontes, e o aumento da utilização desses dados para uma variedade de propósitos, tem gerado preocupações relacionadas a \"adequação ao uso\" desses dados e ao impacto da qualidade de dados (QD) sobre resultados de análises, relatórios e tomada de decisões. Uma abordagem consistente para avaliar e gerenciar a QD é atualmente crítica para usuários de dados sobre a biodiversidade. No entanto, atingir esse objetivo tem sido particularmente desafiador devido à idiossincrasia inerente ao conceito de qualidade. A avaliação e a gestão da QD não podem ser adequadamente realizadas sem definir claramente o significado de qualidade de acordo com o ponto de vista do usuário dos dados. Esta tese apresenta um arcabouço conceitual formal para apoiar a comunidade de Informática para Biodiversidade (IB) a descrever consistentemente o significado de \"adequação ao uso\" de dados. Princípios relacionados à adequação ao uso são usados para estabelecer uma base formal e comum para a definição colaborativa de necessidades, soluções e relatórios de QD úteis para a avaliação e gestão de QD. Baseado no estudo do domínio de QD e sua contextualização no domínio de IB, que envolveu discussões com especialistas em QD e IB em um processo iterativo, foi projetado e formalizado um arcabouço conceitual abrangente. Ele define oito conceitos fundamentais e vinte e um conceitos derivados organizados em três classes: Necessidades de QD, Soluções de QD e Relatório de QD. Os conceitos de cada classe descrevem, respectivamente, o significado de QD em um dado contexto, métodos e ferramentas que podem servir como soluções para atender necessidades de QD, e relatórios que apresentam o estado atual da qualidade de um recurso de dado. A formalização do arcabouço foi apresentada usando notação de mapas conceituais e notação de teoria dos conjuntos. Para a validação do arcabouço, nós apresentamos uma prova de conceito baseada em um estudo de caso conduzido no Museu de Zoologia Comparativa da Universidade de Harvard. As ferramentas FP-Akka Kurator e BDQ Toolkit foram usadas no estudo de caso para realizar medidas, validações e melhorias da QD em um conjunto de dados da Coleção de Insetos Hasbrouck da Universidade do Estado do Arizona. Os resultados ilustram como o arcabouço permite a usuários de dados avaliarem e gerenciarem a QD de conjunto de dados e registros isolados usando as abordagens de controle de qualidade a garantia de qualidade. A prova de conceito demonstrou que o arcabouço é adequadamente formalizado e flexível, e suficientemente completo para definir necessidades, soluções e relatórios de QD no domínio da IB. O arcabouço é capaz de formalizar o pensamento humano em componentes bem definidos para fazer possível compartilhar e reutilizar definições de QD em diferentes cenários, descrever e encontrar ferramentas de QD e comunicar o estado atual da qualidade dos dados em um formato padronizado entre as partes interessadas da comunidade de IB. Além disso, o arcabouço apoia atores da comunidade de IB a unirem esforços na identificação e desenvolvimento colaborativo de componentes necessários para a avaliação e gestão da QD. O arcabouço é também o fundamento de um Grupos de Trabalho em Qualidade de Dados, sob os auspícios do Biodiversity Information Standard (TDWG) e do Biodiversity Information Facility (GBIF) e está sendo utilizado para coletar as necessidades de qualidade de dados de usuários de dados de agrobiodiversidade e de modelagem de distribuição de espécies, inicialmente. Em trabalhos futuros, planejamos usar o arcabouço apresentado para engajar a comunidade de IB para formalizar e compartilhar perfis de QD relacionados a inúmeros outros usos de dados, recomendar métodos, diretrizes, protocolos, esquemas de metadados e vocabulários controlados para apoiar a avaliação e gestão da adequação ao uso de dados em ambiente de sistemas e dados distribuídos. Além disso, nós planejamos construir uma plataforma baseada no arcabouço para servir como uma central integrada comum para o registro e recuperação de conceitos de QD, tais como perfis, métodos, ferramentas e relatórios de QD.
|
285 |
CORREÇÃO DE DADOS AGROMETEOROLÓGICOS UTILIZANDO MÉTODOS ESTATÍSTICOSBaba, Ricardo Kazuo 31 July 2012 (has links)
Made available in DSpace on 2017-07-21T14:19:32Z (GMT). No. of bitstreams: 1
Ricardo Baba.pdf: 3642224 bytes, checksum: 81e8e78f554cdf870e6f9a554b71f87a (MD5)
Previous issue date: 2012-07-31 / Climatic data are more and more important to predict climate phenomena or to evaluate historical data that serve as support for decision making especially for agriculture. Ensuring the quality of these data is crucial. These data are collected by the meteorological stations, during this process some data gaps and data inconsistent may be generated. Identify suspicious or inconsistent data is very important to ensure data quality. This paper presents an approach that uses statistical and geostatistical techniques to identify incorrect and suspicious data and estimate new values to fill gaps and errors. In this research, a spatial database was used to implement these techniques (statistical and geostatistical) and to test and evaluate the weather data. To evaluate these techniques we used data from stations located in Paraná State to evaluate the temperature variable. To check the results of the estimated data, we used the mean absolute error (MAE) and the root mean square error (RMSE). As a result, the uses of these techniques have proved to be suitable to identify basic errors and historical errors. The temporal validation showed a poor performance by overestimating the amount of incorrect data. Regarding the estimation techniques applied Kriging, Inverse of Distance Weighted and Linear Regression, all showed similar performance in the error analysis. / A análise de dados climáticos serve de suporte na previsão de fenômenos relacionados, na avaliação de seus dados históricos e para a tomada de decisões, em especial na área da agricultura. Garantir a sua qualidade é fundamental. O processo de coleta desses dados, através das estações meteorológicas, pode apresentar problemas, onde dados inconsistentes podem ser geridos ou obtidos. A identificação de dados inconsistentes ou suspeitos é de fundamental importância na garantia de qualidade dos dados. Este trabalho apresenta uma abordagem para solução do problema, utilizando técnicas estatísticas e geoestatísticas na identificação de dados inconsistentes e na estimativa de dados a serem corrigidos ou preenchidos. A implementação destas técnicas em um banco de dados espacial apresentou-se como um facilitador na identificação e no preenchimento desses dados. Para avaliação destas técnicas utilizou-se de dados das estações localizadas no Estado do Paraná, para análise da variável temperatura. Para avaliar os resultados, foram utilizados os erros médio e quadrático. Como resultado, destaca-se que as técnicas de identificação de erros mostraram-se adequadas na consistência de erros básicos e históricos. A validação espacial apresentou baixo desempenho por superestimar a quantidade de dados incorretos. Quanto as técnicas utilizadas na estimativa dos dados, Krigagem, Inverso da Distância e Regressão Linear, todas apresentaram desempenho semelhantes com relação à análise dos erros.
|
286 |
Data, learning and privacy in recommendation systems / Données, apprentissage et respect de la vie privée dans les systèmes de recommandationMittal, Nupur 25 November 2016 (has links)
Les systèmes de recommandation sont devenus une partie indispensable des services et des applications d’internet, en particulier dû à la surcharge de données provenant de nombreuses sources. Quel que soit le type, chaque système de recommandation a des défis fondamentaux à traiter. Dans ce travail, nous identifions trois défis communs, rencontrés par tous les types de systèmes de recommandation: les données, les modèles d'apprentissage et la protection de la vie privée. Nous élaborons différents problèmes qui peuvent être créés par des données inappropriées en mettant l'accent sur sa qualité et sa quantité. De plus, nous mettons en évidence l'importance des réseaux sociaux dans la mise à disposition publique de systèmes de recommandation contenant des données sur ses utilisateurs, afin d'améliorer la qualité des recommandations. Nous fournissons également les capacités d'inférence de données publiques liées à des données relatives aux utilisateurs. Dans notre travail, nous exploitons cette capacité à améliorer la qualité des recommandations, mais nous soutenons également qu'il en résulte des menaces d'atteinte à la vie privée des utilisateurs sur la base de leurs informations. Pour notre second défi, nous proposons une nouvelle version de la méthode des k plus proches voisins (knn, de l'anglais k-nearest neighbors), qui est une des méthodes d'apprentissage parmi les plus populaires pour les systèmes de recommandation. Notre solution, conçue pour exploiter la nature bipartie des ensembles de données utilisateur-élément, est évolutive, rapide et efficace pour la construction d'un graphe knn et tire sa motivation de la grande quantité de ressources utilisées par des calculs de similarité dans les calculs de knn. Notre algorithme KIFF utilise des expériences sur des jeux de données réelles provenant de divers domaines, pour démontrer sa rapidité et son efficacité lorsqu'il est comparé à des approches issues de l'état de l'art. Pour notre dernière contribution, nous fournissons un mécanisme permettant aux utilisateurs de dissimuler leur opinion sur des réseaux sociaux sans pour autant dissimuler leur identité. / Recommendation systems have gained tremendous popularity, both in academia and industry. They have evolved into many different varieties depending mostly on the techniques and ideas used in their implementation. This categorization also marks the boundary of their application domain. Regardless of the types of recommendation systems, they are complex and multi-disciplinary in nature, involving subjects like information retrieval, data cleansing and preprocessing, data mining etc. In our work, we identify three different challenges (among many possible) involved in the process of making recommendations and provide their solutions. We elaborate the challenges involved in obtaining user-demographic data, and processing it, to render it useful for making recommendations. The focus here is to make use of Online Social Networks to access publicly available user data, to help the recommendation systems. Using user-demographic data for the purpose of improving the personalized recommendations, has many other advantages, like dealing with the famous cold-start problem. It is also one of the founding pillars of hybrid recommendation systems. With the help of this work, we underline the importance of user’s publicly available information like tweets, posts, votes etc. to infer more private details about her. As the second challenge, we aim at improving the learning process of recommendation systems. Our goal is to provide a k-nearest neighbor method that deals with very large amount of datasets, surpassing billions of users. We propose a generic, fast and scalable k-NN graph construction algorithm that improves significantly the performance as compared to the state-of-the art approaches. Our idea is based on leveraging the bipartite nature of the underlying dataset, and use a preprocessing phase to reduce the number of similarity computations in later iterations. As a result, we gain a speed-up of 14 compared to other significant approaches from literature. Finally, we also consider the issue of privacy. Instead of directly viewing it under trivial recommendation systems, we analyze it on Online Social Networks. First, we reason how OSNs can be seen as a form of recommendation systems and how information dissemination is similar to broadcasting opinion/reviews in trivial recommendation systems. Following this parallelism, we identify privacy threat in information diffusion in OSNs and provide a privacy preserving algorithm for the same. Our algorithm Riposte quantifies the privacy in terms of differential privacy and with the help of experimental datasets, we demonstrate how Riposte maintains the desirable information diffusion properties of a network.
|
287 |
Data Governance : A conceptual framework in order to prevent your Data Lake from becoming a Data SwampPaschalidi, Charikleia January 2015 (has links)
Information Security nowadays is becoming a very popular subject of discussion among both academics and organizations. Proper Data Governance is the first step to an effective Information Security policy. As a consequence, more and more organizations are now switching their approach to data, considering them as assets, in order to get as much value as possible out of it. Living in an IT-driven world makes a lot of researchers to approach Data Governance by borrowing IT Governance frameworks.The aim of this thesis is to contribute to this research by doing an Action Research in a big Financial Institution in the Netherlands that is currently releasing a Data Lake where all the data will be gathered and stored in a secure way. During this research a framework on implementing a proper Data Governance into the Data Lake is introduced.The results were promising and indicate that under specific circumstances, this framework could be very beneficial not only for this specific institution, but for every organisation that would like to avoid confusions and apply Data Governance into their tasks. / <p>Validerat; 20151222 (global_studentproject_submitter)</p>
|
288 |
A conceptual framework on biodiversity data quality. / Um framework conceitual em qualidade de dados de biodiversidade.Allan Koch Veiga 28 November 2016 (has links)
The increasing availability of digitized biodiversity data worldwide, provided by an increasing number of sources, and the growing use of those data for a variety of purposes have raised concerns related to the \"fitness for use\" of such data and the impact of data quality (DQ) on outcomes of analyses, reports and decisions making. A consistent approach to assess and manage DQ is currently critical for biodiversity data users. However, achieving this goal has been particularly challenging because of the idiosyncrasies inherent to the concept of quality. DQ assessment and management cannot be suitably carried out if we have not clearly established the meaning of quality according to the data user\'s standpoint. This thesis presents a formal conceptual framework to support the Biodiversity Informatics (BI) community to consistently describe the meaning of data \"fitness for use\". Principles behind data fitness for use are used to establish a formal and common ground for the collaborative definition of DQ needs, solutions and reports useful for DQ assessment and management. Based on the study of the DQ domain and its contextualization in the BI domain, which involved discussions with experts in DQ and BI in an iterative process, a comprehensive framework was designed and formalized. The framework defines eight fundamental concepts and 21 derived concepts, organized into three classes: DQ Needs, DQ Solutions and DQ Report. The concepts of each class describe, respectively, the meaning of DQ in a given context, the methods and tools that can serve as solutions for meeting DQ needs, and reports that present the current status of quality of a data resource. The formalization of the framework was presented using conceptual maps notation and sets theory notation. In order to validate the framework, we present a proof of concept based on a case study conducted at the Museum of Comparative Zoology of Harvard University. The tools FP-Akka Kurator and the BDQ Toolkit were used in the case study to perform DQ measures, validations and improvements in a dataset of the Arizona State University Hasbrouck Insect Collection. The results illustrate how the framework enables data users to assess and manage DQ of datasets and single records using quality control and quality assurance approaches. The proof of concept has also shown that the framework is adequately formalized and flexible, and sufficiently complete for defining DQ needs, solutions and reports in the BI domain. The framework is able of formalizing human thinking into well-defined components to make it possible sharing and reusing definitions of DQ in different scenarios, describing and finding DQ tools and services, and communicating the current status of quality of data in a standardized format among the stakeholders. In addition, the framework supports the players of that community to join efforts on the collaborative gathering and developing of the necessary components for the DQ assessment and management in different contexts. The framework is also the foundation of a Task Group on Data Quality, under the auspices of the Biodiversity Information Standards (TDWG) and the Global Biodiversity Information Facility (GBIF) and is being used to help collect user\'s needs on data quality on agrobiodiversity and on species distributed modeling, initially. In future work, we plan to use the framework to engage the BI community to formalize and share DQ profiles related to a number of other data usages, to recommend methods, guidelines, protocols, metadata schemas and controlled vocabulary for supporting data fitness for use assessment and management in distributed system and data environments. In addition, we plan to build a platform based on the framework to serve as a common backbone for registering and retrieving DQ concepts, such as DQ profiles, methods, tools and reports. / A crescente disponibilização de dados digitalizados sobre a biodiversidade em todo o mundo, fornecidos por um crescente número de fontes, e o aumento da utilização desses dados para uma variedade de propósitos, tem gerado preocupações relacionadas a \"adequação ao uso\" desses dados e ao impacto da qualidade de dados (QD) sobre resultados de análises, relatórios e tomada de decisões. Uma abordagem consistente para avaliar e gerenciar a QD é atualmente crítica para usuários de dados sobre a biodiversidade. No entanto, atingir esse objetivo tem sido particularmente desafiador devido à idiossincrasia inerente ao conceito de qualidade. A avaliação e a gestão da QD não podem ser adequadamente realizadas sem definir claramente o significado de qualidade de acordo com o ponto de vista do usuário dos dados. Esta tese apresenta um arcabouço conceitual formal para apoiar a comunidade de Informática para Biodiversidade (IB) a descrever consistentemente o significado de \"adequação ao uso\" de dados. Princípios relacionados à adequação ao uso são usados para estabelecer uma base formal e comum para a definição colaborativa de necessidades, soluções e relatórios de QD úteis para a avaliação e gestão de QD. Baseado no estudo do domínio de QD e sua contextualização no domínio de IB, que envolveu discussões com especialistas em QD e IB em um processo iterativo, foi projetado e formalizado um arcabouço conceitual abrangente. Ele define oito conceitos fundamentais e vinte e um conceitos derivados organizados em três classes: Necessidades de QD, Soluções de QD e Relatório de QD. Os conceitos de cada classe descrevem, respectivamente, o significado de QD em um dado contexto, métodos e ferramentas que podem servir como soluções para atender necessidades de QD, e relatórios que apresentam o estado atual da qualidade de um recurso de dado. A formalização do arcabouço foi apresentada usando notação de mapas conceituais e notação de teoria dos conjuntos. Para a validação do arcabouço, nós apresentamos uma prova de conceito baseada em um estudo de caso conduzido no Museu de Zoologia Comparativa da Universidade de Harvard. As ferramentas FP-Akka Kurator e BDQ Toolkit foram usadas no estudo de caso para realizar medidas, validações e melhorias da QD em um conjunto de dados da Coleção de Insetos Hasbrouck da Universidade do Estado do Arizona. Os resultados ilustram como o arcabouço permite a usuários de dados avaliarem e gerenciarem a QD de conjunto de dados e registros isolados usando as abordagens de controle de qualidade a garantia de qualidade. A prova de conceito demonstrou que o arcabouço é adequadamente formalizado e flexível, e suficientemente completo para definir necessidades, soluções e relatórios de QD no domínio da IB. O arcabouço é capaz de formalizar o pensamento humano em componentes bem definidos para fazer possível compartilhar e reutilizar definições de QD em diferentes cenários, descrever e encontrar ferramentas de QD e comunicar o estado atual da qualidade dos dados em um formato padronizado entre as partes interessadas da comunidade de IB. Além disso, o arcabouço apoia atores da comunidade de IB a unirem esforços na identificação e desenvolvimento colaborativo de componentes necessários para a avaliação e gestão da QD. O arcabouço é também o fundamento de um Grupos de Trabalho em Qualidade de Dados, sob os auspícios do Biodiversity Information Standard (TDWG) e do Biodiversity Information Facility (GBIF) e está sendo utilizado para coletar as necessidades de qualidade de dados de usuários de dados de agrobiodiversidade e de modelagem de distribuição de espécies, inicialmente. Em trabalhos futuros, planejamos usar o arcabouço apresentado para engajar a comunidade de IB para formalizar e compartilhar perfis de QD relacionados a inúmeros outros usos de dados, recomendar métodos, diretrizes, protocolos, esquemas de metadados e vocabulários controlados para apoiar a avaliação e gestão da adequação ao uso de dados em ambiente de sistemas e dados distribuídos. Além disso, nós planejamos construir uma plataforma baseada no arcabouço para servir como uma central integrada comum para o registro e recuperação de conceitos de QD, tais como perfis, métodos, ferramentas e relatórios de QD.
|
289 |
Three essays on the German capital marketBrückner, Roman 04 April 2013 (has links)
Die vorliegende Dissertation umfasst drei eigen¬ständige Aufsätze zum deutschen Aktienmarkt. In allen drei Aufsätzen stehen eigene empirische Unter¬suchungen im Mittel-punkt. Bisherige empirische Untersuchungen zum deutschen Aktienmarkt konzentrieren sich meist auf das höchste deutsche Marktsegment, den Amtlichen Markt in Frankfurt. Die Anzahl der empirischen Arbeiten zu den Aktien der unteren Marktsegmente, insbesondere zum Geregelten Markt in Frankfurt, ist hingegen gering. Der erste Aufsatz beschäftigt sich mit dem Geregelten Markt und analysiert, ob die Performance mit dem Amtlichen Markt zu vergleichen ist. Beispielsweise war unklar, ob der Geregelte Markt ein ähnliches Desaster wie der Neue Markt darstellt. Wir stellen fest, dass die Aktien des Geregelten Marktes durchaus mit denen des Amtlichen Marktes mithalten können. Im zweiten Aufsatz wird untersucht, inwiefern das das CAPM nach Sharpe (1964), Lintner (1965) und Mossin (1966) die Renditen deutscher Aktien erklären kann. Zusätzlich wird untersucht, ob das CAPM in Deutschland um Faktoren für Size und Buchwert-Marktwert erweitert werden sollte. Auf Basis unserer Untersuchungen stellen wir fest, dass eine Erweiterung des CAPMs um Size- und/oder Buchwert-/Marktwertfaktoren derzeit nicht sinnvoll erscheint. In beiden Aufsätzen und in empirischen Untersuchungen im Allgemeinen spielt die Qualität der verwendeten Daten eine wichtige Rolle. Dementsprechend wird im dritten Aufsatz die Qualität der Datastream Daten für den deutschen Aktienmarkt untersucht. Hierbei kommen wir zu dem Ergebnis, dass Datastream als primäre Datenquelle für den deutschen Kapitalmarkt vor 1990 ungeeignet ist. Nach 1990 kommt es zwar zu zufälligen Fehlern in Datastream, diese sind allerdings nur selten gravierend. / This thesis consists of three empirical essays on the German stock market. Prior empirical research on the German stock market primarily focused on the top market segment in Germany, the Amtlicher Markt in Frankfurt. Only few empirical studies look at the lower market segments like the Geregelter Markt in Frankfurt. The first essay examines whether the performance of the stocks of the Geregelter Markt is comparable to the performance of the stocks listed in the Amtlicher Markt. For example, it was unclear whether the stocks from the Geregelter Markt performed as disastrous as the stocks from the Neuer Markt. We find that the performance of the stocks from the Geregelter Markt is comparable to those of the Amtlicher Markt. The second essay, examines whether the CAPM of Sharpe (1964), Lintner (1965) and Mossin (1966) explains the cross-section of German stock returns. Additionally, we evaluate whether the CAPM should be extended by factors for size and book-to-market. Based on our results, we conclude that the CAPM should not be extended by size and book-to-market in Germany. Data quality is an important aspect of both essays. As a consequence, we examine the quality of equity data from Datastream for the German stock market in the third essay. We conclude that before 1990 Datastream should not be used as the primary data source for empirical studies on the German stock market. After 1990, we find random errors in Datastream, but these are rarely severe.
|
290 |
Riktlinjer för att förbättra datakvaliteten hos data warehouse system / Guiding principles to improve data quality in data warehouse systemCarlswärd, Martin January 2008 (has links)
Data warehouse system är något som har växt fram under 1990-talet och det har implementeras hos flera verksamheter. De källsystem som en verksamhet har kan integreras ihop med ett data warehouse system för att skapa en version av verkligheten och ta fram rapporter för beslutsunderlag. Med en version av verkligheten menas att det skapas en gemensam bild som visar hur verksamhetens dagliga arbete sker och utgör grundinformation för de framtagna analyserna från data warehouse systemet. Det blir därför väsenligt för verksamheten att de framtagna rapporterna håller en, enligt verksamheten, tillfredställande god datakvalitet. Detta leder till att datakvaliteten hos data warehouse systemet behöver hålla en tillräckligt hög kvalitetsnivå. Om datakvaliteten hos beslutsunderlaget brister kommer verksamheten inte att ta de optimala besluten för verksamheten utan det kan förekomma att beslut tas som annars inte hade tagits. Att förbättra datakvaliteten hos data warehouse systemet blir därför centralt för verksamheten. Med hjälp av kvalitetsfilosofin Total Quality Management, TQM, har verksamheten ett stöd för att kunna förbättra datakvaliteten eftersom det möjliggör att ett helhetsgrepp om kvaliteten kan tas. Anledningen till att ta ett helhetsperspektiv angående datakvaliteten är att orsakerna till bristande datakvalitet inte enbart beror på orsaker inom själva data warehouse systemet utan beror även på andra orsaker. De kvalitetsförbättrande åtgärder som behöver utföras inom verksamheter varierar eftersom de är situationsanpassade beroende på hur verksamheten fungerar även om det finns mer övergripande gemensamma åtgärder. Det som kommuniceras i form av exempelvis rapporter från data warehouse systemet behöver anses av verksamhetens aktörer som förståeligt och trovärdigt. Anledningen till det är att de framtagna beslutunderlagen behöver vara förståliga och trovärdiga för mottagaren av informationen. Om exempelvis det som kommuniceras i form av rapporter innehåller skräptecken bli det svårt för mottagaren att anse informationen som trovärdig och förståelig. Förbättras kvaliteten hos det kommunikativa budskapet, det vill säga om kommunikationskvaliteten förbättras, kommer datakvaliteten hos data warehouse systemet i slutändan också förbättras. Inom uppsatsen har det tagits fram riktlinjer för att kunna förbättra datakvaliteten hos data warehouse system med hjälp av kommunikationskvaliteten samt TQM. Riktlinjernas syfte är att förbättra datakvaliteten genom att förbättra kvaliteten hos det som kommuniceras inom företagets data warehouse system. Det finns olika åtgärder som är situationsanpassade för att förbättra datakvaliteten med hjälp av kommunikationskvalitet. Ett exempel är att införa en möjlighet för mottagaren att få reda på vem som är sändaren av informationsinnehållet hos de framtagna rapporterna. Detta för att mottagaren bör ha möjlighet att kritisera och kontrollera den kommunikativa handlingen med sändaren, som i sin tur har möjlighet att försvara budskapet. Detta leder till att öka trovärdigheten hos den kommunikativa handlingen. Ett annat exempel är att införa inmatningskontroller hos källsystemen för att undvika att aktörer matar in skräptecken som sedan hamnar i data warehouse systemet. Detta leder till att mottagarens förståelse av det som kommuniceras förbättras. / The data warehouse system is something that has grown during the 1990s and has been implemented in many companies. The operative information system that a company has, can be integrated with a data warehouse system to build one version of the reality and take forward the decision basis. This means that a version of the reality creates a common picture that show how the company’s daily work occurs and constitutes the base of information for the created analysis reports from the data warehouse system. It is therefore important for a company that the reports have an acceptable data quality. This leads to that the data quality in the data warehouse system needs to hold an acceptable level of high quality. If data quality at the decision basis falls short, the company will not take the optimal decision for the company. Instead the company will take decision that normally would not have been taken. To improve the data quality in the data warehouse system would therefore be central for the company. With help from a quality philosophy, like TQM, the company have support to improve the data quality since it makes it possible for wholeness about the quality to be taken. The reason to take a holistic perspective about the data quality is because lacking of the data quality not only depends on reasons in the data warehouse system, but also on other reasons. The measurement of the quality improvement which needs to perform in the company depends on the situation on how the company works even in the more overall actions. The communication in form of for example reports from the data warehouse system needs to be understandable and trustworthy for the company’s actors. The reason is that the decision basis needs to be understandable and trustworthy for the receiver of the information. If for example the communication in form of reports contains junk characters it gets difficulty for the receiver of the information to consider if it is trustworthy and understandable. If the quality in the communication message is improving, videlicet that the communication quality improves, the data quality in the data warehouse will also improve in the end. In the thesis has guiding principles been created with the purpose to improve data quality in a data warehouse system with help of communication quality and TQM. Improving the quality in the communication, which is performed at the company’s data warehouse to improve the data quality, does this. There are different measures that are depending on the situations to improve the data quality with help of communication quality. One example is to introduce a possibility for the receiver to get information about who the sender of the information content in the reports is. This is because the receiver needs to have the option to criticize and control the communication acts with the sender, which will have the possibility to defend the message. This leads to a more improved trustworthy in the communication act. Another example is to introduce input controls in the operative system to avoid the actors to feed junk characters that land in the data warehouse system. This leads to that the receivers understanding of the communication improves.
|
Page generated in 0.0826 seconds