Spelling suggestions: "subject:"data semantics"" "subject:"data emantics""
1 |
Improving Data Quality Through Effective Use of Data SemanticsMadnick, Stuart E. 01 1900 (has links)
Data quality issues have taken on increasing importance in recent years. In our research, we have discovered that many “data quality” problems are actually “data misinterpretation” problems – that is, problems with data semantics. In this paper, we first illustrate some examples of these problems and then introduce a particular semantic problem that we call “corporate householding.” We stress the importance of “context” to get the appropriate answer for each task. Then we propose an approach to handle these tasks using extensions to the COntext INterchange (COIN) technology for knowledge storage and knowledge processing. / Singapore-MIT Alliance (SMA)
|
2 |
Addressing the Challenges of Aggregational and Temporal Ontological HeterogeneityZhu, Hongwei, Madnick, Stuart E. 01 1900 (has links)
In this paper, we first identify semantic heterogeneities that, when not resolved, often cause serious data quality problems. We discuss the especially challenging problems of temporal and aggregational ontological heterogeneity, which concerns how complex entities and their relationships are aggregated and reinterpreted over time. Then we illustrate how the COntext INterchange (COIN) technology can be used to capture data semantics and reconcile semantic heterogeneities in a scalable manner, thereby improving data quality. / Singapore-MIT Alliance (SMA)
|
3 |
XML as a Format for Representation and Manipulation of Data from Radar CommunicationsAlfredsson, Anders January 2001 (has links)
<p>XML was designed to be a new standard for marking up data on the web. However, as a result of its extensible and flexible properties, XML is now being used more and more for other purposes than was originally intended. Today XML is prompting an approach more focused on data exchange, between different applications inside companies or even between cooperating businesses.</p><p>Businesses are showing interest in using XML as an integral part of their work. Ericsson Microwave Systems (EMW) is a company that sees XML as a conceivable solution to problems in the work with radar communications. An approach towards a solution based on a relational database system has earlier been analysed.</p><p>In this project we present an investigation of the work at EMW, and identification and documentation of the problems in the radar communication work. Also, the requirements and expectations that EMW has on XML are presented. Moreover, an analysis has been made to decide to what extent XML could be used to solve the problems of EMW. The analysis was conducted by elucidating the problems and possibilities of XML compared to the previous approach for solving the problems at EMW, which was based on using a relational database management system.</p><p>The analysis shows that XML has good features for representing hierarchically structured data, as in the EMW case. It is also shown that XML is good for data integration purposes. Furthermore, the analysis shows that XML, due to its self-describing and weak typing nature, is inappropriate to use in the data semantics and integrity problem context of EMW. However, it also shows that the new XML Schema standard could be used as a complement to the core XML standard, to partially solve the semantics problems.</p>
|
4 |
XML as a Format for Representation and Manipulation of Data from Radar CommunicationsAlfredsson, Anders January 2001 (has links)
XML was designed to be a new standard for marking up data on the web. However, as a result of its extensible and flexible properties, XML is now being used more and more for other purposes than was originally intended. Today XML is prompting an approach more focused on data exchange, between different applications inside companies or even between cooperating businesses. Businesses are showing interest in using XML as an integral part of their work. Ericsson Microwave Systems (EMW) is a company that sees XML as a conceivable solution to problems in the work with radar communications. An approach towards a solution based on a relational database system has earlier been analysed. In this project we present an investigation of the work at EMW, and identification and documentation of the problems in the radar communication work. Also, the requirements and expectations that EMW has on XML are presented. Moreover, an analysis has been made to decide to what extent XML could be used to solve the problems of EMW. The analysis was conducted by elucidating the problems and possibilities of XML compared to the previous approach for solving the problems at EMW, which was based on using a relational database management system. The analysis shows that XML has good features for representing hierarchically structured data, as in the EMW case. It is also shown that XML is good for data integration purposes. Furthermore, the analysis shows that XML, due to its self-describing and weak typing nature, is inappropriate to use in the data semantics and integrity problem context of EMW. However, it also shows that the new XML Schema standard could be used as a complement to the core XML standard, to partially solve the semantics problems.
|
5 |
Qualité contextuelle des données : détection et nettoyage guidés par la sémantique des données / Contextual data quality : Detection and cleaning guided by data semanticsBen salem, Aïcha 31 March 2015 (has links)
De nos jours, les applications complexes telles que l'extraction de connaissances, la fouille de données, le E-learning ou les applications web utilisent des données hétérogènes et distribuées. Dans ce contexte, la qualité de toute décision dépend de la qualité des données utilisées. En effet, avec l'absence de données riches, précises et fiables, une organisation peut prendre potentiellement de mauvaises décisions. L'objectif de cette thèse consiste à assister l'utilisateur dans sa démarche qualité. Il s'agit de mieux extraire, mélanger, interpréter et réutiliser les données. Pour cela, il faut rattacher aux données leurs sens sémantiques, leurs types, leurs contraintes et leurs commentaires. La première partie s'intéresse à la reconnaissance sémantique du schéma d'une source de données. Elle permet d'extraire la sémantique des données à partir de toutes les informations disponibles, incluant les données et les métadonnées. Elle consiste, d'une part, à classifier les données en leur attribuant une catégorie et éventuellement une sous-catégorie, et d'autre part, à établir des relations inter colonnes et de découvrir éventuellement la sémantique de la source de données manipulée. Ces liens inter colonnes une fois détectés offrent une meilleure compréhension de la source ainsi que des alternatives de correction des données. En effet, cette approche permet de détecter de manière automatique un grand nombre d'anomalies syntaxiques et sémantiques. La deuxième partie consiste à nettoyer les données en utilisant les rapports d'anomalies fournis par la première partie. Elle permet une correction intra colonne (homogénéisation des données), inter colonnes (dépendances sémantique) et inter lignes (élimination des doublons et similaire). Tout au long de ce processus, des recommandations ainsi que des analyses sont proposées à l'utilisateur. / Nowadays, complex applications such as knowledge extraction, data mining, e-learning or web applications use heterogeneous and distributed data. The quality of any decision depends on the quality of the used data. The absence of rich, accurate and reliable data can potentially lead an organization to make bad decisions.The subject covered in this thesis aims at assisting the user in its quality ap-proach. The goal is to better extract, mix, interpret and reuse data. For this, the data must be related to its semantic meaning, data types, constraints and comments.The first part deals with the semantic schema recognition of a data source. This enables the extraction of data semantics from all the available information, inculding the data and the metadata. Firstly, it consists of categorizing the data by assigning it to a category and possibly a sub-category, and secondly, of establishing relations between columns and possibly discovering the semantics of the manipulated data source. These links detected between columns offer a better understanding of the source and the alternatives for correcting data. This approach allows automatic detection of a large number of syntactic and semantic anomalies.The second part is the data cleansing using the reports on anomalies returned by the first part. It allows corrections to be made within a column itself (data homogeni-zation), between columns (semantic dependencies), and between lines (eliminating duplicates and similar data). Throughout all this process, recommendations and analyses are provided to the user.
|
6 |
A Resource-Oriented Architecture for Integration and Exploitation of Linked Data / Conception d'une architecture orientée services pour l'intégration et l'exploitation de données liéesDe Vettor, Pierre 29 September 2016 (has links)
Cette thèse porte sur l'intégration de données brutes provenant de sources hétérogènes sur le Web. L'objectif global est de fournir une architecture générique et modulable capable de combiner, de façon sémantique et intelligente, ces données hétérogènes dans le but de les rendre réutilisables. Ce travail est motivé par un scenario réel de l'entreprise Audience Labs permettant une mise à l'échelle de cette architecture. Dans ce rapport, nous proposons de nouveaux modèles et techniques permettant d'adapter le processus de combinaison et d'intégration à la diversité des sources de données impliquées. Les problématiques sont une gestion transparente et dynamique des sources de données, passage à l'échelle et responsivité par rapport au nombre de sources, adaptabilité au caractéristiques de sources, et finalement, consistance des données produites(données cohérentes, sans erreurs ni doublons). Pour répondre à ces problématiques, nous proposons un méta-modèle pour représenter ces sources selon leurs caractéristiques, liées à l'accès (URI) ou à l'extraction (format) des données, mais aussi au capacités physiques des sources (latence, volume). En s'appuyant sur cette formalisation, nous proposent différentes stratégies d'accès aux données, afin d'adapter les traitements aux spécificités des sources. En se basant sur ces modèles et stratégies, nous proposons une architecture orientée ressource, ou tout les composants sont accessibles par HTTP via leurs URI. En se basant sur les caractéristiques des sources, sont générés des workflows d'exécution spécifiques et adapté, permettant d'orchestrer les différentes taches du processus d'intégration de façon optimale, en donnant différentes priorités à chacune des tâches. Ainsi, les temps de traitements sont diminuées, ainsi que les volumes des données échangées. Afin d'améliorer la qualité des données produites par notre approches, l'accent est mis sur l'incertitude qui peut apparaître dans les données sur le Web. Nous proposons un modèle, permettant de représenter cette incertitude, au travers du concept de ressource Web incertaines, basé sur un modèle probabiliste ou chaque ressource peut avoir plusieurs représentation possibles, avec une certaine probabilité. Cette approche sera à l'origine d'une nouvelle optimisation de l'architecture pour permettre de prendre en compte l'incertitude pendant la combinaison des données / In this thesis, we focus on data integration of raw data coming from heterogeneous and multi-origin data sources on the Web. The global objective is to provide a generic and adaptive architecture able to analyze and combine this heterogeneous, informal, and sometimes meaningless data into a coherent smart data set. We define smart data as significant, semantically explicit data, ready to be used to fulfill the stakeholders' objective. This work is motivated by a live scenario from the French {\em Audience Labs} company. In this report, we propose new models and techniques to adapt the combination and integration process to the diversity of data sources. We focus on transparency and dynamicity in data source management, scalability and responsivity according to the number of data sources, adaptability to data source characteristics, and finally consistency of produced data (coherent data, without errors and duplicates). In order to address these challenges, we first propose a meta-models in order to represent the variety of data source characteristics, related to access (URI, authentication) extraction (request format), or physical characteristics (volume, latency). By relying on this coherent formalization of data sources, we define different data access strategies in order to adapt access and processing to data source capabilities. With help form these models and strategies, we propose a distributed resource oriented software architecture, where each component is freely accessible through REST via its URI. The orchestration of the different tasks of the integration process can be done in an optimized way, regarding data source and data characteristics. This data allows us to generate an adapted workflow, where tasks are prioritized amongst other in order to fasten the process, and by limiting the quantity of data transfered. In order to improve the data quality of our approach, we then focus on the data uncertainty that could appear in a Web context, and propose a model to represent uncertainty in a Web context. We introduce the concept of Web resource, based on a probabilistic model where each resource can have different possible representations, each with a probability. This approach will be the basis of a new architecture optimization allowing to take uncertainty into account during our combination process
|
7 |
Visualizing audit log events at the Swedish Police Authority to facilitate its use in the judicial system / Visualisering av spårbarhetslogg hos Polismyndigheten för att underlätta dess användning inom rättssystemetMichel, Hannes January 2019 (has links)
Within the Swedish Police Authority, physical users’ actions within all systems that manage sensitive information, are registered and sent to an audit log. The audit log contains log entries that consist of information regarding the events that occur by the performing user. This means that the audit log continuously manages massive amounts of data which is collected, processed and stored. For the police authority, the audit log may be useful for proving a digital trail of something that has occurred. An audit log is based upon the collected data from a security log. Security logs can collect datafrom most of the available systems and applications. It provides the availability for the organizationto implement network surveillance over the digital assets where logs are collected in real-time whichenables the possibility to detect any intrusion over the network. Furthermore, additional assets thatlog events are generated from are security software, firewalls, operating systems, workstations,networking equipment, and applications. The actors in a court of law usually don’t possess the technical knowledge required to interpret alog events since they can contain variable names, unparsed data or undefined values. Thisemphasizes the need for a user-friendly artifact of the audit log events that facilitates its use. Researching a way of displaying the current data format and displaying it in an improvedpresentable manner would be beneficial as an academic research by producing a generalizablemodel. In addition, it would prove useful for the internal investigations of the police authority sinceit was formed by their needs.
|
8 |
Sistema FOQuE para expansão semântica de consultas baseada em ontologias difusasYaguinuma, Cristiane Akemi 22 June 2007 (has links)
Made available in DSpace on 2016-06-02T19:05:26Z (GMT). No. of bitstreams: 1
1634.pdf: 2033754 bytes, checksum: ef58063d765aca814c3608c0828d4965 (MD5)
Previous issue date: 2007-06-22 / Financiadora de Estudos e Projetos / As availability of data from several areas of knowledge grows, it is even more necessary to
develop effective techniques to retrieve the desired information, aiming to reduce irrelevant
answers and ensure that relevant results are not ignored. Considering this context, we present
the FOQuE system, developed to perform query expansion in order to retrieve semantically
relevant and broad results. Based on fuzzy ontologies, this system is able to obtain
approximate results that satisfy user requirements according to expansion parameters defined
by the user. The additional answers retrieved by the FOQuE system are classified according to
the semantic expansion performed and the relevance to the query, therefore it is possible to
improve results that are presented to the user. / Diante da crescente facilidade de acesso a dados de diversas áreas do conhecimento, cada vez
mais são necessárias técnicas eficazes para recuperar a informação desejada, visando reduzir
respostas irrelevantes e assegurar que resultados relevantes não sejam desprezados. Dentro
deste contexto, este trabalho apresenta o sistema FOQuE, desenvolvido para realizar diversos
tipos de expansão de consultas com o intuito de recuperar resultados semanticamente
relevantes e abrangentes. Baseado em ontologias difusas, este sistema é capaz de obter
resultados aproximados que satisfaçam aos requisitos do usuário, de acordo com parâmetros
de expansão especificados por ele. As respostas adicionais recuperadas pelo sistema FOQuE
são classificadas segundo o tipo de expansão realizada e a relevância para a consulta,
melhorando, assim, a forma como os resultados são apresentados ao usuário.
|
Page generated in 0.0443 seconds