Global ETD Search

111	Allocation Strategies for Data-Oriented Architectures Kiefer, Tim 12 January 2016 (has links) (PDF) Data orientation is a common design principle in distributed data management systems. In contrast to process-oriented or transaction-oriented system designs, data-oriented architectures are based on data locality and function shipping. The tight coupling of data and processing thereon is implemented in different systems in a variety of application scenarios such as data analysis, database-as-a-service, and data management on multiprocessor systems. Data-oriented systems, i.e., systems that implement a data-oriented architecture, bundle data and operations together in tasks which are processed locally on the nodes of the distributed system. Allocation strategies, i.e., methods that decide the mapping from tasks to nodes, are core components in data-oriented systems. Good allocation strategies can lead to balanced systems while bad allocation strategies cause skew in the load and therefore suboptimal application performance and infrastructure utilization. Optimal allocation strategies are hard to find given the complexity of the systems, the complicated interactions of tasks, and the huge solution space. To ensure the scalability of data-oriented systems and to keep them manageable with hundreds of thousands of tasks, thousands of nodes, and dynamic workloads, fast and reliable allocation strategies are mandatory. In this thesis, we develop novel allocation strategies for data-oriented systems based on graph partitioning algorithms. Therefore, we show that systems from different application scenarios with different abstraction levels can be generalized to generic infrastructure and workload descriptions. We use weighted graph representations to model infrastructures with bounded and unbounded, i.e., overcommited, resources and possibly non-linear performance characteristics. Based on our generalized infrastructure and workload model, we formalize the allocation problem, which seeks valid and balanced allocations that minimize communication. Our allocation strategies partition the workload graph using solution heuristics that work with single and multiple vertex weights. Novel extensions to these solution heuristics can be used to balance penalized and secondary graph partition weights. These extensions enable the allocation strategies to handle infrastructures with non-linear performance behavior. On top of the basic algorithms, we propose methods to incorporate heterogeneous infrastructures and to react to changing workloads and infrastructures by incrementally updating the partitioning. We evaluate all components of our allocation strategy algorithms and show their applicability and scalability with synthetic workload graphs. In end-to-end--performance experiments in two actual data-oriented systems, a database-as-a-service system and a database management system for multiprocessor systems, we prove that our allocation strategies outperform alternative state-of-the-art methods. Datenbank Graphpartitionierung Datenorientierung data management data-orientation database-as-a-service graph partitioning ddc:004 rvk:ST 270
112	Discipline and research data in geography Tam, Wan Ting (Winnie) January 2016 (has links) Research data is essential to scholarship. The value of research data and its management has been increasingly recognized by policy makers and higher education institutions. A deep understanding of disciplinary practices is vital to develop culturally-sensitive policy, tools and services for successful data management. Previous research has shown that data practices vary across sub-fields and disciplines. However, much less is known about how disciplinary cultures shape data practices. There is a need to theorise research data practices based on empirical evidence in order to inform policy, tools and services. The aim of the thesis is to examine the interrelation between data practices and disciplinary cultures within geography. Geography is well-established and multidisciplinary, consisting of elements from the sciences, social sciences and humanities. By examining a single discipline this thesis develops a theoretical understanding of research data practices at a finer level of granularity than would be achieved by looking at broad disciplinary groupings such as the physical and social sciences. Data collection and analysis consisted of two phases. Phase one was exploratory, including an analysis of geography department websites and researcher web profiles and a bibliometric study of collaboration patterns based on co-authorship. Phase one aimed to understand the disciplinary characteristics of geography in preparation for Phase two. The second phase consisted of a series of 23 semi-structured interviews with researchers in geography, which aimed to understand researchers data practices and their attitudes toward data sharing within the context of the sub-discipline(s) they inhabited. The findings of the thesis show that there are contrasting intellectual, social and data differences between physical and human geography. For example, intellectually, these two branches of geography differ in terms of their research objects and methods; socially, they differ in terms of the scale of their collaborative activities and the motivations to collaborate; furthermore, the nature of data, how data is collected and data sharing practices are also different between physical and human geography. The thesis concludes that differences in the notion of data and data sharing practices are grounded in disciplinary characteristics. The thesis develops a new three-dimensional framework to better understand the notion of data from a disciplinary perspective. The three dimensions are (1) physical form, (2) intellectual content and (3) social construction. Furthermore, Becher and Trowler s (2001) disciplinary taxonomy i.e. hard-soft/pure-applied, and the concepts urban-rural ways of life and convergent-divergent communities, is shown to be useful to explain the diverse data sharing practices of geographers. The thesis demonstrates the usefulness of applying disciplinary theories to the sphere of research data management.
113	A visual analytics approach for visualisation and knowledge discovery from time-varying personal life data Parvinzamir, Farzad January 2018 (has links) Today, the importance of big data from lifestyles and work activities has been the focus of much research. At the same time, advances in modern sensor technologies have enabled self-logging of a signi cant number of daily activities and movements. Lifestyle logging produces a wide variety of personal data along the lifespan of individuals, including locations, movements, travel distance, step counts and the like, and can be useful in many areas such as healthcare, personal life management, memory recall, and socialisation. However, the amount of obtainable personal life logging data has enormously increased and stands in need of effective processing, analysis, and visualisation to provide hidden insights owing to the lack of semantic information (particularly in spatiotemporal data), complexity, large volume of trivial records, and absence of effective information visualisation on a large scale. Meanwhile, new technologies such as visual analytics have emerged with great potential in data mining and visualisation to overcome the challenges in handling such data and to support individuals in many aspects of their life. Thus, this thesis contemplates the importance of scalability and conducts a comprehensive investigation into visual analytics and its impact on the process of knowledge discovery from the European Commission project MyHealthAvatar at the Centre for Visualisation and Data Analytics by actively involving individuals in order to establish a credible reasoning and effectual interactive visualisation of such multivariate data with particular focus on lifestyle and personal events. To this end, this work widely reviews the foremost existing work on data mining (with the particular focus on semantic enrichment and ranking), data visualisation (of time-oriented, personal, and spatiotemporal data), and methodical evaluations of such approaches. Subsequently, a novel automated place annotation is introduced with multilevel probabilistic latent semantic analysis to automatically attach relevant information to the collected personal spatiotemporal data with low or no semantic information in order to address the inadequate information, which is essential for the process of knowledge discovery. Correspondingly, a multi-signi ficance event ranking model is introduced by involving a number of factors as well as individuals' preferences, which can influence the result within the process of analysis towards credible and high-quality knowledge discovery. The data mining models are assessed in terms of accurateness and performance. The results showed that both models are highly capable of enriching the raw data and providing significant events based on user preferences. An interactive visualisation is also designed and implemented including a set of novel visual components signifi cantly based upon human perception and attentiveness to visualise the extracted knowledge. Each visual component is evaluated iteratively based on usability and perceptibility in order to enhance the visualisation towards reaching the goal of this thesis. Lastly, three integrated visual analytics tools (platforms) are designed and implemented in order to demonstrate how the data mining models and interactive visualisation can be exploited to support different aspects of personal life, such as lifestyle, life pattern, and memory recall (reminiscence). The result of the evaluation for the three integrated visual analytics tools showed that this visual analytics approach can deliver a remarkable experience in gaining knowledge and supporting the users' life in certain aspects.
114	Recommender systems and market approaches for industrial data management Jess, Torben January 2017 (has links) Industrial companies are dealing with an increasing data overload problem in all aspects of their business: vast amounts of data are generated in and outside each company. Determining which data is relevant and how to get it to the right users is becoming increasingly difficult. There are a large number of datasets to be considered, and an even higher number of combinations of datasets that each user could be using. Current techniques to address this data overload problem necessitate detailed analysis. These techniques have limited scalability due to their manual effort and their complexity, which makes them unpractical for a large number of datasets. Search, the alternative used by many users, is limited by the user’s knowledge about the available data and does not consider the relevance or costs of providing these datasets. Recommender systems and so-called market approaches have previously been used to solve this type of resource allocation problem, as shown for example in allocation of equipment for production processes in manufacturing or for spare part supplier selection. They can therefore also be seen as a potential application for the problem of data overload. This thesis introduces the so-called RecorDa approach: an architecture using market approaches and recommender systems on their own or by combining them into one system. Its purpose is to identify which data is more relevant for a user’s decision and improve allocation of relevant data to users. Using a combination of case studies and experiments, this thesis develops and tests the approach. It further compares RecorDa to search and other mechanisms. The results indicate that RecorDa can provide significant benefit to users with easier and more flexible access to relevant datasets compared to other techniques, such as search in these databases. It is able to provide a fast increase in precision and recall of relevant datasets while still keeping high novelty and coverage of a large variety of datasets.
115	JavaRMS : um sistema de gerência de dados para grades baseado num modelo par-a-par / JavaRMS: a grid data management system based on a peer-to-peer model Gomes, Diego da Silva January 2008 (has links) A grande demanda por computação de alto desempenho culminou na construção de ambientes de execução de larga escala como as Grades Computacionais. Não diferente de outras plataformas de execução, seus usuários precisam obter os dados de entrada para suas aplicações e muitas vezes precisam armazenar os resultados por elas gerados. Apesar de o termo Grade ter surgido de uma metáfora onde os recursos computacionais estão tão facilmente acessíveis como os da rede elétrica, as ferramentas para gerenciamento de dados e de recursos de armazenamento disponíveis estão muito aquém do necessário para concretizar essa idéia. A imaturidade desses serviços se torna crítica para aplicações científicas que necessitam processar grandes volumes de dados. Nesses casos, utiliza-se apenas os recursos de alto desempenho e assegura-se confiabilidade, disponibilidade e segurança para os dados através de presença humana. Este trabalho apresenta o JavaRMS, um sistema de gerência de dados para Grades. Ao empregar um modelo par-a-par, consegue-se agregar os recursos menos capacitados disponíveis no ambiente de Grade, diminuindo-se assim o custo da solução. O sistema utiliza a técnica de nodos virtuais para lidar com a grande heterogeneidade de recursos, distribuindo os dados de acordo com o espaço de armazenamento fornecido. Empregase fragmentação para viabilizar o uso dos recursos menos capacitados e para melhorar o desempenho das operações que envolvem a transferência de arquivos. Utiliza-se replicação para prover persistência aos dados e para melhorar sua disponibilidade. JavaRMS lida ainda com a dinamicidade e a instabilidade dos recursos através de um modelo de estados, de forma a diminuir o impacto das operações de manutenção. A arquitetura contempla também serviços para gerenciamento de usuários e protege os recursos contra fraudes através de um sistema de cotas. Todas as operações foram projetadas para serem seguras. Por fim, disponibiliza-se toda a infra-estrutura necessária para que serviços de busca e ferramentas de interação com o usuário sejam futuramente fornecidos. Os experimentos realizados com o protótipo do JavaRMS comprovam que usar um modelo par-a-par para organizar os recursos e localizar os dados resulta em boa escalabilidade. Já a técnica de nodos virtuais se mostrou eficiente para distribuir de forma balanceada os dados entre as máquinas, de acordo com a capacidade de armazenamento oferecida. Através de testes com a principal operação que envolve a transferência de arquivos, comprovou-se que o modelo é capaz de melhorar significativamente o desempenho de aplicações que necessitam processar grandes volumes de dados. / Large scale execution environments such as Grids emerged to meet high-performance computing demands. Like in other execution platforms, its users need to get input data to their applications and to store their results. Although the Grid term is a metaphor where computing resources are so easily accessible as those from the eletric grid, its data and resource management tools are not sufficiently mature to make this idea a reality. They usually target high-performance resources, where data reliability, availability and security is assured through human presence. It turns to be critical when scientific applications need to process huge amounts of data. This work presents JavaRMS, a Grid data management system. By using a peer-topeer model, it aggregates low capacity resources to reduce storage costs. Resource heterogeneity is dealt with the virtual node technique, where peers receive data proportionally to their provided storage space. It applies fragmentation to make feasible the usage of low capacity resources and to improve file transfer operations performance. Also, the system achieves data persistence and availability through replication. In order to decrease the impact of maintenance operations, JavaRMS deals with resource dinamicity and instability with a state model. The architecture also contains user management services and protects resources through a quota system. All operations are designed to be secure. Finally, it provides the necessary infrastructure for further deployment of search services and user interactive tools. Experiments with the JavaRMS prototype showed that using a peer-to-peer model for resource organization and data location results in good scalability. Also, the virtual node technique showed to be efficient to provide heterogeneity-aware data distribution. Tests with the main file transfer operation proved the model can significantly improve data-intensive applications performance. Processamento distribuido Java (Linguagem de programação) Grid computing Data management Peer-to-peer model Replication Archiving
116	JavaRMS : um sistema de gerência de dados para grades baseado num modelo par-a-par / JavaRMS: a grid data management system based on a peer-to-peer model Gomes, Diego da Silva January 2008 (has links) A grande demanda por computação de alto desempenho culminou na construção de ambientes de execução de larga escala como as Grades Computacionais. Não diferente de outras plataformas de execução, seus usuários precisam obter os dados de entrada para suas aplicações e muitas vezes precisam armazenar os resultados por elas gerados. Apesar de o termo Grade ter surgido de uma metáfora onde os recursos computacionais estão tão facilmente acessíveis como os da rede elétrica, as ferramentas para gerenciamento de dados e de recursos de armazenamento disponíveis estão muito aquém do necessário para concretizar essa idéia. A imaturidade desses serviços se torna crítica para aplicações científicas que necessitam processar grandes volumes de dados. Nesses casos, utiliza-se apenas os recursos de alto desempenho e assegura-se confiabilidade, disponibilidade e segurança para os dados através de presença humana. Este trabalho apresenta o JavaRMS, um sistema de gerência de dados para Grades. Ao empregar um modelo par-a-par, consegue-se agregar os recursos menos capacitados disponíveis no ambiente de Grade, diminuindo-se assim o custo da solução. O sistema utiliza a técnica de nodos virtuais para lidar com a grande heterogeneidade de recursos, distribuindo os dados de acordo com o espaço de armazenamento fornecido. Empregase fragmentação para viabilizar o uso dos recursos menos capacitados e para melhorar o desempenho das operações que envolvem a transferência de arquivos. Utiliza-se replicação para prover persistência aos dados e para melhorar sua disponibilidade. JavaRMS lida ainda com a dinamicidade e a instabilidade dos recursos através de um modelo de estados, de forma a diminuir o impacto das operações de manutenção. A arquitetura contempla também serviços para gerenciamento de usuários e protege os recursos contra fraudes através de um sistema de cotas. Todas as operações foram projetadas para serem seguras. Por fim, disponibiliza-se toda a infra-estrutura necessária para que serviços de busca e ferramentas de interação com o usuário sejam futuramente fornecidos. Os experimentos realizados com o protótipo do JavaRMS comprovam que usar um modelo par-a-par para organizar os recursos e localizar os dados resulta em boa escalabilidade. Já a técnica de nodos virtuais se mostrou eficiente para distribuir de forma balanceada os dados entre as máquinas, de acordo com a capacidade de armazenamento oferecida. Através de testes com a principal operação que envolve a transferência de arquivos, comprovou-se que o modelo é capaz de melhorar significativamente o desempenho de aplicações que necessitam processar grandes volumes de dados. / Large scale execution environments such as Grids emerged to meet high-performance computing demands. Like in other execution platforms, its users need to get input data to their applications and to store their results. Although the Grid term is a metaphor where computing resources are so easily accessible as those from the eletric grid, its data and resource management tools are not sufficiently mature to make this idea a reality. They usually target high-performance resources, where data reliability, availability and security is assured through human presence. It turns to be critical when scientific applications need to process huge amounts of data. This work presents JavaRMS, a Grid data management system. By using a peer-topeer model, it aggregates low capacity resources to reduce storage costs. Resource heterogeneity is dealt with the virtual node technique, where peers receive data proportionally to their provided storage space. It applies fragmentation to make feasible the usage of low capacity resources and to improve file transfer operations performance. Also, the system achieves data persistence and availability through replication. In order to decrease the impact of maintenance operations, JavaRMS deals with resource dinamicity and instability with a state model. The architecture also contains user management services and protects resources through a quota system. All operations are designed to be secure. Finally, it provides the necessary infrastructure for further deployment of search services and user interactive tools. Experiments with the JavaRMS prototype showed that using a peer-to-peer model for resource organization and data location results in good scalability. Also, the virtual node technique showed to be efficient to provide heterogeneity-aware data distribution. Tests with the main file transfer operation proved the model can significantly improve data-intensive applications performance. Processamento distribuido Java (Linguagem de programação) Grid computing Data management Peer-to-peer model Replication Archiving
117	Master Data Management : Creating a Common Language for Master Data Across an Extended and Complex Supply Chain Lindmark, Fanny January 2018 (has links) Connectivity provided by technology and liberation of trade have led to a globalization of organizations, affecting supply chains to expand in complexity. As a result, many organizations today have challenges of managing information in a consistent manner throughout a complex system environment. This study aims to identify the most valuable attributes of a solution for managing master data, in an efficient and consistent manner, across an extended and complex supply chain. Master data, such as products, customers and suppliers, can be defined as valuable core business information, since it is vital for supporting business operations. A requirements elicitation was performed, including interviews conducted internally with employees at IFS and externally with customers. Furthermore, a requirements analysis resulted in a specification of requirements including the most desirable attributes of a future Master Data Management (MDM) solution. Five main themes of the attributes were identified; architecture, availability and integration, governance, user interface and lifecycle management. The study contributes to the area of research, by identifying challenges and valuable attributes to consider when developing or investing in a solution for MDM. Master Data Master Data Management MDM Supply Chain Requirements Computer and Information Sciences Data- och informationsvetenskap
118	Early screening and diagnosis of diabetic retinopathy Leontidis, Georgios January 2016 (has links) Diabetic retinopathy (DR) is a chronic, progressive and possibly vision-threatening eye disease. Early detection and diagnosis of DR, prior to the development of any lesions, is paramount for more efficiently dealing with it and managing its consequences. This thesis investigates and proposes a number of candidate geometric and haemodynamic biomarkers, derived from fundus images of the retinal vasculature, which can be reliably utilised for identifying the progression from diabetes to DR. Numerous studies exist in literature that investigate only some of these biomarkers in independent normal, diabetic and DR cohorts. However, none exist, to the best of my knowledge, that investigates more than 100 biomarkers altogether, both geometric and haemodynamic ones, for identifying the progression to DR, by also using a novel experimental design, where the same exact matched junctions and subjects are evaluated in a four year period that includes the last three years pre-DR (still diabetic eye) and the onset of DR (progressors’ group). Multiple additional conventional experimental designs, such as non-matched junctions, non-progressors’ group, and a combination of them are also adopted in order to present the superiority of this type of analysis for retinal features. Therefore, this thesis aims to present a complete framework and some novel knowledge, based on statistical analysis, feature selection processes and classification models, so as to provide robust, rigorous and meaningful statistical inferences, alongside efficient feature subsets that can identify the stages of the progression. In addition, a new and improved method for more accurately summarising the calibres of the retinal vessel trunks is also presented. The first original contribution of this thesis is that a series of haemodynamic features (blood flow rate, blood flow velocity, etc.), which are estimated from the retinal vascular geometry based on some boundary conditions, are applied to studying the progression from diabetes to DR. These features are found to undoubtedly contribute to the inferences and the understanding of the progression, yielding significant results, mainly for the venular network. The second major contribution is the proposed framework and the experimental design for more accurately and efficiently studying and quantifying the vascular alterations that occur during the progression to DR and that can be safely attributed only to this progression. The combination of the framework and the experimental design lead to more sound and concrete inferences, providing a set of features, such as the central retinal artery and vein equivalent, fractal dimension, blood flow rate, etc., that are indeed biomarkers of progression to DR. The third major contribution of this work is the new and improved method for more accurately summarising the calibre of an arterial or venular trunk, with a direct application to estimating the central retinal artery equivalent (CRAE), the central retinal vein equivalent (CRVE) and their quotient, the arteriovenous ratio (AVR). Finally, the improved method is shown to truly make a notable difference in the estimations, when compared to the established alternative method in literature, with an improvement between 0.24% and 0.49% in terms of the mean absolute percentage error and 0.013 in the area under the curve. I have demonstrated that some thoroughly planned experimental studies based on a comprehensive framework, which combines image processing algorithms, statistical and classification models, feature selection processes, and robust haemodynamic and geometric features, extracted from the retinal vasculature (as a whole and from specific areas of interest), provide altogether succinct evidence that the early detection of the progression from diabetes to DR can be indeed achieved. The performance that the eight different classification combinations achieved in terms of the area under the curve varied from 0.745 to 0.968. 617.7
119	Uma abordagem para roteamento de consultas em PDMS baseada em aspectos semânticos e de qualidade Freire, Crishane Azevedo 31 January 2014 (has links) Submitted by Nayara Passos (nayara.passos@ufpe.br) on 2015-03-11T19:28:27Z No. of bitstreams: 2 TESE Crishane Azevedo Freire.pdf: 2913439 bytes, checksum: 6da007914a1bc532cd0681ed751d6e4c (MD5) license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5) / Made available in DSpace on 2015-03-11T19:28:27Z (GMT). No. of bitstreams: 2 TESE Crishane Azevedo Freire.pdf: 2913439 bytes, checksum: 6da007914a1bc532cd0681ed751d6e4c (MD5) license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5) Previous issue date: 2014 / Os Peer Data Management Systems (PDMS) são sistemas que permitem o gerenciamento de dados estruturados e semiestruturados em ambientes Ponto-a-Ponto (P2P). Nestes sistemas, cada ponto corresponde a uma fonte de dados cujo esquema representa os dados que se deseja compartilhar na rede. Pontos estão conectados por meio de mapeamentos (correspondências semânticas entre os esquemas dos pontos) estabelecendo uma vizinhança semântica entre eles. O processamento de consultas é reconhecido como o principal serviço que um PDMS pode prover. Uma etapa importante deste processo está relacionada ao roteamento da consulta, ou seja, a habilidade do sistema de identificar, selecionar e fazer o encaminhamento da consulta ao melhor conjunto de pontos capazes de respondê-la. A cada encaminhamento a consulta precisa ser reformulada, ou seja, reescrita de acordo com o esquema do ponto destino. Na reformulação, termos (conceitos e/ou propriedades utilizados na formulação da consulta) podem ser perdidos por não possuírem correspondentes exatos no esquema do ponto destino. Neste caso, estratégias de reformulação que usam expansão buscam melhorar a consulta adicionando novos termos com o objetivo de tornar a consulta mais abrangente e evitar a ausência de resultados. Ao longo do roteamento, termos perdidos ou adicionados, a cada reformulação, podem levar à perda semântica da consulta original. Neste trabalho apresentamos a SemRouting, uma abordagem para o roteamento de consultas em PDMS baseada no uso de aspectos semânticos e de qualidade. A abordagem SemRouting compreende uma estratégia para identificação e seleção do melhor conjunto de pontos, um modelo para representação das informações semânticas e de qualidade e uma estratégia para análise e preservação da semântica da consulta original durante o roteamento. Para avaliação da abordagem, experimentos foram realizados e os resultados discutidos e apresentados. A análise dos resultados produzidos nos experimentos mostra que as estratégias adotadas na abordagem SemRouting confirmam as hipóteses levantadas nesta tese em relação à preservação semântica da consulta e à seleção do melhor conjunto de pontos durante o roteamento da consulta. Palavras-chave: Roteamento Semântico de Consulta. Informação Semântica. Qualidade da Informação. Peer Data Management System. Roteamento semântico de consulta Informação semântica Qualidade da Informação Peer Data Management System
120	Ontology-based clustering in a Peer Data Management System Pires, Carlos Eduardo Santos 31 January 2009 (has links) Made available in DSpace on 2014-06-12T15:49:23Z (GMT). No. of bitstreams: 1 license.txt: 1748 bytes, checksum: 8a4605be74aa9ea9d79846c1fba20a33 (MD5) Previous issue date: 2009 / Faculdade de Amparo à Ciência e Tecnologia do Estado de Pernambuco / Os Sistemas P2P de Gerenciamento de Dados (PDMS) são aplicações P2P avançadas que permitem aos usuários consultar, de forma transparente, várias fontes de dados distribuídas, heterogêneas e autônomas. Cada peer representa uma fonte de dados e exporta seu esquema de dados completo ou apenas uma parte dele. Tal esquema, denominado esquema exportado, representa os dados a serem compartilhados com outros peers no sistema e é comumente descrito por uma ontologia. Os dois aspectos mais estudados sobre gerenciamento de dados em PDMS estão relacionados com mapeamentos entre esquemas e processamento de consultas. Estes aspectos podem ser melhorados se os peers estiverem eficientemente dispostos na rede overlay de acordo com uma abordagem baseada em semântica. Nesse contexto, a noção de comunidade semântica de peers é bastante importante visto que permite aproximar logicamente peers com interesses comuns sobre um tópico específico. Entretanto, devido ao comportamento dinâmico dos peers, a criação e manutenção de comunidades semânticas é um aspecto desafiador no estágio atual de desenvolvimento dos PDMS. O objetivo principal desta tese é propor um processo baseado em semântica para agrupar, de modo incremental, peers semanticamente similares que compõem comunidades em um PDMS. Nesse processo, os peers são agrupados de acordo com o respectivo esquema exportado (uma ontologia) e processos de gerenciamento de ontologias (por exemplo, matching e sumarização) são utilizados para auxiliar a conexão dos peers. Uma arquitetura de PDMS é proposta para facilitar a organização semântica dos peers na rede overlay. Para obter a similaridade semântica entre duas ontologias de peers, propomos uma medida de similaridade global como saída de um processo de ontology matching. Para otimizar o matching entre ontologias, um processo automático para sumarização de ontologias também é proposto. Um simulador foi desenvolvido de acordo com a arquitetura do PDMS. Os processos de gerenciamento de ontologias propostos também foram desenvolvidos e incluídos no simulador. Experimentações de cada processo no contexto do PDMS assim como os resultados obtidos a partir dos experimentos são apresentadas Peer-to-Peer Peer Data Management Systems Semantic Community Ontology Matching Ontology Summarization Similarity Measure

Search results