Global ETD Search

261	Composing DaaS web services : application to eHealth / Composition des services web DaaS : application à l'eSanté Barhamgi, Mahmoud 08 October 2010 (has links) Dans cette thèse, nous intéressons à l'automatisation de la composition de service Web d'accès aux données (i.e. DaaS Data-gs-g-S..ervice Web services) pour les besoins de partage de données dans les environnements distribués. La composition de service Web permet de répondre aux besoins d'un utilisateur ne pouvant être satisfaits par un seul Web service, alors qu'une intégration de plusieurs le permettrait. La motivation principale de notre travail est que les méthodes de composition, telles qu'elles sont appliquées aux services Web traditionnels (i.e. AaaS Application-as-a-Service Web services), ne permettent pas de prendre en compte la relation sémantique entre les entrées/sorties d'un service Web d'accès aux données, et en conséquence, elles ne sont pas adaptées pour composer les services Web d'accès aux données. Dans ce travail de thèse, nous proposons d'exploiter les principes de base des systèmes d'intégration des données pour composer les services Web d'accès aux données. Plus précisément, nous modélisons les services Web d'accès aux données comme des vues sur des ontologies de domaine. Cela permet de représenter la sémantique d'un service d'une manière déclarative en se basant sur des concepts et des relations dont les sémantiques sont formellement définies dans l'ontologie de domaine. Ensuite, nous utilisons les techniques de réécriture des requêtes pour sélectionner et composer automatiquement les services pour répondre aux requêtes des utilisateurs. Comme les services Web d'accès aux données peuvent être utilisés pour accéder à des données sensibles et privées, nous proposons également un mécanisme basé sur la modification des requêtes pour préserver la confidentialité des données. Ce mécanisme modifie les requêtes en se basant sur des politiques de confidentialité avant leur résolution par 1' algorithme de composition, et il prend en considération les préférences des utilisateurs quant à la divulgation de leurs données privées. Le principal domaine d'application de notre approche est le domaine d'e-santé, où les services Web d'accès aux données sont utilisés pour partager les dossiers médicaux des patients. / In this dissertation, we propose a novel approach for the automatic composition of DaaS Web services (DaaS Data-gs-g-S.ervice Web services). Automatic DaaS Web service composition requires dealing with three major research thrusts: (i) describing the semantics of DaaS Web services, (ii) selecting and combining relevant DaaS Web services, and (iii) generating composite service descriptions (i.e. the compositions' plans). We first propose to model DaaS Web services as RDF views over domain ontologies. An RDF view allows capturing the semantics of the associated DaaS Web service in a "declarative" way based on concepts and relationships whose semantics are formally defined in domain ontologies. The service description files (i.e. WSDL files) are annotated with the defined RDF views using the extensibility feature of the WSDL standard. We then propose to use query rewriting techniques for selecting and composing DaaS Web services. Specifically, we devised an efficient RDF-oriented query rewriting algorithm that selects relevant services based ontheir defined RDF views and combines them to ans~wer a posed query. It also generates an execution plan for the obtained composition/s. Our algorithm takes into account the RDFS semantic constraints (i.e. "subClassOf", "subPropertyOf", "Domain" and "Range") and is able to address both specifie and parameterized queries. Since DaaS Web services may be used to access sensitive and private data; we also extended our DaaS service composition approach to handle data privacy concems. Posed queries are modified to accommodate pertaining privacy conditions from data privacy policies before their resolution by the core composition algorithm. Our proposed privacy preservation model takes user' s privacy preferences into account. Services Web d'accès aux données Composition des services web Intégration des données Confidentialité DaaS Web Services Service composition Data Integration Data Privacy 005.7
262	AcCORD: um modelo colaborativo assíncrono para a reconciliação de dados / AcCORD: asynchronous collaborative data reconciliation model Dayse Silveira de Almeida 28 April 2016 (has links) Reconciliação é o processo de prover uma visão consistente de dados provenientes de várias fontes de dados. Embora existam na literatura trabalhos voltados à proposta de soluções de reconciliação baseadas em colaboração assíncrona, o desafio de reconciliar dados quando vários usuários colaborativos trabalham de forma assíncrona sobre as mesmas cópias locais de dados, compartilhando somente eventualmente as suas decisões de integração particulares, tem recebido menos atenção. Nesta tese de doutorado investiga-se esse desafio, por meio da proposta do modelo AcCORD (Asynchronous COllaborative data ReconcIliation moDel). AcCORD é um modelo colaborativo assíncrono para reconciliação de dados no qual as atualizações dos usuários são mantidas em um repositório de operações na forma de dados de procedência. Cada usuário tem o seu próprio repositório para armazenar a procedência e a sua própria cópia das fontes. Ou seja, quando inconsistências entre fontes importadas são detectadas, o usuário pode tomar decisões de integração para resolvê-las de maneira autônoma, e as atualizações que são executadas localmente são registradas em seu próprio repositório. As atualizações são compartilhadas entre colaboradores quando um usuário importa as operações dos repositórios dos demais usuários. Desde que diferentes usuários podem ter diferentes pontos de vista para resolver o mesmo conflito, seus repositórios podem estar inconsistentes. Assim, o modelo AcCORD também inclui a proposta de diferentes políticas de reconciliação multiusuário para resolver conflitos entre repositórios. Políticas distintas podem ser aplicadas por diferentes usuários para reconciliar as suas atualizações. Dependendo da política aplicada, a visão final das fontes importadas pode ser a mesma para todos os usuários, ou seja, um única visão global integrada, ou resultar em distintas visões locais para cada um deles. Adicionalmente, o modelo AcCORD também incorpora um método de propagação de decisões de integração, o qual tem como objetivo evitar que um usuário tome decisões inconsistentes a respeito de um mesmo conflito de dado presente em diferentes fontes, garantindo um processo de reconciliação multiusuário mais efetivo. O modelo AcCORD foi validado por meio de testes de desempenho que avaliaram as políticas propostas, e por entrevistas a usuários que avaliaram não somente as políticas propostas mas também a qualidade da reconciliação multiusuário. Os resultados obtidos demonstraram a eficiência e a eficácia do modelo proposto, além de sua flexibilidade para gerar uma visão integrada ou distintas visões locais. As entrevistas realizadas demonstraram diferentes percepções dos usuários quanto à qualidade do resultado provido pelo modelo AcCORD, incluindo aspectos relacionados à consistência, aceitabilidade, corretude, economia de tempo e satisfação. / Reconciliation is the process of providing a consistent view of the data imported from different sources. Despite some efforts reported in the literature for providing data reconciliation solutions with asynchronous collaboration, the challenge of reconciling data when multiple users work asynchronously over local copies of the same imported data has received less attention. In this thesis we investigate this challenge. We propose AcCORD, an asynchronous collaborative data reconciliation model. It stores users integration decision in logs, called repositories. Repositories keep data provenance, that is, the operations applied to the data sources that led to the current state of the data. Each user has her own repository for storing the provenance. That is, whenever inconsistencies among imported sources are detected, the user may autonomously take decisions to solve them, and integration decisions that are locally executed are registered in her repository. Integration decisions are shared among collaborators by importing each others repositories. Since users may have different points of view, repositories may also be inconsistent. Therefore, AcCORD also introduces several policies that can be applied by different users in order to solve conflicts among repositories and reconcile their integration decisions. Depending on the applied policy, the final view of the imported sources may either be the same for all users, that is, a single integrated view, or result in distinct local views for each of them. Furthermore, AcCORD encompasses a decision integration propagation method, which is aimed to avoid that a user take inconsistent decisions over the same data conflict present in different sources, thus guaranteeing a more effective reconciliation process. AcCORD was validated through performance tests that investigated the proposed policies and through users interviews that investigated not only the proposed policies but also the quality of the multiuser reconciliation. The results demonstrated the efficiency and efficacy of AcCORD, and highlighted its flexibility to generate a single integrated view or different local views. The interviews demonstrated different perceptions of the users with regard to the quality of the result provided by AcCORD, including aspects related to consistency, acceptability, correctness, time-saving and satisfaction. Compartilhamento de dados Integração de dados Procedência dos dados Reconciliação de dados Resolução de conflitos Conflict resolution Data integration Data provenance Data reconciliation Data sharing
263	Arquitetura e métodos de integração de dados e interoperabilidade aplicados na saúde mental / Investigation of the effectiveness of data integration and interoperability methods applied to mental health Newton Shydeo Brandão Miyoshi 16 March 2018 (has links) A disponibilidade e integração das informações em saúde relativas a um mesmo paciente entre diferentes níveis de atenção ou entre diferentes instituições de saúde é normalmente incompleta ou inexistente. Isso acontece principalmente porque os sistemas de informação que oferecem apoio aos profissionais da saúde não são interoperáveis, dificultando também a gestão dos serviços a nível municipal e regional. Essa fragmentação da informação também é desafiadora e preocupante na área da saúde mental, em que normalmente se exige um cuidado prolongado e que integra diferentes tipos de serviços de saúde. Problemas como a baixa qualidade e indisponibilidade de informações, assim como a duplicidade de registros, são importantes aspectos na gestão e no cuidado prolongado ao paciente portador de transtornos mentais. Apesar disso, ainda não existem estudos objetivos demonstrando o impacto efetivo da interoperabilidade e integração de dados na gestão e na qualidade de dados para a área de saúde mental. Objetivos: Neste contexto, o projeto tem como objetivo geral propor uma arquitetura de interoperabilidade para a assistência em saúde regionalizada e avaliar a efetividade de técnicas de integração de dados e interoperabilidade para a gestão dos atendimentos e internações em saúde mental na região de Ribeirão Preto, assim como o impacto na melhoria e disponibilidade dos dados por meio de métricas bem definidas. Métodos: O framework de interoperabilidade proposto tem como base a arquitetura cliente-servidor em camadas. O modelo de informação de interoperabilidade foi baseado em padrões de saúde internacionais e nacionais. Foi proposto um servidor de terminologias baseado em padrões de informação em saúde. Foram também utilizados algoritmos de Record Linkage para garantir a identificação unívoca do paciente. Para teste e validação da proposta foram utilizados dados de diferentes níveis de atenção à saúde provenientes de atendimentos na rede de atenção psicossocial na região de Ribeirão Preto. Os dados foram extraídos de cinco fontes diferentes: (i) a Unidade Básica de Saúde da Família - I, de Santa Cruz da Esperança; (ii) o Centro de Atenção Integrada à Saúde, de Santa Rita do Passa Quatro; (iii) o Hospital Santa Tereza; (iv) as informações de solicitações de internação contidas no SISAM (Sistema de Informação em Saúde Mental); e (v) dados demográficos do Barramento do Cartão Nacional de Saúde do Ministério da Saúde. As métricas de qualidade de dados utilizadas foram completude, consistência, duplicidade e acurácia. Resultados: Como resultado deste trabalho, foi projetado, desenvolvido e testado a plataforma de interoperabilidade em saúde, denominado eHealth-Interop. Foi adotada uma proposta de interoperabilidade por meio de serviços web com um modelo de integração de dados baseado em um banco de dados centralizador. Foi desenvolvido também um servidor de terminologias, denominado eHealth-Interop Terminology Server, que pode ser utilizado como um componente independente e em outros contextos médicos. No total foram obtidos dados de 31340 registros de pacientes pelo SISAM, e-SUS AB de Santa Cruz da Esperança, do CAIS de Santa Rita do Passa Quatro, do Hospital Santa Tereza e do Barramento do CNS do Ministério da Saúde. Desse total, 30,47% (9548) registros foram identificados como presente em mais de 1 fonte de informação, possuindo diferentes níveis de acurácia e completude. A análise de qualidade de dados, abrangendo todas os registros integrados, obteve uma melhoria na completude média de 18,40% (de 56,47% para 74,87%) e na acurácia sintática média de 1,08% (de 96,69% para 96,77%). Na análise de consistência houve melhoras em todas as fontes de informação, variando de uma melhoria mínima de 14.4% até o máximo de 51,5%. Com o módulo de Record Linkage foi possível quantificar, 1066 duplicidades e, dessas, 226 foram verificadas manualmente. Conclusões: A disponibilidade e a qualidade da informação são aspectos importantes para a continuidade do atendimento e gerenciamento de serviços de saúde. A solução proposta neste trabalho visa estabelecer um modelo computacional para preencher essa lacuna. O ambiente de interoperabilidade foi capaz de integrar a informação no caso de uso de saúde mental com o suporte de terminologias clínicas internacionais e nacionais sendo flexível para ser estendido a outros domínios de atenção à saúde. / The availability and integration of health information from the same patient between different care levels or between different health services is usually incomplete or non-existent. This happens especially because the information systems that support health professionals are not interoperable, making it difficult to manage services at the municipal and regional level. This fragmentation of information is also challenging and worrying in the area of mental health, where long-term care is often required and integrates different types of health services and professionals. Problems such as poor quality and unavailability of information, as well as duplicate records, are important aspects in the management and long-term care of patients with mental disorders. Despite this, there are still no objective studies that demonstrate the effective impact of interoperability and data integration on the management and quality of data for the mental health area. Objectives: In this context, this project proposes an interoperability architecture for regionalized health care management. It also proposes to evaluate the effectiveness of data integration and interoperability techniques for the management of mental health hospitalizations in the Ribeirão Preto region as well as the improvement in data availability through well-defined metrics. Methods: The proposed framework is based on client-service architecture to be deployed in the web. The interoperability information model was based on international and national health standards. It was proposed a terminology server based on health information standards. Record Linkage algorithms were implemented to guarantee the patient identification. In order to test and validate the proposal, we used data from different health care levels provided by the mental health care network in the Ribeirão Preto region. The data were extracted from five different sources: the Family Health Unit I of Santa Cruz da Esperança, the Center for Integrated Health Care of Santa Rita do Passa Quatro, Santa Tereza Hospital, the information on hospitalization requests system in SISAM (Mental Health Information System) and demographic data of the Brazilian Ministry of Health Bus. Results: As a result of this work, the health interoperability platform, called eHealth-Interop, was designed, developed and tested. A proposal was adopted for interoperability through web services with a data integration model based on a centralizing database. A terminology server, called eHealth-Interop Terminology Server, has been developed that can be used as an independent component and in other medical contexts. In total, 31340 patient records were obtained from SISAM, eSUS-AB from Santa Cruz da Esperança, from CAIS from Santa Rita do Passa Quatro, from Santa Tereza Hospital and from the CNS Service Bus from the Brazillian Ministry of Health. 47% (9548) records were identified as present in more than 1 information source, having different levels ofaccuracy and completeness. The data quality analysis, covering all integrated records, obtained an improvement in the average completeness of 18.40% (from 56.47% to 74.87%) and the mean syntactic accuracy of 1.08% (from 96,69% to 96.77%). In the consistency analysis there were improvements in all information sources, ranging from a minimum improvement of 14.4% to a maximum of 51.5%. With the Record Linkage module it was possible to quantify 1066 duplications, of which 226 were manually verified. Conclusions: The information\'s availability and quality are both important aspects for the continuity of care and health services management. The solution proposed in this work aims to establish a computational model to fill this gap. It has been successfully applied in the mental health care context and is flexible to be extendable to other medical domains. Interoperabilidade Qualidade de Dados Record Linkage Saúde Mental Sistemas de Informação em Saúde Terminologias em Saúde Data Integration Data Quality Health Information Systems Health Terminologies Interoperability Mental Health Record Linkage
264	Intégrer des sources de données hétérogènes dans le Web de données / Integrating heterogeneous data sources in the Web of data Michel, Franck 03 March 2017 (has links) Le succès du Web de Données repose largement sur notre capacité à atteindre les données stockées dans des silos invisibles du web. Dans les 15 dernières années, des travaux ont entrepris d’exposer divers types de données structurées au format RDF. Dans le même temps, le marché des bases de données (BdD) est devenu très hétérogène avec le succès massif des BdD NoSQL. Celles-ci sont potentiellement d’importants fournisseurs de données liées. Aussi, l’objectif de cette thèse est de permettre l’intégration en RDF de sources de données hétérogènes, et notamment d'alimenter le Web de Données avec les données issues des BdD NoSQL. Nous proposons un langage générique, xR2RML, pour décrire le mapping de sources hétérogènes vers une représentation RDF arbitraire. Ce langage étend des travaux précédents sur la traduction de sources relationnelles, CSV/TSV et XML en RDF. Sur cette base, nous proposons soit de matérialiser les données RDF, soit d'évaluer dynamiquement des requêtes SPARQL sur la base native. Dans ce dernier cas, nous proposons une approche en deux étapes : (i) traduction d’une requête SPARQL en une requête pivot, abstraite, en se basant sur le mapping xR2RML ; (ii) traduction de la requête abstraite en une requête concrète, prenant en compte les spécificités du langage de requête de la BdD cible. Un souci particulier est apporté à l'optimisation des requêtes, aux niveaux abstrait et concret. Nous démontrons l’applicabilité de notre approche via un prototype pour la populaire base MongoDB. Nous avons validé la méthode dans un cas d’utilisation réel issu du domaine des humanités numériques. / To a great extent, the success of the Web of Data depends on the ability to reach out legacy data locked in silos inaccessible from the web. In the last 15 years, various works have tackled the problem of exposing various structured data in the Resource Description Format (RDF). Meanwhile, the overwhelming success of NoSQL databases has made the database landscape more diverse than ever. NoSQL databases are strong potential contributors of valuable linked open data. Hence, the object of this thesis is to enable RDF-based data integration over heterogeneous data sources and, in particular, to harness NoSQL databases to populate the Web of Data. We propose a generic mapping language, xR2RML, to describe the mapping of heterogeneous data sources into an arbitrary RDF representation. xR2RML relies on and extends previous works on the translation of RDBs, CSV/TSV and XML into RDF. With such an xR2RML mapping, we propose either to materialize RDF data or to dynamically evaluate SPARQL queries on the native database. In the latter, we follow a two-step approach. The first step performs the translation of a SPARQL query into a pivot abstract query based on the xR2RML mapping of the target database to RDF. In the second step, the abstract query is translated into a concrete query, taking into account the specificities of the database query language. Great care is taken of the query optimization opportunities, both at the abstract and the concrete levels. To demonstrate the effectiveness of our approach, we have developed a prototype implementation for MongoDB, the popular NoSQL document store. We have validated the method using a real-life use case in Digital Humanities. Intégration de données Données historiques Web de Données Entrepôt RDF virtuel XR2RML SPARQL MongoDB Data integration Legacy data Web of Data Virtual RDF store XR2RML SPARQL MongoDB
265	Crowdsourcing in pay-as-you-go data integration Osorno Gutierrez, Fernando January 2016 (has links) In pay-as-you-go data integration, feedback can inform the regeneration of different aspects of a data integration system, and as a result, helps to improve the system's quality. However, feedback could be expensive as the amount of feedback required to annotate all the possible integration artefacts is potentially big in contexts where the budget can be limited. Also, feedback could be used in different ways. Feedback of different types and in different orders could have different effects in the quality of the integration. Some feedback types could give rise to more benefit than others. There is a need to develop techniques to collect feedback effectively. Previous efforts have explored the benefit of feedback in one aspect of the integration. However, the contributions have not considered the benefit of different feedback types in a single integration task. We have investigated the annotation of mapping results using crowdsourcing, and implementing techniques for reliability. The results indicate that precision estimates derived from crowdsourcing improve rapidly, suggesting that crowdsourcing can be used as a cost-effective source of feedback. We propose an approach to maximize the improvement of data integration systems given a budget for feedback. Our approach takes into account the annotation of schema matchings, mapping results and pairs of candidate record duplicates. We define a feedback plan, which indicates the type of feedback to collect, the amount of feedback to collect and the order in which different types of feedback are collected. We defined a fitness function and a genetic algorithm to search for the most cost-effective feedback plans. We implemented a framework to test the application of feedback plans and measure the improvement of different data integration systems. In the framework, we use a greedy algorithm for the selection of mappings. We designed quality measures to estimate the quality of a dataspace after the application of a feedback plan. For the evaluation of our approach, we propose a method to generate synthetic data scenarios. We evaluate our approach in scenarios with different characteristics. The results showed that the generated feedback plans achieved higher quality values than the randomly generated feedback plans in several scenarios. 005.7
266	Intelligent knowledge discovery on building energy and indoor climate data Raatikainen, M. (Mika) 29 November 2016 (has links) Abstract A future vision of enabling technologies for the needs of energy conservation as well as energy efficiency based on the most important megatrends identified, namely climate change, urbanization, and digitalization. In the United States and in the European Union, about 40% of total energy consumption goes into energy use by buildings. Moreover, indoor climate quality is recognized as a distinct health hazard. On account of these two factors, energy efficiency and healthy housing are active topics in international research. The main aims of this thesis are to study which elements affect indoor climate quality, how energy consumption describes building energy efficiency and to analyse the measured data using intelligent computational methods. The data acquisition technology used in the studies relies heavily on smart metering technologies based on Building Automation Systems (BAS), big data and the Internet of Things (IoT). The data refining process presented and used is called Knowledge Discovery in Databases (KDD). It contains methods for data acquisition, pre-processing, data mining, visualisation and interpretation of results, and transformation into knowledge and new information for end users. In this thesis, four examples of data analysis and knowledge deployment concerning small houses and school buildings are presented. The results of the case studies show that the data mining methods used in building energy efficiency and indoor climate quality analysis have a great potential for processing a large amount of multivariate data effectively. An innovative use of computational methods provides a good basis for researching and developing new information services. In the KDD process, researchers should co-operate with end users, such as building management and maintenance personnel as well as residents, to achieve better analysis results, easier interpretation and correct conclusions for exploiting the knowledge. / Tiivistelmä Tulevaisuuden visio energiansäästön sekä energiatehokkuuden mahdollistavista teknologioista pohjautuu tärkeimpiin tunnistettuihin megatrendeihin, ilmastonmuutokseen, kaupungistumiseen ja digitalisoitumiseen. Yhdysvalloissa ja Euroopan unionissa käytetään noin 40 % kokonaisenergiankulutuksesta rakennusten käytön energiatarpeeseen. Myös rakennusten sisäilmaston on havaittu olevan ilmeinen terveysriski. Perustuen kahteen edellä mainittuun tekijään, energiatehokkuus ja asumisterveys ovat aktiivisia tutkimusaiheita kansainvälisessä tutkimuksessa. Tämän väitöskirjan päätavoitteena on ollut tutkia, mitkä elementit vaikuttavat sisäilmastoon ja rakennusten energiatehokkuuteen pääasiassa analysoimalla mittausdataa käyttäen älykkäitä laskennallisia menetelmiä. Tutkimuksissa käytetyt tiedonkeruuteknologiat perustuvat etäluentaan ja rakennusautomaatioon, big datan hyödyntämiseen ja esineiden internetiin (IoT). Väitöskirjassa esiteltävä tietämyksen muodostusprosessi (KDD) koostuu tiedonkeruusta,datan esikäsittelystä, tiedonlouhinnasta, visualisoinnista ja tutkimustulosten tulkinnasta sekä tietämyksen muodostamisesta ja oleellisen informaation esittämisestä loppukäyttäjille. Tässä väitöstutkimuksessa esitellään neljän data-analyysin ja niiden pohjalta muodostetun tietämyksen hyödyntämisen esimerkkiä, jotka liittyvät pientaloihin ja koulurakennuksiin. Esimerkkitapausten tulokset osoittavat, että käytetyillä tiedonlouhinnan menetelmillä sovellettuna rakennusten energiatehokkuus- ja sisäilmastoanalyyseihin on mahdollista jalostaa suuria monimuuttuja-aineistoja tehokkaasti. Laskennallisten menetelmien innovatiivinen käyttö antaa hyvät perusteet tutkia ja kehittää uusia informaatiopalveluja. Tutkijoiden tulee tehdä yhteistyötä loppukäyttäjinä toimivien kiinteistöhallinnan ja -ylläpidon henkilöstön sekä asukkaiden kanssa saavuttaakseen parempia analyysituloksia, helpompaa tulosten tulkintaa ja oikeita johtopäätöksiä tietämyksen hyödyntämiseksi. Sammon’s mapping self-organizing map data integration data mining energy consumption indoor climate quality information services k-means knowledge discovery energiankulutus informaatio rakennukset sisäilmasto tiedonlouhinta
267	Intelligent information services in environmental applications Räsänen, T. (Teemu) 22 November 2011 (has links) Abstract The amount of information available has increased due to the development of our modern digital society. This has caused an information overflow, meaning that there is lot of data available but the meaningful information or knowledge is hidden inside the overwhelming data smog. Nevertheless, the large amount of data together with the increased capabilities of computers provides a great opportunity to learn the behaviour of different kinds of phenomena at a more detailed level. The quality of life, well-being and a healthy living environment, for example, are fields where new information services can assist the creation of proactive decisions to avoid environmental problems caused by industrial activity, traffic, or extraordinary weather conditions. The combination of data coming from different sources such as public registers, companies’ operational information systems, online sensors and process monitoring systems provides a fruitful basis for creating new valuable information for citizens, decision makers or other end users. The aim of this thesis is to present the concept of intelligent information services and a methodological background in order to add intelligence using computational methods for the enrichment of multidimensional data. Moreover, novel examples are presented where new significant information is created and then provided for end users. The data refining process used is called data mining and contains methods for data collection, pre-processing, modelling, visualizing and interpreting the results and sharing the new information thus created. Information systems are a base for the creation of information services, meaning that stakeholder groups have access only to information but they do not own the whole information system that contains measurement systems, data collecting, and a technological platform. Intelligence in information services comes from the use of computational intelligent methods in data processing, modelling and visualization. In this thesis the general concept of such services is presented and concretized using five cases that focus on environmental and industrial examples. The results of these case studies show that the combination of different data sources provides fertile ground for developing new information services. The data mining methods used such as clustering and predictive modelling together with effective pre-processing methods have great potential to handle the large amount of multivariate data in this environmental context also. A self-organizing map combined with k-means clustering is useful for creating more detailed information about personal energy use. Predictive modelling using a multilayer perceptron (MLP) is well suited for estimating the number of tourists visiting a leisure centre and to find the correspondence between pulp process characteristics and the chemicals used. These results have many indirect effects on reducing negative concerns regarding our surroundings and maintaining a healthy living environment. The innovative use of stored data is one of the main elements in the creation of future information services. Thus, more emphasis should be placed on the development of data integration and effective data processing methods. Furthermore, it is noted that final end users, such as citizens or decision makers, should be involved in the data refining process at the very first stage. In this way, the approach is truly customer-oriented and the results fulfil the concrete need of specific end users. / Tiivistelmä Informaation määrä on kasvanut merkittävästi tietoyhteiskunnan kehittymisen myötä. Käytössämme onkin huomattava määrä erimuotoista tietoa, josta voimme hyödyntää kuitenkin vain osan. Jatkuvasti mitattavan datan suuri määrä ja sijoittuminen hajalleen asettavat osaltaan haasteita tiedon hyödyntämiselle. Tietoyhteiskunnassa hyvinvointi ja terveellisen elinympäristön säilyminen koetaan aiempaa tärkeämmäksi. Toisaalta yritysten toiminnan tehostaminen ja kestävän kehityksen edistäminen vaativat jatkuvaa parantamista. Informaatioteknologian avulla moniulotteista mittaus- ja rekisteritietoa voidaan hyödyntää esimerkiksi ennakoivaan päätöksentekoon jolla voidaan edistää edellä mainittuja tavoitteita. Tässä työssä on esitetty ympäristöalan älykkäiden informaatiopalveluiden konsepti, jossa oleellista on loppukäyttäjien tarpeiden tunnistaminen ja ongelmien ratkaiseminen jalostetun informaation avulla. Älykkäiden informaatiopalvelujen taustalla on yhtenäinen tiedonlouhintaan perustuva tiedonjalostusprosessi, jossa raakatieto jalostetaan loppukäyttäjille soveltuvaan muotoon. Tiedonjalostusprosessi koostuu datan keräämisestä ja esikäsittelystä, mallintamisesta, tiedon visualisoinnista, tulosten tulkitsemisesta sekä oleellisen tiedon jakamisesta loppukäyttäjäryhmille. Datan käsittelyyn ja analysointiin on käytetty laskennallisesti älykkäitä menetelmiä, josta juontuu työn otsikko; älykkäät informaatiopalvelut. Väitöskirja pohjautuu viiteen artikkeliin, joissa osoitetaan tiedonjalostusprosessin toimivuus erilaisissa tapauksissa ja esitetään esimerkkejä kuhunkin prosessin vaiheeseen soveltuvista laskennallisista menetelmistä. Artikkeleissa on kuvattu matkailualueen kävijämäärien ennakointiin ja kotitalouksien sähköenergian kulutuksen pienentämiseen liittyvät informaatiopalvelut sekä analyysi selluprosessissa käytettävien kemikaalien määrän pienentämiseksi. Näistä saadut kokemukset ja tulokset on yleistetty älykkään informaatiopalvelun konseptiksi. Väitöskirjan toisena tavoitteena on rohkaista organisaatioita hyödyntämään tietovarantoja aiempaa tehokkaammin ja monipuolisemmin sekä rohkaista tarkastelemaan myös oman organisaation ulkopuolelta saatavien tietolähteiden käyttämistä. Toisaalta, uudenlaisten informaatiopalvelujen ja liiketoimintojen kehittämistä tukisi julkisilla varoilla kerättyjen, ja osin yritysten hallussa olevien, tietovarantojen julkaiseminen avoimiksi. data integration data mining electricity environmental informatics intelligent information services k-means multilayer perceptron pulp self-organizing map informaatiopalvelu laskentamenetelmät mallintaminen matkailu prosessitekniikka sellu sähköenergia tiedonlouhinta tietojärjestelmät ympäristö
268	Query-Time Data Integration Eberius, Julian 16 December 2015 (has links) (PDF) Today, data is collected in ever increasing scale and variety, opening up enormous potential for new insights and data-centric products. However, in many cases the volume and heterogeneity of new data sources precludes up-front integration using traditional ETL processes and data warehouses. In some cases, it is even unclear if and in what context the collected data will be utilized. Therefore, there is a need for agile methods that defer the effort of integration until the usage context is established. This thesis introduces Query-Time Data Integration as an alternative concept to traditional up-front integration. It aims at enabling users to issue ad-hoc queries on their own data as if all potential other data sources were already integrated, without declaring specific sources and mappings to use. Automated data search and integration methods are then coupled directly with query processing on the available data. The ambiguity and uncertainty introduced through fully automated retrieval and mapping methods is compensated by answering those queries with ranked lists of alternative results. Each result is then based on different data sources or query interpretations, allowing users to pick the result most suitable to their information need. To this end, this thesis makes three main contributions. Firstly, we introduce a novel method for Top-k Entity Augmentation, which is able to construct a top-k list of consistent integration results from a large corpus of heterogeneous data sources. It improves on the state-of-the-art by producing a set of individually consistent, but mutually diverse, set of alternative solutions, while minimizing the number of data sources used. Secondly, based on this novel augmentation method, we introduce the DrillBeyond system, which is able to process Open World SQL queries, i.e., queries referencing arbitrary attributes not defined in the queried database. The original database is then augmented at query time with Web data sources providing those attributes. Its hybrid augmentation/relational query processing enables the use of ad-hoc data search and integration in data analysis queries, and improves both performance and quality when compared to using separate systems for the two tasks. Finally, we studied the management of large-scale dataset corpora such as data lakes or Open Data platforms, which are used as data sources for our augmentation methods. We introduce Publish-time Data Integration as a new technique for data curation systems managing such corpora, which aims at improving the individual reusability of datasets without requiring up-front global integration. This is achieved by automatically generating metadata and format recommendations, allowing publishers to enhance their datasets with minimal effort. Collectively, these three contributions are the foundation of a Query-time Data Integration architecture, that enables ad-hoc data search and integration queries over large heterogeneous dataset collections. Datenintegration Ad-hoc Integration Top-k Anfrageverarbeitung Webdaten data integration ad-hoc integration top-k query processing web data ddc:004 rvk:ST 270
269	Vers la construction d'un référentiel géographique ancien : un modèle de graphe agrégé pour intégrer, qualifier et analyser des réseaux géohistoriques / Towards the construction of a geohistorical reference database : an aggregated graph to integrate, qualify and analyze geohistorical networks Costes, Benoît 04 November 2016 (has links) Les historiens et archéologues ont efficacement mis à profit les travaux réalisés dans le domaine des SIG pour répondre à leurs propres problématiques. Pour l'historien, un Système d’Information Géographique est avant tout un outil de compréhension des phénomènes sociaux.De nombreuses sources géohistoriques sont aujourd'hui mises à la disposition des chercheurs: plans anciens, bottins, etc. Le croisement de ces sources d'informations diverses et hétérogènes soulève de nombreuses questions autour des dynamiques urbaines.Mais les données géohistoriques sont par nature imparfaites, et pour pouvoir être exploitées, elles doivent être spatialisées et qualifiées.L'objectif de cette thèse est d'apporter une solution à ce verrou par la production de données anciennes de référence. En nous focalisant sur le réseau des rues de Paris entre la fin du XVIIIe et la fin du XIXe siècles, nous proposons plus précisément un modèle multi-représentations de données agrégées permettant, par confrontation d'observations homologues dans le temps, de créer de nouvelles connaissances sur les imperfections des données utilisées et de les corriger. Nous terminons par tester le rôle de référentiel géohistorique des données précédemment qualifiées et enrichies en spatialisant et intégrant dans le modèle de nouvelles données géohistoriques de types variés (sociales et spatiales), en proposant de nouvelles approches d'appariement et de géocodage / The increasing availability of geohistorical data, particularly through the development of collaborative projects is a first step towards the design of a representation of space and its changes over time in order to study its evolution, whether social, administrative or topographical.Geohistorical data extracted from various and heterogeneous sources are highly inaccurate, uncertain or inexact according to the existing terminology. Before being processed, such data should be qualified and spatialized.In this thesis, we propose a solution to this issue by producing reference data. In particular, we focus on Paris historical street networks and its evolution between the end of the XVIIIth and the end of the XIXth centuries.Our proposal is based on a merged structure of multiple representations of data capable of modelling spatial networks at different times, providing tools such as pattern detection in order to criticize, qualify and eventually correct data and sources without using ground truth data but the comparison of data with each other through the merging process.Then, we use the produced reference data to spatialize and integrate other geohistorical data such as social data, by proposing new approaches of data matching and geocoding Réseaux géohistoriques Théorie des graphes Imperfections Intégration Référentiel géohistorique Appariement Geohistorical networks Graph theory Data imperfections Data integration Geohistorical referential Data matching
270	Representation of thermal building simulation in virtual reality for sustainable building / Représentation de simulation thermique en réalité virtuelle pour la construction durable Nugraha Bahar, Yudi 15 April 2014 (has links) La sobriété énergétique du bâti devient aujourd’hui un élément clé en phase de conception. L’intégration en amont d’outils numériques, notamment la réalité virtuelle (RV). Nous a conduit, dans cette recherche, à nous concentrer sur les résultats de simulations thermiques visualisées dans un environnement virtuel. La contribution est portée sur la représentation et la perception dans un EV de ces données issues de simulation. Nous nous limitons à la caractérisation de l’efficacité énergétique en processus de conception. Cette étude vise la prédiction des performances thermiques dans des systèmes de réalité virtuelle. Les problématiques de formats de données et de flux de travail entre la modélisation classique CAO (Conception Assistée par Ordinateur), les simulations thermiques, et la visualisation immersive sont également traitées. Il existe plusieurs outils logiciels dédiés à la représentation de simulations thermiques en EV et le premier enjeu de ces travaux fut de sélectionner l’outil approprié. De nombreux modeleurs CAO, logiciels de simulation thermique et outils de RV sont disponibles ; ils diffèrent notamment par leurs approches (fonctionnalités et environnement logiciel). La problématique d’interopérabilité (formats d’échange entre les outils logiciels) requiert de bâtir un flux de travail structuré. Les difficultés d’intégration entre outils CAO et outils de simulation, et les barrières au transfert vers des systèmes de réalité virtuelle sont également décrits. Il est apparu pertinent d'utiliser le Building Information Model (BIM) de plus en plus utilisé parmi les acteurs de l’architecture, ingénierie et construction (AIC). Puis nous avons poursuivi par l’évaluation des tendances actuelles en matière de représentation de données thermiques issues de simulation dans un EV, par la création de méthode de transfert de données de sorte à les intégrer au flux de travail. Après un état de l’art sur la simulation thermique et une évaluation des travaux connexes, nous décrivons l'application, la méthode et les outils pour parvenir à nos objectifs. Une proposition de procédé de transfert de données et de présentation de données en EV est formulée et évaluée. Le flux d’échanges de données s’effectue en trois phases, de sorte à optimiser les passages entre la CAO, le calcul thermique et la réalité virtuelle. La représentation des données dans l’EV est réalisée grâce à une visualisation immersive et interactive. Une expérimentation a été conduite de sorte à évaluer des sujets : Le scénario consistait en une visualisation interactive de données thermiques selon 4 modalités en environnement virtuel. L’interface développée pour l’interaction a été voulue intuitive et conviviale. L’application contient un modèle 3D réaliste du projet (salle Gunzo) dans deux configurations : état actuel et état rénové. Les données thermiques sont restituées selon plusieurs métaphores de représentation. L’expérimentation développe une approche qui associe au scénario de rénovation virtuelle une configuration matérielle/logicielle. Les résultats obtenus se concentrent sur la visualisation, l'interaction et le retour subjectif des utilisateurs. Quatre métaphores de visualisation sont testées et leur évaluation porte notamment sur deux critères : leurs capacités à restituer les résultats de simulation thermique ; le degré d’interaction et la perception de l’utilisateur des impacts de ses actions. L’évaluation subjective révèle les préférences des utilisateurs et montre que les métaphores de représentation ont une influence sur la précision et l’efficience de l’interprétation des données. Ces travaux montrent que les techniques de représentation et de visualisation de données de simulation ont un effet sur la pertinence de leur interprétation. La méthode décrite spécifie les modalités de transfert de la donnée depuis la phase conception jusqu’aux outils et systèmes de RV. Sa souplesse lui permet d’être transposée à tout type de projet (…) / The importance of energy efficiency as well as integration of advances in sustainable buildingdesign and VR technology have lead this research to focus on thermal simulation results visualized in avirtual environment (VE). The emphasis is on the representation of thermal building simulation (TBS)results and on the perception of thermal data simulated in a VE. The current application of the designprocess through energy efficiency in VR systems is limited mostly to building performance predictionsand design review, as the issue of the data formats and the workflow used for 3D modeling, thermalcalculation and VR visualization.Different applications and tools involved to represent TBS in VE are become the challenge ofthis work. Many 3D modeller, thermal simulation tools and VR tools are available and they are differ intheir function and platform. Issues of data format exchange, appropriate tools and equipments from thissituation require an interoperability solution that needs to be structured in a workflow method.Significances and barriers to integration design with CAD and TBS tools are also outlined in order totransfer the model to VR system. Therefore, the idea then is to use Building Information Model (BIM)extensively used in Architecture, Engineering and Construction (AEC) community. It then continued toevaluate the current trends for TBS representation in VE, to create data transfer method, and tointegrate them in the workflow. After a review in thermal simulation and an evaluation of related works,we specify the application, method and tools for our objectives.An application of a method of data transfer and presentation of data in VE are formulated andtested. This effort conduct using a specific data workflow which performed the data transfer through 3phases. This relies on the smooth exchange of data workflow between CAD tools, thermal calculationtools and VR tools. Presentation of data in VE is conducted through immersive visualization andintuitive interaction. An experiment scenario of a thermal simulation in VR system was created tointeractively visualize the results in the immersion room and tested by some respondents. The systeminclude with friendly interface for interaction. It presents a realistic 3D model of the project (Gunzoroom) in existing condition and renovated version, and their TBS results visualized in somevisualization metaphor. In the experiment, the method which bundled in an application brings togetherwithin a couple of virtual scenario and a software/hardware solution. The obtained results concentrateon visualization, interaction and its feedback. Some visualization metaphor are tested and evaluated topresent more informative TBS results where the user can interact and perceive the impact of theiraction.Evaluation of the application prototype showed various levels of user satisfaction, andimprovements in the accuracy and efficiency of data interpretation. The research has demonstrated it ispossible to improve the representation and interpretation of building performance data, particularly TBSresults using visualization techniques. Using specific method, the data flow that starts from the designprocess is completely and accurately channelled to the VR system. The method can be used with anykind of construction project and, being a flexible application, accepts new data when necessary,allowing for a comparison between the planned and the constructed. Simulation thermique du bâtiment BIM 3D La réalité virtuelle L'intégration des données La visualisation scientifique Thermal building simulation BIM Three-dimensional (3D) Virtual reality Data integration Scientific visualization 006.8

Search results