11 |
Duomenų integravimas panaudojant XML / XML - based data integrationKazlauskas, Marius 07 January 2005 (has links)
Using different information systems and correct communication between them are very important problem at this time. Modern companies have a large number of major applications that take care of running the business. At different times, different people used different technologies to write these applications for different purposes. Data resources can be at many different forms in organization: relational databases, object databases, XML documents and etc. Databases can operate with different systems and use distinct software. This problem must be solved by enterprise application integration (EAI) systems. The project analyzes data level EAI. This type of integration is relatively inexpensive and it doesn’t need to incur the expense of changing, testing, and deploying the application. XML can be a powerful ally for data integration. The main purpose of this project was to analyze usage of XML technologies possibilities for data integration. The problem solved is how to use the same XML flow for a several purposes: the change of another database(s), transforming to a HTML document and forming a PDF document for printing. These problems are analyzed and solved in the particular range – organization of management of public utility. Here is designed and realized an information system.
|
12 |
Studijų modulių planavimo ir valdymo sistema / Planning and management system of studies modulesJurčikonis, Dainius 16 January 2005 (has links)
In this work analyzed some standards and technologies according to data transfer processes between different data sources. New database development and usage requires legacy database data integration into new databases. The common integration structure must be created for this purpose. During integration we should pay attention to name conflicts. XML is being used because of its usage flexibility and simplicity. Data exchange using XML enables to ensure data storage for the future regenerative possibility. System realization was used in KTU Computer Cathedral activity.
|
13 |
SociQL: a query language for the social webSerrano Suarez, Diego Fernando Unknown Date
No description available.
|
14 |
SociQL: a query language for the social webSerrano Suarez, Diego Fernando 06 1900 (has links)
Social network sites are becoming increasingly popular and useful as well as relevant means for serious social research. However, despite their user appeal and wide adoption, the current generation of sites are hard to query and explore, offering limited views of local network neighbourhoods. Moreover these sites are disconnected islands of information due to application and interface differences. We describe SociQL: a query language along with a prototype implementation that enables for the representation, querying and exploration of disparate social networks. Unlike generic web query languages, SociQL is designed to support the examination of sociological questions, incorporating social theory and integration of networks that form a single unified source of information. The thesis discusses the design and rationale for the elements in the language, and reports on our experiences in querying real social network sites with it.
|
15 |
Keyword Join: Realizing Keyword Search for Information IntegrationYu, Bei, Liu, Ling, Ooi, Beng Chin, Tan, Kian Lee 01 1900 (has links)
Information integration has been widely addressed over the last several decades. However, it is far from solved due to the complexity of resolving schema and data heterogeneities. In this paper, we propose out attempt to alleviate such difficulty by realizing keyword search functionality for integrating information from heterogeneous databases. Our solution does not require predefined global schema or any mappings between databases. Rather, it relies on an operator called keyword join to take a set of lists of partial answers from different data sources as input, and output a list of results that are joined by the tuples from input lists based on predefined similarity measures as integrated results. Our system allows source databases remain autonomous and the system to be dynamic and extensible. We have tested our system with real dataset and benchmark, which shows that our proposed method is practical and effective. / Singapore-MIT Alliance (SMA)
|
16 |
Schema quality analysis in a data integration systemBATISTA, Maria da Conceição Moraes 31 January 2008 (has links)
Made available in DSpace on 2014-06-12T15:49:12Z (GMT). No. of bitstreams: 1
license.txt: 1748 bytes, checksum: 8a4605be74aa9ea9d79846c1fba20a33 (MD5)
Previous issue date: 2008 / Conselho Nacional de Desenvolvimento Científico e Tecnológico / Qualidade da Informação (QI) tem se tornado um aspecto crítico nas organizações e em
pesquisas da área de sistemas de informação. Informações de pouca qualidade podem ter
impactos negativos na efetividade de uma organização. O crescimento do uso de data
warehouses e acesso direto de gerentes e usários a informações obtidas de várias fontes
contribuíram para o crescimento da necessidade de qualidade nas informações das empresas.
A noção de QI em sistemas de informação emergiu nos últimos e vem sendo alvo de interesse
cada vez maior. Não existe ainda um acordo comum acerca de uma definição da QI. Apenas
um consenso de que tratase de um conceito de adequação ao uso . A informação é
considerada apropriada para o uso dentro da perspectiva dos requisitos e necessidades de um
usuário, ou seja, a qualidade da informação depende de sua utilidade.
O acesso integrado a informações distribuídas em múltiplas fontes de dados heterogêneas,
distribuídas e autônomas é um problema importante a ser resolvido em muitos domínios de
aplicações. Tipicamente existem algumas formas de se obter respostas a consultas globais,
sobre dados em fontes diferentes com diferentes combinações. entretanto é bastante custoso
obter todas as respostas possíveis. Enquanto muita pesquisa tem sido feita em relação a
processamento de consultas e seleção de planos com critérios de custo, pouco se conhece com
relação ao problema de incorporar aspectos de QI em esquemas globais de sistemas de
integração de dados.
Neste trabalho, nós propomos a análise da QI em um sistema de integração de dados, mais
especificamente a qualidade dos esquemas do sistema. O nosso principal objetivo é melhorar a
qualidade da execução das consultas. Nossa proposta baseiasse na hipótese de que uma
alternativa de otimizar o processamento de consultas seria a construção de esquemas com
altos escores de QI.
Assim, o foco deste trabalho está no desenvolvimento de mecanismos de análise da QI voltados
esquemas de integração de dados, especialmente o esquema global. Inicialmente, nós
construímos uma lista de critérios de QI e relacionamos estes critérios com os elementos
existentes em sistemas de integração de dados. Em seguida, direcionamos o foco para o
esquema integrado e especificamos formalmente critérios de qualidade de esquemas
minimalidade, completude do esquema e consistência de tipo. Também especificamos um
algoritmo de execução de ajustes de forma a melhorar a minimalidade e algoritmos para medir a
consistência de tipo nos esquemas. Com esses experimentos conseguimos mostrar que o
tempo de execução de uma consulta em um sistema de integração de dados pode diminuir se
esta consulta for submetida a um esquema com escores altos de minimalidade e consistência
de tipo
|
17 |
Open City Data PipelineBischof, Stefan, Kämpgen, Benedikt, Harth, Andreas, Polleres, Axel, Schneider, Patrik 02 1900 (has links) (PDF)
Statistical data about cities, regions and at country level is collected for various purposes and from various institutions. Yet, while
access to high quality and recent such data is crucial both for decision makers as well as for the public, all to often such collections of
data remain isolated and not re-usable, let alone properly integrated. In this paper we present the Open City Data Pipeline, a focused
attempt to collect, integrate, and enrich statistical data collected at city level worldwide, and republish this data in a reusable manner
as Linked Data. The main feature of the Open City Data Pipeline are: (i) we integrate and cleanse data from several sources in a
modular and extensible, always up-to-date fashion; (ii) we use both Machine Learning techniques as well as ontological reasoning
over equational background knowledge to enrich the data by imputing missing values, (iii) we assess the estimated accuracy of such
imputations per indicator. Additionally, (iv) we make the integrated and enriched data available both in a we browser interface and as
machine-readable Linked Data, using standard vocabularies such as QB and PROV, and linking to e.g. DBpedia.
Lastly, in an exhaustive evaluation of our approach, we compare our enrichment and cleansing techniques to a preliminary version
of the Open City Data Pipeline presented at ISWC2015: firstly, we demonstrate that the combination of equational knowledge and
standard machine learning techniques significantly helps to improve the quality of our missing value imputations; secondly, we
arguable show that the more data we integrate, the more reliable our predictions become. Hence, over time, the Open City Data
Pipeline shall provide a sustainable effort to serve Linked Data about cities in increasing quality. / Series: Working Papers on Information Systems, Information Business and Operations
|
18 |
On techniques for pay-as-you-go data integration of linked dataChristodoulou, Klitos January 2015 (has links)
It is recognised that nowadays, users interact with large amounts of data that exist in disparate forms, and are stored under different settings. Moreover, it is true that the amount of structured and un-structured data outside a single well organised data management system is expanding rapidly. To address the recent challenges of managing large amounts of potentially distributed data, the vision of a dataspace was introduced. This data management paradigm aims at reducing the complexity behind the challenges of integrating heterogeneous data sources. Recently, efforts by the Linked Data (LD) community gave rise to a Web of Data (WoD) that interweaves with the current Web of documents in a way that it is useful for data consumption by both humans and computational agents. On the WoD, datasets are structured under a common data model and published as Web resources following a simple set of guidelines that enables them to be linked with other pieces of data, as well as, to be annotated with useful meta data that help determine their semantics. The WoD is an evolving open ecosystem including specialist publishers as well as community efforts aiming at re-publishing isolated databases as LD on the WoD, and annotating them with meta data. The WoD raises new opportunities and challenges. However, currently it mostly relies on manual efforts for integrating the large amounts of heterogeneous data sources on the WoD. This dissertation makes the case that several techniques from the dataspaces research area (aiming at on-demand integration of data sources in a pay-as-you-go fashion) can support the integration of heterogeneous WoD sources. In so doing, this dissertation explores the opportunities and identifies the challenges of adapting existing pay-as-you-go data integration techniques in the context of LD. More specifically, this dissertation makes the following contributions: (1) a case-study for identifying the challenges when existing pay-as-you-go data integration techniques are applied in a setting where data sources are LD; (2) a methodology that deals with the 'schema-less' nature of LD sources by automatically inferring a conceptual structure from a given RDF graph thus enabling downstream tasks, such as the identification of matches and the derivation of mappings, which are, both, essential for the automatic bootstrapping of a dataspace; and (3) a well-defined, principled methodology that builds on a Bayesian inference technique for reasoning under uncertainty to improve pay-as-you-go integration. Although the developed methodology is generic in being able to reason with different hypothesis, its effectiveness has only been explored on reducing the uncertain decisions made by string-based matchers during the matching stage of a dataspace system.
|
19 |
Leveraging big data resources and data integration in biology: applying computational systems analyses and machine learning to gain insights into the biology of cancersSinkala, Musalula 24 February 2021 (has links)
Recently, many "molecular profiling" projects have yielded vast amounts of genetic, epigenetic, transcription, protein expression, metabolic and drug response data for cancerous tumours, healthy tissues, and cell lines. We aim to facilitate a multi-scale understanding of these high-dimensional biological data and the complexity of the relationships between the different data types taken from human tumours. Further, we intend to identify molecular disease subtypes of various cancers, uncover the subtype-specific drug targets and identify sets of therapeutic molecules that could potentially be used to inhibit these targets. We collected data from over 20 publicly available resources. We then leverage integrative computational systems analyses, network analyses and machine learning, to gain insights into the pathophysiology of pancreatic cancer and 32 other human cancer types. Here, we uncover aberrations in multiple cell signalling and metabolic pathways that implicate regulatory kinases and the Warburg effect as the likely drivers of the distinct molecular signatures of three established pancreatic cancer subtypes. Then, we apply an integrative clustering method to four different types of molecular data to reveal that pancreatic tumours can be segregated into two distinct subtypes. We define sets of proteins, mRNAs, miRNAs and DNA methylation patterns that could serve as biomarkers to accurately differentiate between the two pancreatic cancer subtypes. Then we confirm the biological relevance of the identified biomarkers by showing that these can be used together with pattern-recognition algorithms to infer the drug sensitivity of pancreatic cancer cell lines accurately. Further, we evaluate the alterations of metabolic pathway genes across 32 human cancers. We find that while alterations of metabolic genes are pervasive across all human cancers, the extent of these gene alterations varies between them. Based on these gene alterations, we define two distinct cancer supertypes that tend to be associated with different clinical outcomes and show that these supertypes are likely to respond differently to anticancer drugs. Overall, we show that the time has already arrived where we can leverage available data resources to potentially elicit more precise and personalised cancer therapies that would yield better clinical outcomes at a much lower cost than is currently being achieved.
|
20 |
Mashup-Werkzeuge zur Ad-hoc-Datenintegration im WebAumüller, David, Thor, Andreas 05 November 2018 (has links)
No description available.
|
Page generated in 0.133 seconds