211 |
A Common Programming Interface for Managed Heterogeneous Data AnalysisLuong, Johannes 28 July 2021 (has links)
The widespread success of data analysis in a growing number of application domains has lead to the development of a variety of purpose build data processing systems. Today, many organizations operate whole fleets of different data related systems. Although this differentiation has good reasons there is also a growing need to create holistic perspectives that cut across the borders of individual systems. Application experts that want to create such perspectives are confronted with a variety of programming interfaces, data formats, and the task to combine available systems in an efficient manner. These issues are generally unrelated to the application domain and require a specialized set of skills. As a consequence, development is slowed down and made more expensive which stifles exploration and innovation. In addition, the direct use of specialized system interfaces can couple application code to specific processing systems.
In this dissertation, we propose the data processing platform DataCalc which presents users with a unified application oriented programming interface and which automatically executes this interface in an efficient manner on a variety of processing systems. DataCalc offers a managed environment for data analyses that enables domain experts to concentrate on their application logic and decouples code from specific processing technology. The basis of this managed processing environment are the high-level domain oriented program representation DCIL and a flexible and extensible cost based optimization component. In addition to traditional up-front optimization, the optimizer also supports dynamic re-optimization of partially executed DCIL programs. This enables the system to benefit from dynamic information that only becomes available during execution of queries. DataCalc assigns workloads to available processing systems using a fine grained task scheduling model to enable efficient exploitation of available resources.
In the second part of the dissertation we present a prototypical implementation of the DataCalc platform which includes connectors for the relational DBMS PostgreSQL, the document store MongoDB, the graph database Neo4j, and for the custom build PyProc processing system. For the evaluation of this prototype we have implemented an extended application scenario. Our experiments demonstrate that DataCalc is able to find and execute efficient execution strategies that minimize cross system data movement. The system achieves much better results than a naive
implementation and it comes close to the performance of a hand-optimized solution. Based on these findings we are confident to conclude that the DataCalc platform architecture provides an excellent environment for cross domain data analysis on a heterogeneous federated processing architecture.
|
212 |
Datová integrace mezi databázovými systémy / Data Integration between Database SystemsPapež, Zdeněk January 2010 (has links)
This master´s thesis deals with data integration that is used for data transfer within various database systems in both directions - data migration and replication. We become familiar with the technologies of distributed databases. In detail the system of health care providers is described and particular tables involved into its data integration are explored. For the project execution the proposal for integration of this system is created and whereupon following implementation is described.
|
213 |
Using DevOps principles to continuously monitor RDF data qualityMeissner, Roy, Junghanns, Kurt 01 August 2017 (has links)
One approach to continuously achieve a certain data quality level is to use an integration pipeline that continuously checks and monitors the quality of a data set according to defined metrics. This approach is inspired by Continuous Integration pipelines, that have been introduced in the area of software development and DevOps to perform continuous source code checks. By investigating in possible tools to use and discussing the specific requirements for RDF data sets, an integration pipeline is derived that joins current approaches of the areas of software development and semantic web as well as reuses existing tools. As these tools have not been built explicitly for CI usage, we evaluate their usability and propose possible workarounds and improvements. Furthermore, a real world usage scenario is discussed, outlining
the benefit of the usage of such a pipeline.
|
214 |
Connecting GOMMA with STROMA: an approach for semantic ontology mapping in the biomedical domainMöller, Maximilian 13 February 2018 (has links)
This thesis establishes a connection between GOMMA and STROMA – both are tools of ontology processing. Consequently, a new workflow of denoting a set of correspondences with five semantic relation types has been implemented. Such a rich denotation is scarcely discussed within the literature. The evaluation of the denotation shows that trivial correspondences are easy to recognize (tF > 90). The challenge is the denotation of non-trivial types ( 30 < ntF < 70). A prerequisite of the implemented workflow is the extraction of semantic relations between concepts. These relations represent additional background knowledge for the enrichment tool STROMA and are integrated to the repository SemRep which is accessed by this tool. Thus, STROMA is able to calculate a semantic type more precisely. UMLS was chosen as a biomedical knowledge source because it subsumes many different ontologies of this domain and thus, it represents a rich resource. Nevertheless, only a small set of relations met the requirements which are imposed to SemRep relations. Further studies may analyze whether there is an appropriate way to integrate the missing relations as well. The connection of GOMMA with STROMA allows the semantic enrichment of a biomedical mapping. As a consequence, this thesis enlightens two subjects of research. First, STROMA had been tested with general ontologies, which models common sense knowledge. Within this thesis, STROMA was applied to domain ontologies. Studies have shown that overall, STROMA was able to treat such ontologies as well. However, some strategies for the enrichment process are based on assumption which are misleading in the biomedical domain. Consequently, further strategies are suggested in this thesis which might improve the type denotation. These strategies may lead to an optimization of STROMA for biomedical data sets. A more thorough analysis will review their scope, also beyond the biomedical domain. Second, the established connection may lead to deeper investigations about advantages of semantic enrichment in the biomedical domain as an enriched mapping is returned. Despite heterogeneity of source and target ontology, such a mapping results in an improved interoperability at a finer level of granularity. The utilization of semantically rich correspondences in the biomedical domain is a worthwhile focus for future research.
|
215 |
Open(Geo-)Data - ein Katalysator für die Digitalisierung in der Landwirtschaft?Nölle, Olaf 15 November 2016 (has links)
(Geo-)Daten integrieren, analysieren und visualisieren - Wissen erschließen und in Entscheidungsprozesse integrieren – dafür steht Disy seit knapp 20 Jahren!
|
216 |
Knowledge Integration and Representation for Biomedical AnalysisAlachram, Halima 04 February 2021 (has links)
No description available.
|
217 |
Geometrische und stochastische Modelle für die integrierte Auswertung terrestrischer Laserscannerdaten und photogrammetrischer Bilddaten: Geometrische und stochastische Modelle für die integrierte Auswertung terrestrischer Laserscannerdaten und photogrammetrischer BilddatenSchneider, Danilo 13 November 2008 (has links)
Terrestrische Laserscanner finden seit einigen Jahren immer stärkere Anwendung in der Praxis und ersetzen
bzw. ergänzen bisherige Messverfahren, oder es werden neue Anwendungsgebiete erschlossen. Werden
die Daten eines terrestrischen Laserscanners mit photogrammetrischen Bilddaten kombiniert, ergeben sich
viel versprechende Möglichkeiten, weil die Eigenschaften beider Datentypen als weitestgehend komplementär
angesehen werden können: Terrestrische Laserscanner erzeugen schnell und zuverlässig dreidimensionale Repräsentationen
von Objektoberflächen von einem einzigen Aufnahmestandpunkt aus, während sich zweidimensionale
photogrammetrische Bilddaten durch eine sehr gute visuelle Qualität mit hohem Interpretationsgehalt
und hoher lateraler Genauigkeit auszeichnen. Infolgedessen existieren bereits zahlreiche Ansätze, sowohl
software- als auch hardwareseitig, in denen diese Kombination realisiert wird. Allerdings haben die
Bildinformationen bisher meist nur ergänzenden Charakter, beispielsweise bei der Kolorierung von Punktwolken
oder der Texturierung von aus Laserscannerdaten erzeugten Oberflächenmodellen. Die konsequente Nutzung
der komplementären Eigenschaften beider Sensortypen bietet jedoch ein weitaus größeres Potenzial.
Aus diesem Grund wurde im Rahmen dieser Arbeit eine Berechnungsmethode – die integrierte Bündelblockausgleichung
– entwickelt, bei dem die aus terrestrischen Laserscannerdaten und photogrammetrischen
Bilddaten abgeleiteten Beobachtungen diskreter Objektpunkte gleichberechtigt Verwendung finden können.
Diese Vorgehensweise hat mehrere Vorteile: durch die Nutzung der individuellen Eigenschaften beider Datentypen
unterstützen sie sich gegenseitig bei der Bestimmung von 3D-Objektkoordinaten, wodurch eine höhere
Genauigkeit erreicht werden kann. Alle am Ausgleichungsprozess beteiligten Daten werden optimal zueinander
referenziert und die verwendeten Aufnahmegeräte können simultan kalibriert werden.
Wegen des (sphärischen) Gesichtsfeldes der meisten terrestrischen Laserscanner von 360° in horizontaler
und bis zu 180° in vertikaler Richtung bietet sich die Kombination mit Rotationszeilen-Panoramakameras
oder Kameras mit Fisheye-Objektiv an, weil diese im Vergleich zu zentralperspektiven Kameras deutlich größere
Winkelbereiche in einer Aufnahme abbilden können. Grundlage für die gemeinsame Auswertung terrestrischer
Laserscanner- und photogrammetrischer Bilddaten ist die strenge geometrische Modellierung der Aufnahmegeräte.
Deshalb wurde für terrestrische Laserscanner und verschiedene Kameratypen ein geometrisches
Modell, bestehend aus einem Grundmodell und Zusatzparametern zur Kompensation von Restsystematiken,
entwickelt und verifiziert. Insbesondere bei der Entwicklung des geometrischen Modells für Laserscanner
wurden verschiedene in der Literatur beschriebene Ansätze berücksichtigt. Dabei wurde auch auf von Theodoliten
und Tachymetern bekannte Korrekturmodelle zurückgegriffen.
Besondere Bedeutung innerhalb der gemeinsamen Auswertung hat die Festlegung des stochastischen Modells.
Weil verschiedene Typen von Beobachtungen mit unterschiedlichen zugrunde liegenden geometrischen
Modellen und unterschiedlichen stochastischen Eigenschaften gemeinsam ausgeglichen werden, muss den
Daten ein entsprechendes Gewicht zugeordnet werden. Bei ungünstiger Gewichtung der Beobachtungen können
die Ausgleichungsergebnisse negativ beeinflusst werden. Deshalb wurde die integrierte Bündelblockausgleichung
um das Verfahren der Varianzkomponentenschätzung erweitert, mit dem optimale Beobachtungsgewichte
automatisch bestimmt werden können. Erst dadurch wird es möglich, das Potenzial der Kombination
terrestrischer Laserscanner- und photogrammetrischer Bilddaten vollständig auszuschöpfen.
Zur Berechnung der integrierten Bündelblockausgleichung wurde eine Software entwickelt, mit der vielfältige
Varianten der algorithmischen Kombination der Datentypen realisiert werden können. Es wurden zahlreiche
Laserscannerdaten, Panoramabilddaten, Fisheye-Bilddaten und zentralperspektive Bilddaten in mehreren
Testumgebungen aufgenommen und unter Anwendung der entwickelten Software prozessiert. Dabei wurden
verschiedene Berechnungsvarianten detailliert analysiert und damit die Vorteile und Einschränkungen der
vorgestellten Methode demonstriert. Ein Anwendungsbeispiel aus dem Bereich der Geologie veranschaulicht
das Potenzial des Algorithmus in der Praxis. / The use of terrestrial laser scanning has grown in popularity in recent years, and replaces and complements
previous measuring methods, as well as opening new fields of application. If data from terrestrial laser
scanners are combined with photogrammetric image data, this yields promising possibilities, as the properties
of both types of data can be considered mainly complementary: terrestrial laser scanners produce fast and reliable
three-dimensional representations of object surfaces from only one position, while two-dimensional
photogrammetric image data are characterised by a high visual quality, ease of interpretation, and high lateral
accuracy. Consequently there are numerous approaches existing, both hardware- and software-based, where
this combination is realised. However, in most approaches, the image data are only used to add additional
characteristics, such as colouring point clouds or texturing object surfaces generated from laser scanner data.
A thorough exploitation of the complementary characteristics of both types of sensors provides much more
potential.
For this reason a calculation method – the integrated bundle adjustment – was developed within this thesis,
where the observations of discrete object points derived from terrestrial laser scanner data and photogrammetric
image data are utilised equally. This approach has several advantages: using the individual characteristics
of both types of data they mutually strengthen each other in terms of 3D object coordinate determination,
so that a higher accuracy can be achieved; all involved data sets are optimally co-registered; and
each instrument is simultaneously calibrated.
Due to the (spherical) field of view of most terrestrial laser scanners of 360° in the horizontal direction
and up to 180° in the vertical direction, the integration with rotating line panoramic cameras or cameras with
fisheye lenses is very appropriate, as they have a wider field of view compared to central perspective cameras.
The basis for the combined processing of terrestrial laser scanner and photogrammetric image data is the
strict geometric modelling of the recording instruments. Therefore geometric models, consisting of a basic
model and additional parameters for the compensation of systematic errors, was developed and verified for
terrestrial laser scanners and different types of cameras. Regarding the geometric laser scanner model, different
approaches described in the literature were considered, as well as applying correction models known
from theodolites and total stations.
A particular consideration within the combined processing is the definition of the stochastic model. Since
different types of observations with different underlying geometric models and different stochastic properties
have to be adjusted simultaneously, adequate weights have to be assigned to the measurements. An unfavourable
weighting can have a negative influence on the adjustment results. Therefore a variance component estimation
procedure was implemented in the integrated bundle adjustment, which allows for an automatic determination
of optimal observation weights. Hence, it becomes possible to exploit the potential of the combination
of terrestrial laser scanner and photogrammetric image data completely.
For the calculation of the integrated bundle adjustment, software was developed allowing various algorithmic
configurations of the different data types to be applied. Numerous laser scanner, panoramic image, fisheye
image and central perspective image data were recorded in different test fields and processed using the
developed software. Several calculation alternatives were analysed, demonstrating the advantages and limitations
of the presented method. An application example from the field of geology illustrates the potential of the
algorithm in practice.
|
218 |
Reportovací nástroj pre monitorovanie stavu energetickej siete v reálnom časePoľakovský, Michal January 2019 (has links)
The goal of this diploma thesis is to develop a tool for monitoring electricity grid state in real-time. The theoretical part of the thesis covers real-time power grid monitoring and reporting knowledge, solutions and standards currently existing on the market and answers questions about what, how and why it should be monitored. This thesis describes TSCNET as electricity grid coordinator in Central Europe and Unicorn Systems and its current solutions for this industry. Also, modern approaches for development of the real-time BI are mentioned. Customised solution for the customer is presented. Chapter Material and methods introduces multiple tools used for the development of such a tool – Pentaho, JMS, MOM, AMICA, RIS, Vue etc. BPMN is used for ETL process modelling. Results describe the tool implementation based on the application integration principles, available solutions mentioned in methods part of this thesis and feasibility study. In the next part, physical model is created, communication interface is set, ETL diagrams are provided and all is implemented to generate CSV file output with grid state data. In addition, presentation layer is developed using Vue, NodeJS and client-server sockets. This layer visualises real-time tables containing grid state contingency and load-flow analysis results and also allows managers to regulate the newly implemented tool. Even though this tool presented in this thesis is deeply customised and developed for business processes of a specific company, it still can be the clue for further research of the real-time solutions and tools.
|
219 |
Spatial Transcriptomics Analysis Reveals Transcriptomic and Cellular Topology Associations in Breast and Prostate CancersAlsaleh, Lujain 05 1900 (has links)
Indiana University-Purdue University Indianapolis (IUPUI) / Background: Cancer is the leading cause of death worldwide and as a result is one of the most studied topics in public health. Breast cancer and prostate cancer are the most common cancers among women and men respectively. Gene expression and image features are independently prognostic of patient survival. However, it is sometimes difficult to discern how the molecular profile, e.g., gene expression, of given cells relate to their spatial layout, i.e., topology, in the tumor microenvironment (TME). However, with the advent of spatial transcriptomics (ST) and integrative bioinformatics analysis techniques, we are now able to better understand the TME of common cancers.
Method: In this paper, we aim to determine the genes that are correlated with image topology features (ITFs) in common cancers which we denote topology associated genes (TAGs). To achieve this objective, we generate the correlation coefficient between genes and image features after identifying the optimal number of clusters for each of them. Applying this correlation matrix to heatmap using R package pheatmap to visualize the correlation between the two sets. The objective of this study is to identify common themes for the genes correlated with ITFs and we can pursue this using functional enrichment analysis. Moreover, we also find the similarity between gene clusters and some image features clusters using the ranking of correlation coefficient in order to identify, compare and contrast the TAGs across breast and prostate cancer ST slides.
Result: The analysis shows that there are groups of gene ontology terms that are common within breast cancer, prostate cancer, and across both cancers. Notably, extracellular matrix (ECM) related terms appeared regularly in all ST slides.
Conclusion: We identified TAGs in every ST slide regardless of cancer type. These TAGs were enriched for ontology terms that add context to the ITFs generated from ST cancer slides.
|
220 |
DATAWAREHOUSE APPROACH TO DECISION SUPPORT SYSTEM FROM DISTRIBUTED, HETEROGENEOUS SOURCESSannellappanavar, Vijaya Laxmankumar 05 October 2006 (has links)
No description available.
|
Page generated in 0.0532 seconds