Global ETD Search

11	Hybrid Hardware/Software Architectures for Network Packet Processing in Security Applications Fießler, Andreas Christoph Kurt 14 June 2019 (has links) Die Menge an in Computernetzwerken verarbeiteten Daten steigt stetig, was Netzwerkgeräte wie Switches, Bridges, Router und Firewalls vor Herausfordungen stellt. Die Performance der verbreiteten, CPU/softwarebasierten Ansätze für die Implementierung dieser Aufgaben ist durch den inhärenten Overhead in der sequentiellen Datenverarbeitung limitiert, weshalb solche Funktionalitäten vermehrt auf dedizierten Hardwarebausteinen realisiert werden. Diese bieten eine schnelle, parallele Verarbeitung mit niedriger Latenz, sind allerdings aufwendiger in der Entwicklung und weniger flexibel. Nicht jede Anwendung kann zudem für parallele Verarbeitung optimiert werden. Diese Arbeit befasst sich mit hybriden Ansätzen, um eine bessere Ausnutzung der jeweiligen Stärken von Soft- und Hardwaresystemen zu ermöglichen, mit Schwerpunkt auf der Paketklassifikation. Es wird eine Firewall realisiert, die sowohl Flexibilität und Analysetiefe einer Software-Firewall als auch Durchsatz und Latenz einer Hardware-Firewall erreicht. Der Ansatz wird auf einem Standard-Rechnersystem, welches für die Hardware-Klassifikation mit einem rekonfigurierbaren Logikbaustein (FPGA) ergänzt wird, evaluiert. Eine wesentliche Herausforderung einer hybriden Firewall ist die Identifikation von Abhängigkeiten im Regelsatz. Es werden Ansätze vorgestellt, welche den redundanten Klassifikationsaufwand auf ein Minimum reduzieren, wie etwa die Wiederverwendung von Teilergebnissen der hybriden Klassifikatoren oder eine exakte Abhängigkeitsanalyse mittels Header Space Analysis. Für weitere Problemstellungen im Bereich der hardwarebasierten Paketklassifikation, wie dynamisch konfigurierbare Filterungsschaltkreise und schnelle, sichere Hashfunktionen für Lookups, werden Machbarkeit und Optimierungen evaluiert. Der hybride Ansatz wird im Weiteren auf ein System mit einer SDN-Komponente statt einer FPGA-Erweiterung übertragen. Auch hiermit können signifikante Performancegewinne erreicht werden. / Network devices like switches, bridges, routers, and firewalls are subject to a continuous development to keep up with ever-rising requirements. As the overhead of software network processing already became the performance-limiting factor for a variety of applications, also former software functions are shifted towards dedicated network processing hardware. Although such application-specific circuits allow fast, parallel, and low latency processing, they require expensive and time-consuming development with minimal possibilities for adaptions. Security can also be a major concern, as these circuits are virtually a black box for the user. Moreover, the highly parallel processing capabilities of specialized hardware are not necessarily an advantage for all kinds of tasks in network processing, where sometimes a classical CPU is better suited. This work introduces and evaluates concepts for building hybrid hardware-software-systems that exploit the advantages of both hardware and software approaches in order to achieve performant, flexible, and versatile network processing and packet classification systems. The approaches are evaluated on standard software systems, extended by a programmable hardware circuit (FPGA) to provide full control and flexibility. One key achievement of this work is the identification and mitigation of challenges inherent when a hybrid combination of multiple packet classification circuits with different characteristics is used. We introduce approaches to reduce redundant classification effort to a minimum, like re-usage of intermediate classification results and determination of dependencies by header space analysis. In addition, for some further challenges in hardware based packet classification like filtering circuits with dynamic updates and fast hash functions for lookups, we describe feasibility and optimizations. At last, the hybrid approach is evaluated using a standard SDN switch instead of the FPGA accelerator to prove portability. Paketklassifikation FPGA Netzwerk Firewall packet classification FPGA computer networks firewall security 004 Datenverarbeitung; Informatik ST 277 ST 200 ST 230 ST 150 ddc:004 ddc:000
12	Integrating Natural Language Processing (NLP) and Language Resources Using Linked Data Hellmann, Sebastian 09 January 2014 (has links) This thesis is a compendium of scientific works and engineering specifications that have been contributed to a large community of stakeholders to be copied, adapted, mixed, built upon and exploited in any way possible to achieve a common goal: Integrating Natural Language Processing (NLP) and Language Resources Using Linked Data The explosion of information technology in the last two decades has led to a substantial growth in quantity, diversity and complexity of web-accessible linguistic data. These resources become even more useful when linked with each other and the last few years have seen the emergence of numerous approaches in various disciplines concerned with linguistic resources and NLP tools. It is the challenge of our time to store, interlink and exploit this wealth of data accumulated in more than half a century of computational linguistics, of empirical, corpus-based study of language, and of computational lexicography in all its heterogeneity. The vision of the Giant Global Graph (GGG) was conceived by Tim Berners-Lee aiming at connecting all data on the Web and allowing to discover new relations between this openly-accessible data. This vision has been pursued by the Linked Open Data (LOD) community, where the cloud of published datasets comprises 295 data repositories and more than 30 billion RDF triples (as of September 2011). RDF is based on globally unique and accessible URIs and it was specifically designed to establish links between such URIs (or resources). This is captured in the Linked Data paradigm that postulates four rules: (1) Referred entities should be designated by URIs, (2) these URIs should be resolvable over HTTP, (3) data should be represented by means of standards such as RDF, (4) and a resource should include links to other resources. Although it is difficult to precisely identify the reasons for the success of the LOD effort, advocates generally argue that open licenses as well as open access are key enablers for the growth of such a network as they provide a strong incentive for collaboration and contribution by third parties. In his keynote at BNCOD 2011, Chris Bizer argued that with RDF the overall data integration effort can be “split between data publishers, third parties, and the data consumer”, a claim that can be substantiated by observing the evolution of many large data sets constituting the LOD cloud. As written in the acknowledgement section, parts of this thesis has received numerous feedback from other scientists, practitioners and industry in many different ways. The main contributions of this thesis are summarized here: Part I – Introduction and Background. During his keynote at the Language Resource and Evaluation Conference in 2012, Sören Auer stressed the decentralized, collaborative, interlinked and interoperable nature of the Web of Data. The keynote provides strong evidence that Semantic Web technologies such as Linked Data are on its way to become main stream for the representation of language resources. The jointly written companion publication for the keynote was later extended as a book chapter in The People’s Web Meets NLP and serves as the basis for “Introduction” and “Background”, outlining some stages of the Linked Data publication and refinement chain. Both chapters stress the importance of open licenses and open access as an enabler for collaboration, the ability to interlink data on the Web as a key feature of RDF as well as provide a discussion about scalability issues and decentralization. Furthermore, we elaborate on how conceptual interoperability can be achieved by (1) re-using vocabularies, (2) agile ontology development, (3) meetings to refine and adapt ontologies and (4) tool support to enrich ontologies and match schemata. Part II - Language Resources as Linked Data. “Linked Data in Linguistics” and “NLP & DBpedia, an Upward Knowledge Acquisition Spiral” summarize the results of the Linked Data in Linguistics (LDL) Workshop in 2012 and the NLP & DBpedia Workshop in 2013 and give a preview of the MLOD special issue. In total, five proceedings – three published at CEUR (OKCon 2011, WoLE 2012, NLP & DBpedia 2013), one Springer book (Linked Data in Linguistics, LDL 2012) and one journal special issue (Multilingual Linked Open Data, MLOD to appear) – have been (co-)edited to create incentives for scientists to convert and publish Linked Data and thus to contribute open and/or linguistic data to the LOD cloud. Based on the disseminated call for papers, 152 authors contributed one or more accepted submissions to our venues and 120 reviewers were involved in peer-reviewing. “DBpedia as a Multilingual Language Resource” and “Leveraging the Crowdsourcing of Lexical Resources for Bootstrapping a Linguistic Linked Data Cloud” contain this thesis’ contribution to the DBpedia Project in order to further increase the size and inter-linkage of the LOD Cloud with lexical-semantic resources. Our contribution comprises extracted data from Wiktionary (an online, collaborative dictionary similar to Wikipedia) in more than four languages (now six) as well as language-specific versions of DBpedia, including a quality assessment of inter-language links between Wikipedia editions and internationalized content negotiation rules for Linked Data. In particular the work described in created the foundation for a DBpedia Internationalisation Committee with members from over 15 different languages with the common goal to push DBpedia as a free and open multilingual language resource. Part III - The NLP Interchange Format (NIF). “NIF 2.0 Core Specification”, “NIF 2.0 Resources and Architecture” and “Evaluation and Related Work” constitute one of the main contribution of this thesis. The NLP Interchange Format (NIF) is an RDF/OWL-based format that aims to achieve interoperability between Natural Language Processing (NLP) tools, language resources and annotations. The core specification is included in and describes which URI schemes and RDF vocabularies must be used for (parts of) natural language texts and annotations in order to create an RDF/OWL-based interoperability layer with NIF built upon Unicode Code Points in Normal Form C. In , classes and properties of the NIF Core Ontology are described to formally define the relations between text, substrings and their URI schemes. contains the evaluation of NIF. In a questionnaire, we asked questions to 13 developers using NIF. UIMA, GATE and Stanbol are extensible NLP frameworks and NIF was not yet able to provide off-the-shelf NLP domain ontologies for all possible domains, but only for the plugins used in this study. After inspecting the software, the developers agreed however that NIF is adequate enough to provide a generic RDF output based on NIF using literal objects for annotations. All developers were able to map the internal data structure to NIF URIs to serialize RDF output (Adequacy). The development effort in hours (ranging between 3 and 40 hours) as well as the number of code lines (ranging between 110 and 445) suggest, that the implementation of NIF wrappers is easy and fast for an average developer. Furthermore the evaluation contains a comparison to other formats and an evaluation of the available URI schemes for web annotation. In order to collect input from the wide group of stakeholders, a total of 16 presentations were given with extensive discussions and feedback, which has lead to a constant improvement of NIF from 2010 until 2013. After the release of NIF (Version 1.0) in November 2011, a total of 32 vocabulary employments and implementations for different NLP tools and converters were reported (8 by the (co-)authors, including Wiki-link corpus, 13 by people participating in our survey and 11 more, of which we have heard). Several roll-out meetings and tutorials were held (e.g. in Leipzig and Prague in 2013) and are planned (e.g. at LREC 2014). Part IV - The NLP Interchange Format in Use. “Use Cases and Applications for NIF” and “Publication of Corpora using NIF” describe 8 concrete instances where NIF has been successfully used. One major contribution in is the usage of NIF as the recommended RDF mapping in the Internationalization Tag Set (ITS) 2.0 W3C standard and the conversion algorithms from ITS to NIF and back. One outcome of the discussions in the standardization meetings and telephone conferences for ITS 2.0 resulted in the conclusion there was no alternative RDF format or vocabulary other than NIF with the required features to fulfill the working group charter. Five further uses of NIF are described for the Ontology of Linguistic Annotations (OLiA), the RDFaCE tool, the Tiger Corpus Navigator, the OntosFeeder and visualisations of NIF using the RelFinder tool. These 8 instances provide an implemented proof-of-concept of the features of NIF. starts with describing the conversion and hosting of the huge Google Wikilinks corpus with 40 million annotations for 3 million web sites. The resulting RDF dump contains 477 million triples in a 5.6 GB compressed dump file in turtle syntax. describes how NIF can be used to publish extracted facts from news feeds in the RDFLiveNews tool as Linked Data. Part V - Conclusions. provides lessons learned for NIF, conclusions and an outlook on future work. Most of the contributions are already summarized above. One particular aspect worth mentioning is the increasing number of NIF-formated corpora for Named Entity Recognition (NER) that have come into existence after the publication of the main NIF paper Integrating NLP using Linked Data at ISWC 2013. These include the corpora converted by Steinmetz, Knuth and Sack for the NLP & DBpedia workshop and an OpenNLP-based CoNLL converter by Brümmer. Furthermore, we are aware of three LREC 2014 submissions that leverage NIF: NIF4OGGD - NLP Interchange Format for Open German Governmental Data, N^3 – A Collection of Datasets for Named Entity Recognition and Disambiguation in the NLP Interchange Format and Global Intelligent Content: Active Curation of Language Resources using Linked Data as well as an early implementation of a GATE-based NER/NEL evaluation framework by Dojchinovski and Kliegr. Further funding for the maintenance, interlinking and publication of Linguistic Linked Data as well as support and improvements of NIF is available via the expiring LOD2 EU project, as well as the CSA EU project called LIDER, which started in November 2013. Based on the evidence of successful adoption presented in this thesis, we can expect a decent to high chance of reaching critical mass of Linked Data technology as well as the NIF standard in the field of Natural Language Processing and Language Resources.:CONTENTS i introduction and background 1 1 introduction 3 1.1 Natural Language Processing . . . . . . . . . . . . . . . 3 1.2 Open licenses, open access and collaboration . . . . . . 5 1.3 Linked Data in Linguistics . . . . . . . . . . . . . . . . . 6 1.4 NLP for and by the Semantic Web – the NLP Inter- change Format (NIF) . . . . . . . . . . . . . . . . . . . . 8 1.5 Requirements for NLP Integration . . . . . . . . . . . . 10 1.6 Overview and Contributions . . . . . . . . . . . . . . . 11 2 background 15 2.1 The Working Group on Open Data in Linguistics (OWLG) 15 2.1.1 The Open Knowledge Foundation . . . . . . . . 15 2.1.2 Goals of the Open Linguistics Working Group . 16 2.1.3 Open linguistics resources, problems and chal- lenges . . . . . . . . . . . . . . . . . . . . . . . . 17 2.1.4 Recent activities and on-going developments . . 18 2.2 Technological Background . . . . . . . . . . . . . . . . . 18 2.3 RDF as a data model . . . . . . . . . . . . . . . . . . . . 21 2.4 Performance and scalability . . . . . . . . . . . . . . . . 22 2.5 Conceptual interoperability . . . . . . . . . . . . . . . . 22 ii language resources as linked data 25 3 linked data in linguistics 27 3.1 Lexical Resources . . . . . . . . . . . . . . . . . . . . . . 29 3.2 Linguistic Corpora . . . . . . . . . . . . . . . . . . . . . 30 3.3 Linguistic Knowledgebases . . . . . . . . . . . . . . . . 31 3.4 Towards a Linguistic Linked Open Data Cloud . . . . . 32 3.5 State of the Linguistic Linked Open Data Cloud in 2012 33 3.6 Querying linked resources in the LLOD . . . . . . . . . 36 3.6.1 Enriching metadata repositories with linguistic features (Glottolog → OLiA) . . . . . . . . . . . 36 3.6.2 Enriching lexical-semantic resources with lin- guistic information (DBpedia (→ POWLA) → OLiA) . . . . . . . . . . . . . . . . . . . . . . . . 38 4 DBpedia as a multilingual language resource: the case of the greek dbpedia edition. 39 4.1 Current state of the internationalization effort . . . . . 40 4.2 Language-specific design of DBpedia resource identifiers 41 4.3 Inter-DBpedia linking . . . . . . . . . . . . . . . . . . . 42 4.4 Outlook on DBpedia Internationalization . . . . . . . . 44 5 leveraging the crowdsourcing of lexical resources for bootstrapping a linguistic linked data cloud 47 5.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . 48 5.2 Problem Description . . . . . . . . . . . . . . . . . . . . 50 5.2.1 Processing Wiki Syntax . . . . . . . . . . . . . . 50 5.2.2 Wiktionary . . . . . . . . . . . . . . . . . . . . . . 52 5.2.3 Wiki-scale Data Extraction . . . . . . . . . . . . . 53 5.3 Design and Implementation . . . . . . . . . . . . . . . . 54 5.3.1 Extraction Templates . . . . . . . . . . . . . . . . 56 5.3.2 Algorithm . . . . . . . . . . . . . . . . . . . . . . 56 5.3.3 Language Mapping . . . . . . . . . . . . . . . . . 58 5.3.4 Schema Mediation by Annotation with lemon . 58 5.4 Resulting Data . . . . . . . . . . . . . . . . . . . . . . . . 58 5.5 Lessons Learned . . . . . . . . . . . . . . . . . . . . . . . 60 5.6 Discussion and Future Work . . . . . . . . . . . . . . . 60 5.6.1 Next Steps . . . . . . . . . . . . . . . . . . . . . . 61 5.6.2 Open Research Questions . . . . . . . . . . . . . 61 6 nlp & dbpedia, an upward knowledge acquisition spiral 63 6.1 Knowledge acquisition and structuring . . . . . . . . . 64 6.2 Representation of knowledge . . . . . . . . . . . . . . . 65 6.3 NLP tasks and applications . . . . . . . . . . . . . . . . 65 6.3.1 Named Entity Recognition . . . . . . . . . . . . 66 6.3.2 Relation extraction . . . . . . . . . . . . . . . . . 67 6.3.3 Question Answering over Linked Data . . . . . 67 6.4 Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 6.4.1 Gold and silver standards . . . . . . . . . . . . . 69 6.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 iii the nlp interchange format (nif) 73 7 nif 2.0 core specification 75 7.1 Conformance checklist . . . . . . . . . . . . . . . . . . . 75 7.2 Creation . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 7.2.1 Definition of Strings . . . . . . . . . . . . . . . . 78 7.2.2 Representation of Document Content with the nif:Context Class . . . . . . . . . . . . . . . . . . 80 7.3 Extension of NIF . . . . . . . . . . . . . . . . . . . . . . 82 7.3.1 Part of Speech Tagging with OLiA . . . . . . . . 83 7.3.2 Named Entity Recognition with ITS 2.0, DBpe- dia and NERD . . . . . . . . . . . . . . . . . . . 84 7.3.3 lemon and Wiktionary2RDF . . . . . . . . . . . 86 8 nif 2.0 resources and architecture 89 8.1 NIF Core Ontology . . . . . . . . . . . . . . . . . . . . . 89 8.1.1 Logical Modules . . . . . . . . . . . . . . . . . . 90 8.2 Workflows . . . . . . . . . . . . . . . . . . . . . . . . . . 91 8.2.1 Access via REST Services . . . . . . . . . . . . . 92 8.2.2 NIF Combinator Demo . . . . . . . . . . . . . . 92 8.3 Granularity Profiles . . . . . . . . . . . . . . . . . . . . . 93 8.4 Further URI Schemes for NIF . . . . . . . . . . . . . . . 95 8.4.1 Context-Hash-based URIs . . . . . . . . . . . . . 99 9 evaluation and related work 101 9.1 Questionnaire and Developers Study for NIF 1.0 . . . . 101 9.2 Qualitative Comparison with other Frameworks and Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 9.3 URI Stability Evaluation . . . . . . . . . . . . . . . . . . 103 9.4 Related URI Schemes . . . . . . . . . . . . . . . . . . . . 104 iv the nlp interchange format in use 109 10 use cases and applications for nif 111 10.1 Internationalization Tag Set 2.0 . . . . . . . . . . . . . . 111 10.1.1 ITS2NIF and NIF2ITS conversion . . . . . . . . . 112 10.2 OLiA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 10.3 RDFaCE . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 10.4 Tiger Corpus Navigator . . . . . . . . . . . . . . . . . . 121 10.4.1 Tools and Resources . . . . . . . . . . . . . . . . 122 10.4.2 NLP2RDF in 2010 . . . . . . . . . . . . . . . . . . 123 10.4.3 Linguistic Ontologies . . . . . . . . . . . . . . . . 124 10.4.4 Implementation . . . . . . . . . . . . . . . . . . . 125 10.4.5 Evaluation . . . . . . . . . . . . . . . . . . . . . . 126 10.4.6 Related Work and Outlook . . . . . . . . . . . . 129 10.5 OntosFeeder – a Versatile Semantic Context Provider for Web Content Authoring . . . . . . . . . . . . . . . . 131 10.5.1 Feature Description and User Interface Walk- through . . . . . . . . . . . . . . . . . . . . . . . 132 10.5.2 Architecture . . . . . . . . . . . . . . . . . . . . . 134 10.5.3 Embedding Metadata . . . . . . . . . . . . . . . 135 10.5.4 Related Work and Summary . . . . . . . . . . . 135 10.6 RelFinder: Revealing Relationships in RDF Knowledge Bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 10.6.1 Implementation . . . . . . . . . . . . . . . . . . . 137 10.6.2 Disambiguation . . . . . . . . . . . . . . . . . . . 138 10.6.3 Searching for Relationships . . . . . . . . . . . . 139 10.6.4 Graph Visualization . . . . . . . . . . . . . . . . 140 10.6.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . 141 11 publication of corpora using nif 143 11.1 Wikilinks Corpus . . . . . . . . . . . . . . . . . . . . . . 143 11.1.1 Description of the corpus . . . . . . . . . . . . . 143 11.1.2 Quantitative Analysis with Google Wikilinks Cor- pus . . . . . . . . . . . . . . . . . . . . . . . . . . 144 11.2 RDFLiveNews . . . . . . . . . . . . . . . . . . . . . . . . 144 11.2.1 Overview . . . . . . . . . . . . . . . . . . . . . . 145 11.2.2 Mapping to RDF and Publication on the Web of Data . . . . . . . . . . . . . . . . . . . . . . . . . 146 v conclusions 149 12 lessons learned, conclusions and future work 151 12.1 Lessons Learned for NIF . . . . . . . . . . . . . . . . . . 151 12.2 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 151 12.3 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . 153 ddc:000 Informatik ddc:Informationswissenschaft ddc:allgemeine Werke ddc:004 Datenverarbeitung; Informatik ddc:410 Linguistik
13	Data Fusion for Multi-Sensor Nondestructive Detection of Surface Cracks in Ferromagnetic Materials Heideklang, René 28 November 2018 (has links) Ermüdungsrissbildung ist ein gefährliches und kostenintensives Phänomen, welches frühzeitig erkannt werden muss. Weil kleine Fehlstellen jedoch hohe Testempfindlichkeit erfordern, wird die Prüfzuverlässigkeit durch Falschanzeigen vermindert. Diese Arbeit macht sich deshalb die Diversität unterschiedlicher zerstörungsfreier Oberflächenprüfmethoden zu Nutze, um mittels Datenfusion die Zuverlässigkeit der Fehlererkennung zu erhöhen. Der erste Beitrag dieser Arbeit in neuartigen Ansätzen zur Fusion von Prüfbildern. Diese werden durch Oberflächenabtastung mittels Wirbelstromprüfung, thermischer Prüfung und magnetischer Streuflussprüfung gewonnen. Die Ergebnisse zeigen, dass schon einfache algebraische Fusionsregeln gute Ergebnisse liefern, sofern die Daten adäquat vorverarbeitet wurden. So übertrifft Datenfusion den besten Einzelsensor in der pixelbasierten Falscherkennungsrate um den Faktor sechs bei einer Nutentiefe von 10 μm. Weiterhin wird die Fusion im Bildtransformationsbereich untersucht. Jedoch werden die theoretischen Vorteile solcher richtungsempfindlichen Transformationen in der Praxis mit den vorliegenden Daten nicht erreicht. Nichtsdestotrotz wird der Vorteil der Fusion gegenüber Einzelsensorprüfung auch hier bestätigt. Darüber hinaus liefert diese Arbeit neuartige Techniken zur Fusion auch auf höheren Ebenen der Signalabstraktion. Ein Ansatz, der auf Kerndichtefunktionen beruht, wird eingeführt, um örtlich verteilte Detektionshypothesen zu integrieren. Er ermöglicht, die praktisch unvermeidbaren Registrierungsfehler explizit zu modellieren. Oberflächenunstetigkeiten von 30 μm Tiefe können zuverlässig durch Fusion gefunden werden, wogegen das beste Einzelverfahren erst Tiefen ab 40–50 μm erfolgreich auffindet. Das Experiment wird auf einem zweiten Prüfkörper bestätigt. Am Ende der Arbeit werden Richtlinien für den Einsatz von Datenfusion gegeben, und die Notwendigkeit einer Initiative zum Teilen von Messdaten wird betont, um zukünftige Forschung zu fördern. / Fatigue cracking is a dangerous and cost-intensive phenomenon that requires early detection. But at high test sensitivity, the abundance of false indications limits the reliability of conventional materials testing. This thesis exploits the diversity of physical principles that different nondestructive surface inspection methods offer, by applying data fusion techniques to increase the reliability of defect detection. The first main contribution are novel approaches for the fusion of NDT images. These surface scans are obtained from state-of-the-art inspection procedures in Eddy Current Testing, Thermal Testing and Magnetic Flux Leakage Testing. The implemented image fusion strategy demonstrates that simple algebraic fusion rules are sufficient for high performance, given adequate signal normalization. Data fusion reduces the rate of false positives is reduced by a factor of six over the best individual sensor at a 10 μm deep groove. Moreover, the utility of state-of-the-art image representations, like the Shearlet domain, are explored. However, the theoretical advantages of such directional transforms are not attained in practice with the given data. Nevertheless, the benefit of fusion over single-sensor inspection is confirmed a second time. Furthermore, this work proposes novel techniques for fusion at a high level of signal abstraction. A kernel-based approach is introduced to integrate spatially scattered detection hypotheses. This method explicitly deals with registration errors that are unavoidable in practice. Surface discontinuities as shallow as 30 μm are reliably found by fusion, whereas the best individual sensor requires depths of 40–50 μm for successful detection. The experiment is replicated on a similar second test specimen. Practical guidelines are given at the end of the thesis, and the need for a data sharing initiative is stressed to promote future research on this topic. Datenfusion Zerstörungsfreie Prüfung Ermüdungsriss Werkstoffermüdung Bildverarbeitung Kerndichteschätzung Wirbelstromprüfung Streuflussprüfung Thermographieprüfung Thermographie ZfP Signalverarbeitung Data Fusion Nondestructive Testing NDT NDE Fatigue Cracking Fatigue Image Processing Kernel Density Estimation KDE Eddy Current Testing Magnetic Flux Leakage Testing Thermal Testing 004 Datenverarbeitung; Informatik ST 308 ddc:004
14	On Invariant Formulae of First-Order Logic with Numerical Predicates Harwath, Frederik 12 December 2018 (has links) Diese Arbeit untersucht ordnungsinvariante Formeln der Logik erster Stufe (FO) und einiger ihrer Erweiterungen, sowie andere eng verwandte Konzepte der endlichen Modelltheorie. Viele Resultate der endlichen Modelltheorie nehmen an, dass Strukturen mit einer Einbettung ihres Universums in ein Anfangsstück der natürlichen Zahlen ausgestattet sind. Dies erlaubt es, beliebige Relationen (z.B. die lineare Ordnung) und Operationen (z.B. Addition, Multiplikation) von den natürlichen Zahlen auf solche Strukturen zu übertragen. Die resultierenden Relationen auf den endlichen Strukturen werden als numerische Prädikate bezeichnet. Werden numerische Prädikate in Formeln verwendet, beschränkt man sich dabei häufig auf solche Formeln, deren Wahrheitswert auf endlichen Strukturen invariant unter Änderungen der Einbettung der Strukturen ist. Wenn das einzige verwendete numerische Prädikat eine lineare Ordnung ist, spricht man beispielsweise von ordnungsinvarianten Formeln. Die Resultate dieser Arbeit können in drei Teile unterteilt werden. Der erste Teil betrachtet die Lokalitätseigenschaften von FO-Formeln mit Modulo-Zählquantoren, die beliebige numerische Prädikate invariant nutzen. Der zweite Teil betrachtet FO-Sätze, die eine lineare Ordnung samt der zugehörigen Addition auf invariante Weise nutzen, auf endlichen Bäumen. Es wird gezeigt, dass diese dieselben regulären Baumsprachen definieren, wie FO-Sätze ohne numerische Prädikate mit bestimmten Kardinalitätsprädikaten. Für den Beweis wird eine algebraische Charakterisierung der in dieser Logik definierbaren Baumsprachen durch Operationen auf Bäumen entwickelt. Der dritte Teil der Arbeit beschäftigt sich mit der Ausdrucksstärke und der Prägnanz von FO und Erweiterungen von FO auf Klassen von Strukturen beschränkter Baumtiefe. / This thesis studies the concept of order-invariance of formulae of first-order logic (FO) and some of its extensions as well as other closely related concepts from finite model theory. Many results in finite model theory assume that structures are equipped with an embedding of their universe into an initial segment of the natural numbers. This allows to transfer arbitrary relations (e.g. linear order) and operations (e.g. addition, multiplication) on the natural numbers to structures. The arising relations on the structures are called numerical predicates. If formulae use these numerical predicates, it is often desirable to consider only such formulae whose truth value in finite structures is invariant under changes to the embeddings of the structures. If the numerical predicates include only a linear order, such formulae are called order-invariant. We study the effect of the invariant use of different kinds of numerical predicates on the expressive power of FO and extensions thereof. The results of this thesis can be divided into three parts. The first part considers the locality and non-locality properties of formulae of FO with modulo-counting quantifiers which may use arbitrary numerical predicates in an invariant way. The second part considers sentences of FO which may use a linear order and the corresponding addition in an invariant way and obtains a characterisation of the regular finite tree languages which can be defined by such sentences: these are the same tree languages which are definable by FO-sentences without numerical predicates with certain cardinality predicates. For the proof, we obtain a characterisation of the tree languages definable in this logic in terms of algebraic operations on trees. The third part compares the expressive power and the succinctness of different ex- tensions of FO on structures of bounded tree-depth. Mathematische Logik Modelltheorie Prädikatenlogik erster Stufe Theoretische Informatik Komplexitätstheorie Boolesche Schaltkreise Baumsprachen Lokalität Deskriptive Komplexitätstheorie Ordnungsinvarianz mathematical logic model theory first-order logic theoretical computer science complexity theory circuit complexity tree languages locality descriptive complexity order-invariance 004 Datenverarbeitung; Informatik ST 125 ddc:004
15	Extracting and Aggregating Temporal Events from Texts Döhling, Lars 11 October 2017 (has links) Das Finden von zuverlässigen Informationen über gegebene Ereignisse aus großen und dynamischen Textsammlungen, wie dem Web, ist ein wichtiges Thema. Zum Beispiel sind Rettungsteams und Versicherungsunternehmen an prägnanten Fakten über Schäden nach Katastrophen interessiert, die heutzutage online in Web-Blogs, Zeitungsartikeln, Social Media etc. zu finden sind. Solche Fakten helfen, die erforderlichen Hilfsmaßnahmen zu bestimmen und unterstützen deren Koordination. Allerdings ist das Finden, Extrahieren und Aggregieren nützlicher Informationen ein hochkomplexes Unterfangen: Es erfordert die Ermittlung geeigneter Textquellen und deren zeitliche Einordung, die Extraktion relevanter Fakten in diesen Texten und deren Aggregation zu einer verdichteten Sicht auf die Ereignisse, trotz Inkonsistenzen, vagen Angaben und Veränderungen über die Zeit. In dieser Arbeit präsentieren und evaluieren wir Techniken und Lösungen für jedes dieser Probleme, eingebettet in ein vierstufiges Framework. Die angewandten Methoden beruhen auf Verfahren des Musterabgleichs, der Verarbeitung natürlicher Sprache und des maschinellen Lernens. Zusätzlich berichten wir über die Ergebnisse zweier Fallstudien, basierend auf dem Einsatz des gesamten Frameworks: Die Ermittlung von Daten über Erdbeben und Überschwemmungen aus Webdokumenten. Unsere Ergebnisse zeigen, dass es unter bestimmten Umständen möglich ist, automatisch zuverlässige und zeitgerechte Daten aus dem Internet zu erhalten. / Finding reliable information about given events from large and dynamic text collections, such as the web, is a topic of great interest. For instance, rescue teams and insurance companies are interested in concise facts about damages after disasters, which can be found today in web blogs, online newspaper articles, social media, etc. Knowing these facts helps to determine the required scale of relief operations and supports their coordination. However, finding, extracting, and condensing specific facts is a highly complex undertaking: It requires identifying appropriate textual sources and their temporal alignment, recognizing relevant facts within these texts, and aggregating extracted facts into a condensed answer despite inconsistencies, uncertainty, and changes over time. In this thesis, we present and evaluate techniques and solutions for each of these problems, embedded in a four-step framework. Applied methods are pattern matching, natural language processing, and machine learning. We also report the results for two case studies applying our entire framework: gathering data on earthquakes and floods from web documents. Our results show that it is, under certain circumstances, possible to automatically obtain reliable and timely data from the web. Dokumentenretrieval Query Expansion Temporal Alignment Informationsextraktion Named Entity Recognition Relationsextraktion CRF SVM Dependenzgraph Informationsfusion Funktionsanpassung Erdbeben Flut Document Retrieval Query Expansion Temporal Alignment Information Extraction Named Entity Recognition Relationship Extraction CRF SVM Dependency Graph Information Fusion Curve Fitting Earthquake Flood 004 Datenverarbeitung; Informatik ST 530 ddc:004
16	Capturing Polynomial Time and Logarithmic Space using Modular Decompositions and Limited Recursion Grußien, Berit 10 November 2017 (has links) Diese Arbeit leistet Beiträge im Bereich der deskriptiven Komplexitätstheorie. Zunächst beschäftigen wir uns mit der ungelösten Frage, ob es eine Logik gibt, welche die Klasse der Polynomialzeit-Eigenschaften (PTIME) charakterisiert. Wir betrachten Graphklassen, die unter induzierten Teilgraphen abgeschlossen sind. Auf solchen Graphklassen lässt sich die 1976 von Gallai eingeführte modulare Zerlegung anwenden. Graphen, die durch modulare Zerlegung nicht zerlegbar sind, heißen prim. Wir stellen ein neues Werkzeug vor: das Modulare Zerlegungstheorem. Es reduziert (definierbare) Kanonisierung einer Graphklasse C auf (definierbare) Kanonisierung der Klasse aller primen Graphen aus C, die mit binären Relationen auf einer linear geordneten Menge gefärbt sind. Mit Hilfe des Modularen Zerlegungstheorems zeigen wir, dass Fixpunktlogik mit Zählen (FP+C) PTIME auf der Klasse aller Permutationsgraphen und auf der Klasse aller chordalen Komparabilitätsgraphen charakterisiert. Wir beweisen zudem, dass modulare Zerlegungsbäume in Symmetrisch-Transitive-Hüllen-Logik mit Zählen (STC+C) definierbar und damit in logarithmischem Platz berechenbar sind. Weiterhin definieren wir eine neue Logik für die Komplexitätsklasse Logarithmischer Platz (LOGSPACE). Wir erweitern die Logik erster Stufe mit Zählen um einen Operator, der eine in logarithmischem Platz berechenbare Form der Rekursion erlaubt. Die resultierende Logik LREC ist ausdrucksstärker als die Deterministisch-Transitive-Hüllen-Logik mit Zählen (DTC+C) und echt in FP+C enthalten. Wir zeigen, dass LREC LOGSPACE auf gerichteten Bäumen charakterisiert. Zudem betrachten wir eine Erweiterung LREC= von LREC, die sich gegenüber LREC durch bessere Abschlusseigenschaften auszeichnet und im Gegensatz zu LREC ausdrucksstärker als die Symmetrisch-Transitive-Hüllen-Logik (STC) ist. Wir beweisen, dass LREC= LOGSPACE sowohl auf der Klasse der Intervallgraphen als auch auf der Klasse der chordalen klauenfreien Graphen charakterisiert. / This theses is making contributions to the field of descriptive complexity theory. First, we look at the main open problem in this area: the question of whether there exists a logic that captures polynomial time (PTIME). We consider classes of graphs that are closed under taking induced subgraphs. For such graph classes, an effective graph decomposition, called modular decomposition, was introduced by Gallai in 1976. The graphs that are non-decomposable with respect to modular decomposition are called prime. We present a tool, the Modular Decomposition Theorem, that reduces (definable) canonization of a graph class C to (definable) canonization of the class of prime graphs of C that are colored with binary relations on a linearly ordered set. By an application of the Modular Decomposition Theorem, we show that fixed-point logic with counting (FP+C) captures PTIME on the class of permutation graphs and the class of chordal comparability graphs. We also prove that the modular decomposition tree is definable in symmetric transitive closure logic with counting (STC+C), and therefore, computable in logarithmic space. Further, we introduce a new logic for the complexity class logarithmic space (LOGSPACE). We extend first-order logic with counting by a new operator that allows it to formalize a limited form of recursion which can be evaluated in logarithmic space. We prove that the resulting logic LREC is strictly more expressive than deterministic transitive closure logic with counting (DTC+C) and that it is strictly contained in FP+C. We show that LREC captures LOGSPACE on the class of directed trees. We also study an extension LREC= of LREC that has nicer closure properties and that, unlike LREC, is more expressive than symmetric transitive closure logic (STC). We prove that LREC= captures LOGSPACE on the class of interval graphs and on the class of chordal claw-free graphs. deskriptive Komplexität modulare Zerlegung Polynomialzeit Fixpunktlogik Permutationsgraphen chordale Komparabilitätsgraphen Kanonisierung logarithmischer Platz Intervallgraphen chordale klauenfreie Graphen descriptive complexity modular decomposition polynomial time fixed-point logic permutation graphs chordal comparability graphs canonization logarithmic space interval graphs chordal claw-free graphs 004 Datenverarbeitung; Informatik ST 134 ddc:004
17	Scientific Workflows for Hadoop Bux, Marc Nicolas 07 August 2018 (has links) Scientific Workflows bieten flexible Möglichkeiten für die Modellierung und den Austausch komplexer Arbeitsabläufe zur Analyse wissenschaftlicher Daten. In den letzten Jahrzehnten sind verschiedene Systeme entstanden, die den Entwurf, die Ausführung und die Verwaltung solcher Scientific Workflows unterstützen und erleichtern. In mehreren wissenschaftlichen Disziplinen wachsen die Mengen zu verarbeitender Daten inzwischen jedoch schneller als die Rechenleistung und der Speicherplatz verfügbarer Rechner. Parallelisierung und verteilte Ausführung werden häufig angewendet, um mit wachsenden Datenmengen Schritt zu halten. Allerdings sind die durch verteilte Infrastrukturen bereitgestellten Ressourcen häufig heterogen, instabil und unzuverlässig. Um die Skalierbarkeit solcher Infrastrukturen nutzen zu können, müssen daher mehrere Anforderungen erfüllt sein: Scientific Workflows müssen parallelisiert werden. Simulations-Frameworks zur Evaluation von Planungsalgorithmen müssen die Instabilität verteilter Infrastrukturen berücksichtigen. Adaptive Planungsalgorithmen müssen eingesetzt werden, um die Nutzung instabiler Ressourcen zu optimieren. Hadoop oder ähnliche Systeme zur skalierbaren Verwaltung verteilter Ressourcen müssen verwendet werden. Diese Dissertation präsentiert neue Lösungen für diese Anforderungen. Zunächst stellen wir DynamicCloudSim vor, ein Simulations-Framework für Cloud-Infrastrukturen, welches verschiedene Aspekte der Variabilität adäquat modelliert. Im Anschluss beschreiben wir ERA, einen adaptiven Planungsalgorithmus, der die Ausführungszeit eines Scientific Workflows optimiert, indem er Heterogenität ausnutzt, kritische Teile des Workflows repliziert und sich an Veränderungen in der Infrastruktur anpasst. Schließlich präsentieren wir Hi-WAY, eine Ausführungsumgebung die ERA integriert und die hochgradig skalierbare Ausführungen in verschiedenen Sprachen beschriebener Scientific Workflows auf Hadoop ermöglicht. / Scientific workflows provide a means to model, execute, and exchange the increasingly complex analysis pipelines necessary for today's data-driven science. Over the last decades, scientific workflow management systems have emerged to facilitate the design, execution, and monitoring of such workflows. At the same time, the amounts of data generated in various areas of science outpaced hardware advancements. Parallelization and distributed execution are generally proposed to deal with increasing amounts of data. However, the resources provided by distributed infrastructures are subject to heterogeneity, dynamic performance changes at runtime, and occasional failures. To leverage the scalability provided by these infrastructures despite the observed aspects of performance variability, workflow management systems have to progress: Parallelization potentials in scientific workflows have to be detected and exploited. Simulation frameworks, which are commonly employed for the evaluation of scheduling mechanisms, have to consider the instability encountered on the infrastructures they emulate. Adaptive scheduling mechanisms have to be employed to optimize resource utilization in the face of instability. State-of-the-art systems for scalable distributed resource management and storage, such as Apache Hadoop, have to be supported. This dissertation presents novel solutions for these aspirations. First, we introduce DynamicCloudSim, a cloud computing simulation framework that is able to adequately model the various aspects of variability encountered in computational clouds. Secondly, we outline ERA, an adaptive scheduling policy that optimizes workflow makespan by exploiting heterogeneity, replicating bottlenecks in workflow execution, and adapting to changes in the underlying infrastructure. Finally, we present Hi-WAY, an execution engine that integrates ERA and enables the highly scalable execution of scientific workflows written in a number of languages on Hadoop. Scientific Workflows Workflow-Management-System Cloud Computing Simulation Workflow-Planungsalgorithmen Adaptive Planungsalgorithmen Brownsche Bewegung Wiener-Prozess Apache Hadoop Hadoop YARN Scientific Workflows Workflow Management Systems Cloud Computing Simulation Workflow Scheduling Adaptive Scheduling Brownian Motion Wiener Process Apache Hadoop Hadoop YARN 004 Datenverarbeitung; Informatik ST 201 H03 ST 201 ddc:004
18	Artificial Neural Networks in Greenhouse Modelling Miranda Trujillo, Luis Carlos 24 August 2018 (has links) Moderne Präzisionsgartenbaulicheproduktion schließt hoch technifizierte Gewächshäuser, deren Einsatz in großem Maße von der Qualität der Sensorik- und Regelungstechnik abhängt, mit ein. Zu den Regelungsstrategien gehören unter anderem Methoden der Künstlichen Intelligenz, wie z.B. Künstliche Neuronale Netze (KNN, aus dem Englischen). Die vorliegende Arbeit befasst sich mit der Eignung KNN-basierter Modelle als Bauelemente von Klimaregelungstrategien in Gewächshäusern. Es werden zwei Modelle vorgestellt: Ein Modell zur kurzzeitigen Voraussage des Gewächshausklimas (Lufttemperatur und relative Feuchtigkeit, in Minuten-Zeiträumen), und Modell zur Einschätzung von phytometrischen Signalen (Blatttemperatur, Transpirationsrate und Photosyntheserate). Eine Datenbank, die drei Kulturjahre umfasste (Kultur: Tomato), wurde zur Modellbildung bzw. -test benutzt. Es wurde festgestellt, dass die ANN-basierte Modelle sehr stark auf die Auswahl der Metaparameter und Netzarchitektur reagieren, und dass sie auch mit derselben Architektur verschiedene Kalkulationsergebnisse liefern können. Nichtsdestotrotz, hat sich diese Art von Modellen als geeignet zur Einschätzung komplexer Pflanzensignalen sowie zur Mikroklimavoraussage erwiesen. Zwei zusätzliche Möglichkeiten zur Erstellung von komplexen Simulationen sind in der Arbeit enthalten, und zwar zur Klimavoraussage in längerer Perioden und zur Voraussage der Photosyntheserate. Die Arbeit kommt zum Ergebnis, dass die Verwendung von KNN-Modellen für neue Gewächshaussteuerungstrategien geeignet ist, da sie robust sind und mit der Systemskomplexität gut zurechtkommen. Allerdings muss beachtet werden, dass Probleme und Schwierigkeiten auftreten können. Diese Arbeit weist auf die Relevanz der Netzarchitektur, die erforderlichen großen Datenmengen zur Modellbildung und Probleme mit verschiedenen Zeitkonstanten im Gewächshaus hin. / One facet of the current developments in precision horticulture is the highly technified production under cover. The intensive production in modern greenhouses heavily relies on instrumentation and control techniques to automate many tasks. Among these techniques are control strategies, which can also include some methods developed within the field of Artificial Intelligence. This document presents research on Artificial Neural Networks (ANN), a technique derived from Artificial Intelligence, and aims to shed light on their applicability in greenhouse vegetable production. In particular, this work focuses on the suitability of ANN-based models for greenhouse environmental control. To this end, two models were built: A short-term climate prediction model (air temperature and relative humidity in time scale of minutes), and a model of the plant response to the climate, the latter regarding phytometric measurements of leaf temperature, transpiration rate and photosynthesis rate. A dataset comprising three years of tomato cultivation was used to build and test the models. It was found that this kind of models is very sensitive to the fine-tuning of the metaparameters and that they can produce different results even with the same architecture. Nevertheless, it was shown that ANN are useful to simulate complex biological signals and to estimate future microclimate trends. Furthermore, two connection schemes are proposed to assemble several models in order to generate more complex simulations, like long-term prediction chains and photosynthesis forecasts. It was concluded that ANN could be used in greenhouse automation systems as part of the control strategy, as they are robust and can cope with the complexity of the system. However, a number of problems and difficulties are pointed out, including the importance of the architecture, the need for large datasets to build the models and problems arising from different time constants in the whole greenhouse system. Gartenbau Gewächshaus Tomate Hydrokultur Photosynthese Klimamodelle Künstliche Intelligenz Künstliche Neuronale Netze KI KNN Horticulture Greenhouse Tomato Hydroponics Photosynthesis Climate Models Artificial Intelligence Artificial Neural Networks ANN AI Intensive Horticulture 004 Datenverarbeitung; Informatik ST 300 ZC 51300 ZD 56000 ddc:630 ddc:004
19	Development and application of new statistical methods for the analysis of multiple phenotypes to investigate genetic associations with cardiometabolic traits Konigorski, Stefan 27 April 2018 (has links) Die biotechnologischen Entwicklungen der letzten Jahre ermöglichen eine immer detailliertere Untersuchung von genetischen und molekularen Markern mit multiplen komplexen Traits. Allerdings liefern vorhandene statistische Methoden für diese komplexen Analysen oft keine valide Inferenz. Das erste Ziel der vorliegenden Arbeit ist, zwei neue statistische Methoden für Assoziationsstudien von genetischen Markern mit multiplen Phänotypen zu entwickeln, effizient und robust zu implementieren, und im Vergleich zu existierenden statistischen Methoden zu evaluieren. Der erste Ansatz, C-JAMP (Copula-based Joint Analysis of Multiple Phenotypes), ermöglicht die Assoziation von genetischen Varianten mit multiplen Traits in einem gemeinsamen Copula Modell zu untersuchen. Der zweite Ansatz, CIEE (Causal Inference using Estimating Equations), ermöglicht direkte genetische Effekte zu schätzen und testen. C-JAMP wird in dieser Arbeit für Assoziationsstudien von seltenen genetischen Varianten mit quantitativen Traits evaluiert, und CIEE für Assoziationsstudien von häufigen genetischen Varianten mit quantitativen Traits und Ereigniszeiten. Die Ergebnisse von umfangreichen Simulationsstudien zeigen, dass beide Methoden unverzerrte und effiziente Parameterschätzer liefern und die statistische Power von Assoziationstests im Vergleich zu existierenden Methoden erhöhen können - welche ihrerseits oft keine valide Inferenz liefern. Für das zweite Ziel dieser Arbeit, neue genetische und transkriptomische Marker für kardiometabolische Traits zu identifizieren, werden zwei Studien mit genom- und transkriptomweiten Daten mit C-JAMP und CIEE analysiert. In den Analysen werden mehrere neue Kandidatenmarker und -gene für Blutdruck und Adipositas identifiziert. Dies unterstreicht den Wert, neue statistische Methoden zu entwickeln, evaluieren, und implementieren. Für beide entwickelten Methoden sind R Pakete verfügbar, die ihre Anwendung in zukünftigen Studien ermöglichen. / In recent years, the biotechnological advancements have allowed to investigate associations of genetic and molecular markers with multiple complex phenotypes in much greater depth. However, for the analysis of such complex datasets, available statistical methods often don’t yield valid inference. The first aim of this thesis is to develop two novel statistical methods for association analyses of genetic markers with multiple phenotypes, to implement them in a computationally efficient and robust manner so that they can be used for large-scale analyses, and evaluate them in comparison to existing statistical approaches under realistic scenarios. The first approach, called the copula-based joint analysis of multiple phenotypes (C-JAMP) method, allows investigating genetic associations with multiple traits in a joint copula model and is evaluated for genetic association analyses of rare genetic variants with quantitative traits. The second approach, called the causal inference using estimating equations (CIEE) method, allows estimating and testing direct genetic effects in directed acyclic graphs, and is evaluated for association analyses of common genetic variants with quantitative and time-to-event traits. The results of extensive simulation studies show that both approaches yield unbiased and efficient parameter estimators and can improve the power of association tests in comparison to existing approaches, which yield invalid inference in many scenarios. For the second goal of this thesis, to identify novel genetic and transcriptomic markers associated with cardiometabolic traits, C-JAMP and CIEE are applied in two large-scale studies including genome- and transcriptome-wide data. In the analyses, several novel candidate markers and genes are identified, which highlights the merit of developing, evaluating, and implementing novel statistical approaches. R packages are available for both methods and enable their application in future studies. Genomweite Assoziationsstudien Multiple Phänotypen Copula Modelle Kausale Inferenz Kardiometabolische Traits Seltene genetische Varianten R Pakete RNA Sequenzierung Genome-wide association studies Multiple phenotypes Copula models Causal inference Cardiometabolic traits Rare genetic variants R packages RNA Sequencing 004 Datenverarbeitung; Informatik 576 Genetik und Evolution 610 Medizin und Gesundheit WC 7700 ddc:519 ddc:004 ddc:576 ddc:610
20	Scalable and Efficient Analysis of Large High-Dimensional Data Sets in the Context of Recurrence Analysis Rawald, Tobias 13 February 2018 (has links) Die Recurrence Quantification Analysis (RQA) ist eine Methode aus der nicht-linearen Zeitreihenanalyse. Im Mittelpunkt dieser Methode steht die Auswertung des Inhalts sogenannter Rekurrenzmatrizen. Bestehende Berechnungsansätze zur Durchführung der RQA können entweder nur Zeitreihen bis zu einer bestimmten Länge verarbeiten oder benötigen viel Zeit zur Analyse von sehr langen Zeitreihen. Diese Dissertation stellt die sogenannte skalierbare Rekurrenzanalyse (SRA) vor. Sie ist ein neuartiger Berechnungsansatz, der eine gegebene Rekurrenzmatrix in mehrere Submatrizen unterteilt. Jede Submatrix wird von einem Berechnungsgerät in massiv-paralleler Art und Weise untersucht. Dieser Ansatz wird unter Verwendung der OpenCL-Schnittstelle umgesetzt. Anhand mehrerer Experimente wird demonstriert, dass SRA massive Leistungssteigerungen im Vergleich zu existierenden Berechnungsansätzen insbesondere durch den Einsatz von Grafikkarten ermöglicht. Die Dissertation enthält eine ausführliche Evaluation, die den Einfluss der Anwendung mehrerer Datenbankkonzepte, wie z.B. die Repräsentation der Eingangsdaten, auf die RQA-Verarbeitungskette analysiert. Es wird untersucht, inwiefern unterschiedliche Ausprägungen dieser Konzepte Einfluss auf die Effizienz der Analyse auf verschiedenen Berechnungsgeräten haben. Abschließend wird ein automatischer Optimierungsansatz vorgestellt, der performante RQA-Implementierungen für ein gegebenes Analyseszenario in Kombination mit einer Hardware-Plattform dynamisch bestimmt. Neben anderen Aspekten werden drastische Effizienzgewinne durch den Einsatz des Optimierungsansatzes aufgezeigt. / Recurrence quantification analysis (RQA) is a method from nonlinear time series analysis. It relies on the identification of line structures within so-called recurrence matrices and comprises a set of scalar measures. Existing computing approaches to RQA are either not capable of processing recurrence matrices exceeding a certain size or suffer from long runtimes considering time series that contain hundreds of thousands of data points. This thesis introduces scalable recurrence analysis (SRA), which is an alternative computing approach that subdivides a recurrence matrix into multiple sub matrices. Each sub matrix is processed individually in a massively parallel manner by a single compute device. This is implemented exemplarily using the OpenCL framework. It is shown that this approach delivers considerable performance improvements in comparison to state-of-the-art RQA software by exploiting the computing capabilities of many-core hardware architectures, in particular graphics cards. The usage of OpenCL allows to execute identical SRA implementations on a variety of hardware platforms having different architectural properties. An extensive evaluation analyses the impact of applying concepts from database technology, such memory storage layouts, to the RQA processing pipeline. It is investigated how different realisations of these concepts affect the performance of the computations on different types of compute devices. Finally, an approach based on automatic performance tuning is introduced that automatically selects well-performing RQA implementations for a given analytical scenario on specific computing hardware. Among others, it is demonstrated that the customised auto-tuning approach allows to considerably increase the efficiency of the processing by adapting the implementation selection. Paralleles Rechnen Paralleler Algorithmus Maschinelles Lernen Rekurrenzanalyse Nichtlineare Zeitreihenanalyse parallel computing parallel algorithm machine learning recurrence analysis nonlinear time series analysis 004 Datenverarbeitung; Informatik SK 845 ST 530 ddc:004 ddc:000 ddc:005

Search results