321 |
Integrating Natural Language Processing (NLP) and Language Resources Using Linked DataHellmann, Sebastian 12 January 2015 (has links) (PDF)
This thesis is a compendium of scientific works and engineering
specifications that have been contributed to a large community of
stakeholders to be copied, adapted, mixed, built upon and exploited in
any way possible to achieve a common goal: Integrating Natural Language
Processing (NLP) and Language Resources Using Linked Data
The explosion of information technology in the last two decades has led
to a substantial growth in quantity, diversity and complexity of
web-accessible linguistic data. These resources become even more useful
when linked with each other and the last few years have seen the
emergence of numerous approaches in various disciplines concerned with
linguistic resources and NLP tools. It is the challenge of our time to
store, interlink and exploit this wealth of data accumulated in more
than half a century of computational linguistics, of empirical,
corpus-based study of language, and of computational lexicography in all
its heterogeneity.
The vision of the Giant Global Graph (GGG) was conceived by Tim
Berners-Lee aiming at connecting all data on the Web and allowing to
discover new relations between this openly-accessible data. This vision
has been pursued by the Linked Open Data (LOD) community, where the
cloud of published datasets comprises 295 data repositories and more
than 30 billion RDF triples (as of September 2011).
RDF is based on globally unique and accessible URIs and it was
specifically designed to establish links between such URIs (or
resources). This is captured in the Linked Data paradigm that postulates
four rules: (1) Referred entities should be designated by URIs, (2)
these URIs should be resolvable over HTTP, (3) data should be
represented by means of standards such as RDF, (4) and a resource should
include links to other resources.
Although it is difficult to precisely identify the reasons for the
success of the LOD effort, advocates generally argue that open licenses
as well as open access are key enablers for the growth of such a network
as they provide a strong incentive for collaboration and contribution by
third parties. In his keynote at BNCOD 2011, Chris Bizer argued that
with RDF the overall data integration effort can be “split between data
publishers, third parties, and the data consumer”, a claim that can be
substantiated by observing the evolution of many large data sets
constituting the LOD cloud.
As written in the acknowledgement section, parts of this thesis has
received numerous feedback from other scientists, practitioners and
industry in many different ways. The main contributions of this thesis
are summarized here:
Part I – Introduction and Background.
During his keynote at the Language Resource and Evaluation Conference in
2012, Sören Auer stressed the decentralized, collaborative, interlinked
and interoperable nature of the Web of Data. The keynote provides strong
evidence that Semantic Web technologies such as Linked Data are on its
way to become main stream for the representation of language resources.
The jointly written companion publication for the keynote was later
extended as a book chapter in The People’s Web Meets NLP and serves as
the basis for “Introduction” and “Background”, outlining some stages of
the Linked Data publication and refinement chain. Both chapters stress
the importance of open licenses and open access as an enabler for
collaboration, the ability to interlink data on the Web as a key feature
of RDF as well as provide a discussion about scalability issues and
decentralization. Furthermore, we elaborate on how conceptual
interoperability can be achieved by (1) re-using vocabularies, (2) agile
ontology development, (3) meetings to refine and adapt ontologies and
(4) tool support to enrich ontologies and match schemata.
Part II - Language Resources as Linked Data.
“Linked Data in Linguistics” and “NLP & DBpedia, an Upward Knowledge
Acquisition Spiral” summarize the results of the Linked Data in
Linguistics (LDL) Workshop in 2012 and the NLP & DBpedia Workshop in
2013 and give a preview of the MLOD special issue. In total, five
proceedings – three published at CEUR (OKCon 2011, WoLE 2012, NLP &
DBpedia 2013), one Springer book (Linked Data in Linguistics, LDL 2012)
and one journal special issue (Multilingual Linked Open Data, MLOD to
appear) – have been (co-)edited to create incentives for scientists to
convert and publish Linked Data and thus to contribute open and/or
linguistic data to the LOD cloud. Based on the disseminated call for
papers, 152 authors contributed one or more accepted submissions to our
venues and 120 reviewers were involved in peer-reviewing.
“DBpedia as a Multilingual Language Resource” and “Leveraging the
Crowdsourcing of Lexical Resources for Bootstrapping a Linguistic Linked
Data Cloud” contain this thesis’ contribution to the DBpedia Project in
order to further increase the size and inter-linkage of the LOD Cloud
with lexical-semantic resources. Our contribution comprises extracted
data from Wiktionary (an online, collaborative dictionary similar to
Wikipedia) in more than four languages (now six) as well as
language-specific versions of DBpedia, including a quality assessment of
inter-language links between Wikipedia editions and internationalized
content negotiation rules for Linked Data. In particular the work
described in created the foundation for a DBpedia Internationalisation
Committee with members from over 15 different languages with the common
goal to push DBpedia as a free and open multilingual language resource.
Part III - The NLP Interchange Format (NIF).
“NIF 2.0 Core Specification”, “NIF 2.0 Resources and Architecture” and
“Evaluation and Related Work” constitute one of the main contribution of
this thesis. The NLP Interchange Format (NIF) is an RDF/OWL-based format
that aims to achieve interoperability between Natural Language
Processing (NLP) tools, language resources and annotations. The core
specification is included in and describes which URI schemes and RDF
vocabularies must be used for (parts of) natural language texts and
annotations in order to create an RDF/OWL-based interoperability layer
with NIF built upon Unicode Code Points in Normal Form C. In , classes
and properties of the NIF Core Ontology are described to formally define
the relations between text, substrings and their URI schemes. contains
the evaluation of NIF.
In a questionnaire, we asked questions to 13 developers using NIF. UIMA,
GATE and Stanbol are extensible NLP frameworks and NIF was not yet able
to provide off-the-shelf NLP domain ontologies for all possible domains,
but only for the plugins used in this study. After inspecting the
software, the developers agreed however that NIF is adequate enough to
provide a generic RDF output based on NIF using literal objects for
annotations. All developers were able to map the internal data structure
to NIF URIs to serialize RDF output (Adequacy). The development effort
in hours (ranging between 3 and 40 hours) as well as the number of code
lines (ranging between 110 and 445) suggest, that the implementation of
NIF wrappers is easy and fast for an average developer. Furthermore the
evaluation contains a comparison to other formats and an evaluation of
the available URI schemes for web annotation.
In order to collect input from the wide group of stakeholders, a total
of 16 presentations were given with extensive discussions and feedback,
which has lead to a constant improvement of NIF from 2010 until 2013.
After the release of NIF (Version 1.0) in November 2011, a total of 32
vocabulary employments and implementations for different NLP tools and
converters were reported (8 by the (co-)authors, including Wiki-link
corpus, 13 by people participating in our survey and 11 more, of
which we have heard). Several roll-out meetings and tutorials were held
(e.g. in Leipzig and Prague in 2013) and are planned (e.g. at LREC
2014).
Part IV - The NLP Interchange Format in Use.
“Use Cases and Applications for NIF” and “Publication of Corpora using
NIF” describe 8 concrete instances where NIF has been successfully used.
One major contribution in is the usage of NIF as the recommended RDF
mapping in the Internationalization Tag Set (ITS) 2.0 W3C standard
and the conversion algorithms from ITS to NIF and back. One outcome
of the discussions in the standardization meetings and telephone
conferences for ITS 2.0 resulted in the conclusion there was no
alternative RDF format or vocabulary other than NIF with the required
features to fulfill the working group charter. Five further uses of NIF
are described for the Ontology of Linguistic Annotations (OLiA), the
RDFaCE tool, the Tiger Corpus Navigator, the OntosFeeder and
visualisations of NIF using the RelFinder tool. These 8 instances
provide an implemented proof-of-concept of the features of NIF.
starts with describing the conversion and hosting of the huge Google
Wikilinks corpus with 40 million annotations for 3 million web sites.
The resulting RDF dump contains 477 million triples in a 5.6 GB
compressed dump file in turtle syntax. describes how NIF can be used to
publish extracted facts from news feeds in the RDFLiveNews tool as
Linked Data.
Part V - Conclusions.
provides lessons learned for NIF, conclusions and an outlook on future
work. Most of the contributions are already summarized above. One
particular aspect worth mentioning is the increasing number of
NIF-formated corpora for Named Entity Recognition (NER) that have come
into existence after the publication of the main NIF paper Integrating
NLP using Linked Data at ISWC 2013. These include the corpora converted
by Steinmetz, Knuth and Sack for the NLP & DBpedia workshop and an
OpenNLP-based CoNLL converter by Brümmer. Furthermore, we are aware of
three LREC 2014 submissions that leverage NIF: NIF4OGGD - NLP
Interchange Format for Open German Governmental Data, N^3 – A Collection
of Datasets for Named Entity Recognition and Disambiguation in the NLP
Interchange Format and Global Intelligent Content: Active Curation of
Language Resources using Linked Data as well as an early implementation
of a GATE-based NER/NEL evaluation framework by Dojchinovski and Kliegr.
Further funding for the maintenance, interlinking and publication of
Linguistic Linked Data as well as support and improvements of NIF is
available via the expiring LOD2 EU project, as well as the CSA EU
project called LIDER, which started in November 2013. Based on the
evidence of successful adoption presented in this thesis, we can expect
a decent to high chance of reaching critical mass of Linked Data
technology as well as the NIF standard in the field of Natural Language
Processing and Language Resources.
|
322 |
Vers plus d'automatisation dans la construction de systèmes mediateurs pour le web semantique : une application des logiques de description / Towards more automation in building mediator systems in the semantic web context : a description logic applicationNiang, Cheikh Ahmed Tidiane 05 July 2013 (has links)
Les travaux que nous présentons dans cette thèse concernent l’automatisation de la construction de systèmes médiateurs pour le web sémantique. L’intégration de données de manière générale et la médiation en particulier sont des processus qui visent à exploiter conjointement des sources d’information indépendantes, hétérogènes et distribuées. L’objectif final est de permettre à un utilisateur d’interroger ces données comme si elles provenaient d’un système unique et centralisé grâce à une interface d’interrogation uniforme basée sur un modèle du domaine d’application, appelé schéma global. Durant ces dernières années, beaucoup de projets de recherche se sont intéressés à cette problématique et de nombreux systèmes d’intégration ont été proposés. Cependant, la quantité d’intervention humaine nécessaire pour construire ces systèmes est beaucoup trop importante pour qu’il soit envisageable de les mettre en place dans bien des situations. De plus, face à la diversité et à l’évolution croissante des sources d’information apparaissent de nouveaux chalenges relatifs notamment à la flexibilité et à la rapidité d’accès à l’information. Nos propositions s’appuient sur les modèles et technologies du web sémantique. Cette généralisation du web qui est un vaste espace d’échange de ressources, non seulement entre êtres humains, mais également entre machines, offre par essence les moyens d’une automatisation des processus d’intégration. Ils reposent d’une part sur des langages et une infrastructure dont l’objectif est d’enrichir le web d’informations "sémantiques", et d’autre part sur des usages collaboratifs qui produisent des ressources ontologiques pertinentes et réutilisables. / This thesis is set in a research effort that aims to bring more automation in building mediator-based data integration systems in the semantic Web context. The mediator approach is a conceptual architecture of data integration that involves combining data residing in different sources and providing users with a unified view of these data. The problem of designing effective data integration solutions has been addressed by several researches, and well-known data integration projects have been developed during the 90’s. However, the building process of these systems rely heavily on human intervention so that it is difficult to implement them in many situations. Moreover, faced with the diversity and the increase of available information sources, the easiness and fastness of information access are new challenges. Our proposals are based on models and technologies of semantic web. The semantic web is recognized as a generalization of the current web which enables to find, combine and share resources, not only between humans but also between machines. It provides a good track for automating the integration process. Possibilities offered by the semantic web are based, on the one hand, on languages and an infrastructure aiming to enrich the web with "semantic" information and, on the other hand, on collaborative practices that allow the production of relevant and reusable ontological resources.
|
323 |
[en] TOWARDS A WELL-INTERLINKED WEB THROUGH MATCHING AND INTERLINKING APPROACHES / [pt] INTERLIGANDO RECURSOS NA WEB ATRAVÉS DE ABORDAGENS DE MATCHING E INTERLINKINGBERNARDO PEREIRA NUNES 07 January 2016 (has links)
[pt] Com o surgimento da Linked (Open) Data, uma série de novos e importantes
desafios de pesquisa vieram à tona. A abertura de dados, como muitas vezes a
Linked Data é conhecida, oferece uma oportunidade para integrar e conectar, de
forma homogênea, fontes de dados heterogêneas na Web. Como diferentes fontes
de dados, com recursos em comum ou relacionados, são publicados por diferentes
editores, a sua integração e consolidação torna-se um verdadeiro desafio. Outro
desafio advindo da Linked Data está na criação de um grafo denso de dados na
Web. Com isso, a identificação e interligação, não só de recursos idênticos, mas
também dos recursos relacionadas na Web, provê ao consumidor (data consumer)
uma representação mais rica dos dados e a possibilidade de exploração dos recursos
conectados. Nesta tese, apresentamos três abordagens para enfrentar os problemas
de integração, consolidação e interligação de dados. Nossa primeira abordagem
combina técnicas de informação mútua e programação genética para solucionar o
problema de alinhamento complexo entre fontes de dados, um problema raramente
abordado na literatura. Na segunda e terceira abordagens, adotamos e ampliamos
uma métrica utilizada em teoria de redes sociais para enfrentar o problema de
consolidação e interligação de dados. Além disso, apresentamos um aplicativo Web
chamado Cite4Me que fornece uma nova perspectiva sobre a pesquisa e recuperação
de conjuntos de Linked Open Data, bem como os benefícios da utilização de nossas
abordagens. Por fim, uma série de experimentos utilizando conjuntos de dados reais
demonstram que as nossas abordagens superam abordagens consideradas como
estado da arte. / [en] With the emergence of Linked (Open) Data, a number of novel and notable
research challenges have been raised. The openness that often characterises Linked
Data offers an opportunity to homogeneously integrate and connect heterogeneous
data sources on the Web. As disparate data sources with overlapping or related resources
are provided by different data publishers, their integration and consolidation
becomes a real challenge. An additional challenge of Linked Data lies in the creation
of a well-interlinked graph of Web data. Identifying and linking not only identical
Web resources, but also lateral Web resources, provides the data consumer with
richer representation of the data and the possibility of exploiting connected resources.
In this thesis, we present three approaches that tackle data integration, consolidation
and linkage problems. Our first approach combines mutual information and genetic
programming techniques for complex datatype property matching, a rarely addressed
problem in the literature. In the second and third approaches, we adopt and extend a
measure from social network theory to address data consolidation and interlinking.
Furthermore, we present a Web-based application named Cite4Me that provides
a new perspective on search and retrieval of Linked Open Data sets, as well as
the benefits of using our approaches. Finally, we validate our approaches through
extensive evaluations using real-world datasets, reporting results that outperform
state of the art approaches.
|
324 |
Integration and analysis of phenotypic data from functional screensPaszkowski-Rogacz, Maciej 29 November 2010 (has links)
Motivation: Although various high-throughput technologies provide a lot of valuable information, each of them is giving an insight into different aspects of cellular activity and each has its own limitations. Thus, a complete and systematic understanding of the cellular machinery can be achieved only by a combined analysis of results coming from different approaches. However, methods and tools for integration and analysis of heterogenous biological data still have to be developed.
Results: This work presents systemic analysis of basic cellular processes, i.e. cell viability and cell cycle, as well as embryonic stem cell pluripotency and differentiation. These phenomena were studied using several high-throughput technologies, whose combined results were analysed with existing and novel clustering and hit selection algorithms.
This thesis also introduces two novel data management and data analysis tools. The first, called DSViewer, is a database application designed for integrating and querying results coming from various genome-wide experiments. The second, named PhenoFam, is an application performing gene set enrichment analysis by employing structural and functional information on families of protein domains as annotation terms. Both programs are accessible through a web interface.
Conclusions: Eventually, investigations presented in this work provide the research community with novel and markedly improved repertoire of computational tools and methods that facilitate the systematic analysis of accumulated information obtained from high-throughput studies into novel biological insights.
|
325 |
Geometrische und stochastische Modelle zur Verarbeitung von 3D-Kameradaten am Beispiel menschlicher BewegungsanalysenWestfeld, Patrick 08 May 2012 (has links)
Die dreidimensionale Erfassung der Form und Lage eines beliebigen Objekts durch die flexiblen Methoden und Verfahren der Photogrammetrie spielt für ein breites Spektrum technisch-industrieller und naturwissenschaftlicher Einsatzgebiete eine große Rolle. Die Anwendungsmöglichkeiten reichen von Messaufgaben im Automobil-, Maschinen- und Schiffbau über die Erstellung komplexer 3D-Modelle in Architektur, Archäologie und Denkmalpflege bis hin zu Bewegungsanalysen in Bereichen der Strömungsmesstechnik, Ballistik oder Medizin. In der Nahbereichsphotogrammetrie werden dabei verschiedene optische 3D-Messsysteme verwendet. Neben flächenhaften Halbleiterkameras im Einzel- oder Mehrbildverband kommen aktive Triangulationsverfahren zur Oberflächenmessung mit z.B. strukturiertem Licht oder Laserscanner-Systeme zum Einsatz.
3D-Kameras auf der Basis von Photomischdetektoren oder vergleichbaren Prinzipien erzeugen durch die Anwendung von Modulationstechniken zusätzlich zu einem Grauwertbild simultan ein Entfernungsbild. Als Einzelbildsensoren liefern sie ohne die Notwendigkeit einer stereoskopischen Zuordnung räumlich aufgelöste Oberflächendaten in Videorate. In der 3D-Bewegungsanalyse ergeben sich bezüglich der Komplexität und des Rechenaufwands erhebliche Erleichterungen. 3D-Kameras verbinden die Handlichkeit einer Digitalkamera mit dem Potential der dreidimensionalen Datenakquisition etablierter Oberflächenmesssysteme. Sie stellen trotz der noch vergleichsweise geringen räumlichen Auflösung als monosensorielles System zur Echtzeit-Tiefenbildakquisition eine interessante Alternative für Aufgabenstellungen der 3D-Bewegungsanalyse dar.
Der Einsatz einer 3D-Kamera als Messinstrument verlangt die Modellierung von Abweichungen zum idealen Abbildungsmodell; die Verarbeitung der erzeugten 3D-Kameradaten bedingt die zielgerichtete Adaption, Weiter- und Neuentwicklung von Verfahren der Computer Vision und Photogrammetrie. Am Beispiel der Untersuchung des zwischenmenschlichen Bewegungsverhaltens sind folglich die Entwicklung von Verfahren zur Sensorkalibrierung und zur 3D-Bewegungsanalyse die Schwerpunkte der Dissertation. Eine 3D-Kamera stellt aufgrund ihres inhärenten Designs und Messprinzips gleichzeitig Amplituden- und Entfernungsinformationen zur Verfügung, welche aus einem Messsignal rekonstruiert werden. Die simultane Einbeziehung aller 3D-Kamerainformationen in jeweils einen integrierten Ansatz ist eine logische Konsequenz und steht im Vordergrund der Verfahrensentwicklungen. Zum einen stützen sich die komplementären Eigenschaften der Beobachtungen durch die Herstellung des funktionalen Zusammenhangs der Messkanäle gegenseitig, wodurch Genauigkeits- und Zuverlässigkeitssteigerungen zu erwarten sind. Zum anderen gewährleistet das um eine Varianzkomponentenschätzung erweiterte stochastische Modell eine vollständige Ausnutzung des heterogenen Informationshaushalts.
Die entwickelte integrierte Bündelblockausgleichung ermöglicht die Bestimmung der exakten 3D-Kamerageometrie sowie die Schätzung der distanzmessspezifischen Korrekturparameter zur Modellierung linearer, zyklischer und signalwegeffektbedingter Fehleranteile einer 3D-Kamerastreckenmessung. Die integrierte Kalibrierroutine gleicht in beiden Informationskanälen gemessene Größen gemeinsam, unter der automatischen Schätzung optimaler Beobachtungsgewichte, aus. Die Methode basiert auf dem flexiblen Prinzip einer Selbstkalibrierung und benötigt keine Objektrauminformation, wodurch insbesondere die aufwendige Ermittlung von Referenzstrecken übergeordneter Genauigkeit entfällt. Die durchgeführten Genauigkeitsuntersuchungen bestätigen die Richtigkeit der aufgestellten funktionalen Zusammenhänge, zeigen aber auch Schwächen aufgrund noch nicht parametrisierter distanzmessspezifischer Fehler. Die Adaptivität und die modulare Implementierung des entwickelten mathematischen Modells gewährleisten aber eine zukünftige Erweiterung. Die Qualität der 3D-Neupunktkoordinaten kann nach einer Kalibrierung mit 5 mm angegeben werden. Für die durch eine Vielzahl von meist simultan auftretenden Rauschquellen beeinflusste Tiefenbildtechnologie ist diese Genauigkeitsangabe sehr vielversprechend, vor allem im Hinblick auf die Entwicklung von auf korrigierten 3D-Kameradaten aufbauenden Auswertealgorithmen.
2,5D Least Squares Tracking (LST) ist eine im Rahmen der Dissertation entwickelte integrierte spatiale und temporale Zuordnungsmethode zur Auswertung von 3D-Kamerabildsequenzen. Der Algorithmus basiert auf der in der Photogrammetrie bekannten Bildzuordnung nach der Methode der kleinsten Quadrate und bildet kleine Oberflächensegmente konsekutiver 3D-Kameradatensätze aufeinander ab. Die Abbildungsvorschrift wurde, aufbauend auf einer 2D-Affintransformation, an die Datenstruktur einer 3D-Kamera angepasst. Die geschlossen formulierte Parametrisierung verknüpft sowohl Grau- als auch Entfernungswerte in einem integrierten Modell. Neben den affinen Parametern zur Erfassung von Translations- und Rotationseffekten, modellieren die Maßstabs- sowie Neigungsparameter perspektivbedingte Größenänderungen des Bildausschnitts, verursacht durch Distanzänderungen in Aufnahmerichtung. Die Eingabedaten sind in einem Vorverarbeitungsschritt mit Hilfe der entwickelten Kalibrierroutine um ihre opto- und distanzmessspezifischen Fehler korrigiert sowie die gemessenen Schrägstrecken auf Horizontaldistanzen reduziert worden. 2,5D-LST liefert als integrierter Ansatz vollständige 3D-Verschiebungsvektoren. Weiterhin können die aus der Fehlerrechnung resultierenden Genauigkeits- und Zuverlässigkeitsangaben als Entscheidungskriterien für die Integration in einer anwendungsspezifischen Verarbeitungskette Verwendung finden. Die Validierung des Verfahrens zeigte, dass die Einführung komplementärer Informationen eine genauere und zuverlässigere Lösung des Korrespondenzproblems bringt, vor allem bei schwierigen Kontrastverhältnissen in einem Kanal. Die Genauigkeit der direkt mit den Distanzkorrekturtermen verknüpften Maßstabs- und Neigungsparameter verbesserte sich deutlich. Darüber hinaus brachte die Erweiterung des geometrischen Modells insbesondere bei der Zuordnung natürlicher, nicht gänzlich ebener Oberflächensegmente signifikante Vorteile.
Die entwickelte flächenbasierte Methode zur Objektzuordnung und Objektverfolgung arbeitet auf der Grundlage berührungslos aufgenommener 3D-Kameradaten. Sie ist somit besonders für Aufgabenstellungen der 3D-Bewegungsanalyse geeignet, die den Mehraufwand einer multiokularen Experimentalanordnung und die Notwendigkeit einer Objektsignalisierung mit Zielmarken vermeiden möchten. Das Potential des 3D-Kamerazuordnungsansatzes wurde an zwei Anwendungsszenarien der menschlichen Verhaltensforschung demonstriert. 2,5D-LST kam zur Bestimmung der interpersonalen Distanz und Körperorientierung im erziehungswissenschaftlichen Untersuchungsgebiet der Konfliktregulation befreundeter Kindespaare ebenso zum Einsatz wie zur Markierung und anschließenden Klassifizierung von Bewegungseinheiten sprachbegleitender Handgesten. Die Implementierung von 2,5D-LST in die vorgeschlagenen Verfahren ermöglichte eine automatische, effektive, objektive sowie zeitlich und räumlich hochaufgelöste Erhebung und Auswertung verhaltensrelevanter Daten.
Die vorliegende Dissertation schlägt die Verwendung einer neuartigen 3D-Tiefenbildkamera zur Erhebung menschlicher Verhaltensdaten vor. Sie präsentiert sowohl ein zur Datenaufbereitung entwickeltes Kalibrierwerkzeug als auch eine Methode zur berührungslosen Bestimmung dichter 3D-Bewegungsvektorfelder. Die Arbeit zeigt, dass die Methoden der Photogrammetrie auch für bewegungsanalytische Aufgabenstellungen auf dem bisher noch wenig erschlossenen Gebiet der Verhaltensforschung wertvolle Ergebnisse liefern können. Damit leistet sie einen Beitrag für die derzeitigen Bestrebungen in der automatisierten videographischen Erhebung von Körperbewegungen in dyadischen Interaktionen. / The three-dimensional documentation of the form and location of any type of object using flexible photogrammetric methods and procedures plays a key role in a wide range of technical-industrial and scientific areas of application. Potential applications include measurement tasks in the automotive, machine building and ship building sectors, the compilation of complex 3D models in the fields of architecture, archaeology and monumental preservation and motion analyses in the fields of flow measurement technology, ballistics and medicine. In the case of close-range photogrammetry a variety of optical 3D measurement systems are used. Area sensor cameras arranged in single or multi-image configurations are used besides active triangulation procedures for surface measurement (e.g. using structured light or laser scanner systems).
The use of modulation techniques enables 3D cameras based on photomix detectors or similar principles to simultaneously produce both a grey value image and a range image. Functioning as single image sensors, they deliver spatially resolved surface data at video rate without the need for stereoscopic image matching. In the case of 3D motion analyses in particular, this leads to considerable reductions in complexity and computing time. 3D cameras combine the practicality of a digital camera with the 3D data acquisition potential of conventional surface measurement systems. Despite the relatively low spatial resolution currently achievable, as a monosensory real-time depth image acquisition system they represent an interesting alternative in the field of 3D motion analysis.
The use of 3D cameras as measuring instruments requires the modelling of deviations from the ideal projection model, and indeed the processing of the 3D camera data generated requires the targeted adaptation, development and further development of procedures in the fields of computer graphics and photogrammetry. This Ph.D. thesis therefore focuses on the development of methods of sensor calibration and 3D motion analysis in the context of investigations into inter-human motion behaviour. As a result of its intrinsic design and measurement principle, a 3D camera simultaneously provides amplitude and range data reconstructed from a measurement signal. The simultaneous integration of all data obtained using a 3D camera into an integrated approach is a logical consequence and represents the focus of current procedural development. On the one hand, the complementary characteristics of the observations made support each other due to the creation of a functional context for the measurement channels, with is to be expected to lead to increases in accuracy and reliability. On the other, the expansion of the stochastic model to include variance component estimation ensures that the heterogeneous information pool is fully exploited.
The integrated bundle adjustment developed facilitates the definition of precise 3D camera geometry and the estimation of range-measurement-specific correction parameters required for the modelling of the linear, cyclical and latency defectives of a distance measurement made using a 3D camera. The integrated calibration routine jointly adjusts appropriate dimensions across both information channels, and also automatically estimates optimum observation weights. The method is based on the same flexible principle used in self-calibration, does not require spatial object data and therefore foregoes the time-consuming determination of reference distances with superior accuracy. The accuracy analyses carried out confirm the correctness of the proposed functional contexts, but nevertheless exhibit weaknesses in the form of non-parameterized range-measurement-specific errors. This notwithstanding, the future expansion of the mathematical model developed is guaranteed due to its adaptivity and modular implementation. The accuracy of a new 3D point coordinate can be set at 5 mm further to calibration. In the case of depth imaging technology – which is influenced by a range of usually simultaneously occurring noise sources – this level of accuracy is very promising, especially in terms of the development of evaluation algorithms based on corrected 3D camera data.
2.5D Least Squares Tracking (LST) is an integrated spatial and temporal matching method developed within the framework of this Ph.D. thesis for the purpose of evaluating 3D camera image sequences. The algorithm is based on the least squares image matching method already established in photogrammetry, and maps small surface segments of consecutive 3D camera data sets on top of one another. The mapping rule has been adapted to the data structure of a 3D camera on the basis of a 2D affine transformation. The closed parameterization combines both grey values and range values in an integrated model. In addition to the affine parameters used to include translation and rotation effects, the scale and inclination parameters model perspective-related deviations caused by distance changes in the line of sight. A pre-processing phase sees the calibration routine developed used to correct optical and distance-related measurement specific errors in input data and measured slope distances reduced to horizontal distances. 2.5D LST is an integrated approach, and therefore delivers fully three-dimensional displacement vectors. In addition, the accuracy and reliability data generated by error calculation can be used as decision criteria for integration into an application-specific processing chain. Process validation showed that the integration of complementary data leads to a more accurate, reliable solution to the correspondence problem, especially in the case of difficult contrast ratios within a channel. The accuracy of scale and inclination parameters directly linked to distance correction terms improved dramatically. In addition, the expansion of the geometric model led to significant benefits, and in particular for the matching of natural, not entirely planar surface segments.
The area-based object matching and object tracking method developed functions on the basis of 3D camera data gathered without object contact. It is therefore particularly suited to 3D motion analysis tasks in which the extra effort involved in multi-ocular experimental settings and the necessity of object signalling using target marks are to be avoided. The potential of the 3D camera matching approach has been demonstrated in two application scenarios in the field of research into human behaviour. As in the case of the use of 2.5D LST to mark and then classify hand gestures accompanying verbal communication, the implementation of 2.5D LST in the proposed procedures for the determination of interpersonal distance and body orientation within the framework of pedagogical research into conflict regulation between pairs of child-age friends facilitates the automatic, effective, objective and high-resolution (from both a temporal and spatial perspective) acquisition and evaluation of data with relevance to behaviour.
This Ph.D. thesis proposes the use of a novel 3D range imaging camera to gather data on human behaviour, and presents both a calibration tool developed for data processing purposes and a method for the contact-free determination of dense 3D motion vector fields. It therefore makes a contribution to current efforts in the field of the automated videographic documentation of bodily motion within the framework of dyadic interaction, and shows that photogrammetric methods can also deliver valuable results within the framework of motion evaluation tasks in the as-yet relatively untapped field of behavioural research.
|
326 |
Projekt SiPro - Etablierung einer durchgehenden Simulationsprozesskette in der SchwerindustrieKittner, Kai 24 May 2023 (has links)
Der Beitrag befasst sich mit einem Überblick über das Forschungsprojekt SiPro, im Rahmen der
Förderung des Bundesministeriums für Wirtschaft und Klimapolitik. Ziel des Projektes ist es, einen
Durchbruch in der horizontalen Datenintegration entlang der Prozesskette Gießen, Schmieden,
Wärmebehandlung und den nachfolgenden Prozessen, wie Walzen, Gesenkschmieden und Ziehen zu
erreichen. Ausgehend davon ist eine Optimierung der Prozesse möglich, was letztendlich zur Einsparung des
Energieeinsatzes und damit zu einer verbesserten Wettbewerbsfähigkeit verbunden mit einer Reduzierung
des CO2-Ausstoßes führt.
|
327 |
Multimodal Data Management in Open-world EnvironmentK M A Solaiman (16678431) 02 August 2023 (has links)
<p>The availability of abundant multimodal data, including textual, visual, and sensor-based information, holds the potential to improve decision-making in diverse domains. Extracting data-driven decision-making information from heterogeneous and changing datasets in real-world data-centric applications requires achieving complementary functionalities of multimodal data integration, knowledge extraction and mining, situationally-aware data recommendation to different users, and uncertainty management in the open-world setting. To achieve a system that encompasses all of these functionalities, several challenges need to be effectively addressed: (1) How to represent and analyze heterogeneous source contents and application context for multimodal data recommendation? (2) How to predict and fulfill current and future needs as new information streams in without user intervention? (3) How to integrate disconnected data sources and learn relevant information to specific mission needs? (4) How to scale from processing petabytes of data to exabytes? (5) How to deal with uncertainties in open-world that stem from changes in data sources and user requirements?</p>
<p><br></p>
<p>This dissertation tackles these challenges by proposing novel frameworks, learning-based data integration and retrieval models, and algorithms to empower decision-makers to extract valuable insights from diverse multimodal data sources. The contributions of this dissertation can be summarized as follows: (1) We developed SKOD, a novel multimodal knowledge querying framework that overcomes the data representation, scalability, and data completeness issues while utilizing streaming brokers and RDBMS capabilities with entity-centric semantic features as an effective representation of content and context. Additionally, as part of the framework, a novel text attribute recognition model called HART was developed, which leveraged language models and syntactic properties of large unstructured texts. (2) In the SKOD framework, we incrementally proposed three different approaches for data integration of the disconnected sources from their semantic features to build a common knowledge base with the user information need: (i) EARS: A mediator approach using schema mapping of the semantic features and SQL joins was proposed to address scalability challenges in data integration; (ii) FemmIR: A data integration approach for more susceptible and flexible applications, that utilizes neural network-based graph matching techniques to learn coordinated graph representations of the data. It introduces a novel graph creation approach from the features and a novel similarity metric among data sources; (iii) WeSJem: This approach allows zero-shot similarity matching and data discovery by using contrastive learning<br>
to embed data samples and query examples in a high-dimensional space using features as a novel source of supervision instead of relevance labels. (3) Finally, to manage uncertainties in multimodal data management for open-world environments, we characterized novelties in multimodal information retrieval based on data drift. Moreover, we proposed a novelty detection and adaptation technique as an augmentation to WeSJem.<br>
</p>
<p>The effectiveness of the proposed frameworks, models, and algorithms was demonstrated<br>
through real-world system prototypes that solved open problems requiring large-scale human<br>
endeavors and computational resources. Specifically, these prototypes assisted law enforcement officers in automating investigations and finding missing persons.<br>
</p>
|
328 |
CyberWater: An open framework for data and model integrationRanran Chen (18423792) 03 June 2024 (has links)
<p dir="ltr">Workflow management systems (WMSs) are commonly used to organize/automate sequences of tasks as workflows to accelerate scientific discoveries. During complex workflow modeling, a local interactive workflow environment is desirable, as users usually rely on their rich, local environments for fast prototyping and refinements before they consider using more powerful computing resources.</p><p dir="ltr">This dissertation delves into the innovative development of the CyberWater framework based on Workflow Management Systems (WMSs). Against the backdrop of data-intensive and complex models, CyberWater exemplifies the transition of intricate data into insightful and actionable knowledge and introduces the nuanced architecture of CyberWater, particularly focusing on its adaptation and enhancement from the VisTrails system. It highlights the significance of control and data flow mechanisms and the introduction of new data formats for effective data processing within the CyberWater framework.</p><p dir="ltr">This study presents an in-depth analysis of the design and implementation of Generic Model Agent Toolkits. The discussion centers on template-based component mechanisms and the integration with popular platforms, while emphasizing the toolkit’s ability to facilitate on-demand access to High-Performance Computing resources for large-scale data handling. Besides, the development of an asynchronously controlled workflow within CyberWater is also explored. This innovative approach enhances computational performance by optimizing pipeline-level parallelism and allows for on-demand submissions of HPC jobs, significantly improving the efficiency of data processing.</p><p dir="ltr">A comprehensive methodology for model-driven development and Python code integration within the CyberWater framework and innovative applications of GPT models for automated data retrieval are introduced in this research as well. It examines the implementation of Git Actions for system automation in data retrieval processes and discusses the transformation of raw data into a compatible format, enhancing the adaptability and reliability of the data retrieval component in the adaptive generic model agent toolkit component.</p><p dir="ltr">For the development and maintenance of software within the CyberWater framework, the use of tools like GitHub for version control and outlining automated processes has been applied for software updates and error reporting. Except that, the user data collection also emphasizes the role of the CyberWater Server in these processes.</p><p dir="ltr">In conclusion, this dissertation presents our comprehensive work on the CyberWater framework's advancements, setting new standards in scientific workflow management and demonstrating how technological innovation can significantly elevate the process of scientific discovery.</p>
|
329 |
Integrating Natural Language Processing (NLP) and Language Resources Using Linked DataHellmann, Sebastian 09 January 2014 (has links)
This thesis is a compendium of scientific works and engineering
specifications that have been contributed to a large community of
stakeholders to be copied, adapted, mixed, built upon and exploited in
any way possible to achieve a common goal: Integrating Natural Language
Processing (NLP) and Language Resources Using Linked Data
The explosion of information technology in the last two decades has led
to a substantial growth in quantity, diversity and complexity of
web-accessible linguistic data. These resources become even more useful
when linked with each other and the last few years have seen the
emergence of numerous approaches in various disciplines concerned with
linguistic resources and NLP tools. It is the challenge of our time to
store, interlink and exploit this wealth of data accumulated in more
than half a century of computational linguistics, of empirical,
corpus-based study of language, and of computational lexicography in all
its heterogeneity.
The vision of the Giant Global Graph (GGG) was conceived by Tim
Berners-Lee aiming at connecting all data on the Web and allowing to
discover new relations between this openly-accessible data. This vision
has been pursued by the Linked Open Data (LOD) community, where the
cloud of published datasets comprises 295 data repositories and more
than 30 billion RDF triples (as of September 2011).
RDF is based on globally unique and accessible URIs and it was
specifically designed to establish links between such URIs (or
resources). This is captured in the Linked Data paradigm that postulates
four rules: (1) Referred entities should be designated by URIs, (2)
these URIs should be resolvable over HTTP, (3) data should be
represented by means of standards such as RDF, (4) and a resource should
include links to other resources.
Although it is difficult to precisely identify the reasons for the
success of the LOD effort, advocates generally argue that open licenses
as well as open access are key enablers for the growth of such a network
as they provide a strong incentive for collaboration and contribution by
third parties. In his keynote at BNCOD 2011, Chris Bizer argued that
with RDF the overall data integration effort can be “split between data
publishers, third parties, and the data consumer”, a claim that can be
substantiated by observing the evolution of many large data sets
constituting the LOD cloud.
As written in the acknowledgement section, parts of this thesis has
received numerous feedback from other scientists, practitioners and
industry in many different ways. The main contributions of this thesis
are summarized here:
Part I – Introduction and Background.
During his keynote at the Language Resource and Evaluation Conference in
2012, Sören Auer stressed the decentralized, collaborative, interlinked
and interoperable nature of the Web of Data. The keynote provides strong
evidence that Semantic Web technologies such as Linked Data are on its
way to become main stream for the representation of language resources.
The jointly written companion publication for the keynote was later
extended as a book chapter in The People’s Web Meets NLP and serves as
the basis for “Introduction” and “Background”, outlining some stages of
the Linked Data publication and refinement chain. Both chapters stress
the importance of open licenses and open access as an enabler for
collaboration, the ability to interlink data on the Web as a key feature
of RDF as well as provide a discussion about scalability issues and
decentralization. Furthermore, we elaborate on how conceptual
interoperability can be achieved by (1) re-using vocabularies, (2) agile
ontology development, (3) meetings to refine and adapt ontologies and
(4) tool support to enrich ontologies and match schemata.
Part II - Language Resources as Linked Data.
“Linked Data in Linguistics” and “NLP & DBpedia, an Upward Knowledge
Acquisition Spiral” summarize the results of the Linked Data in
Linguistics (LDL) Workshop in 2012 and the NLP & DBpedia Workshop in
2013 and give a preview of the MLOD special issue. In total, five
proceedings – three published at CEUR (OKCon 2011, WoLE 2012, NLP &
DBpedia 2013), one Springer book (Linked Data in Linguistics, LDL 2012)
and one journal special issue (Multilingual Linked Open Data, MLOD to
appear) – have been (co-)edited to create incentives for scientists to
convert and publish Linked Data and thus to contribute open and/or
linguistic data to the LOD cloud. Based on the disseminated call for
papers, 152 authors contributed one or more accepted submissions to our
venues and 120 reviewers were involved in peer-reviewing.
“DBpedia as a Multilingual Language Resource” and “Leveraging the
Crowdsourcing of Lexical Resources for Bootstrapping a Linguistic Linked
Data Cloud” contain this thesis’ contribution to the DBpedia Project in
order to further increase the size and inter-linkage of the LOD Cloud
with lexical-semantic resources. Our contribution comprises extracted
data from Wiktionary (an online, collaborative dictionary similar to
Wikipedia) in more than four languages (now six) as well as
language-specific versions of DBpedia, including a quality assessment of
inter-language links between Wikipedia editions and internationalized
content negotiation rules for Linked Data. In particular the work
described in created the foundation for a DBpedia Internationalisation
Committee with members from over 15 different languages with the common
goal to push DBpedia as a free and open multilingual language resource.
Part III - The NLP Interchange Format (NIF).
“NIF 2.0 Core Specification”, “NIF 2.0 Resources and Architecture” and
“Evaluation and Related Work” constitute one of the main contribution of
this thesis. The NLP Interchange Format (NIF) is an RDF/OWL-based format
that aims to achieve interoperability between Natural Language
Processing (NLP) tools, language resources and annotations. The core
specification is included in and describes which URI schemes and RDF
vocabularies must be used for (parts of) natural language texts and
annotations in order to create an RDF/OWL-based interoperability layer
with NIF built upon Unicode Code Points in Normal Form C. In , classes
and properties of the NIF Core Ontology are described to formally define
the relations between text, substrings and their URI schemes. contains
the evaluation of NIF.
In a questionnaire, we asked questions to 13 developers using NIF. UIMA,
GATE and Stanbol are extensible NLP frameworks and NIF was not yet able
to provide off-the-shelf NLP domain ontologies for all possible domains,
but only for the plugins used in this study. After inspecting the
software, the developers agreed however that NIF is adequate enough to
provide a generic RDF output based on NIF using literal objects for
annotations. All developers were able to map the internal data structure
to NIF URIs to serialize RDF output (Adequacy). The development effort
in hours (ranging between 3 and 40 hours) as well as the number of code
lines (ranging between 110 and 445) suggest, that the implementation of
NIF wrappers is easy and fast for an average developer. Furthermore the
evaluation contains a comparison to other formats and an evaluation of
the available URI schemes for web annotation.
In order to collect input from the wide group of stakeholders, a total
of 16 presentations were given with extensive discussions and feedback,
which has lead to a constant improvement of NIF from 2010 until 2013.
After the release of NIF (Version 1.0) in November 2011, a total of 32
vocabulary employments and implementations for different NLP tools and
converters were reported (8 by the (co-)authors, including Wiki-link
corpus, 13 by people participating in our survey and 11 more, of
which we have heard). Several roll-out meetings and tutorials were held
(e.g. in Leipzig and Prague in 2013) and are planned (e.g. at LREC
2014).
Part IV - The NLP Interchange Format in Use.
“Use Cases and Applications for NIF” and “Publication of Corpora using
NIF” describe 8 concrete instances where NIF has been successfully used.
One major contribution in is the usage of NIF as the recommended RDF
mapping in the Internationalization Tag Set (ITS) 2.0 W3C standard
and the conversion algorithms from ITS to NIF and back. One outcome
of the discussions in the standardization meetings and telephone
conferences for ITS 2.0 resulted in the conclusion there was no
alternative RDF format or vocabulary other than NIF with the required
features to fulfill the working group charter. Five further uses of NIF
are described for the Ontology of Linguistic Annotations (OLiA), the
RDFaCE tool, the Tiger Corpus Navigator, the OntosFeeder and
visualisations of NIF using the RelFinder tool. These 8 instances
provide an implemented proof-of-concept of the features of NIF.
starts with describing the conversion and hosting of the huge Google
Wikilinks corpus with 40 million annotations for 3 million web sites.
The resulting RDF dump contains 477 million triples in a 5.6 GB
compressed dump file in turtle syntax. describes how NIF can be used to
publish extracted facts from news feeds in the RDFLiveNews tool as
Linked Data.
Part V - Conclusions.
provides lessons learned for NIF, conclusions and an outlook on future
work. Most of the contributions are already summarized above. One
particular aspect worth mentioning is the increasing number of
NIF-formated corpora for Named Entity Recognition (NER) that have come
into existence after the publication of the main NIF paper Integrating
NLP using Linked Data at ISWC 2013. These include the corpora converted
by Steinmetz, Knuth and Sack for the NLP & DBpedia workshop and an
OpenNLP-based CoNLL converter by Brümmer. Furthermore, we are aware of
three LREC 2014 submissions that leverage NIF: NIF4OGGD - NLP
Interchange Format for Open German Governmental Data, N^3 – A Collection
of Datasets for Named Entity Recognition and Disambiguation in the NLP
Interchange Format and Global Intelligent Content: Active Curation of
Language Resources using Linked Data as well as an early implementation
of a GATE-based NER/NEL evaluation framework by Dojchinovski and Kliegr.
Further funding for the maintenance, interlinking and publication of
Linguistic Linked Data as well as support and improvements of NIF is
available via the expiring LOD2 EU project, as well as the CSA EU
project called LIDER, which started in November 2013. Based on the
evidence of successful adoption presented in this thesis, we can expect
a decent to high chance of reaching critical mass of Linked Data
technology as well as the NIF standard in the field of Natural Language
Processing and Language Resources.:CONTENTS
i introduction and background 1
1 introduction 3
1.1 Natural Language Processing . . . . . . . . . . . . . . . 3
1.2 Open licenses, open access and collaboration . . . . . . 5
1.3 Linked Data in Linguistics . . . . . . . . . . . . . . . . . 6
1.4 NLP for and by the Semantic Web – the NLP Inter-
change Format (NIF) . . . . . . . . . . . . . . . . . . . . 8
1.5 Requirements for NLP Integration . . . . . . . . . . . . 10
1.6 Overview and Contributions . . . . . . . . . . . . . . . 11
2 background 15
2.1 The Working Group on Open Data in Linguistics (OWLG) 15
2.1.1 The Open Knowledge Foundation . . . . . . . . 15
2.1.2 Goals of the Open Linguistics Working Group . 16
2.1.3 Open linguistics resources, problems and chal-
lenges . . . . . . . . . . . . . . . . . . . . . . . . 17
2.1.4 Recent activities and on-going developments . . 18
2.2 Technological Background . . . . . . . . . . . . . . . . . 18
2.3 RDF as a data model . . . . . . . . . . . . . . . . . . . . 21
2.4 Performance and scalability . . . . . . . . . . . . . . . . 22
2.5 Conceptual interoperability . . . . . . . . . . . . . . . . 22
ii language resources as linked data 25
3 linked data in linguistics 27
3.1 Lexical Resources . . . . . . . . . . . . . . . . . . . . . . 29
3.2 Linguistic Corpora . . . . . . . . . . . . . . . . . . . . . 30
3.3 Linguistic Knowledgebases . . . . . . . . . . . . . . . . 31
3.4 Towards a Linguistic Linked Open Data Cloud . . . . . 32
3.5 State of the Linguistic Linked Open Data Cloud in 2012 33
3.6 Querying linked resources in the LLOD . . . . . . . . . 36
3.6.1 Enriching metadata repositories with linguistic
features (Glottolog → OLiA) . . . . . . . . . . . 36
3.6.2 Enriching lexical-semantic resources with lin-
guistic information (DBpedia (→ POWLA) →
OLiA) . . . . . . . . . . . . . . . . . . . . . . . . 38
4 DBpedia as a multilingual language resource:
the case of the greek dbpedia edition. 39
4.1 Current state of the internationalization effort . . . . . 40
4.2 Language-specific design of DBpedia resource identifiers 41
4.3 Inter-DBpedia linking . . . . . . . . . . . . . . . . . . . 42
4.4 Outlook on DBpedia Internationalization . . . . . . . . 44
5 leveraging the crowdsourcing of lexical resources
for bootstrapping a linguistic linked data cloud 47
5.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . 48
5.2 Problem Description . . . . . . . . . . . . . . . . . . . . 50
5.2.1 Processing Wiki Syntax . . . . . . . . . . . . . . 50
5.2.2 Wiktionary . . . . . . . . . . . . . . . . . . . . . . 52
5.2.3 Wiki-scale Data Extraction . . . . . . . . . . . . . 53
5.3 Design and Implementation . . . . . . . . . . . . . . . . 54
5.3.1 Extraction Templates . . . . . . . . . . . . . . . . 56
5.3.2 Algorithm . . . . . . . . . . . . . . . . . . . . . . 56
5.3.3 Language Mapping . . . . . . . . . . . . . . . . . 58
5.3.4 Schema Mediation by Annotation with lemon . 58
5.4 Resulting Data . . . . . . . . . . . . . . . . . . . . . . . . 58
5.5 Lessons Learned . . . . . . . . . . . . . . . . . . . . . . . 60
5.6 Discussion and Future Work . . . . . . . . . . . . . . . 60
5.6.1 Next Steps . . . . . . . . . . . . . . . . . . . . . . 61
5.6.2 Open Research Questions . . . . . . . . . . . . . 61
6 nlp & dbpedia, an upward knowledge acquisition
spiral 63
6.1 Knowledge acquisition and structuring . . . . . . . . . 64
6.2 Representation of knowledge . . . . . . . . . . . . . . . 65
6.3 NLP tasks and applications . . . . . . . . . . . . . . . . 65
6.3.1 Named Entity Recognition . . . . . . . . . . . . 66
6.3.2 Relation extraction . . . . . . . . . . . . . . . . . 67
6.3.3 Question Answering over Linked Data . . . . . 67
6.4 Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
6.4.1 Gold and silver standards . . . . . . . . . . . . . 69
6.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
iii the nlp interchange format (nif) 73
7 nif 2.0 core specification 75
7.1 Conformance checklist . . . . . . . . . . . . . . . . . . . 75
7.2 Creation . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
7.2.1 Definition of Strings . . . . . . . . . . . . . . . . 78
7.2.2 Representation of Document Content with the
nif:Context Class . . . . . . . . . . . . . . . . . . 80
7.3 Extension of NIF . . . . . . . . . . . . . . . . . . . . . . 82
7.3.1 Part of Speech Tagging with OLiA . . . . . . . . 83
7.3.2 Named Entity Recognition with ITS 2.0, DBpe-
dia and NERD . . . . . . . . . . . . . . . . . . . 84
7.3.3 lemon and Wiktionary2RDF . . . . . . . . . . . 86
8 nif 2.0 resources and architecture 89
8.1 NIF Core Ontology . . . . . . . . . . . . . . . . . . . . . 89
8.1.1 Logical Modules . . . . . . . . . . . . . . . . . . 90
8.2 Workflows . . . . . . . . . . . . . . . . . . . . . . . . . . 91
8.2.1 Access via REST Services . . . . . . . . . . . . . 92
8.2.2 NIF Combinator Demo . . . . . . . . . . . . . .
92
8.3 Granularity Profiles . . . . . . . . . . . . . . . . . . . . .
93
8.4 Further URI Schemes for NIF . . . . . . . . . . . . . . .
95
8.4.1 Context-Hash-based URIs . . . . . . . . . . . . .
99
9 evaluation and related work 101
9.1 Questionnaire and Developers Study for NIF 1.0 . . . . 101
9.2 Qualitative Comparison with other Frameworks and
Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
9.3 URI Stability Evaluation . . . . . . . . . . . . . . . . . . 103
9.4 Related URI Schemes . . . . . . . . . . . . . . . . . . . . 104
iv the nlp interchange format in use 109
10 use cases and applications for nif 111
10.1 Internationalization Tag Set 2.0 . . . . . . . . . . . . . . 111
10.1.1 ITS2NIF and NIF2ITS conversion . . . . . . . . . 112
10.2 OLiA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
10.3 RDFaCE . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
10.4 Tiger Corpus Navigator . . . . . . . . . . . . . . . . . . 121
10.4.1 Tools and Resources . . . . . . . . . . . . . . . . 122
10.4.2 NLP2RDF in 2010 . . . . . . . . . . . . . . . . . . 123
10.4.3 Linguistic Ontologies . . . . . . . . . . . . . . . . 124
10.4.4 Implementation . . . . . . . . . . . . . . . . . . . 125
10.4.5 Evaluation . . . . . . . . . . . . . . . . . . . . . . 126
10.4.6 Related Work and Outlook . . . . . . . . . . . . 129
10.5 OntosFeeder – a Versatile Semantic Context Provider
for Web Content Authoring . . . . . . . . . . . . . . . . 131
10.5.1 Feature Description and User Interface Walk-
through . . . . . . . . . . . . . . . . . . . . . . . 132
10.5.2 Architecture . . . . . . . . . . . . . . . . . . . . . 134
10.5.3 Embedding Metadata . . . . . . . . . . . . . . . 135
10.5.4 Related Work and Summary . . . . . . . . . . . 135
10.6 RelFinder: Revealing Relationships in RDF Knowledge
Bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
10.6.1 Implementation . . . . . . . . . . . . . . . . . . . 137
10.6.2 Disambiguation . . . . . . . . . . . . . . . . . . . 138
10.6.3 Searching for Relationships . . . . . . . . . . . . 139
10.6.4 Graph Visualization . . . . . . . . . . . . . . . . 140
10.6.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . 141
11 publication of corpora using nif 143
11.1 Wikilinks Corpus . . . . . . . . . . . . . . . . . . . . . . 143
11.1.1 Description of the corpus . . . . . . . . . . . . . 143
11.1.2 Quantitative Analysis with Google Wikilinks Cor-
pus . . . . . . . . . . . . . . . . . . . . . . . . . . 144
11.2 RDFLiveNews . . . . . . . . . . . . . . . . . . . . . . . . 144
11.2.1 Overview . . . . . . . . . . . . . . . . . . . . . . 145
11.2.2 Mapping to RDF and Publication on the Web of
Data . . . . . . . . . . . . . . . . . . . . . . . . . 146
v conclusions 149
12 lessons learned, conclusions and future work 151
12.1 Lessons Learned for NIF . . . . . . . . . . . . . . . . . . 151
12.2 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 151
12.3 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . 153
|
330 |
Drone-based Integration of Hyperspectral Imaging and Magnetics for Mineral ExplorationJackisch, Robert 15 August 2022 (has links)
The advent of unoccupied aerial systems (UAS) as disruptive technology has a lasting impact on remote sensing, geophysics and most geosciences. Small, lightweight, and low-cost UAS enable researchers and surveyors to acquire earth observation data in higher spatial and spectral resolution as compared to airborne and satellite data. UAS-based applications range from rapid topographic mapping using photogrammetric techniques to hyperspectral and geophysical measurements of surface and subsurface geology. UAS surveys contribute to identifying metal deposits, monitoring of mine sites and can reveal arising environmental issues associated with mining. Further, affordable UAS technology will boost exploration data availability and expertise in the global south.
This thesis investigates the application of UAS-based multi-sensor data for mineral exploration, in particular the integration of hyperspectral imagers, magnetometers and digital cameras (covering the visible red, green, blue light spectrum). UAS-based research is maturing, however the aforementioned methods are not unified effectively. RGB-based photogrammetry is used to investigate topography and surface texture. Image spectrometers measure mineral-specific surface signatures. Magnetometers detect geomagnetic field changes caused by magnetic minerals at surface and depth. The integration of such UAS sensor-based methods in this thesis augments exploration potential with non-invasive, high-resolution, safe, rapid and practical survey methods.
UAS-based surveying acquired, processed and integrated data from three distinct test sites. The sites are located in Finland (Fe-Ti-V at Otanmäki; apatite at Siilinjärvi) and Greenland (Ni-Cu-PGE at Qullissat, Disko Island) and were chosen as geologically diverse areas in subarctic to arctic environments. Restricted accessibility, unfavourable atmospheric conditions, dark rocks, debris and vegetation cover and low solar illumination were common features. While the topography in Finland was moderately flat, a steep landscape challenged the Greenland field work. These restraints meant that acquisitions varied from site to site and how data was integrated and interpreted is dependent on the commodity of interest.
Iron-based spectral absorption and magnetic mineral response were detected using hyperspectral and magnetic surveying in Otanmäki. Multi-sensor-based image feature detection and classification combined with magnetic forward modelling enabled seamless geologic mapping in Siilinjärvi. Detailed magnetic inversion and multispectral photogrammetry led to the construction of a comprehensive 3D model of magmatic exploration targets in Greenland. Ground truth at different intensity was employed to verify UAS-based data interpretations during all case studies.
Laboratory analysis was applied when deemed necessary to acquire geologic-mineralogic validation (e.g., X-ray diffraction and optical microscopy for mineral identification to establish lithologic domains, magnetic susceptibility measurements for subsurface modelling), for example for trace amounts of magnetite in carbonatite (Siilinjärvi) and native iron occurrence in basalt (Qullissat). Technical achievements were the integration of a multicopter-based prototype fluxgate-magnetometer data from different survey altitudes with ground truth, and a feasibility study with a high-speed multispectral image system for fixed-wing UAS.
The employed case studies transfer the experiences made towards general recommendations for UAS application-based multi-sensor integration. This thesis highlights the feasibility of UAS-based surveying at target scale (1–50 km2) and solidifies versatile survey approaches for multi-sensor integration. / Ziel dieser Arbeit war es, das Potenzial einer Drohnen-basierten Mineralexploration mit Multisensor-Datenintegration unter Verwendung optisch-spektroskopischer und magnetischer Methoden zu untersuchen, um u. a. übertragbare Arbeitsabläufe zu erstellen.
Die untersuchte Literatur legt nahe, dass Drohnen-basierte Bildspektroskopie und magnetische Sensoren ein ausgereiftes technologisches Niveau erreichen und erhebliches Potenzial für die Anwendungsentwicklung bieten, aber es noch keine ausreichende Synergie von hyperspektralen und magnetischen Methoden gibt.
Diese Arbeit umfasste drei Fallstudien, bei denen die Drohnengestützte Vermessung von geologischen Zielen in subarktischen bis arktischen Regionen angewendet wurde.
Eine Kombination von Drohnen-Technologie mit RGB, Multi- und Hyperspektralkameras und Magnetometern ist vorteilhaft und schuf die Grundlage für eine integrierte Modellierung in den Fallstudien.
Die Untersuchungen wurden in einem Gelände mit flacher und zerklüfteter Topografie, verdeckten Zielen und unter oft schlechten Lichtverhältnissen durchgeführt. Unter diesen Bedingungen war es das Ziel, die Anwendbarkeit von Drohnen-basierten Multisensordaten in verschiedenen Explorationsumgebungen zu bewerten.
Hochauflösende Oberflächenbilder und Untergrundinformationen aus der Magnetik wurden fusioniert und gemeinsam interpretiert, dabei war eine selektive Gesteinsprobennahme und Analyse ein wesentlicher Bestandteil dieser Arbeit und für die Validierung notwendig.
Für eine Eisenerzlagerstätte wurde eine einfache Ressourcenschätzung durchgeführt, indem Magnetik, bildspektroskopisch-basierte Indizes und 2D-Strukturinterpretation integriert wurden. Fotogrammetrische 3D-Modellierung, magnetisches forward-modelling und hyperspektrale Klassifizierungen wurden für eine Karbonatit-Intrusion angewendet, um einen kompletten Explorationsabschnitt zu erfassen. Eine Vektorinversion von magnetischen Daten von Disko Island, Grönland, wurden genutzt, um großräumige 3D-Modelle von undifferenzierten Erdrutschblöcken zu erstellen, sowie diese zu identifizieren und zu vermessen.
Die integrierte spektrale und magnetische Kartierung in komplexen Gebieten verbesserte die Erkennungsrate und räumliche Auflösung von Erkundungszielen und reduzierte Zeit, Aufwand und benötigtes Probenmaterial für eine komplexe Interpretation.
Der Prototyp einer Multispektralkamera, gebaut für eine Starrflügler-Drohne für die schnelle Vermessung, wurde entwickelt, erfolgreich getestet und zum Teil ausgewertet.
Die vorgelegte Arbeit zeigt die Vorteile und Potenziale von Multisensor-Drohnen als praktisches, leichtes, sicheres, schnelles und komfortabel einsetzbares geowissenschaftliches Werkzeug, um digitale Modelle für präzise Rohstofferkundung und geologische Kartierung zu erstellen.
|
Page generated in 0.0474 seconds