Global ETD Search

321	Robust relationship extraction in the biomedical domain Thomas, Philippe 25 November 2015 (has links) Seit Jahrhunderten wird menschliches Wissen in Form von natürlicher Sprache ausgetauscht und in Dokumenten schriftlich aufgezeichnet. In den letzten Jahren konnte man auf dem Gebiet der Lebenswissenschaften eine exponentielle Zunahme wissenschaftlicher Publikationen beobachten. Diese Dissertation untersucht die automatische Extraktion von Beziehungen zwischen Eigennamen. Innerhalb dieses Gebietes beschäftigt sich die Arbeit mit der Steigerung der Robustheit für die Relationsextraktion. Zunächst wird der Einsatz von Ensemble-Methoden anhand von Daten aus der "Drug-drug-interaction challenge 2013" evaluiert. Ensemble-Methoden erhöhen die Robustheit durch Aggregation unterschiedlicher Klassifikationssysteme zu einem Modell. Weiterhin wird in dieser Arbeit das Problem der Relationsextraktion auf Dokumenten mit unbekannten Texteigenschaften beschrieben. Es wird gezeigt, dass die Verwendung des halb-überwachten Lernverfahrens self training in solchen Fällen eine höhere Robustheit erzielt als die Nutzung eines Klassifikators, der lediglich auf einem manuell annotierten Korpus trainiert wurde. Zur Ermittlung der Robustheit wird das Verfahren des cross-learnings verwendet. Zuletzt wird die Verwendung von distant-supervision untersucht. Korpora, welche mit der distant-supervision-Methode erzeugt wurden, weisen ein inhärentes Rauschen auf und profitieren daher von robusten Relationsextraktionsverfahren. Es werden zwei verschiedene Methoden untersucht, die auf solchen Korpora trainiert werden. Beide Ansätze zeigen eine vergleichbare Leistung wie vollständig überwachte Klassifikatoren, welche mit dem cross-learning-Verfahren evaluiert wurden. Um die Nutzung von Ergebnissen der Informationsextraktion zu erleichtern, wurde die semantische Suchmaschine GeneView entwickelt. Anforderungen an die Rechenkapazität beim Erstellen von GeneView werden diskutiert und Anwendungen auf den von verschiedenen Text-Mining-Komponenten extrahierten Daten präsentiert. / For several centuries, a great wealth of human knowledge has been communicated by natural language, often recorded in written documents. In the life sciences, an exponential increase of scientific articles has been observed, hindering the effective and fast reconciliation of previous finding into current research projects. This thesis studies the automatic extraction of relationships between named entities. Within this topic, it focuses on increasing robustness for relationship extraction. First, we evaluate the use of ensemble methods to improve performance using data provided by the drug-drug-interaction challenge 2013. Ensemble methods aggregate several classifiers into one model, increasing robustness by reducing the risk of choosing an inappropriate single classifier. Second, this work discusses the problem of applying relationship extraction to documents with unknown text characteristics. Robustness of a text mining component is assessed by cross-learning, where a model is evaluated on a corpus different from the training corpus. We apply self-training, a semi-supervised learning technique, in order to increase cross-learning performance and show that it is more robust in comparison to a classifier trained on manually annotated text only. Third, we investigate the use of distant supervision to overcome the need of manually annotated training instances. Corpora derived by distant supervision are inherently noisy, thus benefiting from robust relationship extraction methods. We compare two different methods and show that both approaches achieve similar performance as fully supervised classifiers, evaluated in the cross-learning scenario. To facilitate the usage of information extraction results, including those developed within this thesis, we develop the semantic search engine GeneView. We discuss computational requirements to build this resource and present some applications utilizing the data extracted by different text-mining components. Relationsextraktion Informationsextraktion Protein-Protein Interaktionen Maschinelles Lernen Verarbeitung natürlicher Sprache Text Mining Information Extraction Natural Language Processing Text Mining Relation Extraction Protein Protein Interactions Machine Learning 004 Informatik 28 Informatik, Datenverarbeitung WC 7700 ddc:004
322	GoWeb: Semantic Search and Browsing for the Life Sciences Dietze, Heiko 21 December 2010 (has links) (PDF) Searching is a fundamental task to support research. Current search engines are keyword-based. Semantic technologies promise a next generation of semantic search engines, which will be able to answer questions. Current approaches either apply natural language processing to unstructured text or they assume the existence of structured statements over which they can reason. This work provides a system for combining the classical keyword-based search engines with semantic annotation. Conventional search results are annotated using a customized annotation algorithm, which takes the textual properties and requirements such as speed and scalability into account. The biomedical background knowledge consists of the GeneOntology and Medical Subject Headings and other related entities, e.g. proteins/gene names and person names. Together they provide the relevant semantic context for a search engine for the life sciences. We develop the system GoWeb for semantic web search and evaluate it using three benchmarks. It is shown that GoWeb is able to aid question answering with success rates up to 79%. Furthermore, the system also includes semantic hyperlinks that enable semantic browsing of the knowledge space. The semantic hyperlinks facilitate the use of the eScience infrastructure, even complex workflows of composed web services. To complement the web search of GoWeb, other data source and more specialized information needs are tested in different prototypes. This includes patents and intranet search. Semantic search is applicable for these usage scenarios, but the developed systems also show limits of the semantic approach. That is the size, applicability and completeness of the integrated ontologies, as well as technical issues of text-extraction and meta-data information gathering. Additionally, semantic indexing as an alternative approach to implement semantic search is implemented and evaluated with a question answering benchmark. A semantic index can help to answer questions and address some limitations of GoWeb. Still the maintenance and optimization of such an index is a challenge, whereas GoWeb provides a straightforward system. Semantische Suche Semantische Indexierung Informationsextraktion Textmining Algorithmen Ontologie semantic search semantic index information extraction text mining algorithms ontology ddc:004 rvk:ST 515 Suchmaschine Text Mining Ontologie <Wissensverarbeitung>
323	Analyse der Meinungsentwicklung in Online Foren – Konzept und Fallstudie Kaiser, Carolin, Bodendorf, Freimut 22 May 2014 (has links) (PDF) Das Web 2.0 ist u.a. auch eine weltweite Plattform für Meinungsäußerungen. Immer mehr Kunden diskutieren online über Produkte und tauschen Erfahrungen aus. Die Analyse der Online Beiträge stellt daher ein wichtiges Marktforschungsinstrument dar. Es wird ein Ansatz zur automatischen Identifikation, Aggregation und Analyse von Meinungen mittels Text Mining vorgestellt und dessen Anwendung an einem Beispiel aus der Sportartikelindustrie aufgezeigt. Konferenz GeNeMe 2010 Neue Medien Web 2.0 Onlineforen Kunden Meinungsentwicklung Martkforschung Text Mining conference new media online community web 2.0 customer opinion enterprise marketing research online forum text mining data mining ddc:330 rvk:QR 760
324	Exploring Trends, Patterns and Characteristics of Quality Management Through Text Mining Carnerud, Daniel January 2016 (has links) At frequent intervals, new reports and papers are published stressing the importance of high quality and quality improvement measures in the public and private sector if Sweden is to survive as a welfare state and industrial nation. The situation seems not to be unique for Sweden: similar opinions can be heard in other parts of the world as well. In the 21st century, consumers and citizens should be provided with continuously improved quality at a lower cost, otherwise businesses are likely to go bankrupt and politicians may lose the trust of the people. Quality is, thus, a word that is used persistently by people in power and the social commentators of today. From this perspective, it might seem fitting that quality, quality management (QM), total quality management (TQM) and other closely related terminologies are well defined, to make possible constructive dialogue that will culminate in effective action. This is often not the case, which is why vision statements, campaigns and other quality improvement measures risk falling short before they are even launched. With this background, the purpose of this thesis is to facilitate fruitful dialogue by examining QM-research and how trends, terminologies and research focus have shifted over time. By increasing the understanding of how QM-research has evolved it is also possible to create a coherent overview which hopefully can help to reduce confusion and polarisation amongst scholars and practitioners. In this way, it might be possible to increase the number of successful quality improvement measures as well as to lay the foundations for sustainable and system-wide quality improvement actions in society at large. The thesis is based on three studies, of which the first is looking into conference proceedings from one of the globally most prominent scientific conferences on quality – the QMOD-ICQSS conference. The two subsequent studies use abstracts from three of the top ranked scientific journals dealing with quality - International Journal of Quality and Reliability Management, Total Quality Management Journal and Total Quality Management & Business Excellence. All studies have been conducted according to text mining methodology, which entails usage of statistical tools in the form of hard- and software for data collection, modelling and analysis. The approach is exploratory and previously not verified with this purpose, which is why the three studies offer unique perspectives on the research field, at the same time as new methodological tools and approaches are investigated and tested. Through the studies it is possible to show occurrence of trends in research alignment as well as in publication design and popularity. The studies also identify central, perpetual, topics around which the research has been concentrated. These topics indicate that the research field, in spite of momentary trends and fashions, rests on a firm foundation regarding problem definition and approaches to solve them. Finally, a model is presented which summarizes the perspectives and outsets which distinguish QM and make it a research field in its own right. / Regelbundet publiceras det nya rapporter och artiklar där vikten av hög kvalitet liksom kvalitetsförbättrande åtgärder inom offentlig och privat sektor lyfts fram som oumbärliga för att Sverige ska överleva som välfärds- och näringslivsnation. Situationen tycks inte vara unik för Sverige, likande tongångar hörs även i andra delar av världen. På 2000-talet ska konsumenter och medborgare tillhandahållas ständigt förbättrad kvalitet till lägre kostnad, annars går företag i konkurs och politiska företrädare förlorar folkets förtroende. Kvalitet är alltså ett ord som används flitigt av dagens makthavare och samhällsdebattörer. Utifrån detta perspektiv kan det anses angeläget att kvalitet, kvalitetsutveckling, kvalitetsarbete och andra närbesläktade ord är väl definierade för att möjliggöra konstruktiv dialog som mynnar ut i verkningsfulla insatser. Så är dock ofta inte fallet - varför visioner, kampanjer och andra kvalitetsförbättrande initiativ riskerar att faller till korta innan de ens hunnit lanseras. Med denna bakgrund, är syftet med avhandlingen att underlätta givande diskussioner genom att närmare belysa forskningsfältet kvalitetsteknik, eller Quality Management (QM), och hur trender, termer och forskningsfokus har växlat över tid. Genom att öka förståelsen för forskningens utveckling över tid kan även en sammanhängande översikt skapas, vilka tillsammans förhoppningsvis kan bidra till att minska den förvirring och polarisering som råder både inom akademi liksom praktik. Därigenom kanske det är möjligt att öka mängden lyckade kvalitetssatsningar liksom att lägga grunden för ett långsiktigt hållbart och systemövergripande kvalitetsarbete i samhället i överlag. Avhandlingen baseras på tre studier, varav den första har tittat närmare på konferensbidrag från en av världens idag ledande forskningskonferenser om kvalitet – QMOD-ICQSS konferensen. De två senare studierna behandlar sammanfattningar från tre av de högst rankade forskningstidskrifterna med fokus på kvalitet – International Journal of Quality and Reliability Management, Total Quality Management Journal och Total Quality Management & Business Excellence. Samtliga studier har utformats enligt text-mining metodik, vilket medför att statistiska hjälpmedel i form av hård och mjukvara har använts för datainsamling, modellering och analys. Angreppssättet är explorativt och tidigare ej beprövat i ovanstående syfte varför de tre studierna erbjuder unika perspektiv på forskningsområdet samtidigt som nya metodologiska verktyg och arbetssätt utforskats och utvärderats. Genom studierna går det att påvisa förekomsten av trender i forskningsinriktningar liksom i publikationernas utformning och popularitet. Studierna identifierar även centrala, återkommande teman kring vilka forskningen koncentrerats. Dessa teman indikerar att forskningsområdet, trots tillfälliga trender och moden, vilar på en stadigvarande grund gällande problemformuleringar och ansatser att lösa dessa. Slutligen presenteras en modell som sammanfattar de perspektiv och utgångspunkter som utmärker kvalitetstekniken (QM) och som gör det till ett forskningsområde i sin egen rätt. / <p>Vid tidpunkten för framläggandet av avhandlingen var följande delarbeten opublicerade: delarbete B accepterat för publicering och delarbete C inskickat.</p><p>At the time of the licentiate defence the following papers were unpublished: paper B accepted for publication and paper C submitted.</p> Quality Management Total Quality Management TQM Quality Movement Quality Revolution Business Excellence Text Mining Systems Thinking Kvalitetsteknik Offensiv Kvalitetsutveckling Systematiskt Kvalitetsarbete Quality Management Total Quality Management TQM Text Mining Reliability and Maintenance Tillförlitlighets- och kvalitetsteknik
325	Graphdatenbanken für die textorientierten e-Humanities Efer, Thomas 08 February 2017 (has links) Vor dem Hintergrund zahlreicher Digitalisierungsinitiativen befinden sich weite Teile der Geistes- und Sozialwissenschaften derzeit in einer Transition hin zur großflächigen Anwendung digitaler Methoden. Zwischen den Fachdisziplinen und der Informatik zeigen sich große Differenzen in der Methodik und bei der gemeinsamen Kommunikation. Diese durch interdisziplinäre Projektarbeit zu überbrücken, ist das zentrale Anliegen der sogenannten e-Humanities. Da Text der häufigste Untersuchungsgegenstand in diesem Feld ist, wurden bereits viele Verfahren des Text Mining auf Problemstellungen der Fächer angepasst und angewendet. Während sich langsam generelle Arbeitsabläufe und Best Practices etablieren, zeigt sich, dass generische Lösungen für spezifische Teilprobleme oftmals nicht geeignet sind. Um für diese Anwendungsfälle maßgeschneiderte digitale Werkzeuge erstellen zu können, ist eines der Kernprobleme die adäquate digitale Repräsentation von Text sowie seinen vielen Kontexten und Bezügen. In dieser Arbeit wird eine neue Form der Textrepräsentation vorgestellt, die auf Property-Graph-Datenbanken beruht – einer aktuellen Technologie für die Speicherung und Abfrage hochverknüpfter Daten. Darauf aufbauend wird das Textrecherchesystem „Kadmos“ vorgestellt, mit welchem nutzerdefinierte asynchrone Webservices erstellt werden können. Es bietet flexible Möglichkeiten zur Erweiterung des Datenmodells und der Programmfunktionalität und kann Textsammlungen mit mehreren hundert Millionen Wörtern auf einzelnen Rechnern und weitaus größere in Rechnerclustern speichern. Es wird gezeigt, wie verschiedene Text-Mining-Verfahren über diese Graphrepräsentation realisiert und an sie angepasst werden können. Die feine Granularität der Zugriffsebene erlaubt die Erstellung passender Werkzeuge für spezifische fachwissenschaftliche Anwendungen. Zusätzlich wird demonstriert, wie die graphbasierte Modellierung auch über die rein textorientierte Forschung hinaus gewinnbringend eingesetzt werden kann. / In light of the recent massive digitization efforts, most of the humanities disciplines are currently undergoing a fundamental transition towards the widespread application of digital methods. In between those traditional scholarly fields and computer science exists a methodological and communicational gap, that the so-called \\\"e-Humanities\\\" aim to bridge systematically, via interdisciplinary project work. With text being the most common object of study in this field, many approaches from the area of Text Mining have been adapted to problems of the disciplines. While common workflows and best practices slowly emerge, it is evident that generic solutions are no ultimate fit for many specific application scenarios. To be able to create custom-tailored digital tools, one of the central issues is to digitally represent the text, as well as its many contexts and related objects of interest in an adequate manner. This thesis introduces a novel form of text representation that is based on Property Graph databases – an emerging technology that is used to store and query highly interconnected data sets. Based on this modeling paradigm, a new text research system called \\\"Kadmos\\\" is introduced. It provides user-definable asynchronous web services and is built to allow for a flexible extension of the data model and system functionality within a prototype-driven development process. With Kadmos it is possible to easily scale up to text collections containing hundreds of millions of words on a single device and even further when using a machine cluster. It is shown how various methods of Text Mining can be implemented with and adapted for the graph representation at a very fine granularity level, allowing the creation of fitting digital tools for different aspects of scholarly work. In extended usage scenarios it is demonstrated how the graph-based modeling of domain data can be beneficial even in research scenarios that go beyond a purely text-based study. info:eu-repo/classification/ddc/500 ddc:500
326	Genderový pohled na prezentaci žen na Wikipedii / Gender View on Female Presentation on Wikipedia Stančíková, Ľubica January 2019 (has links) This thesis examines, characterizes and quantifies how women and men are being presented on Czech and Slovak Wikipedia and further elaborates on their possible differences. The theoretical part of the thesis describes the current state of Wikipedia and media in general and draws attention to gender inequality within its editorial base, which is linked to the general trend in free culture communities, where there is also a visible inequality. Furthermore, the theoretical part of the thesis also deals with the issue of gender in general and summarizes current state of knowledge and methods of studying the issue of gender on Wikipedia. The practical part of this thesis partly replicates the study First Gender, Second Sex: Gender Bias on Wikipedia (Graells- Garrido, Lalmas and Menczer, 2015) using RStudio programme to do basic text mining and quantitative analysis of biographical texts of men and women in both languages, tracking word frequencies from selected word categories, namely Gender, Reference to the opposite sex, Family and family status and Career. Overall, we have analysed 24 510 Slovak biographical articles and 110 866 Czech biographical articles, and our findings have confirmed an imbalance and stereotyping in the presentation of women on Wikipedia in both languages.
327	Analyse der Meinungsentwicklung in Online Foren – Konzept und Fallstudie Kaiser, Carolin, Bodendorf, Freimut January 2010 (has links) Das Web 2.0 ist u.a. auch eine weltweite Plattform für Meinungsäußerungen. Immer mehr Kunden diskutieren online über Produkte und tauschen Erfahrungen aus. Die Analyse der Online Beiträge stellt daher ein wichtiges Marktforschungsinstrument dar. Es wird ein Ansatz zur automatischen Identifikation, Aggregation und Analyse von Meinungen mittels Text Mining vorgestellt und dessen Anwendung an einem Beispiel aus der Sportartikelindustrie aufgezeigt. info:eu-repo/classification/ddc/330 ddc:330
328	Konzeption und Entwicklung eines automatisierten Workflows zur geovisuellen Analyse von georeferenzierten Textdaten(strömen) / Microblogging Content Gröbe, Mathias 13 October 2015 (has links) Die vorliegende Masterarbeit behandelt den Entwurf und die exemplarische Umsetzung eines Arbeitsablaufs zur Aufbereitung von georeferenziertem Microblogging Content. Als beispielhafte Datenquelle wurde Twitter herangezogen. Darauf basierend, wurden Überlegungen angestellt, welche Arbeitsschritte nötig und mit welchen Mitteln sie am besten realisiert werden können. Dabei zeigte sich, dass eine ganze Reihe von Bausteinen aus dem Bereich des Data Mining und des Text Mining für eine Pipeline bereits vorhanden sind und diese zum Teil nur noch mit den richtigen Einstellungen aneinandergereiht werden müssen. Zwar kann eine logische Reihenfolge definiert werden, aber weitere Anpassungen auf die Fragestellung und die verwendeten Daten können notwendig sein. Unterstützt wird dieser Prozess durch verschiedenen Visualisierungen mittels Histogrammen, Wortwolken und Kartendarstellungen. So kann neues Wissen entdeckt und nach und nach die Parametrisierung der Schritte gemäß den Prinzipien des Geovisual Analytics verfeinert werden. Für eine exemplarische Umsetzung wurde nach der Betrachtung verschiedener Softwareprodukte die für statistische Anwendungen optimierte Programmiersprache R ausgewählt. Abschließend wurden die Software mit Daten von Twitter und Flickr evaluiert. / This Master's Thesis deals with the conception and exemplary implementation of a workflow for georeferenced Microblogging Content. Data from Twitter is used as an example and as a starting point to think about how to build that workflow. In the field of Data Mining and Text Mining, there was found a whole range of useful software modules that already exist. Mostly, they only need to get lined up to a process pipeline using appropriate preferences. Although a logical order can be defined, further adjustments according to the research question and the data are required. The process is supported by different forms of visualizations such as histograms, tag clouds and maps. This way new knowledge can be discovered and the options for the preparation can be improved. This way of knowledge discovery is already known as Geovisual Analytics. After a review of multiple existing software tools, the programming language R is used to implement the workflow as this language is optimized for solving statistical problems. Finally, the workflow has been tested using data from Twitter and Flickr. info:eu-repo/classification/ddc/550 ddc:550
329	GoWeb: Semantic Search and Browsing for the Life Sciences Dietze, Heiko 20 October 2010 (has links) Searching is a fundamental task to support research. Current search engines are keyword-based. Semantic technologies promise a next generation of semantic search engines, which will be able to answer questions. Current approaches either apply natural language processing to unstructured text or they assume the existence of structured statements over which they can reason. This work provides a system for combining the classical keyword-based search engines with semantic annotation. Conventional search results are annotated using a customized annotation algorithm, which takes the textual properties and requirements such as speed and scalability into account. The biomedical background knowledge consists of the GeneOntology and Medical Subject Headings and other related entities, e.g. proteins/gene names and person names. Together they provide the relevant semantic context for a search engine for the life sciences. We develop the system GoWeb for semantic web search and evaluate it using three benchmarks. It is shown that GoWeb is able to aid question answering with success rates up to 79%. Furthermore, the system also includes semantic hyperlinks that enable semantic browsing of the knowledge space. The semantic hyperlinks facilitate the use of the eScience infrastructure, even complex workflows of composed web services. To complement the web search of GoWeb, other data source and more specialized information needs are tested in different prototypes. This includes patents and intranet search. Semantic search is applicable for these usage scenarios, but the developed systems also show limits of the semantic approach. That is the size, applicability and completeness of the integrated ontologies, as well as technical issues of text-extraction and meta-data information gathering. Additionally, semantic indexing as an alternative approach to implement semantic search is implemented and evaluated with a question answering benchmark. A semantic index can help to answer questions and address some limitations of GoWeb. Still the maintenance and optimization of such an index is a challenge, whereas GoWeb provides a straightforward system. info:eu-repo/classification/ddc/004 ddc:004
330	Text mining for social harm and criminal justice application Ritika Pandey (9147281) 30 July 2020 (has links) Increasing rates of social harm events and plethora of text data demands the need of employing text mining techniques not only to better understand their causes but also to develop optimal prevention strategies. In this work, we study three social harm issues: crime topic models, transitions into drug addiction and homicide investigation chronologies. Topic modeling for the categorization and analysis of crime report text allows for more nuanced categories of crime compared to official UCR categorizations. This study has important implications in hotspot policing. We investigate the extent to which topic models that improve coherence lead to higher levels of crime concentration. We further explore the transitions into drug addiction using Reddit data. We proposed a prediction model to classify the users’ transition from casual drug discussion forum to recovery drug discussion forum and the likelihood of such transitions. Through this study we offer insights into modern drug culture and provide tools with potential applications in combating opioid crises. Lastly, we present a knowledge graph based framework for homicide investigation chronologies that may aid investigators in analyzing homicide case data and also allow for post hoc analysis of key features that determine whether a homicide is ultimately solved. For this purpose<br>we perform named entity recognition to determine witnesses, detectives and suspects from chronology, use keyword expansion to identify various evidence types and finally link these entities and evidence to construct a homicide investigation knowledge graph. We compare the performance over several choice of methodologies for these sub-tasks and analyze the association between network statistics of knowledge graph and homicide solvability. <br> Natural Language Processing Pattern Recognition and Data Mining machine learning data mining techniques text mining Social harm Criminal Justice Natural language processing

Search results