• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 247
  • 124
  • 44
  • 38
  • 31
  • 29
  • 24
  • 24
  • 13
  • 7
  • 6
  • 6
  • 5
  • 5
  • 5
  • Tagged with
  • 629
  • 629
  • 144
  • 132
  • 122
  • 115
  • 95
  • 89
  • 87
  • 82
  • 81
  • 77
  • 72
  • 67
  • 66
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
321

Robust relationship extraction in the biomedical domain

Thomas, Philippe 25 November 2015 (has links)
Seit Jahrhunderten wird menschliches Wissen in Form von natürlicher Sprache ausgetauscht und in Dokumenten schriftlich aufgezeichnet. In den letzten Jahren konnte man auf dem Gebiet der Lebenswissenschaften eine exponentielle Zunahme wissenschaftlicher Publikationen beobachten. Diese Dissertation untersucht die automatische Extraktion von Beziehungen zwischen Eigennamen. Innerhalb dieses Gebietes beschäftigt sich die Arbeit mit der Steigerung der Robustheit für die Relationsextraktion. Zunächst wird der Einsatz von Ensemble-Methoden anhand von Daten aus der "Drug-drug-interaction challenge 2013" evaluiert. Ensemble-Methoden erhöhen die Robustheit durch Aggregation unterschiedlicher Klassifikationssysteme zu einem Modell. Weiterhin wird in dieser Arbeit das Problem der Relationsextraktion auf Dokumenten mit unbekannten Texteigenschaften beschrieben. Es wird gezeigt, dass die Verwendung des halb-überwachten Lernverfahrens self training in solchen Fällen eine höhere Robustheit erzielt als die Nutzung eines Klassifikators, der lediglich auf einem manuell annotierten Korpus trainiert wurde. Zur Ermittlung der Robustheit wird das Verfahren des cross-learnings verwendet. Zuletzt wird die Verwendung von distant-supervision untersucht. Korpora, welche mit der distant-supervision-Methode erzeugt wurden, weisen ein inhärentes Rauschen auf und profitieren daher von robusten Relationsextraktionsverfahren. Es werden zwei verschiedene Methoden untersucht, die auf solchen Korpora trainiert werden. Beide Ansätze zeigen eine vergleichbare Leistung wie vollständig überwachte Klassifikatoren, welche mit dem cross-learning-Verfahren evaluiert wurden. Um die Nutzung von Ergebnissen der Informationsextraktion zu erleichtern, wurde die semantische Suchmaschine GeneView entwickelt. Anforderungen an die Rechenkapazität beim Erstellen von GeneView werden diskutiert und Anwendungen auf den von verschiedenen Text-Mining-Komponenten extrahierten Daten präsentiert. / For several centuries, a great wealth of human knowledge has been communicated by natural language, often recorded in written documents. In the life sciences, an exponential increase of scientific articles has been observed, hindering the effective and fast reconciliation of previous finding into current research projects. This thesis studies the automatic extraction of relationships between named entities. Within this topic, it focuses on increasing robustness for relationship extraction. First, we evaluate the use of ensemble methods to improve performance using data provided by the drug-drug-interaction challenge 2013. Ensemble methods aggregate several classifiers into one model, increasing robustness by reducing the risk of choosing an inappropriate single classifier. Second, this work discusses the problem of applying relationship extraction to documents with unknown text characteristics. Robustness of a text mining component is assessed by cross-learning, where a model is evaluated on a corpus different from the training corpus. We apply self-training, a semi-supervised learning technique, in order to increase cross-learning performance and show that it is more robust in comparison to a classifier trained on manually annotated text only. Third, we investigate the use of distant supervision to overcome the need of manually annotated training instances. Corpora derived by distant supervision are inherently noisy, thus benefiting from robust relationship extraction methods. We compare two different methods and show that both approaches achieve similar performance as fully supervised classifiers, evaluated in the cross-learning scenario. To facilitate the usage of information extraction results, including those developed within this thesis, we develop the semantic search engine GeneView. We discuss computational requirements to build this resource and present some applications utilizing the data extracted by different text-mining components.
322

GoWeb: Semantic Search and Browsing for the Life Sciences

Dietze, Heiko 21 December 2010 (has links) (PDF)
Searching is a fundamental task to support research. Current search engines are keyword-based. Semantic technologies promise a next generation of semantic search engines, which will be able to answer questions. Current approaches either apply natural language processing to unstructured text or they assume the existence of structured statements over which they can reason. This work provides a system for combining the classical keyword-based search engines with semantic annotation. Conventional search results are annotated using a customized annotation algorithm, which takes the textual properties and requirements such as speed and scalability into account. The biomedical background knowledge consists of the GeneOntology and Medical Subject Headings and other related entities, e.g. proteins/gene names and person names. Together they provide the relevant semantic context for a search engine for the life sciences. We develop the system GoWeb for semantic web search and evaluate it using three benchmarks. It is shown that GoWeb is able to aid question answering with success rates up to 79%. Furthermore, the system also includes semantic hyperlinks that enable semantic browsing of the knowledge space. The semantic hyperlinks facilitate the use of the eScience infrastructure, even complex workflows of composed web services. To complement the web search of GoWeb, other data source and more specialized information needs are tested in different prototypes. This includes patents and intranet search. Semantic search is applicable for these usage scenarios, but the developed systems also show limits of the semantic approach. That is the size, applicability and completeness of the integrated ontologies, as well as technical issues of text-extraction and meta-data information gathering. Additionally, semantic indexing as an alternative approach to implement semantic search is implemented and evaluated with a question answering benchmark. A semantic index can help to answer questions and address some limitations of GoWeb. Still the maintenance and optimization of such an index is a challenge, whereas GoWeb provides a straightforward system.
323

Analyse der Meinungsentwicklung in Online Foren – Konzept und Fallstudie

Kaiser, Carolin, Bodendorf, Freimut 22 May 2014 (has links) (PDF)
Das Web 2.0 ist u.a. auch eine weltweite Plattform für Meinungsäußerungen. Immer mehr Kunden diskutieren online über Produkte und tauschen Erfahrungen aus. Die Analyse der Online Beiträge stellt daher ein wichtiges Marktforschungsinstrument dar. Es wird ein Ansatz zur automatischen Identifikation, Aggregation und Analyse von Meinungen mittels Text Mining vorgestellt und dessen Anwendung an einem Beispiel aus der Sportartikelindustrie aufgezeigt.
324

Exploring Trends, Patterns and Characteristics of Quality Management Through Text Mining

Carnerud, Daniel January 2016 (has links)
At frequent intervals, new reports and papers are published stressing the importance of high quality and quality improvement measures in the public and private sector if Sweden is to survive as a welfare state and industrial nation. The situation seems not to be unique for Sweden: similar opinions can be heard in other parts of the world as well. In the 21st century, consumers and citizens should be provided with continuously improved quality at a lower cost, otherwise businesses are likely to go bankrupt and politicians may lose the trust of the people. Quality is, thus, a word that is used persistently by people in power and the social commentators of today. From this perspective, it might seem fitting that quality, quality management (QM), total quality management (TQM) and other closely related terminologies are well defined, to make possible constructive dialogue that will culminate in effective action. This is often not the case, which is why vision statements, campaigns and other quality improvement measures risk falling short before they are even launched. With this background, the purpose of this thesis is to facilitate fruitful dialogue by examining QM-research and how trends, terminologies and research focus have shifted over time. By increasing the understanding of how QM-research has evolved it is also possible to create a coherent overview which hopefully can help to reduce confusion and polarisation amongst scholars and practitioners. In this way, it might be possible to increase the number of successful quality improvement measures as well as to lay the foundations for sustainable and system-wide quality improvement actions in society at large. The thesis is based on three studies, of which the first is looking into conference proceedings from one of the globally most prominent scientific conferences on quality – the QMOD-ICQSS conference. The two subsequent studies use abstracts from three of the top ranked scientific journals dealing with quality - International Journal of Quality and Reliability Management, Total Quality Management Journal and Total Quality Management &amp; Business Excellence. All studies have been conducted according to text mining methodology, which entails usage of statistical tools in the form of hard- and software for data collection, modelling and analysis. The approach is exploratory and previously not verified with this purpose, which is why the three studies offer unique perspectives on the research field, at the same time as new methodological tools and approaches are investigated and tested. Through the studies it is possible to show occurrence of trends in research alignment as well as in publication design and popularity. The studies also identify central, perpetual, topics around which the research has been concentrated. These topics indicate that the research field, in spite of momentary trends and fashions, rests on a firm foundation regarding problem definition and approaches to solve them. Finally, a model is presented which summarizes the perspectives and outsets which distinguish QM and make it a research field in its own right. / Regelbundet publiceras det nya rapporter och artiklar där vikten av hög kvalitet liksom kvalitetsförbättrande åtgärder inom offentlig och privat sektor lyfts fram som oumbärliga för att Sverige ska överleva som välfärds- och näringslivsnation. Situationen tycks inte vara unik för Sverige, likande tongångar hörs även i andra delar av världen. På 2000-talet ska konsumenter och medborgare tillhandahållas ständigt förbättrad kvalitet till lägre kostnad, annars går företag i konkurs och politiska företrädare förlorar folkets förtroende. Kvalitet är alltså ett ord som används flitigt av dagens makthavare och samhällsdebattörer. Utifrån detta perspektiv kan det anses angeläget att kvalitet, kvalitetsutveckling, kvalitetsarbete och andra närbesläktade ord är väl definierade för att möjliggöra konstruktiv dialog som mynnar ut i verkningsfulla insatser. Så är dock ofta inte fallet - varför visioner, kampanjer och andra kvalitetsförbättrande initiativ riskerar att faller till korta innan de ens hunnit lanseras. Med denna bakgrund, är syftet med avhandlingen att underlätta givande diskussioner genom att närmare belysa forskningsfältet kvalitetsteknik, eller Quality Management (QM), och hur trender, termer och forskningsfokus har växlat över tid. Genom att öka förståelsen för forskningens utveckling över tid kan även en sammanhängande översikt skapas, vilka tillsammans förhoppningsvis kan bidra till att minska den förvirring och polarisering som råder både inom akademi liksom praktik. Därigenom kanske det är möjligt att öka mängden lyckade kvalitetssatsningar liksom att lägga grunden för ett långsiktigt hållbart och systemövergripande kvalitetsarbete i samhället i överlag. Avhandlingen baseras på tre studier, varav den första har tittat närmare på konferensbidrag från en av världens idag ledande forskningskonferenser om kvalitet – QMOD-ICQSS konferensen. De två senare studierna behandlar sammanfattningar från tre av de högst rankade forskningstidskrifterna med fokus på kvalitet – International Journal of Quality and Reliability Management, Total Quality Management Journal och Total Quality Management &amp; Business Excellence. Samtliga studier har utformats enligt text-mining metodik, vilket medför att statistiska hjälpmedel i form av hård och mjukvara har använts för datainsamling, modellering och analys. Angreppssättet är explorativt och tidigare ej beprövat i ovanstående syfte varför de tre studierna erbjuder unika perspektiv på forskningsområdet samtidigt som nya metodologiska verktyg och arbetssätt utforskats och utvärderats. Genom studierna går det att påvisa förekomsten av trender i forskningsinriktningar liksom i publikationernas utformning och popularitet. Studierna identifierar även centrala, återkommande teman kring vilka forskningen koncentrerats. Dessa teman indikerar att forskningsområdet, trots tillfälliga trender och moden, vilar på en stadigvarande grund gällande problemformuleringar och ansatser att lösa dessa. Slutligen presenteras en modell som sammanfattar de perspektiv och utgångspunkter som utmärker kvalitetstekniken (QM) och som gör det till ett forskningsområde i sin egen rätt. / <p>Vid tidpunkten för framläggandet av avhandlingen var följande delarbeten opublicerade: delarbete B accepterat för publicering och delarbete C inskickat.</p><p>At the time of the licentiate defence the following papers were unpublished: paper B accepted for publication and paper C submitted.</p>
325

Graphdatenbanken für die textorientierten e-Humanities

Efer, Thomas 08 February 2017 (has links)
Vor dem Hintergrund zahlreicher Digitalisierungsinitiativen befinden sich weite Teile der Geistes- und Sozialwissenschaften derzeit in einer Transition hin zur großflächigen Anwendung digitaler Methoden. Zwischen den Fachdisziplinen und der Informatik zeigen sich große Differenzen in der Methodik und bei der gemeinsamen Kommunikation. Diese durch interdisziplinäre Projektarbeit zu überbrücken, ist das zentrale Anliegen der sogenannten e-Humanities. Da Text der häufigste Untersuchungsgegenstand in diesem Feld ist, wurden bereits viele Verfahren des Text Mining auf Problemstellungen der Fächer angepasst und angewendet. Während sich langsam generelle Arbeitsabläufe und Best Practices etablieren, zeigt sich, dass generische Lösungen für spezifische Teilprobleme oftmals nicht geeignet sind. Um für diese Anwendungsfälle maßgeschneiderte digitale Werkzeuge erstellen zu können, ist eines der Kernprobleme die adäquate digitale Repräsentation von Text sowie seinen vielen Kontexten und Bezügen. In dieser Arbeit wird eine neue Form der Textrepräsentation vorgestellt, die auf Property-Graph-Datenbanken beruht – einer aktuellen Technologie für die Speicherung und Abfrage hochverknüpfter Daten. Darauf aufbauend wird das Textrecherchesystem „Kadmos“ vorgestellt, mit welchem nutzerdefinierte asynchrone Webservices erstellt werden können. Es bietet flexible Möglichkeiten zur Erweiterung des Datenmodells und der Programmfunktionalität und kann Textsammlungen mit mehreren hundert Millionen Wörtern auf einzelnen Rechnern und weitaus größere in Rechnerclustern speichern. Es wird gezeigt, wie verschiedene Text-Mining-Verfahren über diese Graphrepräsentation realisiert und an sie angepasst werden können. Die feine Granularität der Zugriffsebene erlaubt die Erstellung passender Werkzeuge für spezifische fachwissenschaftliche Anwendungen. Zusätzlich wird demonstriert, wie die graphbasierte Modellierung auch über die rein textorientierte Forschung hinaus gewinnbringend eingesetzt werden kann. / In light of the recent massive digitization efforts, most of the humanities disciplines are currently undergoing a fundamental transition towards the widespread application of digital methods. In between those traditional scholarly fields and computer science exists a methodological and communicational gap, that the so-called \\\"e-Humanities\\\" aim to bridge systematically, via interdisciplinary project work. With text being the most common object of study in this field, many approaches from the area of Text Mining have been adapted to problems of the disciplines. While common workflows and best practices slowly emerge, it is evident that generic solutions are no ultimate fit for many specific application scenarios. To be able to create custom-tailored digital tools, one of the central issues is to digitally represent the text, as well as its many contexts and related objects of interest in an adequate manner. This thesis introduces a novel form of text representation that is based on Property Graph databases – an emerging technology that is used to store and query highly interconnected data sets. Based on this modeling paradigm, a new text research system called \\\"Kadmos\\\" is introduced. It provides user-definable asynchronous web services and is built to allow for a flexible extension of the data model and system functionality within a prototype-driven development process. With Kadmos it is possible to easily scale up to text collections containing hundreds of millions of words on a single device and even further when using a machine cluster. It is shown how various methods of Text Mining can be implemented with and adapted for the graph representation at a very fine granularity level, allowing the creation of fitting digital tools for different aspects of scholarly work. In extended usage scenarios it is demonstrated how the graph-based modeling of domain data can be beneficial even in research scenarios that go beyond a purely text-based study.
326

Genderový pohled na prezentaci žen na Wikipedii / Gender View on Female Presentation on Wikipedia

Stančíková, Ľubica January 2019 (has links)
This thesis examines, characterizes and quantifies how women and men are being presented on Czech and Slovak Wikipedia and further elaborates on their possible differences. The theoretical part of the thesis describes the current state of Wikipedia and media in general and draws attention to gender inequality within its editorial base, which is linked to the general trend in free culture communities, where there is also a visible inequality. Furthermore, the theoretical part of the thesis also deals with the issue of gender in general and summarizes current state of knowledge and methods of studying the issue of gender on Wikipedia. The practical part of this thesis partly replicates the study First Gender, Second Sex: Gender Bias on Wikipedia (Graells- Garrido, Lalmas and Menczer, 2015) using RStudio programme to do basic text mining and quantitative analysis of biographical texts of men and women in both languages, tracking word frequencies from selected word categories, namely Gender, Reference to the opposite sex, Family and family status and Career. Overall, we have analysed 24 510 Slovak biographical articles and 110 866 Czech biographical articles, and our findings have confirmed an imbalance and stereotyping in the presentation of women on Wikipedia in both languages.
327

Analyse der Meinungsentwicklung in Online Foren – Konzept und Fallstudie

Kaiser, Carolin, Bodendorf, Freimut January 2010 (has links)
Das Web 2.0 ist u.a. auch eine weltweite Plattform für Meinungsäußerungen. Immer mehr Kunden diskutieren online über Produkte und tauschen Erfahrungen aus. Die Analyse der Online Beiträge stellt daher ein wichtiges Marktforschungsinstrument dar. Es wird ein Ansatz zur automatischen Identifikation, Aggregation und Analyse von Meinungen mittels Text Mining vorgestellt und dessen Anwendung an einem Beispiel aus der Sportartikelindustrie aufgezeigt.
328

Konzeption und Entwicklung eines automatisierten Workflows zur geovisuellen Analyse von georeferenzierten Textdaten(strömen) / Microblogging Content

Gröbe, Mathias 13 October 2015 (has links)
Die vorliegende Masterarbeit behandelt den Entwurf und die exemplarische Umsetzung eines Arbeitsablaufs zur Aufbereitung von georeferenziertem Microblogging Content. Als beispielhafte Datenquelle wurde Twitter herangezogen. Darauf basierend, wurden Überlegungen angestellt, welche Arbeitsschritte nötig und mit welchen Mitteln sie am besten realisiert werden können. Dabei zeigte sich, dass eine ganze Reihe von Bausteinen aus dem Bereich des Data Mining und des Text Mining für eine Pipeline bereits vorhanden sind und diese zum Teil nur noch mit den richtigen Einstellungen aneinandergereiht werden müssen. Zwar kann eine logische Reihenfolge definiert werden, aber weitere Anpassungen auf die Fragestellung und die verwendeten Daten können notwendig sein. Unterstützt wird dieser Prozess durch verschiedenen Visualisierungen mittels Histogrammen, Wortwolken und Kartendarstellungen. So kann neues Wissen entdeckt und nach und nach die Parametrisierung der Schritte gemäß den Prinzipien des Geovisual Analytics verfeinert werden. Für eine exemplarische Umsetzung wurde nach der Betrachtung verschiedener Softwareprodukte die für statistische Anwendungen optimierte Programmiersprache R ausgewählt. Abschließend wurden die Software mit Daten von Twitter und Flickr evaluiert. / This Master's Thesis deals with the conception and exemplary implementation of a workflow for georeferenced Microblogging Content. Data from Twitter is used as an example and as a starting point to think about how to build that workflow. In the field of Data Mining and Text Mining, there was found a whole range of useful software modules that already exist. Mostly, they only need to get lined up to a process pipeline using appropriate preferences. Although a logical order can be defined, further adjustments according to the research question and the data are required. The process is supported by different forms of visualizations such as histograms, tag clouds and maps. This way new knowledge can be discovered and the options for the preparation can be improved. This way of knowledge discovery is already known as Geovisual Analytics. After a review of multiple existing software tools, the programming language R is used to implement the workflow as this language is optimized for solving statistical problems. Finally, the workflow has been tested using data from Twitter and Flickr.
329

GoWeb: Semantic Search and Browsing for the Life Sciences

Dietze, Heiko 20 October 2010 (has links)
Searching is a fundamental task to support research. Current search engines are keyword-based. Semantic technologies promise a next generation of semantic search engines, which will be able to answer questions. Current approaches either apply natural language processing to unstructured text or they assume the existence of structured statements over which they can reason. This work provides a system for combining the classical keyword-based search engines with semantic annotation. Conventional search results are annotated using a customized annotation algorithm, which takes the textual properties and requirements such as speed and scalability into account. The biomedical background knowledge consists of the GeneOntology and Medical Subject Headings and other related entities, e.g. proteins/gene names and person names. Together they provide the relevant semantic context for a search engine for the life sciences. We develop the system GoWeb for semantic web search and evaluate it using three benchmarks. It is shown that GoWeb is able to aid question answering with success rates up to 79%. Furthermore, the system also includes semantic hyperlinks that enable semantic browsing of the knowledge space. The semantic hyperlinks facilitate the use of the eScience infrastructure, even complex workflows of composed web services. To complement the web search of GoWeb, other data source and more specialized information needs are tested in different prototypes. This includes patents and intranet search. Semantic search is applicable for these usage scenarios, but the developed systems also show limits of the semantic approach. That is the size, applicability and completeness of the integrated ontologies, as well as technical issues of text-extraction and meta-data information gathering. Additionally, semantic indexing as an alternative approach to implement semantic search is implemented and evaluated with a question answering benchmark. A semantic index can help to answer questions and address some limitations of GoWeb. Still the maintenance and optimization of such an index is a challenge, whereas GoWeb provides a straightforward system.
330

Text mining for social harm and criminal justice application

Ritika Pandey (9147281) 30 July 2020 (has links)
Increasing rates of social harm events and plethora of text data demands the need of employing text mining techniques not only to better understand their causes but also to develop optimal prevention strategies. In this work, we study three social harm issues: crime topic models, transitions into drug addiction and homicide investigation chronologies. Topic modeling for the categorization and analysis of crime report text allows for more nuanced categories of crime compared to official UCR categorizations. This study has important implications in hotspot policing. We investigate the extent to which topic models that improve coherence lead to higher levels of crime concentration. We further explore the transitions into drug addiction using Reddit data. We proposed a prediction model to classify the users’ transition from casual drug discussion forum to recovery drug discussion forum and the likelihood of such transitions. Through this study we offer insights into modern drug culture and provide tools with potential applications in combating opioid crises. Lastly, we present a knowledge graph based framework for homicide investigation chronologies that may aid investigators in analyzing homicide case data and also allow for post hoc analysis of key features that determine whether a homicide is ultimately solved. For this purpose<br>we perform named entity recognition to determine witnesses, detectives and suspects from chronology, use keyword expansion to identify various evidence types and finally link these entities and evidence to construct a homicide investigation knowledge graph. We compare the performance over several choice of methodologies for these sub-tasks and analyze the association between network statistics of knowledge graph and homicide solvability. <br>

Page generated in 0.0646 seconds