• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 238
  • 124
  • 44
  • 38
  • 30
  • 29
  • 24
  • 24
  • 13
  • 7
  • 6
  • 6
  • 5
  • 5
  • 5
  • Tagged with
  • 619
  • 619
  • 141
  • 128
  • 115
  • 113
  • 87
  • 86
  • 85
  • 81
  • 80
  • 76
  • 65
  • 64
  • 64
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
31

Using Dependency Parses to Augment Feature Construction for Text Mining

Guo, Sheng 18 June 2012 (has links)
With the prevalence of large data stored in the cloud, including unstructured information in the form of text, there is now an increased emphasis on text mining. A broad range of techniques are now used for text mining, including algorithms adapted from machine learning, NLP, computational linguistics, and data mining. Applications are also multi-fold, including classification, clustering, segmentation, relationship discovery, and practically any task that discovers latent information from written natural language. Classical mining algorithms have traditionally focused on shallow representations such as bag-of-words and similar feature-based models. With the advent of modern high performance computing, deep sentence level linguistic analysis of large scale text corpora has become practical. In this dissertation, we evaluate the utility of dependency parses as textual features for different text mining applications. Dependency parsing is one form of syntactic parsing, based on the dependency grammar implicit in sentences. While dependency parsing has traditionally been used for text understanding, we investigate here its application to supply features for text mining applications. We specifically focus on three methods to construct textual features from dependency parses. First, we consider a dependency parse as a general feature akin to a traditional bag-of-words model. Second, we consider the dependency parse as the basis to build a feature graph representation. Finally, we use dependency parses in a supervised collocation mining method for feature selection. To investigate these three methods, several applications are studied, including: (i) movie spoiler detection, (ii) text segmentation, (iii) query expansion, and (iv) recommender systems. / Ph. D.
32

Intégration du web social dans les systèmes de recommandation / Social web integration in recommendation systems

Nana jipmo, Coriane 19 December 2017 (has links)
Le Web social croît de plus en plus et donne accès à une multitude de ressources très variées, qui proviennent de sites de partage tels que del.icio.us, d’échange de messages comme Twitter, des réseaux sociaux à finalité professionnelle, comme LinkedIn, ou plus généralement à finalité sociale, comme Facebook et LiveJournal. Un même individu peut être inscrit et actif sur différents réseaux sociaux ayant potentiellement des finalités différentes, où il publie des informations diverses et variées, telles que son nom, sa localité, ses communautés, et ses différentes activités. Ces informations (textuelles), au vu de la dimension internationale du Web, sont par nature, d’une part multilingue, et d’autre part, intrinsèquement ambiguë puisqu’elles sont éditées par les individus en langage naturel dans un vocabulaire libre. De même, elles sont une source de données précieuses, notamment pour les applications cherchant à connaître leurs utilisateurs afin de mieux comprendre leurs besoins et leurs intérêts. L’objectif de nos travaux de recherche est d’exploiter, en utilisant essentiellement l’encyclopédie Wikipédia, les ressources textuelles des utilisateurs extraites de leurs différents réseaux sociaux afin de construire un profil élargi les caractérisant et exploitable par des applications telles que les systèmes de recommandation. En particulier, nous avons réalisé une étude afin de caractériser les traits de personnalité des utilisateurs. De nombreuses expérimentations, analyses et évaluations ont été réalisées sur des données réelles collectées à partir de différents réseaux sociaux. / The social Web grows more and more and gives through the web, access to a wide variety of resources, like sharing sites such as del.icio.us, exchange messages as Twitter, or social networks with the professional purpose such as LinkedIn, or more generally for social purposes, such as Facebook and LiveJournal. The same individual can be registered and active on different social networks (potentially having different purposes), in which it publishes various information, which are constantly growing, such as its name, locality, communities, various activities. The information (textual), given the international dimension of the Web, is inherently multilingual and intrinsically ambiguous, since it is published in natural language in a free vocabulary by individuals from different origin. They are also important, specially for applications seeking to know their users in order to better understand their needs, activities and interests. The objective of our research is to exploit using essentially the Wikpédia encyclopedia, the textual resources extracted from the different social networks of the same individual in order to construct his characterizing profile, which can be exploited in particular by applications seeking to understand their users, such as recommendation systems. In particular, we conducted a study to characterize the personality traits of users. Many experiments, analyzes and evaluations were carried out on real data collected from different social networks.
33

Nutzen und Benutzen von Text Mining für die Medienanalyse

Richter, Matthias 26 January 2011 (has links) (PDF)
Einerseits werden bestehende Ergebnisse aus so unterschiedlichen Richtungen wie etwa der empirischen Medienforschung und dem Text Mining zusammengetragen. Es geht dabei um Inhaltsanalyse, von Hand, mit Unterstützung durch Computer, oder völlig automatisch, speziell auch im Hinblick auf die Faktoren wie Zeit, Entwicklung und Veränderung. Die Verdichtung und Zusammenstellung liefert nicht nur einen Überblick aus ungewohnter Perspektive, in diesem Prozess geschieht auch die Synthese von etwas Neuem. Die Grundthese bleibt dabei immer eine einschließende: So wenig es möglich scheint, dass in Zukunft der Computer Analysen völlig ohne menschliche Interpretation betreiben kann und wird, so wenig werden menschliche Interpretatoren noch ohne die jeweils bestmögliche Unterstützung des Rechners in der Lage sein, komplexe Themen zeitnah umfassend und ohne allzu große subjektive Einflüsse zu bearbeiten – und so wenig werden es sich substantiell wertvolle Analysen noch leisten können, völlig auf derartige Hilfen und Instrumente der Qualitätssicherung zu verzichten. Daraus ergeben sich unmittelbar Anforderungen: Es ist zu klären, wo die Stärken und Schwächen von menschlichen Analysten und von Computerverfahren liegen. Darauf aufbauend gilt es eine optimale Synthese aus beider Seiten Stärken und unter Minimierung der jeweiligen Schwächen zu erzielen. Praktisches Ziel ist letztlich die Reduktion von Komplexität und die Ermöglichung eines Ausgangs aus dem Zustand des systembedingten „overnewsed but uninformed“-Seins.
34

Entity-Centric Text Mining for Historical Documents

Coll Ardanuy, Maria 07 July 2017 (has links)
No description available.
35

Information extraction from chemical patents

Jessop, David M. January 2011 (has links)
The automated extraction of semantic chemical data from the existing literature is demonstrated. For reasons of copyright, the work is focused on the patent literature, though the methods are expected to apply equally to other areas of the chemical literature. Hearst Patterns are applied to the patent literature in order to discover hyponymic relations describing chemical species. The acquired relations are manually validated to determine the precision of the determined hypernyms (85.0%) and of the asserted hyponymic relations (94.3%). It is demonstrated that the system acquires relations that are not present in the ChEBI ontology, suggesting that it could function as a valuable aid to the ChEBI curators. The relations discovered by this process are formalised using the Web Ontology Language (OWL) to enable re-use. PatentEye - an automated system for the extraction of reactions from chemical patents and their conversion to Chemical Markup Language (CML) - is presented. Chemical patents published by the European Patent Office over a ten-week period are used to demonstrate the capability of PatentEye - 4444 reactions are extracted with a precision of 78% and recall of 64% with regards to determining the identity and amount of reactants employed and an accuracy of 92% with regards to product identification. NMR spectra are extracted from the text using OSCAR3, which is developed to greatly increase recall. The resulting system is presented as a significant advancement towards the large-scale and automated extraction of high-quality reaction information. Extended Polymer Markup Language (EPML), a CML dialect for the description of Markush structures as they are presented in the literature, is developed. Software to exemplify and to enable substructure searching of EPML documents is presented. Further work is recommended to refine the language and code to publication-quality before they are presented to the community.
36

Extraction of chemical structures and reactions from the literature

Lowe, Daniel Mark January 2012 (has links)
The ever increasing quantity of chemical literature necessitates the creation of automated techniques for extracting relevant information. This work focuses on two aspects: the conversion of chemical names to computer readable structure representations and the extraction of chemical reactions from text. Chemical names are a common way of communicating chemical structure information. OPSIN (Open Parser for Systematic IUPAC Nomenclature), an open source, freely available algorithm for converting chemical names to structures was developed. OPSIN employs a regular grammar to direct tokenisation and parsing leading to the generation of an XML parse tree. Nomenclature operations are applied successively to the tree with many requiring the manipulation of an in-memory connection table representation of the structure under construction. Areas of nomenclature supported are described with attention being drawn to difficulties that may be encountered in name to structure conversion. Results on sets of generated names and names extracted from patents are presented. On generated names, recall of between 96.2% and 99.0% was achieved with a lower bound of 97.9% on precision with all results either being comparable or superior to the tested commercial solutions. On the patent names OPSIN s recall was 2-10% higher than the tested solutions when the patent names were processed as found in the patents. The uses of OPSIN as a web service and as a tool for identifying chemical names in text are shown to demonstrate the direct utility of this algorithm. A software system for extracting chemical reactions from the text of chemical patents was developed. The system relies on the output of ChemicalTagger, a tool for tagging words and identifying phrases of importance in experimental chemistry text. Improvements to this tool required to facilitate this task are documented. The structure of chemical entities are where possible determined using OPSIN in conjunction with a dictionary of name to structure relationships. Extracted reactions are atom mapped to confirm that they are chemically consistent. 424,621 atom mapped reactions were extracted from 65,034 organic chemistry USPTO patents. On a sample of 100 of these extracted reactions chemical entities were identified with 96.4% recall and 88.9% precision. Quantities could be associated with reagents in 98.8% of cases and 64.9% of cases for products whilst the correct role was assigned to chemical entities in 91.8% of cases. Qualitatively the system captured the essence of the reaction in 95% of cases. This system is expected to be useful in the creation of searchable databases of reactions from chemical patents and in facilitating analysis of the properties of large populations of reactions.
37

The Business Value of Text Mining

Stolt, Richard January 2017 (has links)
Text mining is an enabling technology that will come to change the process for how businesses derive insights & knowledge from the textual data available to them. The current literature has its focus set on the text mining algorithms and techniques, whereas the practical aspects of text mining are lacking. The efforts of this study aims at helping companies understand what the business value of text mining is with the help of a case study. Subsequently, an SMS-survey method was used to identify additional business areas where text mining could be used to derive business value from. A literature review was conducted to conceptualize the business value of text mining, thus a concept matrix was established. Here a business category and its relative: derived insights & knowledge, domain, and data source are specified. The concept matrix was from then on used to decide when information was of business value, to prove that text mining could be used to derive information of business value.Text mining analyses was conducted on traffic school data of survey feedback. The results were several patterns, where the business value was derived mainly for the categories of Quality Control & Quality Assurance. After comparing the results of the SMS-survey with the case study empiricism, some difficulties emerged in the categorization of derived information, implying the categories are required to become more specific and distinct. Furthermore, the concept matrix does not comprise all of the business categories that are sure to exist.
38

Mining patient journeys from healthcare narratives

Dehghan, Azad January 2015 (has links)
The aim of the thesis is to investigate the feasibility of using text mining methods to reconstruct patient journeys from unstructured clinical narratives. A novel method to extract and represent patient journeys is proposed and evaluated in this thesis. A composition of methods were designed, developed and evaluated to this end; which included health-related concept extraction, temporal information extraction, and concept clustering and automated work-flow generation. A suite of methods to extract clinical information from healthcare narratives were proposed and evaluated in order to enable chronological ordering of clinical concepts. Specifically, we proposed and evaluated a data-driven method to identify key clinical events (i.e., medical problems, treatments, and tests) using a sequence labelling algorithm, CRF, with a combination of lexical and syntactic features, and a rule-based post-processing method including label correction, boundary adjustment and false positive filter. The method was evaluated as part of the 2012 i2b2 challengeand achieved a state-of-the-art performance with a strict and lenient micro F1-measure of 83.45% and 91.13% respectively. A method to extract temporal expressions using a hybrid knowledge- (dictionary and rules) and data-driven (CRF) has been proposed and evaluated. The method demonstrated the state-of-the-art performance at the 2012 i2b2 challenge: F1-measure of 90.48% and accuracy of 70.44% for identification and normalisation respectively. For temporal ordering of events we proposed and evaluated a knowledge-driven method, with a F1-measure of 62.96% (considering the reduced temporal graph) or 70.22% for extraction of temporal links. The method developed consisted of initial rule-based identification and classification components which utilised contextual lexico-syntactic cues for inter-sentence links, string similarity for co-reference links, and subsequently a temporal closure component to calculate transitive relations of the extracted links. In a case study of survivors of childhood central nervous system tumours (medulloblastoma), qualitative evaluation showed that we were able to capture specific trends part of patient journeys. An overall quantitative evaluation score (average precision and recall) of 94-100% for individual and 97% for aggregated patient journeys were also achieved. Hence, indicating that text mining methods can be used to identify, extract and temporally organise key clinical concepts that make up a patient’s journey. We also presented an analyses of healthcare narratives, specifically exploring the content of clinical and patient narratives by using methods developed to extract patient journeys. We found that health-related quality of life concepts are more common in patient narrative, while clinical concepts (e.g., medical problems, treatments, tests) are more prevalent in clinical narratives. In addition, while both aggregated sets of narratives contain all investigated concepts; clinical narratives contain, proportionally, more health-related quality of life concepts than clinical concepts found in patient narratives. These results demonstrate that automated concept extraction, in particular health-related quality of life, as part of standard clinical practice is feasible. The proposed method presented herein demonstrated that text mining methods can be efficiently used to identify, extract and temporally organise key clinical concepts that make up a patient’s journey in a healthcare system. Automated reconstruction of patient journeys can potentially be of value for clinical practitioners and researchers, to aid large scale analyses of implemented care pathways, and subsequently help monitor, compare, develop and adjust clinical guidelines both in the areas of chronic diseases where there is plenty of data and rare conditions where potentially there are no established guidelines.
39

Analyse de données textuelles d'un forum médical pour évaluer le ressenti exprimé par les internautes au sujet des antidépresseurs et des anxyolitiques / Text Mining Analysis of an Online Forum to Evaluate Users’ Perception about Antidepressants and Anxiolytics

Abbé, Adeline 08 November 2016 (has links)
L’analyse de donnée textuelle est facilitée par l’utilisation du text mining (TM) permettant l’automatisation de l’analyse de contenu et possède de nombreuses applications en santé. L’une d’entre elles est l’utilisation du TM pour explorer le contenu des messages échangés sur Internet.Nous avons effectué une revue de la littérature systématique afin d’identifier les applications du TM en santé mentale. De plus, le TM a permis d’explorer les préoccupations des utilisateurs du forum Doctissimo.com au sujet des antidépresseurs et anxiolytiques entre 2013 et 2015 via l’analyse des fréquences des mots, des cooccurrences, de la modélisation thématique (LDA) et de la popularité des thèmes.Les quatre applications du TM en santé mentale sont l’analyse des récits des patients (psychopathologie), le ressenti exprimé sur Internet, le contenu des dossiers médicaux, et les thèmes de la littérature médicale. Quatre grands thèmes ont été identifiés sur le forum: le sevrage (le plus fréquent), l’escitalopram, l’anxiété de l’effet du traitement et les effets secondaires. Alors que les effets indésirables des traitements est un sujet qui a tendance à décroitre, les interrogations sur les effets du sevrage et le changement de traitement sont grandissantes et associées aux antidépresseurs.L’analyse du contenu d’Internet permet de comprendre les préoccupations des patients et le soutien, et améliorer l’adhérence au traitement. / Analysis of textual data is facilitated by the use of text mining (TM) allowing to automate content analysis, and is implemented in several application in healthcare. These include the use of TM to explore the content of posts shared online.We performed a systematique literature review to identify the application of TM in psychiatry. In addition, we used TM to explore users’ concerns of an online forum dedicated to antidepressants and anxiolytics between 2013 and 2015 analysing words frequency, cooccurences, topic models (LDA) and popularity of topics.The four TM applications in psychiatry retrieved are the analysis of patients' narratives (psychopathology), feelings expressed online, content of medical records, and biomedical literature screening. Four topics are identified on the forum: withdrawals (most frequent), escitalopram, anxiety related to treatment effect and secondary effects. While concerns around secondary effects of treatment declined, questions about withdrawals effects and changing medication increased related to several antidepressants.Content analysis of online textual data allow us to better understand major concerns of patients, support provided, and to improve the adherence of treatment.
40

Design und Implementierung eines Software-Ökosystems für textbasierte Inhaltsanalysen in den Sozialwissenschaften mit Schwerpunkt auf der Detektion schwacher Signale

Kahmann, Christian 14 June 2021 (has links)
Der Einsatz von automatisierten quantitativen Methoden in den Sozialwissenschaften gewinnt stetig an Bedeutung. Dies hat zum einen mit der rasant wachsenden Menge und Verfügbarkeit digital vorliegender Daten zu tun. Zum anderen erlauben es innovative automatisierte Ansätze, Ergebnisse zu produzieren, welche durch qualitative Arbeit allein nicht möglich wären. Die Implementierung innovativer Algorithmen zur Anwendung quantitativer Verfahren bedarf jedoch eines großen Maßes an Wissen im Bereich der Programmierung sowie der Funktionsweise der anzuwendenden Methoden. Da dieses Expertenwissen aber nur in den wenigsten Fällen in rein sozialwissenschaftlichen Projekten vorhanden ist, ist es notwendig, andere Lösungsmöglichkeiten zur Anwendung automatisierter quantitativer Verfahren in den Sozialwissenschaften zu nutzen. Lediglich die Bereiche der Computational Social Science sowie die Digital Humanities stellen Forschungsbereiche der Sozialwissenschaften dar, welche als Vorreiter bereits Erfahrungen im Umgang mit automatisierten quantitativen Verfahren aufweisen. Eine mögliche Lösung für den breiten Einsatz von automatisierten Verfahren in den gesamten Sozialwissenschaften ist die Erstellung und Anwendung von Text-Mining-Infrastrukturen, die speziell für den Einsatz in den Sozialwissenschaften ausgerichtet sind. Diese erlauben es Sozialwissenschaftlern, mit einer vergleichsweise geringen Einstiegshürde aktuelle Verfahren und Forschungsansätze der Bereiche Text Mining und Machine Learning auf ihre eigenen Forschungsfragen und Daten anwenden zu können. Damit diese Infrastrukturen aber auch tatsächlich einen deutlichen Mehrwert für den Sozialwissenschaftler darstellen, müssen verschiedene Anforderungen erfüllt werden. Diese teilen sich auf in generelle an Software gestellte Forderungen wie beispielsweise Skalierbarkeit und Performanz sowie in spezifische Anforderungen für die Anwendung in den Sozialwissenschaften. Zu diesen speziellen Anforderungen zählt die Möglichkeit des Umgangs mit verschiedenartigen Datengrundlagen. In dieser Arbeit wird der Fokus auf textuelle Daten gelegt, wobei auch diese sehr große Unterschiede in ihrer Charakteristik und damit in deren notwendiger Verarbeitung aufweisen. Es werden darüber hinaus drei Schlüsselanforderungen identifiziert, die für den Einsatz inden Sozialwissenschaften essentiell sind. Die erste Schlüsselanforderung beschreibt die generelle Ausrichtung einer Text-MiningInfrastruktur als generische Plattform, welche durch die Merkmale von Anpassbarkeit, Erweiterbarkeit sowie der Möglichkeit des Exportes von Ergebnissen an die zahlreichen zum Teil sehr diversen Forschungsfragen der Sozialwissenschaften assimiliert werden kann. Die zweite Schlüsselanforderung stellt die Notwendigkeit, qualitative und quantitative Forschungsdesigns durch die Implementierung von dafür vorgesehenen Interfaces vereinen zu können, in den Vordergrund. Beide Forschungsansätze können auf diese Weise voneinander profitieren. Zuletzt wird noch die Bedeutung von schwachen Signalen als Forschungsgrundlage in den Sozialwissenschaften hervorgehoben. Für alle drei dieser Schlüsselanforderungen als auch die übrigen abgeleiteten Anforderungen an eine Text-Mining-Infrastruktur für den Einsatz in den Sozialwissenschaften werden mögliche Implementierungen und Lösungsansätze präsentiert. Dies geschieht zum einen durch die Beschreibung des Designs und der Entwicklung genau einer solchen Text-Mining-Infrastruktur am Beispiel des interactive Leipzig Corpus Miner. Es werden notwendige Abwägungen bezüglich verschiedener Implementierungsstrategien und Softwaredesignentscheidungen, welche zur Umsetzung der gestellten Anforderungen notwendig sind, erläutert. Zum anderen wird ein Maß zur Quantifizierung von diachronen Kontextänderungen in der Form der Kontextvolatilität vorgestellt. Das Maß wird im Laufe der Arbeit zur Detektion und Analyse schwacher Signale in textuellen Daten eingesetzt. Im letzten Teil der Arbeit werden die realisierten Umsetzungen der Schlüsselanforderungen am Beispiel verschiedener durchgeführter Projekte aufgezeigt. Die wichtigsten Beiträge dieser Arbeit sind damit zum Ersten eine Aufstellung spezifischer Anforderungen an Text-Mining-Infrastrukturen für den Einsatz in den Sozialwissenschaften. Zum Zweiten wird darauf aufbauend ein mögliches Design einer daraus resultierenden Forschungsumgebung detailliert erläutert. Den dritten Beitrag dieser Arbeit stellt die Weiterentwicklung der Kontextvolatilität als Verfahren zur Detektion schwacher Signale in diachronen Daten dar.

Page generated in 0.1018 seconds