Global ETD Search

121	Real Time Presentation Ortiz, Agustin, III 23 June 2017 (has links) No description available. Computer Science presentation inject documents querying text mining data
122	The Psychology of a Web Search Engine Ogbonna, Antoine I. January 2011 (has links) No description available. Computer Science Mathematics Search Yoool Search engine Text mining
123	A Sentiment Analysis Model Integrating Multiple Algorithms and Diverse Features Xu, Zhe 03 September 2010 (has links) No description available. Computer Science text mining sentiment analysis measure model
124	Examining the Educational Depth of Medical Case Reports and Radiology with Text Mining Collinsworth, Amy L. 12 1900 (has links) The purpose of this dissertation was to use the technology of text mining and topic modeling to explore unobserved themes of medical case reports that involve medical imaging. Case reports have a valuable place in medical research because they provide educational benefits, offer evidence, and encourage discussions. Their form has evolved throughout the years, but they have remained a key staple in providing important information to the medical communities around the world with educational context and illuminating visuals. Examining medical case reports that have been published throughout the years on multiple medical subjects can be challenging, therefore text mining and topic modeling methods were used to analyze a large set of abstracts from medical case reports involving radiology. The total number of abstracts used for the data analysis was 68,845 that were published between the years 1975 to 2022. The findings indicate that text mining and topic modeling can offer a unique and reproducible approach to examine a large quantity of abstracts for theme analysis. Case reports Medical imaging Radiology Text mining Topic modeling
125	Evaluation of Word and Paragraph Embeddings and Analogical Reasoning as an Alternative to Term Frequency-Inverse Document Frequency-based Classification in Support of Biocuration Sullivan, Daniel Edward 07 June 2016 (has links) This research addresses the problem, can unsupervised learning generate a representation that improves on the commonly used term frequency-inverse document frequency (TF-IDF ) representation by capturing semantic relations? The analysis measures the quality of sentence classification using term TF-IDF representations, and finds a practical upper limit to precision and recall in a biomedical text classification task (F1-score of 0.85). Arguably, one could use ontologies to supplement TF-IDF, but ontologies are sparse in coverage and costly to create. This prompts a correlated question: can unsupervised learning capture semantic relations at least as well as existing ontologies, and thus supplement existing sparse ontologies? A shallow neural network implementing the Skip-Gram algorithm is used to generate semantic vectors using a corpus of approximately 2.4 billion words. The ability to capture meaning is assessed by comparing semantic vectors generated with MESH. Results indicate that semantic vectors trained by unsupervised methods capture comparable levels of semantic features in some cases, such as amino acid (92% of similarity represented in MESH), but perform substantially poorer in more expansive topics, such as pathogenic bacteria (37.8% similarity represented in MESH). Possible explanations for this difference in performance are proposed along with a method to combine manually curated ontologies with semantic vector spaces to produce a more comprehensive representation than either alone. Semantic vectors are also used as representations for paragraphs, which, when used for classification, achieve an F1-score of 0.92. The results of classification and analogical reasoning tasks are promising but a formal model of semantic vectors, subject to the constraints of known linguistic phenomenon, is needed. This research includes initial steps for developing a formal model of semantic vectors based on a combination of linear algebra and fuzzy set theory subject to the semantic molecularism linguistic model. This research is novel in its analysis of semantic vectors applied to the biomedical domain, analysis of different performance characteristics in biomedical analogical reasoning tasks, comparison semantic relations captured by between vectors and MESH, and the initial development of a formal model of semantic vectors. / Ph. D. text mining Machine learning biocuration linguistics natural language processing
126	Predicting the “helpfulness” of online consumer reviews Singh, J.P., Irani, S., Rana, Nripendra P., Dwivedi, Y.K., Saumya, S., Kumar Roy, P. 25 September 2020 (has links) Yes / Online shopping is increasingly becoming people's first choice when shopping, as it is very convenient to choose products based on their reviews. Even for moderately popular products, there are thousands of reviews constantly being posted on e-commerce sites. Such a large volume of data constantly being generated can be considered as a big data challenge for both online businesses and consumers. That makes it difficult for buyers to go through all the reviews to make purchase decisions. In this research, we have developed models based on machine learning that can predict the helpfulness of the consumer reviews using several textual features such as polarity, subjectivity, entropy, and reading ease. The model will automatically assign helpfulness values to an initial review as soon as it is posted on the website so that the review gets a fair chance of being viewed by other buyers. The results of this study will help buyers to write better reviews and thereby assist other buyers in making their purchase decisions, as well as help businesses to improve their websites. Helpfulness Online user reviews Product features Product ranking Text mining
127	Entwicklung eines generischen Vorgehensmodells für Text Mining Schieber, Andreas, Hilbert, Andreas 29 April 2014 (has links) (PDF) Vor dem Hintergrund des steigenden Interesses von computergestützter Textanalyse in Forschung und Praxis entwickelt dieser Beitrag auf Basis aktueller Literatur ein generisches Vorgehensmodell für Text-Mining-Prozesse. Das Ziel des Beitrags ist, die dabei anfallenden, umfangreichen Aktivitäten zu strukturieren und dadurch die Komplexität von Text-Mining-Vorhaben zu reduzieren. Das Forschungsziel stützt sich auf die Tatsache, dass im Rahmen einer im Vorfeld durchgeführten, systematischen Literatur-Review keine detaillierten, anwendungsneutralen Vorgehensmodelle für Text Mining identifiziert werden konnten. Aufbauend auf den Erkenntnissen der Literatur-Review enthält das resultierende Modell daher sowohl induktiv begründete Komponenten aus spezifischen Ansätzen als auch aus literaturbasierten Anforderungen deduktiv abgeleitete Bestandteile. Die Evaluation des Artefakts belegt die Nützlichkeit des Vorgehensmodells im Vergleich mit dem bisherigen Forschungsstand. Business Intelligence Text Mining Computerlinguistik Vorgehensmodell Business Intelligence Text Mining Computational Linguistics Process Model ddc:004 rvk:ST 505
128	Unsupervised Natural Language Processing for Knowledge Extraction from Domain-specific Textual Resources Hänig, Christian 25 April 2013 (has links) (PDF) This thesis aims to develop a Relation Extraction algorithm to extract knowledge out of automotive data. While most approaches to Relation Extraction are only evaluated on newspaper data dealing with general relations from the business world their applicability to other data sets is not well studied. Part I of this thesis deals with theoretical foundations of Information Extraction algorithms. Text mining cannot be seen as the simple application of data mining methods to textual data. Instead, sophisticated methods have to be employed to accurately extract knowledge from text which then can be mined using statistical methods from the field of data mining. Information Extraction itself can be divided into two subtasks: Entity Detection and Relation Extraction. The detection of entities is very domain-dependent due to terminology, abbreviations and general language use within the given domain. Thus, this task has to be solved for each domain employing thesauri or another type of lexicon. Supervised approaches to Named Entity Recognition will not achieve reasonable results unless they have been trained for the given type of data. The task of Relation Extraction can be basically approached by pattern-based and kernel-based algorithms. The latter achieve state-of-the-art results on newspaper data and point out the importance of linguistic features. In order to analyze relations contained in textual data, syntactic features like part-of-speech tags and syntactic parses are essential. Chapter 4 presents machine learning approaches and linguistic foundations being essential for syntactic annotation of textual data and Relation Extraction. Chapter 6 analyzes the performance of state-of-the-art algorithms of POS tagging, syntactic parsing and Relation Extraction on automotive data. The findings are: supervised methods trained on newspaper corpora do not achieve accurate results when being applied on automotive data. This is grounded in various reasons. Besides low-quality text, the nature of automotive relations states the main challenge. Automotive relation types of interest (e. g. component – symptom) are rather arbitrary compared to well-studied relation types like is-a or is-head-of. In order to achieve acceptable results, algorithms have to be trained directly on this kind of data. As the manual annotation of data for each language and data type is too costly and inflexible, unsupervised methods are the ones to rely on. Part II deals with the development of dedicated algorithms for all three essential tasks. Unsupervised POS tagging (Chapter 7) is a well-studied task and algorithms achieving accurate tagging exist. All of them do not disambiguate high frequency words, only out-of-lexicon words are disambiguated. Most high frequency words bear syntactic information and thus, it is very important to differentiate between their different functions. Especially domain languages contain ambiguous and high frequent words bearing semantic information (e. g. pump). In order to improve POS tagging, an algorithm for disambiguation is developed and used to enhance an existing state-of-the-art tagger. This approach is based on context clustering which is used to detect a word type’s different syntactic functions. Evaluation shows that tagging accuracy is raised significantly. An approach to unsupervised syntactic parsing (Chapter 8) is developed in order to suffice the requirements of Relation Extraction. These requirements include high precision results on nominal and prepositional phrases as they contain the entities being relevant for Relation Extraction. Furthermore, accurate shallow parsing is more desirable than deep binary parsing as it facilitates Relation Extraction more than deep parsing. Endocentric and exocentric constructions can be distinguished and improve proper phrase labeling. unsuParse is based on preferred positions of word types within phrases to detect phrase candidates. Iterating the detection of simple phrases successively induces deeper structures. The proposed algorithm fulfills all demanded criteria and achieves competitive results on standard evaluation setups. Syntactic Relation Extraction (Chapter 9) is an approach exploiting syntactic statistics and text characteristics to extract relations between previously annotated entities. The approach is based on entity distributions given in a corpus and thus, provides a possibility to extend text mining processes to new data in an unsupervised manner. Evaluation on two different languages and two different text types of the automotive domain shows that it achieves accurate results on repair order data. Results are less accurate on internet data, but the task of sentiment analysis and extraction of the opinion target can be mastered. Thus, the incorporation of internet data is possible and important as it provides useful insight into the customer\'s thoughts. To conclude, this thesis presents a complete unsupervised workflow for Relation Extraction – except for the highly domain-dependent Entity Detection task – improving performance of each of the involved subtasks compared to state-of-the-art approaches. Furthermore, this work applies Natural Language Processing methods and Relation Extraction approaches to real world data unveiling challenges that do not occur in high quality newspaper corpora. Text Mining Sprachverarbeitung Informationsextraktion Relationsextraktion POS Tagging Parsing Text Mining NLP Information Extraction Relation Extraction POS Tagging Parsing ddc:500
129	Idea Mining Schieber, Andreas, Kruse, Paul 17 April 2014 (has links) (PDF) Motiviert durch den Erfolg des Web 2.0 und Social Media in vielen Bereichen des öffentlichen Lebens und der damit verbundenen Open-Innovation-Bewegung, die Kunden aktiv in den Innovationsprozess einbezieht, schlägt dieser Beitrag eine Integration von Wissensmanagement und Text Mining zur Verbesserung dieses Innovationsprozesses vor. Durch den beschriebenen Ansatz werden Kunden nicht nur motiviert, ihre Ideen und Bedürfnisse auf webbasierten Kommunikationsplattformen preiszugeben, sondern die entstehenden, textbasierten Daten können automatisiert ausgewertet und zur zielgerichteten und zeitnahen Weiterentwicklung der Produkte eingesetzt werden. Anhand zweier Anwendungsszenarien aus der Praxis werden das resultierende Prozessmodell dargestellt und dessen Potenziale veranschaulicht. Innovation Wissensmanagement Text Mining BPMN Innovation Knowledge Management Text Mining BPMN ddc:004 rvk:ST 515 rvk:QP 650
130	Maladies rares et "Big Data" : solutions bioinformatiques vers une analyse guidée par les connaissances : applications aux ciliopathies / Rare diseases and big data : biocomputing solutions towards knowledge-guided analyses : applications to ciliopathies Chennen, Kirsley 14 October 2016 (has links) Au cours de la dernière décennie, la recherche biomédicale et la pratique médicale ont été révolutionné par l'ère post-génomique et l'émergence des « Big Data » en biologie. Il existe toutefois, le cas particulier des maladies rares caractérisées par la rareté, allant de l’effectif des patients jusqu'aux connaissances sur le domaine. Néanmoins, les maladies rares représentent un réel intérêt, car les connaissances fondamentales accumulées en temps que modèle d'études et les solutions thérapeutique qui en découlent peuvent également bénéficier à des maladies plus communes. Cette thèse porte sur le développement de nouvelles solutions bioinformatiques, intégrant des données Big Data et des approches guidées par la connaissance pour améliorer l'étude des maladies rares. En particulier, mon travail a permis (i) la création de PubAthena, un outil de criblage de la littérature pour la recommandation de nouvelles publications pertinentes, (ii) le développement d'un outil pour l'analyse de données exomique, VarScrut, qui combine des connaissance multiniveaux pour améliorer le taux de résolution. / Over the last decade, biomedical research and medical practice have been revolutionized by the post-genomic era and the emergence of Big Data in biology. The field of rare diseases, are characterized by scarcity from the patient to the domain knowledge. Nevertheless, rare diseases represent a real interest as the fundamental knowledge accumulated as well as the developed therapeutic solutions can also benefit to common underlying disorders. This thesis focuses on the development of new bioinformatics solutions, integrating Big Data and Big Data associated approaches to improve the study of rare diseases. In particular, my work resulted in (i) the creation of PubAthena, a tool for the recommendation of relevant literature updates, (ii) the development of a tool for the analysis of exome datasets, VarScrut, which combines multi-level knowledge to improve the resolution rate. Maladies génétiques rares Ciliopathies Exomes Text-mining Rare diseases Exome sequencing Cilia Ciliopathies Text-mining 006.3 572.8

Search results