Global ETD Search

21	Application of the Recommendation Architecture Model for Text Mining Udithaw@ou.ac.lk, Hemali Uditha Wijewardane Ratnayake January 2004 (has links) The Recommendation Architecture (RA) model is a new connectionist approach simulating some aspects of the human brain. Application of the RA to a real world problem is a novel research problem and has not been previously addressed in literature. Research conducted with simulated data has shown much promise for the Recommendation Architecture models ability in pattern discovery and pattern recognition. This thesis investigates the application of the RA model for text mining where pattern discovery and recognition play an important role. The clustering system of the RA model is examined in detail and a formal notation for representing the fundamental components and algorithms is proposed for clarity of understanding. A software simulation of the clustering system of the RA model is built for empirical studies. In the argument that the RA model is applicable for text mining the following aspects of the model are examined. With its pattern recognition ability the clustering system of the RA is adapted for text classification and text organization. As the core of the RA model is concerned with pattern discovery or identification of associative similarities in input, it is also used to discover unsuspected relationships within the content of documents. How the RA model can be applied to the problems of pattern discovery in text and classification of text is addressed demonstrating results from a series of experiments. The difficulties in applying the RA model to real life data are described and several extensions to the RA model for optimal performance are proposed from the insights obtained from experiments. Furthermore, the RA model can be extended to provide user-friendly interpretation of results. This research shows that with the proposed extensions the RA model can be successfully applied to the problem of text mining to a large extent. Some limitations exist when the RA model is applied to very noisy data, which are also demonstrated here. Recommendation architecture text mining pattern recognition
22	Applying computer-assisted assessment to auto-generating feedback on project proposals Al-Yazeedi, Fatema January 2016 (has links) Through different learning portals, computer-assisted assessment (CAA) tools have improved considerably over the past few decades. In a CAA community, these tools are categorised into types of questions, types of testing, and types of assessment. Most of these provide the assessment of multiple-choice questions, true and false questions, or matching questions. Other CAA tools evaluate short and long essay questions, each of which different grading methods and techniques in terms of style and content have. However, due to the complexity involved in analysing free text writing, the development and evaluation of accurate, easy to use, and effective tools is questionable. This research proposes a new contextual framework as a novel approach to the investigation of a new CAA tool which auto-generates feedback on project proposals. This research follows a Design Science Research paradigm to achieve and evaluate the accuracy, ease of use, and effectiveness of the new tool in the computer science domain in higher education institutes. This is achieved in three interrelated cycles:(1) based on the existent literature on this topic and an exploratory study on the currently available approaches to the provision of feedback on final year project proposals, a proposed framework to auto-generate feedback on any electronically submitted coursework is constructed in order to gain a clear understanding on how such a CAA tool might work; (2) a contextual framework based on the proposed framework for final year project proposals is constructed by considering both the style and content of the free text and using different text mining techniques; and (3) the accuracy, easy to use, and effectiveness of the implemented web-based CAA application named Feedback Automated Tool (FEAT)is evaluated based on the contextual framework. This research applies CAA and text mining techniques to identify and model the key elements of the framework and its components in order to enable the development and evaluation of a novel CAA contextual framework which can be utilised for auto-generating accurate, easy to use, and effective feedback on final year project proposals. 006.3
23	Spherical k-Means Clustering Buchta, Christian, Kober, Martin, Feinerer, Ingo, Hornik, Kurt 09 1900 (has links) (PDF) Clustering text documents is a fundamental task in modern data analysis, requiring approaches which perform well both in terms of solution quality and computational efficiency. Spherical k-means clustering is one approach to address both issues, employing cosine dissimilarities to perform prototype-based partitioning of term weight representations of the documents. This paper presents the theory underlying the standard spherical k-means problem and suitable extensions, and introduces the R extension package skmeans which provides a computational environment for spherical k-means clustering featuring several solvers: a fixed-point and genetic algorithm, and interfaces to two external solvers (CLUTO and Gmeans). Performance of these solvers is investigated by means of a large scale benchmark experiment. (authors' abstract)
24	Intertextual Readings of the Nyāyabhūṣaṇa on Buddhist Anti-Realism Neill, Tyler 13 December 2022 (has links) This two-part dissertation has two goals: 1) a close philological reading of a 50-page section of a 10th-century Sanskrit philosophical work (Bhāsarvajña's Nyāyabhūṣaṇa), and 2) the creation and assessment of a novel intertextuality research system (Vātāyana) centered on the same work. The first half of the dissertation encompasses the philology project in four chapters: 1) background on the author, work, and key philosophical ideas in the passage; 2) descriptions of all known manuscript witnesses of this work and a new critical edition that substantially improves upon the editio princeps; 3) a word-for-word English translation richly annotated with both traditional explanatory material and novel digital links to not one but two interactive online research systems; and 4) a discussion of the Sanskrit author's dialectical strategy in the studied passage. The second half of the dissertation details the intertextuality research system in a further four chapters: 5) why it is needed and what can be learned from existing projects; 6) the creation of the system consisting of curated textual corpus, composite algorithm in natural language processing and information retrieval, and live web-app interface; 7) an evaluation of system performance measured against a small gold-standard dataset derived from traditional philological research; and 8) a discussion of the impact such new technology could have on humanistic research more broadly. System performance was assessed to be quite good, with a 'recall@5' of 80%, meaning that most previously known cases of mid-length quotation and even paraphrase could be automatically found and returned within the system's top five hits. Moreover, the system was also found to return a 34% surplus of additional significant parallels not found in the small benchmark. This assessment confirms that Vātāyana can be useful to researchers by aiding them in their collection and organization of intertextual observations, leaving them more time to focus on interpretation. Seventeen appendices illustrate both these efforts and a number of side projects, the latter of which span translation alignment, network visualization of an important database of South Asian prosopography (PANDiT), and a multi-functional Sanskrit text-processing web application (Skrutable).:Preface (i) Table of Contents (ii) Abbreviations (v) Terms and Symbols (v) Nyāyabhūṣaṇa Witnesses (v) Main Sanskrit Editions (vi) Introduction (vii) A Multi-Disciplinary Project in Intertextual Reading (vii) Main Object of Study: Nyāyabhūṣaṇa 104–154 (vii) Project Outline (ix) Part I: Close Reading (1) 1 Background (1) 1.1 Bhāsarvajña (1) 1.2 The Nyāyabhūṣaṇa (6) 1.2.1 Ts One of Several Commentaries on Bhāsarvajña's Nyāyasāra (6) 1.2.2 In Modern Scholarship, with Focus on NBhū 104–154 (8) 1.3 Philosophical Context (11) 1.3.1 Key Philosophical Concepts (12) 1.3.2 Intra-Textual Context within the Nyāyabhūṣaṇa (34) 1.3.3 Inter-Textual Context (36) 2 Edition of NBhū 104–154 (39) 2.1 Source Materials (39) 2.1.1 Edition of Yogīndrānanda 1968 (E) (40) 2.1.2 Manuscripts (P1, P2, V) (43) 2.1.3 Diplomatic Transcripts (59) 2.2 Notes on Using the Edition (60) 2.3 Critical Edition of NBhū 104–154 with Apparatuses (62) 3 Translation of NBhū 104–154 (108) 3.1 Notes on Translation Method (108) 3.2 Notes on Outline Headings (112) 3.3 Annotated Translation of NBhū 104–154 (114) 4 Discussion (216) 4.1 Internal Structure of NBhū 104–154 (216) 4.2 Critical Assessment of Bhāsarvajña's Argumentation (218) Part II: Distant Reading with Digital Humanities (224) 5 Background in Intertextuality Detection (224) 5.1 Sanskrit Projects (225) 5.2 Non-Sanskrit Projects (228) 5.3 Operationalizing Intertextuality (233) 6 Building an Intertextuality Machine (239) 6.1 Corpus (Pramāṇa NLP) (239) 6.2 Algorithm (Vātāyana) (242) 6.3 User Interface (Vātāyana) (246) 7 Evaluating System Performance (255) 7.1 Previous Scholarship on NBhū 104–154 as Philological Benchmark (255) 7.2 System Performance Relative to Benchmark (257) 8 Discussion (262) Conclusion (266) Works Cited (269) Main Sanskrit Editions (269) Works Cited in Part I (271) Works Cited in Part II (281) Appendices (285) Appendix 1: Correspondence of Joshi 1986 to Yogīndrānanda 1968 (286) Appendix 1D: Full-Text Alignment of Joshi 1986 to Yogīndrānanda 1968 (287) Appendix 2: Prosopographical Relations Important for NBhū 104–154 (288) Appendix 2D: Command-Line Tool “Pandit Grapher” (290) Appendix 3: Previous Suggestions to Improve Text of NBhū 104–154 (291) Appendix 4D: Transcript and Collation Data for NBhū 104–154 (304) Appendix 5D: Command-Line Tool “cte2cex” for Transcript Data Conversion (305) Appendix 6D: Deployment of Brucheion for Interactive Transcript Data (306) Appendix 7: Highlighted Improvements to Text of NBhū 104–154 (307) Appendix 7D: Alternate Version of Edition With Highlighted Improvements (316) Appendix 8D: Digital Forms of Translation of NBhū 104–154 (317) Appendix 9: Analytic Outline of NBhū 104–154 by Shodo Yamakami (318) Appendix 10.1: New Analytic Outline of NBhū 104–154 (Overall) (324) Appendix 10.2: New Analytic Outline of NBhū 104–154 (Detailed) (325) Appendix 11D: Skrutable Text Processing Library and Web Application (328) Appendix 12D: Pramāṇa NLP Corpus, Metadata, and LDA Modeling Info (329) Appendix 13D: Vātāyana Intertextuality Research Web Application (330) Appendix 14: Sample of Yamakami Citation Benchmark for NBhū 104–154 (331) Appendix 14D: Full Yamakami Citation Benchmark for NBhū 104–154 (333) Appendix 15: Vātāyana Recall@5 Scores for NBhū 104–154 (334) Appendix 16: PVA, PVin, and PVSV Vātāyana Search Hits for Entire NBhū (338) Appendix 17: Sample Listing of Vātāyana Search Hits for Entire NBhū (349) Appendix 17D: Full Listing of Vātāyana Search Hits for Entire NBhū (355) Overview of Digital Appendices (356) Zusammenfassung (Thesen Zur Dissertation) (357) Summary of Results (361)
25	Using Dependency Parses to Augment Feature Construction for Text Mining Guo, Sheng 18 June 2012 (has links) With the prevalence of large data stored in the cloud, including unstructured information in the form of text, there is now an increased emphasis on text mining. A broad range of techniques are now used for text mining, including algorithms adapted from machine learning, NLP, computational linguistics, and data mining. Applications are also multi-fold, including classification, clustering, segmentation, relationship discovery, and practically any task that discovers latent information from written natural language. Classical mining algorithms have traditionally focused on shallow representations such as bag-of-words and similar feature-based models. With the advent of modern high performance computing, deep sentence level linguistic analysis of large scale text corpora has become practical. In this dissertation, we evaluate the utility of dependency parses as textual features for different text mining applications. Dependency parsing is one form of syntactic parsing, based on the dependency grammar implicit in sentences. While dependency parsing has traditionally been used for text understanding, we investigate here its application to supply features for text mining applications. We specifically focus on three methods to construct textual features from dependency parses. First, we consider a dependency parse as a general feature akin to a traditional bag-of-words model. Second, we consider the dependency parse as the basis to build a feature graph representation. Finally, we use dependency parses in a supervised collocation mining method for feature selection. To investigate these three methods, several applications are studied, including: (i) movie spoiler detection, (ii) text segmentation, (iii) query expansion, and (iv) recommender systems. / Ph. D. dependency parsing text mining linguistic cues
26	Into the Into of Earth Itself Hodes, Amanda Kay 26 May 2023 (has links) Into the Into of Earth Itself is a poetry collection that investigates the relationship between ecological violation and the violation of women, as well as toxicity and toxic masculinity. In doing so, it draws from the histories of two Pennsylvania towns: Palmerton and Centralia. The former is a Superfund site ravaged by zinc pollution and currently under threat of hydraulic fracturing and pipeline expansion. The latter is a nearby ghost town that was condemned and evacuated due to an underground mine fire, which will continue for another 200 years. The manuscript uses visual forms and digital text mining techniques to craft poetry about these extractive relationships to land and women. The speaker asks herself: As a woman, how have I also been mined and fracked by these same societal technologies? / Master of Fine Arts / Into the Into of Earth Itself is a poetry collection. ecofeminism gender ecopoetics text mining archives Pennsylvania
27	Intégration du web social dans les systèmes de recommandation / Social web integration in recommendation systems Nana jipmo, Coriane 19 December 2017 (has links) Le Web social croît de plus en plus et donne accès à une multitude de ressources très variées, qui proviennent de sites de partage tels que del.icio.us, d’échange de messages comme Twitter, des réseaux sociaux à finalité professionnelle, comme LinkedIn, ou plus généralement à finalité sociale, comme Facebook et LiveJournal. Un même individu peut être inscrit et actif sur différents réseaux sociaux ayant potentiellement des finalités différentes, où il publie des informations diverses et variées, telles que son nom, sa localité, ses communautés, et ses différentes activités. Ces informations (textuelles), au vu de la dimension internationale du Web, sont par nature, d’une part multilingue, et d’autre part, intrinsèquement ambiguë puisqu’elles sont éditées par les individus en langage naturel dans un vocabulaire libre. De même, elles sont une source de données précieuses, notamment pour les applications cherchant à connaître leurs utilisateurs afin de mieux comprendre leurs besoins et leurs intérêts. L’objectif de nos travaux de recherche est d’exploiter, en utilisant essentiellement l’encyclopédie Wikipédia, les ressources textuelles des utilisateurs extraites de leurs différents réseaux sociaux afin de construire un profil élargi les caractérisant et exploitable par des applications telles que les systèmes de recommandation. En particulier, nous avons réalisé une étude afin de caractériser les traits de personnalité des utilisateurs. De nombreuses expérimentations, analyses et évaluations ont été réalisées sur des données réelles collectées à partir de différents réseaux sociaux. / The social Web grows more and more and gives through the web, access to a wide variety of resources, like sharing sites such as del.icio.us, exchange messages as Twitter, or social networks with the professional purpose such as LinkedIn, or more generally for social purposes, such as Facebook and LiveJournal. The same individual can be registered and active on different social networks (potentially having different purposes), in which it publishes various information, which are constantly growing, such as its name, locality, communities, various activities. The information (textual), given the international dimension of the Web, is inherently multilingual and intrinsically ambiguous, since it is published in natural language in a free vocabulary by individuals from different origin. They are also important, specially for applications seeking to know their users in order to better understand their needs, activities and interests. The objective of our research is to exploit using essentially the Wikpédia encyclopedia, the textual resources extracted from the different social networks of the same individual in order to construct his characterizing profile, which can be exploited in particular by applications seeking to understand their users, such as recommendation systems. In particular, we conducted a study to characterize the personality traits of users. Many experiments, analyzes and evaluations were carried out on real data collected from different social networks. Web social Text mining Traitement multilingue Wikipédia Personnalité Social Web Text mining Multilingual processing Wikipédia Personality
28	Entity-Centric Text Mining for Historical Documents Coll Ardanuy, Maria 07 July 2017 (has links) No description available. 510 digital humanities text mining toponym disambiguation person name disambiguation historical text mining Informatik (PPN619939052)
29	Information extraction from chemical patents Jessop, David M. January 2011 (has links) The automated extraction of semantic chemical data from the existing literature is demonstrated. For reasons of copyright, the work is focused on the patent literature, though the methods are expected to apply equally to other areas of the chemical literature. Hearst Patterns are applied to the patent literature in order to discover hyponymic relations describing chemical species. The acquired relations are manually validated to determine the precision of the determined hypernyms (85.0%) and of the asserted hyponymic relations (94.3%). It is demonstrated that the system acquires relations that are not present in the ChEBI ontology, suggesting that it could function as a valuable aid to the ChEBI curators. The relations discovered by this process are formalised using the Web Ontology Language (OWL) to enable re-use. PatentEye - an automated system for the extraction of reactions from chemical patents and their conversion to Chemical Markup Language (CML) - is presented. Chemical patents published by the European Patent Office over a ten-week period are used to demonstrate the capability of PatentEye - 4444 reactions are extracted with a precision of 78% and recall of 64% with regards to determining the identity and amount of reactants employed and an accuracy of 92% with regards to product identification. NMR spectra are extracted from the text using OSCAR3, which is developed to greatly increase recall. The resulting system is presented as a significant advancement towards the large-scale and automated extraction of high-quality reaction information. Extended Polymer Markup Language (EPML), a CML dialect for the description of Markush structures as they are presented in the literature, is developed. Software to exemplify and to enable substructure searching of EPML documents is presented. Further work is recommended to refine the language and code to publication-quality before they are presented to the community. 541
30	Extraction of chemical structures and reactions from the literature Lowe, Daniel Mark January 2012 (has links) The ever increasing quantity of chemical literature necessitates the creation of automated techniques for extracting relevant information. This work focuses on two aspects: the conversion of chemical names to computer readable structure representations and the extraction of chemical reactions from text. Chemical names are a common way of communicating chemical structure information. OPSIN (Open Parser for Systematic IUPAC Nomenclature), an open source, freely available algorithm for converting chemical names to structures was developed. OPSIN employs a regular grammar to direct tokenisation and parsing leading to the generation of an XML parse tree. Nomenclature operations are applied successively to the tree with many requiring the manipulation of an in-memory connection table representation of the structure under construction. Areas of nomenclature supported are described with attention being drawn to difficulties that may be encountered in name to structure conversion. Results on sets of generated names and names extracted from patents are presented. On generated names, recall of between 96.2% and 99.0% was achieved with a lower bound of 97.9% on precision with all results either being comparable or superior to the tested commercial solutions. On the patent names OPSIN s recall was 2-10% higher than the tested solutions when the patent names were processed as found in the patents. The uses of OPSIN as a web service and as a tool for identifying chemical names in text are shown to demonstrate the direct utility of this algorithm. A software system for extracting chemical reactions from the text of chemical patents was developed. The system relies on the output of ChemicalTagger, a tool for tagging words and identifying phrases of importance in experimental chemistry text. Improvements to this tool required to facilitate this task are documented. The structure of chemical entities are where possible determined using OPSIN in conjunction with a dictionary of name to structure relationships. Extracted reactions are atom mapped to confirm that they are chemically consistent. 424,621 atom mapped reactions were extracted from 65,034 organic chemistry USPTO patents. On a sample of 100 of these extracted reactions chemical entities were identified with 96.4% recall and 88.9% precision. Quantities could be associated with reagents in 98.8% of cases and 64.9% of cases for products whilst the correct role was assigned to chemical entities in 91.8% of cases. Qualitatively the system captured the essence of the reaction in 95% of cases. This system is expected to be useful in the creation of searchable databases of reactions from chemical patents and in facilitating analysis of the properties of large populations of reactions. 005.74

Search results