Global ETD Search

21	NewsFerret : supporting identity risk identification and analysis through text mining of news stories Golden, Ryan Christian 18 December 2013 (has links) Individuals, organizations, and devices are now interconnected to an unprecedented degree. This has forced identity risk analysts to redefine what “identity” means in such a context, and to explore new techniques for analyzing an ever expanding threat context. Major hurdles to modeling in this field include the inherent lack of publicly available data due to privacy and safety concerns, as well as the unstructured nature of incident reports. To address this, this report develops a system for strengthening an identity risk model using the text mining of news stories. The system—called NewsFerret—collects and analyzes news stories on the topic of identity theft, establishes semantic relatedness measures between identity concept pairs, and supports analysis of those measures through reports, visualizations, and relevant news stories. Evaluating the resulting analytical models shows where the system is effective in assisting the risk analyst to expand and validate identity risk models. / text Identity Identity theft Risk Text mining
22	Document Clustering with Dual Supervision Hu, Yeming 19 June 2012 (has links) Nowadays, academic researchers maintain a personal library of papers, which they would like to organize based on their needs, e.g., research, projects, or courseware. Clustering techniques are often employed to achieve this goal by grouping the document collection into different topics. Unsupervised clustering does not require any user effort but only produces one universal output with which users may not be satisfied. Therefore, document clustering needs user input for guidance to generate personalized clusters for different users. Semi-supervised clustering incorporates prior information and has the potential to produce customized clusters. Traditional semi-supervised clustering is based on user supervision in the form of labeled instances or pairwise instance constraints. However, alternative forms of user supervision exist such as labeling features. For document clustering, document supervision involves labeling documents while feature supervision involves labeling features. Their joint of use has been called dual supervision. In this thesis, we first explore and propose a framework to use feature supervision for interactive feature selection by indicating whether a feature is useful for clustering. Second, we enhance the semi-supervised clustering with feature supervision using feature reweighting. Third, we propose a unified framework to combine document supervision and feature supervision through seeding. The newly proposed algorithms are evaluated using oracles and demonstrated to be more helpful in producing better clusters matching a single user's point of view than document clustering without any supervision and with only document supervision. Finally, we conduct a user study to confirm that different users have different understandings of the same document collection and prefer personalized clusters. At the same time, we demonstrate that document clustering with dual supervision is able to produce good personalized clusters even with noisy user input. Dual supervision is also demonstrated to be more effective in personalized clustering than no supervision or any single supervision. We also analyze users' behaviors during the user study and present suggestions for the design of document management software. Document Management Text Mining Machine Learning
23	Modelling Deception Detection in Text Gupta, Smita 29 November 2007 (has links) As organizations and government agencies work diligently to detect financial irregularities, malfeasance, fraud and criminal activities through intercepted communication, there is an increasing interest in devising an automated model/tool for deception detection. We build on Pennebaker's empirical model which suggests that deception in text leaves a linguistic signature characterised by changes in frequency of four categories of words: first-person pronouns, exclusive words, negative emotion words, and action words. By applying the model to the Enron email dataset and using an unsupervised matrix-decomposition technique, we explore the differential use of these cue-words/categories in deception detection. Instead of focusing on the predictive power of the individual cue-words, we construct a descriptive model which helps us to understand the multivariate profile of deception based on several linguistic dimensions and highlights the qualitative differences between deceptive and truthful communication. This descriptive model can not only help detect unusual and deceptive communication, but also possibly rank messages along a scale of relative deceptiveness (for instance from strategic negotiation and spin to deception and blatant lying). The model is unintrusive, requires minimal human intervention and, by following the defined pre-processing steps it may be applied to new datasets from different domains. / Thesis (Master, Computing) -- Queen's University, 2007-11-28 18:10:30.45 Deception Detection Fraud Text mining Email Enron
24	概念を用いたHK Graphによるテキスト解析支援 FURUHASHI, Takeshi, YOSHIKAWA, Tomohiro, KOBAYASHI, Daisuke, 古橋, 武, 吉川, 大弘, 小林, 大輔 29 March 2012 (has links) No description available. Questionnaire analysis HK Graph Text mining
25	Application of the Recommendation Architecture Model for Text Mining Udithaw@ou.ac.lk, Hemali Uditha Wijewardane Ratnayake January 2004 (has links) The Recommendation Architecture (RA) model is a new connectionist approach simulating some aspects of the human brain. Application of the RA to a real world problem is a novel research problem and has not been previously addressed in literature. Research conducted with simulated data has shown much promise for the Recommendation Architecture models ability in pattern discovery and pattern recognition. This thesis investigates the application of the RA model for text mining where pattern discovery and recognition play an important role. The clustering system of the RA model is examined in detail and a formal notation for representing the fundamental components and algorithms is proposed for clarity of understanding. A software simulation of the clustering system of the RA model is built for empirical studies. In the argument that the RA model is applicable for text mining the following aspects of the model are examined. With its pattern recognition ability the clustering system of the RA is adapted for text classification and text organization. As the core of the RA model is concerned with pattern discovery or identification of associative similarities in input, it is also used to discover unsuspected relationships within the content of documents. How the RA model can be applied to the problems of pattern discovery in text and classification of text is addressed demonstrating results from a series of experiments. The difficulties in applying the RA model to real life data are described and several extensions to the RA model for optimal performance are proposed from the insights obtained from experiments. Furthermore, the RA model can be extended to provide user-friendly interpretation of results. This research shows that with the proposed extensions the RA model can be successfully applied to the problem of text mining to a large extent. Some limitations exist when the RA model is applied to very noisy data, which are also demonstrated here. Recommendation architecture text mining pattern recognition
26	Automatisierte Verfahren für die Themenanalyse nachrichtenorientierter Textquellen Niekler, Andreas 20 January 2016 (has links) (PDF) Im Bereich der medienwissenschaftlichen Inhaltsanalyse stellt die Themenanalyse einen wichtigen Bestandteil dar. Für die Analyse großer digitaler Textbestände hin- sichtlich thematischer Strukturen ist es deshalb wichtig, das Potential automatisierter computergestützter Methoden zu untersuchen. Dabei müssen die methodischen und analytischen Anforderungen der Inhaltsanalyse beachtet und abgebildet werden, wel- che auch für die Themenanalyse gelten. In dieser Arbeit werden die Möglichkeiten der Automatisierung der Themenanalyse und deren Anwendungsperspektiven untersucht. Dabei wird auf theoretische und methodische Grundlagen der Inhaltsanalyse und auf linguistische Theorien zu Themenstrukturen zurückgegriffen,um Anforderungen an ei- ne automatische Analyse abzuleiten. Den wesentlichen Beitrag stellt die Untersuchung der Potentiale und Werkzeuge aus den Bereichen des Data- und Text-Mining dar, die für die inhaltsanalytische Arbeit in Textdatenbanken hilfreich und gewinnbringend eingesetzt werden können. Weiterhin wird eine exemplarische Analyse durchgeführt, um die Anwendbarkeit automatischer Methoden für Themenanalysen zu zeigen. Die Arbeit demonstriert auch Möglichkeiten der Nutzung interaktiver Oberflächen, formu- liert die Idee und Umsetzung einer geeigneten Software und zeigt die Anwendung eines möglichen Arbeitsablaufs für die Themenanalyse auf. Die Darstellung der Potentiale automatisierter Themenuntersuchungen in großen digitalen Textkollektionen in dieser Arbeit leistet einen Beitrag zur Erforschung der automatisierten Inhaltsanalyse. Ausgehend von den Anforderungen, die an eine Themenanalyse gestellt werden, zeigt diese Arbeit, mit welchen Methoden und Automatismen des Text-Mining diesen Anforderungen nahe gekommen werden kann. Zusammenfassend sind zwei Anforde- rungen herauszuheben, deren jeweilige Erfüllung die andere beeinflusst. Zum einen ist eine schnelle thematische Erfassung der Themen in einer komplexen Dokument- sammlung gefordert, um deren inhaltliche Struktur abzubilden und um Themen kontrastieren zu können. Zum anderen müssen die Themen in einem ausreichenden Detailgrad abbildbar sein, sodass eine Analyse des Sinns und der Bedeutung der The- meninhalte möglich ist. Beide Ansätze haben eine methodische Verankerung in den quantitativen und qualitativen Ansätzen der Inhaltsanalyse. Die Arbeit diskutiert diese Parallelen und setzt automatische Verfahren und Algorithmen mit den Anforde- rungen in Beziehung. Es können Methoden aufgezeigt werden, die eine semantische und damit thematische Trennung der Daten erlauben und einen abstrahierten Über- blick über große Dokumentmengen schaffen. Dies sind Verfahren wie Topic-Modelle oder clusternde Verfahren. Mit Hilfe dieser Algorithmen ist es möglich, thematisch kohärente Untermengen in Dokumentkollektion zu erzeugen und deren thematischen Gehalt für Zusammenfassungen bereitzustellen. Es wird gezeigt, dass die Themen trotz der distanzierten Betrachtung unterscheidbar sind und deren Häufigkeiten und Verteilungen in einer Textkollektion diachron dargestellt werden können. Diese Auf- bereitung der Daten erlaubt die Analyse von thematischen Trends oder die Selektion bestimmter thematischer Aspekte aus einer Fülle von Dokumenten. Diachrone Be- trachtungen thematisch kohärenter Dokumentmengen werden dadurch möglich und die temporären Häufigkeiten von Themen können analysiert werden. Für die detaillier- te Interpretation und Zusammenfassung von Themen müssen weitere Darstellungen und Informationen aus den Inhalten zu den Themen erstellt werden. Es kann gezeigt werden, dass Bedeutungen, Aussagen und Kontexte über eine Kookurrenzanalyse im Themenkontext stehender Dokumente sichtbar gemacht werden können. In einer Anwendungsform, welche die Leserichtung und Wortarten beachtet, können häufig auftretende Wortfolgen oder Aussagen innerhalb einer Thematisierung statistisch erfasst werden. Die so generierten Phrasen können zur Definition von Kategorien eingesetzt werden oder mit anderen Themen, Publikationen oder theoretischen An- nahmen kontrastiert werden. Zudem sind diachrone Analysen einzelner Wörter, von Wortgruppen oder von Eigennamen in einem Thema geeignet, um Themenphasen, Schlüsselbegriffe oder Nachrichtenfaktoren zu identifizieren. Die so gewonnenen Infor- mationen können mit einem „close-reading“ thematisch relevanter Dokumente ergänzt werden, was durch die thematische Trennung der Dokumentmengen möglich ist. Über diese methodischen Perspektiven hinaus lassen sich die automatisierten Analysen als empirische Messinstrumente im Kontext weiterer hier nicht besprochener kommu- nikationswissenschaftlicher Theorien einsetzen. Des Weiteren zeigt die Arbeit, dass grafische Oberflächen und Software-Frameworks für die Bearbeitung von automatisier- ten Themenanalysen realisierbar und praktikabel einsetzbar sind. Insofern zeigen die Ausführungen, wie die besprochenen Lösungen und Ansätze in die Praxis überführt werden können. Wesentliche Beiträge liefert die Arbeit für die Erforschung der automatisierten Inhaltsanalyse. Die Arbeit dokumentiert vor allem die wissenschaftliche Auseinan- dersetzung mit automatisierten Themenanalysen. Während der Arbeit an diesem Thema wurden vom Autor geeignete Vorgehensweisen entwickelt, wie Verfahren des Text-Mining in der Praxis für Inhaltsanalysen einzusetzen sind. Unter anderem wur- den Beiträge zur Visualisierung und einfachen Benutzung unterschiedlicher Verfahren geleistet. Verfahren aus dem Bereich des Topic Modelling, des Clustering und der Kookkurrenzanalyse mussten angepasst werden, sodass deren Anwendung in inhalts- analytischen Anwendungen möglich ist. Weitere Beiträge entstanden im Rahmen der methodologischen Einordnung der computergestützten Themenanalyse und in der Definition innovativer Anwendungen in diesem Bereich. Die für die vorliegende Arbeit durchgeführte Experimente und Untersuchungen wurden komplett in einer eigens ent- wickelten Software durchgeführt, die auch in anderen Projekten erfolgreich eingesetzt wird. Um dieses System herum wurden Verarbeitungsketten,Datenhaltung,Visualisie- rung, grafische Oberflächen, Möglichkeiten der Dateninteraktion, maschinelle Lernver- fahren und Komponenten für das Dokumentretrieval implementiert. Dadurch werden die komplexen Methoden und Verfahren für die automatische Themenanalyse einfach anwendbar und sind für künftige Projekte und Analysen benutzerfreundlich verfüg- bar. Sozialwissenschaftler,Politikwissenschaftler oder Kommunikationswissenschaftler können mit der Softwareumgebung arbeiten und Inhaltsanalysen durchführen, ohne die Details der Automatisierung und der Computerunterstützung durchdringen zu müssen. Text Mining Inhaltsanalyse Themenanalyse Maschinelles Lernen Text Mining Content Analysis Topic Analysis Machine Learning Text Mining Inhaltsanalyse Themenanalyse Maschinelles Lernen ddc:500 Text Mining Inhaltsanalyse Themenanalyse Maschinelles Lernen
27	Applying computer-assisted assessment to auto-generating feedback on project proposals Al-Yazeedi, Fatema January 2016 (has links) Through different learning portals, computer-assisted assessment (CAA) tools have improved considerably over the past few decades. In a CAA community, these tools are categorised into types of questions, types of testing, and types of assessment. Most of these provide the assessment of multiple-choice questions, true and false questions, or matching questions. Other CAA tools evaluate short and long essay questions, each of which different grading methods and techniques in terms of style and content have. However, due to the complexity involved in analysing free text writing, the development and evaluation of accurate, easy to use, and effective tools is questionable. This research proposes a new contextual framework as a novel approach to the investigation of a new CAA tool which auto-generates feedback on project proposals. This research follows a Design Science Research paradigm to achieve and evaluate the accuracy, ease of use, and effectiveness of the new tool in the computer science domain in higher education institutes. This is achieved in three interrelated cycles:(1) based on the existent literature on this topic and an exploratory study on the currently available approaches to the provision of feedback on final year project proposals, a proposed framework to auto-generate feedback on any electronically submitted coursework is constructed in order to gain a clear understanding on how such a CAA tool might work; (2) a contextual framework based on the proposed framework for final year project proposals is constructed by considering both the style and content of the free text and using different text mining techniques; and (3) the accuracy, easy to use, and effectiveness of the implemented web-based CAA application named Feedback Automated Tool (FEAT)is evaluated based on the contextual framework. This research applies CAA and text mining techniques to identify and model the key elements of the framework and its components in order to enable the development and evaluation of a novel CAA contextual framework which can be utilised for auto-generating accurate, easy to use, and effective feedback on final year project proposals. 006.3
28	Spherical k-Means Clustering Buchta, Christian, Kober, Martin, Feinerer, Ingo, Hornik, Kurt 09 1900 (has links) (PDF) Clustering text documents is a fundamental task in modern data analysis, requiring approaches which perform well both in terms of solution quality and computational efficiency. Spherical k-means clustering is one approach to address both issues, employing cosine dissimilarities to perform prototype-based partitioning of term weight representations of the documents. This paper presents the theory underlying the standard spherical k-means problem and suitable extensions, and introduces the R extension package skmeans which provides a computational environment for spherical k-means clustering featuring several solvers: a fixed-point and genetic algorithm, and interfaces to two external solvers (CLUTO and Gmeans). Performance of these solvers is investigated by means of a large scale benchmark experiment. (authors' abstract)
29	Intertextual Readings of the Nyāyabhūṣaṇa on Buddhist Anti-Realism Neill, Tyler 13 December 2022 (has links) This two-part dissertation has two goals: 1) a close philological reading of a 50-page section of a 10th-century Sanskrit philosophical work (Bhāsarvajña's Nyāyabhūṣaṇa), and 2) the creation and assessment of a novel intertextuality research system (Vātāyana) centered on the same work. The first half of the dissertation encompasses the philology project in four chapters: 1) background on the author, work, and key philosophical ideas in the passage; 2) descriptions of all known manuscript witnesses of this work and a new critical edition that substantially improves upon the editio princeps; 3) a word-for-word English translation richly annotated with both traditional explanatory material and novel digital links to not one but two interactive online research systems; and 4) a discussion of the Sanskrit author's dialectical strategy in the studied passage. The second half of the dissertation details the intertextuality research system in a further four chapters: 5) why it is needed and what can be learned from existing projects; 6) the creation of the system consisting of curated textual corpus, composite algorithm in natural language processing and information retrieval, and live web-app interface; 7) an evaluation of system performance measured against a small gold-standard dataset derived from traditional philological research; and 8) a discussion of the impact such new technology could have on humanistic research more broadly. System performance was assessed to be quite good, with a 'recall@5' of 80%, meaning that most previously known cases of mid-length quotation and even paraphrase could be automatically found and returned within the system's top five hits. Moreover, the system was also found to return a 34% surplus of additional significant parallels not found in the small benchmark. This assessment confirms that Vātāyana can be useful to researchers by aiding them in their collection and organization of intertextual observations, leaving them more time to focus on interpretation. Seventeen appendices illustrate both these efforts and a number of side projects, the latter of which span translation alignment, network visualization of an important database of South Asian prosopography (PANDiT), and a multi-functional Sanskrit text-processing web application (Skrutable).:Preface (i) Table of Contents (ii) Abbreviations (v) Terms and Symbols (v) Nyāyabhūṣaṇa Witnesses (v) Main Sanskrit Editions (vi) Introduction (vii) A Multi-Disciplinary Project in Intertextual Reading (vii) Main Object of Study: Nyāyabhūṣaṇa 104–154 (vii) Project Outline (ix) Part I: Close Reading (1) 1 Background (1) 1.1 Bhāsarvajña (1) 1.2 The Nyāyabhūṣaṇa (6) 1.2.1 Ts One of Several Commentaries on Bhāsarvajña's Nyāyasāra (6) 1.2.2 In Modern Scholarship, with Focus on NBhū 104–154 (8) 1.3 Philosophical Context (11) 1.3.1 Key Philosophical Concepts (12) 1.3.2 Intra-Textual Context within the Nyāyabhūṣaṇa (34) 1.3.3 Inter-Textual Context (36) 2 Edition of NBhū 104–154 (39) 2.1 Source Materials (39) 2.1.1 Edition of Yogīndrānanda 1968 (E) (40) 2.1.2 Manuscripts (P1, P2, V) (43) 2.1.3 Diplomatic Transcripts (59) 2.2 Notes on Using the Edition (60) 2.3 Critical Edition of NBhū 104–154 with Apparatuses (62) 3 Translation of NBhū 104–154 (108) 3.1 Notes on Translation Method (108) 3.2 Notes on Outline Headings (112) 3.3 Annotated Translation of NBhū 104–154 (114) 4 Discussion (216) 4.1 Internal Structure of NBhū 104–154 (216) 4.2 Critical Assessment of Bhāsarvajña's Argumentation (218) Part II: Distant Reading with Digital Humanities (224) 5 Background in Intertextuality Detection (224) 5.1 Sanskrit Projects (225) 5.2 Non-Sanskrit Projects (228) 5.3 Operationalizing Intertextuality (233) 6 Building an Intertextuality Machine (239) 6.1 Corpus (Pramāṇa NLP) (239) 6.2 Algorithm (Vātāyana) (242) 6.3 User Interface (Vātāyana) (246) 7 Evaluating System Performance (255) 7.1 Previous Scholarship on NBhū 104–154 as Philological Benchmark (255) 7.2 System Performance Relative to Benchmark (257) 8 Discussion (262) Conclusion (266) Works Cited (269) Main Sanskrit Editions (269) Works Cited in Part I (271) Works Cited in Part II (281) Appendices (285) Appendix 1: Correspondence of Joshi 1986 to Yogīndrānanda 1968 (286) Appendix 1D: Full-Text Alignment of Joshi 1986 to Yogīndrānanda 1968 (287) Appendix 2: Prosopographical Relations Important for NBhū 104–154 (288) Appendix 2D: Command-Line Tool “Pandit Grapher” (290) Appendix 3: Previous Suggestions to Improve Text of NBhū 104–154 (291) Appendix 4D: Transcript and Collation Data for NBhū 104–154 (304) Appendix 5D: Command-Line Tool “cte2cex” for Transcript Data Conversion (305) Appendix 6D: Deployment of Brucheion for Interactive Transcript Data (306) Appendix 7: Highlighted Improvements to Text of NBhū 104–154 (307) Appendix 7D: Alternate Version of Edition With Highlighted Improvements (316) Appendix 8D: Digital Forms of Translation of NBhū 104–154 (317) Appendix 9: Analytic Outline of NBhū 104–154 by Shodo Yamakami (318) Appendix 10.1: New Analytic Outline of NBhū 104–154 (Overall) (324) Appendix 10.2: New Analytic Outline of NBhū 104–154 (Detailed) (325) Appendix 11D: Skrutable Text Processing Library and Web Application (328) Appendix 12D: Pramāṇa NLP Corpus, Metadata, and LDA Modeling Info (329) Appendix 13D: Vātāyana Intertextuality Research Web Application (330) Appendix 14: Sample of Yamakami Citation Benchmark for NBhū 104–154 (331) Appendix 14D: Full Yamakami Citation Benchmark for NBhū 104–154 (333) Appendix 15: Vātāyana Recall@5 Scores for NBhū 104–154 (334) Appendix 16: PVA, PVin, and PVSV Vātāyana Search Hits for Entire NBhū (338) Appendix 17: Sample Listing of Vātāyana Search Hits for Entire NBhū (349) Appendix 17D: Full Listing of Vātāyana Search Hits for Entire NBhū (355) Overview of Digital Appendices (356) Zusammenfassung (Thesen Zur Dissertation) (357) Summary of Results (361)
30	Into the Into of Earth Itself Hodes, Amanda Kay 26 May 2023 (has links) Into the Into of Earth Itself is a poetry collection that investigates the relationship between ecological violation and the violation of women, as well as toxicity and toxic masculinity. In doing so, it draws from the histories of two Pennsylvania towns: Palmerton and Centralia. The former is a Superfund site ravaged by zinc pollution and currently under threat of hydraulic fracturing and pipeline expansion. The latter is a nearby ghost town that was condemned and evacuated due to an underground mine fire, which will continue for another 200 years. The manuscript uses visual forms and digital text mining techniques to craft poetry about these extractive relationships to land and women. The speaker asks herself: As a woman, how have I also been mined and fracked by these same societal technologies? / Master of Fine Arts / Into the Into of Earth Itself is a poetry collection. ecofeminism gender ecopoetics text mining archives Pennsylvania

Search results