Global ETD Search

1	A statistical investigation into the provenance of De Doctrina Christiana, attributed to John Milton Tweedie, Fiona Jane January 1997 (has links) The aim of this study is to conduct an objective investigation into the provenance of De Doctrina Christiana, a theological treatise attributed to Milton since its discovery in 1823. This attribution was questioned in 1991 provoking a series of papers, one of which makes a plea for an objective analysis, which I aim to supply. I begin by reviewing critically some techniques that have recently been applied to stylometry. They include methods from artificial intelligence, linguistics and statistics. The chapter concludes with an investigation into the QSUM technique, finding it to be invalid. As De Doctrina Christiana is written in neo-Latin I examine previous work carried out in Latin, then turn to historical issues and examine issues including censorship and the physical characteristics of the manuscript. The text is the only theological work in the extant Milton canon. As genre as well as authorship affects style, I consider theories of genre which influence the choice of suitable control texts. Chapter seven deals with the methodology used in the study. The analysis follows in a hierarchical structure. I establish which techniques distinguish between Milton and the control texts while maintaining the internal consistency of the authors. It is found that the most-frequently-occurring words are good discriminators. I then use this technique to examine De Doctrina Christiana and the Milton and control texts. A clear difference is found between texts from polemic and exegetical genres, and samples from De Doctrina Christiana form into two groups. This heterogeneity forms the third part of the analysis. No apparent difference is found between sections of the text with different amanuenses, but the Epistle appears to be markedly more Miltonic than the rest. In addition, postulated insertions into chapter X of Book I appear to have a Miltonic influence. I conclude by examining the hypothesis of a Ramist ordering to the text. 519.5 Stylometry; Quantitative linguistics
2	Photo-based Vendor Re-identification on Darknet Marketplaces using Deep Neural Networks Wang, Xiangwen January 2018 (has links) Darknet markets are online services behind Tor where cybercriminals trade illegal goods and stolen datasets. In recent years, security analysts and law enforcement start to investigate the darknet markets to study the cybercriminal networks and predict future incidents. However, vendors in these markets often create multiple accounts (i.e., Sybils), making it challenging to infer the relationships between cybercriminals and identify coordinated crimes. In this thesis, we present a novel approach to link the multiple accounts of the same darknet vendors through photo analytics. The core idea is that darknet vendors often have to take their own product photos to prove the possession of the illegal goods, which can reveal their distinct photography styles. To fingerprint vendors, we construct a series deep neural networks to model the photography styles. We apply transfer learning to the model training, which allows us to accurately fingerprint vendors with a limited number of photos. We evaluate the system using real-world datasets from 3 large darknet markets (7,641 vendors and 197,682 product photos). A ground-truth evaluation shows that the system achieves an accuracy of 97.5%, outperforming existing stylometry-based methods in both accuracy and coverage. In addition, our system identifies previously unknown Sybil accounts within the same markets (23) and across different markets (715 pairs). Further case studies reveal new insights into the coordinated Sybil activities such as price manipulation, buyer scam, and product stocking and reselling. / Master of Science / Taking advantage of the high anonymity of darknet, cybercriminals have set up underground trading websites such as darknet markets for trading illegal goods. To understand the relationships between cybercriminals and identify coordinated activities, it is necessary to identify the multiple accounts hold by the same vendor. Apart from manual investigation, previous studies have proposed methods for linking multiple accounts through analyzing the writing styles hidden in the users' online posts, which face key challenges in similar tasks on darknet markets. In this thesis, we propose a novel approach to link multiple identities within the same darknet market or across different markets by analyzing the product photos. We develop a system where a series of deep neural networks (DNNs) are used with transfer learning to extract distinct features from a vendor's photos automatically. Using real-world datasets from darknet markets, we evaluate the proposed system which shows clear advantages over the writing style based system. Further analysis of the results reported by the proposed system reveal new insights into coordinated activities such as price manipulation, buyer scam and product stocking and reselling for those vendors who hold multiple accounts. Darknet Market Sybil Detection Image Analysis Stylometry
3	Continuous Authentication using Stylometry Brocardo, Marcelo Luiz 30 April 2015 (has links) Static authentication, where user identity is checked once at login time, can be circumvented no matter how strong the authentication mechanism is. Through attacks such as man-in-the-middle and its variants, an authenticated session can be hijacked later after the initial login process has been completed. In the last decade, continuous authentication (CA) using biometrics has emerged as a possible remedy against session hijacking. CA consists of testing the authenticity of the user repeatedly throughout the authenticated session as data becomes available. CA is expected to be carried out unobtrusively, due to its repetitive nature, which means that the authentication information must be collectible without any active involvement of the user and without using any special purpose hardware devices (e.g. biometric readers). Stylometry analysis, which consists of checking whether a target document was written or not by a specific individual, could potentially be used for CA. Although stylometric techniques can achieve high accuracy rates for long documents, it is still challenging to identify an author for short documents, in particular when dealing with large author populations. In this dissertation, we propose a new framework for continuous authentication using authorship verification based on the writing style. Authorship verification can be checked using stylometric techniques through the analysis of linguistic styles and writing characteristics of the authors. Different from traditional authorship verification that focuses on long texts, we tackle the use of short messages. Shorter authentication delay (i.e. smaller data sample) is essential to reduce the window size of the re-authentication period in CA. We validate our method using different block sizes, including 140, 280, and 500 characters, and investigate shallow and deep learning architectures for machine learning classification. Experimental evaluation of the proposed authorship verification approach based on the Enron emails dataset with 76 authors yields an Equal Error Rate (EER) of 8.21% and Twitter dataset with 100 authors yields an EER of 10.08%. The evaluation of the approach using relatively smaller forgery samples with 10 authors yields an EER of 5.48%. / Graduate Continuous authentication Stylometry Deep Belief Network short message verification
4	Towards profiles of periodic style : discourse organisation in modern English instructional writing Lubbers, Thijs Hendrikus Johannes Bernardus January 2017 (has links) A notorious challenge in the study of the diachrony of English is to determine whether developments in syntax, including changing frequencies of a particular construction, or word-order changes as suggested by perceived patterns in extant texts, represent genuine linguistic changes or are due to changes in conventions of writing. What is intuitively clear, however, even to a casual eye, is that a piece of English prose from, say, the 16th-century differs markedly from texts from the 18th-century. Yet such judgements cannot be based on syntactic changes alone, since essential grammatical features of Present-Day English are in place already by the end of the Late Middle English period. As a result, these differences are often simply ascribed to the notoriously elusive domain of style. The current study attempts to come to grips with the issue of period-specific conventions of writing by focusing on features of discourse structure and textual organisation as of the Early Modern English period. It can be positioned at the meso-level between large-scale quantitative approaches of sentence-level linguistic features and detailed, small-scale discourse-analytic studies of individual texts. Texts selected for the current purpose, manuals for equine care, derive from a sub-domain of instructional writing with a long history in the vernacular. As these texts share similar communicative purposes and deal with the same "global" topics of feeding and looking after a horse, any differences between them cannot be attributed to different genres or differences in subject matter. This permits us to zoom in on 'agnates', different ways of expressing the same meanings, and allows us to see how the stylistic options selected by authors achieve the various communicative goals that have to be negotiated, such as discourse coherence or the transition to new topics. The three main sections in this dissertation offer different ways to identifying developments in discourse organisation. The first section explores the traditional corpus-based approach that is frequently used to measure the parameter of "personal involvement", an indicator of periodic style. Initially, this approach restricts itself to measuring the contribution of frequencies of individual lexical items like first and second person pronouns. Next, this section will focus on the presence and linguistic realisation of the interlocutors of these instructional texts, i.e. the writer and the reader. The second main section will try to diagnose such varying styles by employing a completely data-driven, quantitative methodology which offers a linguistically unbiased and theory-independent perspective on the data in the corpus. This second approach offers cues as to how `subliminal' patterns of grammar may affect perceptions of style, and how quantitative measures may aid in assessing whether the texts in our corpus cluster in expected or unexpected ways. The third section draws on theories of referential coherence and textual progression. By charting the variation with which texts from different periods in the history of English apply conventions for discourse organisation, it offers an insight into developments of hierarchical discourse structures (i.e., coordinated versus subordinated discourse relations) and practices of co-reference. Taken together, these three independent measures offer a novel, multi-angled approach to stylistic developments in prose writing. Combining features `above the sentence' level which involve discourse and information structural changes, this dissertation affords a glimpse into the emergence of written textual conventions, or 'grammars of prose', in the history of English. 425
5	Using Style Markers for Detecting Plagiarism in Natural Language Documents Kimler, Marco January 2003 (has links) <p>Most of the existing plagiarism detection systems compare a text to a database of other texts. These external approaches, however, are vulnerable because texts not contained in the database cannot be detected as source texts. This paper examines an internal plagiarism detection method that uses style markers from authorship attribution studies in order to find stylistic changes in a text. These changes might pinpoint plagiarized passages. Additionally, a new style marker called specific words is introduced. A pre-study tests if the style markers can fingerprint an author s style and if they are constant with sample size. It is shown that vocabulary richness measures do not fulfil these prerequisites. The other style markers - simple ratio measures, readability scores, frequency lists, and entropy measures - have these characteristics and are, together with the new specific words measure, used in a main study with an unsupervised approach for detecting stylistic changes in plagiarized texts at sentence and paragraph levels. It is shown that at these small levels the style markers generally cannot detect plagiarized sections because of intra-authorial stylistic variations (i.e. noise), and that at bigger levels the results are strongly a ected by the sliding window approach. The specific words measure, however, can pinpoint single sentences written by another author.</p> plagiarism detection stylometry authorship attribution Computer science Datavetenskap
6	Using Style Markers for Detecting Plagiarism in Natural Language Documents Kimler, Marco January 2003 (has links) Most of the existing plagiarism detection systems compare a text to a database of other texts. These external approaches, however, are vulnerable because texts not contained in the database cannot be detected as source texts. This paper examines an internal plagiarism detection method that uses style markers from authorship attribution studies in order to find stylistic changes in a text. These changes might pinpoint plagiarized passages. Additionally, a new style marker called specific words is introduced. A pre-study tests if the style markers can fingerprint an author s style and if they are constant with sample size. It is shown that vocabulary richness measures do not fulfil these prerequisites. The other style markers - simple ratio measures, readability scores, frequency lists, and entropy measures - have these characteristics and are, together with the new specific words measure, used in a main study with an unsupervised approach for detecting stylistic changes in plagiarized texts at sentence and paragraph levels. It is shown that at these small levels the style markers generally cannot detect plagiarized sections because of intra-authorial stylistic variations (i.e. noise), and that at bigger levels the results are strongly a ected by the sliding window approach. The specific words measure, however, can pinpoint single sentences written by another author. plagiarism detection stylometry authorship attribution Computer Sciences Datavetenskap (datalogi)
7	A Study on the Efficacy of Sentiment Analysis in Author Attribution Schneider, Michael J 01 August 2015 (has links) The field of authorship attribution seeks to characterize an author’s writing style well enough to determine whether he or she has written a text of interest. One subfield of authorship attribution, stylometry, seeks to find the necessary literary attributes to quantify an author’s writing style. The research presented here sought to determine the efficacy of sentiment analysis as a new stylometric feature, by comparing its performance in attributing authorship against the performance of traditional stylometric features. Experimentation, with a corpus of sci-fi texts, found sentiment analysis to have a much lower performance in assigning authorship than the traditional stylometric features. NLP Sentiment Analysis Authorship Attribution Data Mining Stylometry Computational Linguistics
8	Authorship Attribution Through Words Surrounding Named Entities Jacovino, Julia Maureen 03 April 2014 (has links) In text analysis, authorship attribution occurs in a variety of ways. The field of computational linguistics becomes more important as the need of authorship attribution and text analysis becomes more widespread. For this research, pre-existing authorship attribution software, Java Graphical Authorship Attribution Program (JGAAP), implements a named entity recognizer, specifically the Stanford Named Entity Recognizer, to probe into similar genre text and to aid in extricating the correct author. This research specifically examines the words authors use around named entities in order to test the ability of these words at attributing authorship / McAnulty College and Graduate School of Liberal Arts; / Computational Mathematics / MS; / Thesis;
9	Toward a scientific taxonomy of musical styles Bellmann, Hector Guillermo January 2006 (has links) The original aim of the research was to investigate the conceptual dimensions of style in tonal music in order to provide grounds for an objective, measurable categorization of the phenomenon that could be construed as the basis of a scientific taxonomy of musical styles. However, this is a formidable task that surpasses the practical possibilities of the project, which would hence concentrate on creating the tools that would be needed for the following stage. A review of previous attempts to deal with style in music provided a number of guidelines for the process of dealing with the material. The project intends to avoid the subjectivity of musical analysis concentrating on music observable features. A database of 250 keyboard scores in MusicXML format was built to the purpose of covering the whole span of styles in tonal music, from which it should be possible to extract features to be used in style categorization. Early on, it became apparent that most meaningful pitch-related features are linked to scale degrees, thus essentially depending on functional labeling, requiring the knowledge of the key of the music as a point function. Different proposed alternatives to determine the key were considered and a method decided upon. Software was written and its effectiveness tested. The method proved successful in determining the instant key with as much precision as feasible. On this basis, it became possible to functionally label scale degrees and chords. This software constitutes the basic tool for the extraction of pitch-related features. As its first use, the software was applied to the score database in order to quantify the usage of scale degrees and chords. The results indisputably showed that tonal music can be characterized by specific proportions in the use of the different scale degrees, whereas the use of chords shows a constant increase in chromaticism. Part of the material of this work appeared in the Springer-Verlag's 2006 volume of Lecture Notes in Computer Science. style stylometry musicXML key-determination algorithm dot product scale degree chord functional labeling
10	Measuring Greekness: A novel computational methodology to analyze syntactical constructions and quantify the stylistic phenomenon of Attic oratory Bozia, Eleni 18 October 2018 (has links) This study is the result of a compilation and interpretation of data that derive from Classical studies, but are studied and analyzed using computational linguistics, Treebank annotation, and the development and post-processing of metrics. More specifically, the purpose of this work is to employ computational methods so as to analyze a particular form of Ancient Greek language that is Attic Greek, “measure” its attributes, and explore the socio-political connotations that its usage had in the era of the High Roman Empire. During the first centuries CE, the landscape of the Roman Empire is polyvalent. It consists of native Romans who can be fluent in Latin and Greek, Greeks who are Roman citizens, other easterners who are potentially trilingual and have also assumed Roman citizenship, and even Christians, who identify themselves as Roman citizens but with a different religious identity. It comes as no surprise that language is politicized, and identity, both individual and civic, is constantly reshaped through it. The question I attempt to answer is whether we can quantify Greekness of native and bilingual speakers based on an analytic computational study of Attic dialect. Chapter 1 provides a discussion of the three aforementioned scholarly fields, which were pertinent for the study. I present the precepts of computational linguistics, corpus linguistics, and digital humanities so as to further explicate what prompts this work and how the confluence of three methodologies significantly enhances our apprehension of the issue at hand. In Chapter 2, I approach Greekness, Latinity, and Atticism through the writings of Greek and Roman grammarians and lexicographers and provide the complete list of all the occurrences of the aforementioned notions. Chapters 3 and 4 explicate further the reasoning behind the usage of the Perseids framework and the Prague annotation system. They then proceed to relate the metrics developed, the computational methods, and their subsequent visualization to quantify and objectify the previously purely theoretical inferences. The metric system was developed after careful consideration of the stylistic attributes of Ancient Greek. Therefore, each metric “measures” something pertinent in the formation of the language. The visualizations then afford us a more understandable and interpretable format of the numerical results. For philologists, it is interesting to view the graphic presentation of humanistic ideas, and for the computer scientists the applicability of their methods on a topic that is predominantly philological and social. Finally, chapter 5 recontextualizes the numerical results and their interpretations, as were acquired in chapters 3 and 4, and thus sets the parameters necessary to discuss them in conjunction with readings of literary texts of the period of the High Empire. My intention is to show how numbers are “translated” into a different “language,” the language of the humanist.:Acknowledgments Page 6 Chapter 1: Introduction Page 7 1.1 Focus of the Study Page 7 1.2 Classical Studies and Digital Humanities Page 9 1.3 Corpus Linguistics Page 13 1.4 Humanities Corpus and Corpus Linguistics Page 15 1.5 Synopsis of the Project Page 17 Chapter 2: Linguistic Purity as Ethnic and Educational Marker, or Greek and Roman Grammarians on Greek and Latin. Page 22 2.1 Introduction Page 22 2.2 Grammatical and Lexicographic Definitions Page 23 2.2.1 Greek and Latin languages Page 23 2.2.2 Grammatici Graeci Page 29 2.2.3 Grammatici Latini. Page 32 2.3 Greek and Attic in Greek Lexicographers Page 48 2.4 Conclusion Page 57 Chapter 3: Attic Oratory and its Imperial Revival: Quantifying Theory and Practice Page 58 3.1 Introduction Page 58 3.2 Atticism: Definition and Redefinitions Page 59 3.3 Significance of Enhanced Linguistic and Computational Analysis of Atticism Page 65 3.3.1 The Perseids Project, the Prague Mark-up Language, and Dependency Grammar Page 67 3.4 Evaluating Atticism Page 70 3.4.1 Dionysius’s of Halicarnassus Theoretical Framework Page 73 3.5 Methods: Computational Quantification of Rhetorical Styles Page 82 3.5.1 The Perseids 1.5 ALDT Schema Page 84 3.5.2 Node-based Sentence Metrics Page 93 3.5.3 Computer Implementation Page 104 3.6 Conclusion Page 108 Chapter 4: Experimental results, Analysis, and Topological Haar Wavelets Page 110 4.1 Introduction Page 110 4.2 Experimental Results Page 111 4.3 Data Visualization Page 117 4. 4 Topological Metric Wavelets for Syntactical Quantification Page 153 4.4.1 Wavelets Page 154 4.4.2 Topological Metrics using Wavelets Page 155 4.4.3 Experimental Results Page 157 4.5 Conclusion Page 162 Chapter 5: «Γαλάτης ὢν ἑλληνίζειν»: Greekness, Latinity, and Otherness in the World of the High Empire. Page 163 5.1 Introduction Page 163 5.2 The Multiethnical Constituents of an Imperial Citizen: Anacharsis, Favorinus, and Dionysius’s of Halicarnassus Ethnography. Page 165 5.3 Conclusion Page 185 Chapter 6: Conclusion Page 187 References Page 190 Appendix Page 203 Curriculum Vitae Page 212 Dissertation related Publications Page 225 Selbständigkeitserklärung Page 226 info:eu-repo/classification/ddc/000 ddc:000

Search results