Global ETD Search

1	Authorship Attribution based on Grammar Signatures Gopalakrishnan, Sridharan 14 October 2013 (has links) No description available. Engineering Individual and Family Studies stylometric
2	Effective Authorship Attribution in Large Document Collections Zhao, Ying, ying.zhao@rmit.edu.au January 2008 (has links) Techniques that can effectively identify authors of texts are of great importance in scenarios such as detecting plagiarism, and identifying a source of information. A range of attribution approaches has been proposed in recent years, but none of these are particularly satisfactory; some of them are ad hoc and most have defects in terms of scalability, effectiveness, and computational cost. Good test collections are critical for evaluation of authorship attribution (AA) techniques. However, there are no standard benchmarks available in this area; it is almost always the case that researchers have their own test collections. Furthermore, collections that have been explored in AA are usually small, and thus whether the existing approaches are reliable or scalable is unclear. We develop several AA collections that are substantially larger than those in literature; machine learning methods are used to establish the value of using such corpora in AA. The results, also used as baseline results in this thesis, show that the developed text collections can be used as standard benchmarks, and are able to clearly distinguish between different approaches. One of the major contributions is that we propose use of the Kullback-Leibler divergence, a measure of how different two distributions are, to identify authors based on elements of writing style. The results show that our approach is at least as effective as, if not always better than, the best existing attribution methods-that is, support vector machines-for two-class AA, and is superior for multi-class AA. Moreover our proposed method has much lower computational cost and is cheaper to train. Style markers are the key elements of style analysis. We explore several approaches to tokenising documents to extract style markers, examining which marker type works the best. We also propose three systems that boost the AA performance by combining evidence from various marker types, motivated from the observation that there is no one type of marker that can satisfy all AA scenarios. To address the scalability of AA, we propose the novel task of authorship search (AS), inspired by document search and intended for large document collections. Our results show that AS is reasonably effective to find documents by a particular author, even within a collection consisting of half a million documents. Beyond search, we also propose the AS-based method to identify authorship. Our method is substantially more scalable than any method published in prior AA research, in terms of the collection size and the number of candidate authors; the discrimination is scaled up to several hundred authors. Authorship attribution authorship search stylometric studies style markers language models Kullback-Leibler divergence
3	A Stylometric Analysis of Climate Change Fiction Lorenz, Nina 15 July 2020 (has links) (PDF) This work sets out to analyze stylistic changes in Anthropocene fiction over the past 60 years. The starting point for the analysis has been Rachel Carson, and the presumed beginning of the Anthropocene in the 1960s. The primary insight gained reveals the connections within these novel and relations of similar writing about climate change thereby contributing to the field of Environmental Humanities in a fundamental way, as so far, climate change fiction has only been investigated through a topic centered focus. The corpus compiled for scrutiny here extends to over 84 novels from these years. These novels have been selected based on a dual approach, looking at the secondary literature as well as a crowdsourced approach in looking at Good Reads’ cli-fi lists. The resulting texts are then analyzed with stylo, an R package that has been specifically created for stylometric analysis by humanists. The results are visualized in a network that allows easier interpretation and leads to an understanding of more detailed questions about the nature of the connection between works, the inspiration and representation of a specific genre of writing. Moreover, the thesis looks diachronically at clustering based on time and topic. Understanding the ways in which authors address and have addressed climate change is one indicator of how climate change is and has been comprehended. In terms of the digital approach applied here, the basis is a distant reading approach covering a larger number of novels and rather than close reading them, the task is to find patterns that extend throughout. However, for a thorough analysis, scalable reading is applied to contextualize and investigate the results in more depth. Overall, the results are meant to establish a baseline for discussing climate change fiction in the Anthropocene which although gaining more scholarly attention still is understudied. The hope is to not only gain insight but to generate visualizations that will provide a helpful resource for fellow scholars. Stylometric Analysis Climate Change Fiction Fiction in the Anthropocene Digital Humanities Environmental Humanities Arts and Humanities
4	川端康成の代筆問題及び文体問題に関する計量的研究 / カワバタヤスナリノダイヒツモンダイオヨビブンタイモンダイニカンスルケイリョウテキケンキュウ孫昊, Hao Sun 22 March 2018 (has links) 本論文では，計量的な手法を用いて川端康成の代筆問題と文体問題に取り組み，次に挙げたことを明らかにした。①小説『乙女の港』と『花日記』は川端康成と中里恒子の共同執筆である。②『コスモスの友』，『古都』，『眠れる美女』と『山の音』は代筆の可能性が低い。③泉鏡花，徳田秋聲と横光利一と比べ川端康成文体の存在が確認され，終戦の1945 年を境に川端康成の語彙の豊富さと，機能語の助詞，副詞，接続詞に変化が見られた。 / In this study, we revealed the following ghostwriting and writing style problem of Kawabata quantitatively. ①Otome no minato and Hana nikki were colloboratively written by Kawabata and Nakazato. ② Kosumosu no tomo, The Old Capital, House of the Sleeping Beauties, and The Sound of the Mountain were not written by the suspected ghostwriters. ③ Kawabata has his own writing style as compared to novels written by Izumi, Tokuda, and Yokomitsu. Changes were observed in vocabulary richness, postpositional particles, adverb, and conjunctions in Kawabata's novels after the second world war. / 博士(文化情報学) / Doctor of Culture and Information Science / 同志社大学 / Doshisha University 川端康成代筆問題計量文体学文体特徴量機械学習 Yasunari Kawabata ghostwriting problem stylometry stylometric features machine learning
5	Stylometry: Quantifying Classic Literature For Authorship Attribution : - A Machine Learning Approach Yousif, Jacob, Scarano, Donato January 2024 (has links) Classic literature is rich, be it linguistically, historically, or culturally, making it valuable for future studies. Consequently, this project chose a set of 48 classic books to conduct a stylometric analysis on the defined set of books, adopting an approach used by a related work to divide the books into text segments, quantify the resulting text segments, and analyze the books using the quantified values to understand the linguistic attributes of the books. Apart from the latter, this project conducted different classification tasks for other objectives. In one respect, the study used the quantified values of the text segments of the books for classification tasks using advanced models like LightGBM and TabNet to assess the application of this approach in authorship attribution. From another perspective, the study utilized a State-Of-The-Art model, namely, RoBERTa for classification tasks using the segmented texts of the books instead to evaluate the performance of the model in authorship attribution. The results uncovered the characteristics of the books to a reasonable degree. Regarding the authorship attribution tasks, the results suggest that segmenting and quantifying text using stylometric analysis and supervised machine learning algorithms is practical in such tasks. This approach, while showing promise, may still require further improvements to achieve optimal performance. Lastly, RoBERTa demonstrated high performance in authorship attribution tasks. Authorship Attribution Classic Literature Analysis Clustering Data Science Deep Learning Feature Engineering Feature Extraction Gradient Descent K-Means LightGBM Machine Learning Multiclass Classification NLP Neural Network RoBERTa Stylometric Analysis Stylometry TabNet t-SNE Text Mining Transformer Models Computer Sciences Datavetenskap (datalogi) Computer and Information Sciences Data- och informationsvetenskap

1

Page generated in 0.0552 seconds