• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 14
  • 3
  • 1
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 27
  • 27
  • 12
  • 10
  • 8
  • 8
  • 8
  • 8
  • 6
  • 5
  • 5
  • 5
  • 5
  • 4
  • 4
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
21

Idiolekto požymiai elektroniniuose laiškuose / Features of Idiolect in E-mails

Žalkauskaitė, Gintarė 18 January 2012 (has links)
Šiuo darbu siekta nustatyti, ar asmeninių elektroninių laiškų kalboje atsiskleidžia autoriaus idiolektas ir kokiais leksiniais bei grafiniais požymiais jis pasireiškia.. Tyrimui buvo surinktas šešių autorių asmeninių neoficialaus bendravimo elektroninių laiškų tekstynas. Tekstyno duomenys apdoroti pasitelkiant WordSmith Tools programą ir atlikta gretinamoji tekstų analizė: lyginti kalbos vienetų pasikartojimo dažniai tiriamųjų autorių laiškuose ir nustatyta, kad vienų autorių dažniau ar rečiau nei kitų vartojami kalbos vienetai skiria autorių idiolektus. Iš nustatytų kalbos požymių apibendrintos su idiolektu sietinų kalbinės raiškos vienetų grupės. Nustatyta, kad leksikos lygmenyje idiolektus aiškiausiai skiria autoriaus vertinimą ir nuostatas perteikiantys bei modalumą reiškiantys žodžiai bei iš galimų leksinių konkurentų pasirenkami žodžiai ir trumpiniai. Taip pat idiolektus žymi skirtingų autorių nevienodai dažnai pasirenkamų skyrybos ir grafinių ženklų vartojimas. Remiantis atlikto tyrimo rezultatais disertacijoje pateikiamos rekomendacijos teismo lingvistinius autorystės tyrimus atliekantiems ekspertams. / The current study aims to establish, if authors idiolect can be recognized in electronic mails language and to determine the features of lexis and graphics, which can be linked to idiolect. The data has been derived from a corpus of 65,000 words consisting of electronic letters written in Lithuanian by six persons. The WordSmith Tools software was used to generate frequency lists of six subcorpora, representing each person’s language. By using the contrastive method the frequency data of six persons language were compared. The lexis and graphics elements, which were used by one person more often or more rarely than by others and were not determined by the topic, were linked to authors idiolect. As a result of the analysis the classification of lexical and graphical elements is given, which can help recognizing idiolect. The study shows that on a lexical level the main differences between idiolects are in the usage of the modality and stance expressing words, and also the words and abbreviations, which are differently chosen from possible variants. On a graphical level idiolects can be recognized from punctuation marks, emoticons and graphic symbols, used at a different frequency. Based on research results the recommendations for authorship attribution examinations are given.
22

Features of Idiolect in E-mails / Idiolekto požymiai elektroniniuose laiškuose

Žalkauskaitė, Gintarė 18 January 2012 (has links)
The current study aims to establish, if authors idiolect can be recognized in electronic mails language and to determine the features of lexis and graphics, which can be linked to idiolect. The data has been derived from a corpus of 65,000 words consisting of electronic letters written in Lithuanian by six persons. The WordSmith Tools software was used to generate frequency lists of six subcorpora, representing each person’s language. By using the contrastive method the frequency data of six persons language were compared. The lexis and graphics elements, which were used by one person more often or more rarely than by others and were not determined by the topic, were linked to authors idiolect. As a result of the analysis the classification of lexical and graphical elements is given, which can help recognizing idiolect. The study shows that on a lexical level the main differences between idiolects are in the usage of the modality and stance expressing words, and also the words and abbreviations, which are differently chosen from possible variants. On a graphical level idiolects can be recognized from punctuation marks, emoticons and graphic symbols, used at a different frequency. Based on research results the recommendations for authorship attribution examinations are given. / Šiuo darbu siekta nustatyti, ar asmeninių elektroninių laiškų kalboje atsiskleidžia autoriaus idiolektas ir kokiais leksiniais bei grafiniais požymiais jis pasireiškia.. Tyrimui buvo surinktas šešių autorių asmeninių neoficialaus bendravimo elektroninių laiškų tekstynas. Tekstyno duomenys apdoroti pasitelkiant WordSmith Tools programą ir atlikta gretinamoji tekstų analizė: lyginti kalbos vienetų pasikartojimo dažniai tiriamųjų autorių laiškuose ir nustatyta, kad vienų autorių dažniau ar rečiau nei kitų vartojami kalbos vienetai skiria autorių idiolektus. Iš nustatytų kalbos požymių apibendrintos su idiolektu sietinų kalbinės raiškos vienetų grupės. Nustatyta, kad leksikos lygmenyje idiolektus aiškiausiai skiria autoriaus vertinimą ir nuostatas perteikiantys bei modalumą reiškiantys žodžiai bei iš galimų leksinių konkurentų pasirenkami žodžiai ir trumpiniai. Taip pat idiolektus žymi skirtingų autorių nevienodai dažnai pasirenkamų skyrybos ir grafinių ženklų vartojimas. Remiantis atlikto tyrimo rezultatais disertacijoje pateikiamos rekomendacijos teismo lingvistinius autorystės tyrimus atliekantiems ekspertams.
23

Stylometric Embeddings for Book Similarities / Stilometriska vektorer för likhet mellan böcker

Chen, Beichen January 2021 (has links)
Stylometry is the field of research aimed at defining features for quantifying writing style, and the most studied question in stylometry has been authorship attribution, where given a set of texts with known authorship, we are asked to determine the author of a new unseen document. In this study a number of lexical and syntactic stylometric feature sets were extracted for two datasets, a smaller one containing 27 books from 25 authors, and a larger one containing 11,063 books from 316 authors. Neural networks were used to transform the features into embeddings after which the nearest neighbor method was used to attribute texts to their closest neighbor. The smaller dataset achieved an accuracy of 91.25% using frequencies of 50 most common functional words, dependency relations, and Part-of-speech (POS) tags as features, and the larger dataset achieved 69.18% accuracy using a similar feature set with 100 most common functional words. In addition to performing author attribution, a user test showed the potentials of the model in generating author similarities and hence being useful in an applied setting for recommending books to readers based on author style. / Stilometri eller stilistisk statistik är ett forskningsområde som arbetar med att definiera särdrag för att kvantitativt studera stilistisk variation hos författare. Stilometri har mest fokuserat på författarbestämning, där uppgiften är att avgöra vem som skrivit en viss text där författaren är okänd, givet tidigare texter med kända författare. I denna stude valdes ett antal lexikala och syntaktiska stilistiska särdrag vilka användes för att bestämma författare. Experimentella resultat redovisas för två samlingar litterära verk: en mindre med 27 böcker skrivna av 25 författare och en större med 11 063 böcker skrivna av 316 författare. Neurala nätverk användes för att koda de valda särdragen som vektorer varefter de närmaste grannarna för de okända texterna i vektorrummet användes för att bestämma författarna. För den mindre samlingen uppnåddes en träffsäkerhet på 91,25% genom att använda de 50 vanligaste funktionsorden, syntaktiska dependensrelationer och ordklassinformation. För den större samlingen uppnåddes en träffsäkerhet på 69,18% med liknande särdrag. Ett användartest visar att modellen utöver att bestämma författare har potential att representera likhet mellan författares stil. Detta skulle kunna tillämpas för att rekommendera böcker till läsare baserat på stil.
24

Atribuição automática de autoria de obras da literatura brasileira / Atribuição automática de autoria de obras da literatura brasileira

Nobre Neto, Francisco Dantas 19 January 2010 (has links)
Made available in DSpace on 2015-05-14T12:36:48Z (GMT). No. of bitstreams: 1 arquivototal.pdf: 1280792 bytes, checksum: d335d67b212e054f48f0e8bca0798fe5 (MD5) Previous issue date: 2010-01-19 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior / Authorship attribution consists in categorizing an unknown document among some classes of authors previously selected. Knowledge about authorship of a text can be useful when it is required to detect plagiarism in any literary document or to properly give the credits to the author of a book. The most intuitive form of human analysis of a text is by selecting some characteristics that it has. The study of selecting attributes in any written document, such as average word length and vocabulary richness, is known as stylometry. For human analysis of an unknown text, the authorship discovery can take months, also becoming tiring activity. Some computational tools have the functionality of extracting such characteristics from the text, leaving the subjective analysis to the researcher. However, there are computational methods that, in addition to extract attributes, make the authorship attribution, based in the characteristics gathered in the text. Techniques such as neural network, decision tree and classification methods have been applied to this context and presented results that make them relevant to this question. This work presents a data compression method, Prediction by Partial Matching (PPM), as a solution of the authorship attribution problem of Brazilian literary works. The writers and works selected to compose the authors database were, mainly, by their representative in national literature. Besides, the availability of the books has also been considered. The PPM performs the authorship identification without any subjective interference in the text analysis. This method, also, does not make use of attributes presents in the text, differently of others methods. The correct classification rate obtained with PPM, in this work, was approximately 93%, while related works exposes a correct rate between 72% and 89%. In this work, was done, also, authorship attribution with SVM approach. For that, were selected attributes in the text divided in two groups, one word based and other in function-words frequency, obtaining a correct rate of 36,6% and 88,4%, respectively. / Atribuição de autoria consiste em categorizar um documento desconhecido dentre algumas classes de autores previamente selecionadas. Saber a autoria de um texto pode ser útil quando é necessário detectar plágio em alguma obra literária ou dar os devidos créditos ao autor de um livro. A forma mais intuitiva ao ser humano para se analisar um texto é selecionando algumas características que ele possui. O estudo de selecionar atributos em um documento escrito, como tamanho médio das palavras e riqueza vocabular, é conhecido como estilometria. Para análise humana de um texto desconhecido, descobrir a autoria pode demandar meses, além de se tornar uma tarefa cansativa. Algumas ferramentas computacionais têm a funcionalidade de extrair tais características do texto, deixando a análise subjetiva para o pesquisador. No entanto, existem métodos computacionais que, além de extrair atributos, atribuem a autoria baseado nas características colhidas ao longo do texto. Técnicas como redes neurais, árvores de decisão e métodos de classificação já foram aplicados neste contexto e apresentaram resultados que os tornam relevantes para tal questão. Este trabalho apresenta um método de compressão de dados, o Prediction by Partial Matching (PPM), para solução do problema de atribuição de autoria de obras da literatura brasileira. Os escritores e obras selecionados para compor o banco de autores se deram, principalmente, pela representatividade que possuem na literatura nacional. Além disso, a disponibilidade dos livros em formato eletrônico também foi considerada. O PPM realiza a identificação de autoria sem ter qualquer interferência subjetiva na análise do texto. Este método, também, não faz uso de atributos presentes ao longo do texto, diferentemente de outros métodos. A taxa de classificação correta alcançada com o PPM, neste trabalho, foi de aproximadamente 93%, enquanto que trabalhos relacionados mostram uma taxa de acerto entre 72% e 89%. Neste trabalho, também foi realizado atribuição de autoria com a abordagem SVM. Para isso, foram selecionados atributos no texto dividido em dois tipos, sendo um baseado em palavras e o outro na contagem de palavrasfunção, obtendo uma taxa de acerto de 36,6% e 88,4%, respectivamente.
25

Personal information prediction from written texts

Bibi, Khalil 03 1900 (has links)
La détection de la paternité textuelle est un domaine de recherche qui existe depuis les années 1960. Il consiste à prédire l’auteur d’un texte en se basant sur d’autres textes dont les auteurs sont connus. Pour faire cela, plusieurs traits sur le style d’écriture et le contenu sont extraits. Pour ce mémoire, deux sous-problèmes de détection de la paternité textuelle ont été traités : la prédiction du genre et de l’âge de l’auteur. Des données collectées de blogs en ligne ont été utilisées pour faire cela. Dans ce travail, plusieurs traits (features) textuels ont été comparé en utilisant des méthodes d’apprentissage automatique. De même, des méthodes d’apprentissage profond ont été appliqués. Pour la tâche de classification du genre, les meilleurs résultats ont été obtenus en appliquant un système de vote majoritaire sur la prédiction d’autres modèles. Pour la classification d’âge, les meilleurs résultats ont été obtenu en utilisant un classificateur entrainé sur TF-IDF. / Authorship Attribution (AA) is a field of research that exists since the 60s. It consists of identifying the author of a certain text based on texts with known authors. This is done by extracting features about the writing style and the content of the text. In this master thesis, two sub problems of AA were treated: gender and age classification using a corpus collected from online blogs. In this work, several features were compared using several feature-based algorithms. As well as deep learning methods. For the gender classification task, the best results are the ones obtained by a majority vote system over the outputs of several classifiers. For the age classification task, the best result was obtained using classifier trained over TFIDF.
26

Investigating the use of forensic stylistic and stylometric techniques in the analyses of authorship on a publicly accessible social networking site (Facebook)

Michell, Colin Simon 2013 July 1900 (has links)
This research study examines the forensic application of a selection of stylistic and stylometric techniques in a simulated authorship attribution case involving texts on the social networking site, Facebook. Eight participants each submitted 2,000 words of self-authored text from their personal Facebook messages, and one of them submitted an extra 2,000 words to act as the ‘disputed text’. The texts were analysed in terms of the first 1,000 words received and then at the 2,000-word level to determine what effect text length has on the effectiveness of the chosen style markers (keywords, function words, most frequently occurring words, punctuation, use of digitally mediated communication features and spelling). It was found that despite accurately identifying the author of the disputed text at the 1,000-word level, the results were not entirely conclusive but at the 2,000-word level the results were more promising, with certain style markers being particularly effective. / Linguistics / MA (Linguistics)
27

Investigating the use of forensic stylistic and stylometric techniques in the analyses of authorship on a publicly accessible social networking site (Facebook)

Michell, Colin Simon 07 1900 (has links)
This research study examines the forensic application of a selection of stylistic and stylometric techniques in a simulated authorship attribution case involving texts on the social networking site, Facebook. Eight participants each submitted 2,000 words of self-authored text from their personal Facebook messages, and one of them submitted an extra 2,000 words to act as the ‘disputed text’. The texts were analysed in terms of the first 1,000 words received and then at the 2,000-word level to determine what effect text length has on the effectiveness of the chosen style markers (keywords, function words, most frequently occurring words, punctuation, use of digitally mediated communication features and spelling). It was found that despite accurately identifying the author of the disputed text at the 1,000-word level, the results were not entirely conclusive but at the 2,000-word level the results were more promising, with certain style markers being particularly effective. / Linguistics and Modern Languages / M.A. (Linguistics)

Page generated in 0.1094 seconds