• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 1
  • Tagged with
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

BNS informacinių žinučių analizė teminiu aspektu / Topic analysis in news items of BNS news agency

Grigaitytė, Justina 17 June 2010 (has links)
Darbe nagrinėjamas temų identifikavimo uždavinys, kuris siejamas su teksto klasifikavimu į tam tikras kategorijas, t.y. įvairių tekstinių duomenų grupavimas pagal atitinkamas temas. Žinutės naujienų agentūrose yra skirstomos į atskiras grupes ir pogrupius pagal temas. Šis darbas atliekamas rankomis, t.y. perskaitomas tekstas ir priskiriamas kokiai nors temai. Vis dėlto, vystantis žiniasklaidai ir kuriantis įvairiems naujienų portalams, aktualu naujienas skirstyti ne rankiniu, o automatiniu būdu, todėl galimybė automatizuoti šį procesą galėtų būti naudinga įvairiems naujienų portalams, padedant skirstyti pranešimus ir taupant laiko bei energijos sąnaudas. Darbo objektą apima 2007 metų BNS spaudos centro žinutės. Darbo tikslas – išsiaiškinti, kaip atskiri žodžiai padeda nustatyti teksto temą. Temos nustatymui taikomi trys metodai: dažnų žodžių, dvižodžių junginių (bigramų) ir prasminių žodžių. Darbas susideda iš trijų dalių. Pirmoje dalyje buvo aptarti teoriniai pagrindai (temos nustatymas, tekstų klasifikavimas, žinių kalba). Apžvelgus žinučių ypatumus pastebėta, kad šis informacinis žanras iš kitų išsiskiria tekstų glaustumu, faktų konstatavimu. Taip pat daroma prielaida, kad temos nustatymo tikslumui yra svarbu žinutės apimtis ir aktualumas. Antroje dalyje aprašyti dažnų žodžių ir dvižodžių junginių sąrašų sudarymo bei prasminių žodžių ištraukimo būdai. Apžvelgus naujienų skirstymą pagal temas, buvo sudarytas temų sąrašas ir juo remiantis, buvo anotuoti dažnų žodžių ir... [toliau žr. visą tekstą] / The thesis is based on topic detection in BNS news reports. The reports are divided into different groups and sub-grouped according to topics. This topic analysis is manual; namely, reading texts and assigning to any topic. However, media and various news portals are developing very quickly, so the possibility to distribute reports automatically is quite relevant problem. The automated topic detection process would be useful for various news portals, automated distribution would save time and energy costs. Therefore, the task of the paper is topic detection issue, which is associated with the classification of text into certain categories, in other words, various text data is classified by subject. The object of the thesis is reports from BNS news agency received in 2007. The aim of the paper is to analyze how separate words help identify the topic. Three methods are applied to detect the topic: high frequency words, bigrams (two-word compounds) and the keywords. The paper consists of three parts. The first part is theoretical; it presents the bases of topic detection, text classification and report language. The report was chosen because this information genre is concise and clearly stating facts. What is more, it is hypothesized that the accuracy of topic detection depends on the size and relevance of the report. The second part describes the formation of frequent words’ and bigram lists and keyword extraction techniques. Those frequent word and bigram lists were... [to full text]
2

Authorship Attribution with Function Word N-Grams

Johnson, Russell Clark 01 January 2013 (has links)
Prior research has considered the sequential order of function words, after the contextual words of the text have been removed, as a stylistic indicator of authorship. This research describes an effort to enhance authorship attribution accuracy based on this same information source with alternate classifiers, alternate n-gram construction methods, and a genetically tuned configuration. The approach is original in that it is the first time that probabilistic versions of Burrows's Delta have been used. Instead of using z-scores as an input for a classifier, the z-scores were converted to probabilistic equivalents (since z-scores cannot be subtracted, added, or divided without the possibility of distorting their probabilistic meaning); this adaptation enhanced accuracy. Multiple versions of Burrows's Delta were evaluated; this includes a hybrid of the Probabilistic Burrows's Delta and the version proposed by Smith & Aldridge (2011); in this case accuracy was enhanced when individual frequent words were evaluated as indicators of style. Other novel aspects include alternate n-gram construction methods; a reconciliation process that allows texts of various lengths from different authors to be compared; and a GA selection process that determines which function (or frequent) words (see Smith & Rickards, 2008; see also Shaker, Corne, & Everson, 2007) may be used in the construction of function word n-grams.

Page generated in 0.0718 seconds