Global ETD Search

491	Minerafórum : um recurso de apoio para análise qualitativa em fóruns de discussão Azevedo, Breno Fabrício Terra January 2011 (has links) Esta tese aborda o desenvolvimento, uso e experimentação do MineraFórum. Trata-se de um recurso para auxiliar o professor na análise qualitativa das contribuições textuais registradas por alunos em fóruns de discussão. A abordagem desta pesquisa envolveu técnicas de mineração de textos utilizando grafos. As interações proporcionadas pelas trocas de mensagens em um fórum de discussão representam uma importante fonte de investigação para o professor. A partir da análise das postagens, o docente pode identificar quais alunos redigiram contribuições textuais que contemplam conceitos relativos ao tema da discussão, e quais discentes não o fizeram. Desta forma, é possível ter subsídios para motivar a discussão dos conceitos importantes que fazem parte do tema em debate. Para atingir o objetivo do presente estudo, foi necessário realizar uma revisão da literatura onde foram abordados temas como: a Educação a Distância (EAD); Ambientes Virtuais de Aprendizagem; os principais conceitos da área de Mineração de Textos e, por último, trabalhos correlacionados a esta tese. A estratégia metodológica utilizada no processo de desenvolvimento do MineraFórum envolveu uma série de etapas: 1) a escolha de uma técnica de mineração de textos adequada às necessidades da pesquisa; 2) verificação da existência de algum software de mineração de textos que auxiliasse o professor a analisar qualitativamente as contribuições em um fórum de discussão; 3) realização de estudos preliminares para avaliar a técnica de mineração escolhida; 4) definição dos indicadores de relevância das mensagens; elaboração de fórmulas para calcular a relevância das postagens; 5) construção do sistema; 6) integração do MineraFórum a três Ambientes Virtuais de Aprendizagem e, por último, 7) a realização de experimentos com a ferramenta. / This thesis presents the development, use and experimentation of the MineraFórum software. It is a resource that can help teachers in doing qualitative analyses of text contributions in discussion forums. This research included the use of text mining techniques with graphs. Message exchange in discussion forums are an important source of investigation for teachers. By analyzing students’ posts, teachers can identify which learners wrote contributions that have concepts related to the debate theme, and which students did not succeed to do so. This strategy may also give teachers the necessary elements to motivate discussion of concepts relevant to the topic being debated. To accomplish the objectives of this study, a review of the literature was carried on topics such as: Distance Learning; Virtual Learning Environments; main concepts in Text Mining; and studies related to this thesis. The methodological strategy used in the development of MineraFórum followed these steps: 1) choosing a text mining technique suitable to the needs of the research; 2) checking whether there was software available to help teachers to do qualitative analysis of contributions in discussion forums; 3) doing preliminary studies to evaluate the selected mining technique; 4) defining indicators of relevance in the messages; elaborating formulas to calculate relevance in posts; 5) building the system; 6) integrating MineraFórum to three Virtual Learning Environments, and 7) carrying experiments with the tool. Computador na educação Fórum de discussão Ambiente virtual Ambiente de aprendizagem Análise de dados Text mining Discussion forum Qualitative analysis Thematic relevance Virtual learning environments
492	Topological stability and textual differentiation in human interaction networks: statistical analysis, visualization and linked data / Estabilidade topológica e diferenciação textual em redes de interação humana: análise estatística, visualização e dados ligados Renato Fabbri 08 May 2017 (has links) This work reports on stable (or invariant) topological properties and textual differentiation in human interaction networks, with benchmarks derived from public email lists. Activity along time and topology were observed in snapshots in a timeline, and at different scales. Our analysis shows that activity is practically the same for all networks across timescales ranging from seconds to months. The principal components of the participants in the topological metrics space remain practically unchanged as different sets of messages are considered. The activity of participants follows the expected scale-free outline, thus yielding the hub, intermediary and peripheral classes of vertices by comparison against the Erdös-Rényi model. The relative sizes of these three sectors are essentially the same for all email lists and the same along time. Typically, 3-12% of the vertices are hubs, 15-45% are intermediary and 44-81% are peripheral vertices. Texts from each of such sectors are shown to be very different through direct measurements and through an adaptation of the Kolmogorov-Smirnov test. These properties are consistent with the literature and may be general for human interaction networks, which has important implications for establishing a typology of participants based on quantitative criteria. For guiding and supporting this research, we also developed a visualization method of dynamic networks through animations. To facilitate verification and further steps in the analyses, we supply a linked data representation of data related to our results. / Este trabalho relata propriedades topológicas estáveis (ou invariantes) e diferenciação textual em redes de interação humana, com referências derivadas de listas públicas de e-mail. A atividade ao longo do tempo e a topologia foram observadas em instantâneos ao longo de uma linha do tempo e em diferentes escalas. A análise mostra que a atividade é praticamente a mesma para todas as redes em escalas temporais de segundos a meses. As componentes principais dos participantes no espaço das métricas topológicas mantêm-se praticamente inalteradas quando diferentes conjuntos de mensagens são considerados. A atividade dos participantes segue o esperado perfil livre de escala, produzindo, assim, as classes de vértices dos hubs, dos intermediários e dos periféricos em comparação com o modelo Erdös-Rényi. Os tamanhos relativos destes três setores são essencialmente os mesmos para todas as listas de e-mail e ao longo do tempo. Normalmente, 3-12% dos vértices são hubs, 15-45% são intermediários e 44-81% são vértices periféricos. Os textos de cada um destes setores são considerados muito diferentes através de uma adaptação dos testes de Kolmogorov-Smirnov. Estas propriedades são consistentes com a literatura e podem ser gerais para redes de interação humana, o que tem implicações importantes para o estabelecimento de uma tipologia dos participantes com base em critérios quantitativos. De modo a guiar e apoiar esta pesquisa, também desenvolvemos um método de visualização para redes dinâmicas através de animações. Para facilitar a verificação e passos seguintes nas análises, fornecemos uma representação em dados ligados dos dados relacionados aos nossos resultados. Análise de redes sociais Dados ligados Mineração de texto Reconhecimento de padrões Redes complexas Complex networks Linked data Pattern recognition Social network analysis Text mining
493	Enxame de partículas aplicado ao agrupamento de textos / Enxame de partículas aplicado ao agrupamento de textos Prior, Ana Karina Fontes 22 December 2010 (has links) Made available in DSpace on 2016-03-15T19:37:34Z (GMT). No. of bitstreams: 1 Ana Karina Fontes Prior.pdf: 415415 bytes, checksum: a6ecb97b982ab886cc421abdc943c8ac (MD5) Previous issue date: 2010-12-22 / Fundo Mackenzie de Pesquisa / The large number of data generated by people and organizations has stimulated the research on effective and automatic methods of knowledge extraction from databases. This dissertation proposes two new bioinspired techniques, named cPSC and oPSC, based on the Particle Swarm Optimization Algorithm (PSO) to solve data clustering problems. The proposed algorithms are applied to data and text clustering problems and their performances are compared with a standard algorithm from the literature. The results allow us to conclude that the proposed algorithms are competitive with those already available in literature, but bring benefits such as automatic determination of the number of groups on the dataset and a search for the best partitioning of the dataset considering an explicit cost function. / A grande quantidade de dados gerados por pessoas e organizações tem estimulado a pesquisa sobre métodos efetivos e automáticos de extração de conhecimentos a partir de bases de dados. Essa dissertação propõe duas novas técnicas bioinspiradas, denominadas cPSC e oPSC, baseadas no algoritmo de otimização por enxame de partículas (PSO - Particle Swarm Optimization) para resolver problemas de agrupamento de dados. Os algoritmos propostos são aplicados a problemas de agrupamento de dados e textos, e seus desempenhos são comparados com outros propostos na literatura específica. Os resultados obtidos nos permitem concluir que os algoritmos propostos são competitivos com aqueles já disponíveis na literatura, porém trazem outros benefícios como a determinação automática do número de grupos nas bases e a efetuação de uma busca pelo melhor particionamento possível da base considerando uma função de custo explícita. enxame de partículas mineração de textos mineração de dados agrupamento de textos agrupamento de dados particle swarms text mining data mining clustering CNPQ::ENGENHARIAS::ENGENHARIA ELETRICA
494	Análise de sentimento e desambiguação no contexto da tv social Lima, Ana Carolina Espírito Santo 14 December 2012 (has links) Made available in DSpace on 2016-03-15T19:37:43Z (GMT). No. of bitstreams: 1 Ana Carolina Espirito Santo Lima.pdf: 2485278 bytes, checksum: 9843b9f756f82c023af6a2ee291f2b1d (MD5) Previous issue date: 2012-12-14 / Fundação de Amparo a Pesquisa do Estado de São Paulo / Social media have become a way of expressing collective interests. People are motivated by the sharing of information and the feedback from friends and colleagues. Among the many social media tools available, the Twitter microblog is gaining popularity as a platform for in-stantaneous communication. Millions of messages are generated daily, from over 100 million users, about the most varied subjects. As it is a rapid communication platform, this microblog spurred a phenomenon called television storytellers, where surfers comment on what they watch on TV while the programs are being transmitted. The Social TV emerged from this integration between social media and television. The amount of data generated on the TV shows is a rich material for data analysis. Broadcasters may use such information to improve their programs and increase interaction with their audience. Among the main challenges in social media data analysis there is sentiment analysis (to determine the polarity of a text, for instance, positive or negative), and sense disambiguation (to determine the right context of polysemic words). This dissertation aims to use machine learning techniques to create a tool to support Social TV, contributing specifically to the automation of sentiment analysis and disambiguation of Twitter messages. / As mídias sociais são uma forma de expressão dos interesses coletivos, as pessoas gostam de compartilhar informações e sentem-se valorizadas por causa disso. Entre as mídias sociais o microblog Twitter vem ganhando popularidade como uma plataforma para comunicação ins-tantânea. São milhões de mensagens geradas todos os dias, por cerca de 100 milhões de usuá-rios, carregadas dos mais diversos assuntos. Por ser uma plataforma de comunicação rápida esse microblog estimulou um fenômeno denominado narradores televisivos, em que os inter-nautas comentam sobre o que assistem na TV no momento em que é transmitido. Dessa inte-gração entre as mídias sociais e a televisão emergiu a TV Social. A quantidade de dados gera-dos sobre os programas de TV formam um rico material para análise de dados. Emissoras podem usar tais informações para aperfeiçoar seus programas e aumentar a interação com seu público. Dentre os principais desafios da análise de dados de mídias sociais encontram-se a análise de sentimento (determinação de polaridade em um texto, por exemplo, positivo ou negativo) e a desambiguação de sentido (determinação do contexto correto de palavras polis-sêmicas). Essa dissertação tem como objetivo usar técnicas de aprendizagem de máquina para a criação de uma ferramenta de apoio à TV Social com contribuições na automatização dos processos de análise de sentimento e desambiguação de sentido de mensagens postadas no Twitter. mineração de textos análise de sentimento desambiguação de sentido mídias sociais twitter aprendizagem de máquina text mining sentiment analysis word sense disambiguation social media twitter machine learning CNPQ::ENGENHARIAS::ENGENHARIA ELETRICA
495	Um método para extração de palavras-chave de documentos representados em grafos Abilhoa, Willyan Daniel 05 February 2014 (has links) Made available in DSpace on 2016-03-15T19:37:48Z (GMT). No. of bitstreams: 1 Willyan Daniel Abilhoa.pdf: 1956528 bytes, checksum: 5d317e6fd19aebfc36180735bcf6c674 (MD5) Previous issue date: 2014-02-05 / Fundação de Amparo a Pesquisa do Estado de São Paulo / Twitter is a microblog service that generates a huge amount of textual content daily. All this content needs to be explored by means of techniques, such as text mining, natural language processing and information retrieval. In this context, the automatic keyword extraction is a task of great usefulness that can be applied to indexing, summarization and knowledge extrac-tion from texts. A fundamental step in text mining consists of building a text representation model. The model known as vector space model, VSM, is the most well-known and used among these techniques. However, some difficulties and limitations of VSM, such as scalabil-ity and sparsity, motivate the proposal of alternative approaches. This dissertation proposes a keyword extraction method, called TKG (Twitter Keyword Graph), for tweet collections that represents texts as graphs and applies centrality measures for finding the relevant vertices (keywords). To assess the performance of the proposed approach, two different sets of exper-iments are performed and comparisons with TF-IDF and KEA are made, having human clas-sifications as benchmarks. The experiments performed showed that some variations of TKG are invariably superior to others and to the algorithms used for comparisons. / O Twitter é um serviço de microblog que gera um grande volume de dados textuais. Todo esse conteúdo precisa ser explorado por meio de técnicas de mineração de textos, processamento de linguagem natural e recuperação de informação com o objetivo de extrair um conhecimento que seja útil de alguma forma ou em algum processo. Nesse contexto, a extração automática de palavras-chave é uma tarefa que pode ser usada para a indexação, sumarização e compreensão de documentos. Um passo fundamental nas técnicas de mineração de textos consiste em construir um modelo de representação de documentos. O modelo chamado mode-lo de espaço vetorial, VSM, é o mais conhecido e utilizado dentre essas técnicas. No entanto, algumas dificuldades e limitações do VSM, tais como escalabilidade e esparsidade, motivam a proposta de abordagens alternativas. O presente trabalho propõe o método TKG (Twitter Keyword Graph) de extração de palavras-chave de coleções de tweets que representa textos como grafos e aplica medidas de centralidade para encontrar vértices relevantes, correspondentes às palavras-chave. Para medir o desempenho da abordagem proposta, dois diferentes experimentos são realizados e comparações com TF-IDF e KEA são feitas, tendo classifica-ções humanas como referência. Os experimentos realizados mostraram que algumas variações do TKG são superiores a outras e também aos algoritmos usados para comparação. mineração de textos representação de textos em grafo extração de palavras-chave medidas de centralidade text mining text representation in graphs keyword extraction centrality measures CNPQ::ENGENHARIAS::ENGENHARIA ELETRICA
496	Selecionando candidatos a descritores para agrupamentos hierárquicos de documentos utilizando regras de associação / Selecting candidate labels for hierarchical document clusters using association rules Fabiano Fernandes dos Santos 17 September 2010 (has links) Uma forma de extrair e organizar o conhecimento, que tem recebido muita atenção nos últimos anos, é por meio de uma representação estrutural dividida por tópicos hierarquicamente relacionados. Uma vez construída a estrutura hierárquica, é necessário encontrar descritores para cada um dos grupos obtidos pois a interpretação destes grupos é uma tarefa complexa para o usuário, já que normalmente os algoritmos não apresentam descrições conceituais simples. Os métodos encontrados na literatura consideram cada documento como uma bag-of-words e não exploram explicitamente o relacionamento existente entre os termos dos documento do grupo. No entanto, essas relações podem trazer informações importantes para a decisão dos termos que devem ser escolhidos como descritores dos nós, e poderiam ser representadas por regras de associação. Assim, o objetivo deste trabalho é avaliar a utilização de regras de associação para apoiar a identificação de descritores para agrupamentos hierárquicos. Para isto, foi proposto o método SeCLAR (Selecting Candidate Labels using Association Rules), que explora o uso de regras de associação para a seleção de descritores para agrupamentos hierárquicos de documentos. Este método gera regras de associação baseadas em transações construídas à partir de cada documento da coleção, e utiliza a informação de relacionamento existente entre os grupos do agrupamento hierárquico para selecionar candidatos a descritores. Os resultados da avaliação experimental indicam que é possível obter uma melhora significativa com relação a precisão e a cobertura dos métodos tradicionais / One way to organize knowledge, that has received much attention in recent years, is to create a structural representation divided by hierarchically related topics. Once this structure is built, it is necessary to find labels for each of the obtained clusters, since most algorithms do not produce simple descriptions and the interpretation of these clusters is a difficult task for users. The related works consider each document as a bag-of-words and do not explore explicitly the relationship between the terms of the documents. However, these relationships can provide important information to the decision of the terms that must be chosen as descriptors of the nodes, and could be represented by rass. This works aims to evaluate the use of association rules to support the identification of labels for hierarchical document clusters. Thus, this paper presents the SeCLAR (Selecting Candidate Labels using Association Rules) method, which explores the use of association rules for the selection of good candidates for labels of hierarchical clusters of documents. This method generates association rules based on transactions built from each document in the collection, and uses the information relationship between the nodes of hierarchical clustering to select candidates for labels. The experimental results show that it is possible to obtain a significant improvement with respect to precision and recall of traditional methods Agrupamento hierárquico de documantos Mineração de texto Regras de associação Association rules Hierarchical document clustering Label hierarchical clustering Text mining
497	Extraction des relations de causalité dans les textes économiques par la méthode de l’exploration contextuelle / Extraction of causal relations in economic texts by the contextual exploration method Singh, Dory 21 October 2017 (has links) La thèse décrit un processus d’extraction d’informations causales dans les textes économiques qui, contrairement à l’économétrie, se fonde essentiellement sur des ressources linguistiques. En effet, l’économétrie appréhende la notion causale selon des modèles mathématiques et statistiques qui aujourd’hui sont sujets à controverses. Aussi, notre démarche se propose de compléter ou appuyer les modèles économétriques. Il s’agit d’annoter automatiquement des segments textuels selon la méthode de l’exploration contextuelle (EC). L’EC est une stratégie linguistique et computationnelle qui vise à extraire des connaissances selon un point de vue. Par conséquent, cette contribution adopte le point de vue discursif de la causalité où les catégories sont structurées dans une carte sémantique permettant l’élaboration des règles abductives implémentées dans les systèmes EXCOM2 et SEMANTAS. / The thesis describes a process of extraction of causal information, which contrary to econometric, is essentially based on linguistic knowledge. Econometric exploits mathematic or statistic models, which are now, subject of controversy. So, our approach intends to complete or to support the econometric models. It deals with to annotate automatically textual segments according to Contextual Exploration (CE) method. The CE is a linguistic and computational strategy aimed at extracting knowledge according to points of view. Therefore, this contribution adopts the discursive point of view of causality where the categories are structured in a semantic map. These categories allow to elaborate abductive rules implemented in the systems EXCOM2 and SEMANTAS. Causalité Exploration contextuelle Fouille de textes Recherche d'information Annotation sémantique Anticipation Économie Causality Contextual exploration Text mining Information retrieval Semantic annotation Anticipation Economy
498	文字探勘在學生評鑑教師教學之應用研究 / A Study of Students’ Evaluation on Teacher’s Teaching with Text Mining 彭英錡, Peng, Ying Chi Unknown Date (has links) 本研究旨在瞭解探討北部某C大學實施學生評鑑教師教學之現況，並探討大學生回答開放性問題對該課程的優點與建議，進行文字探勘分析。本研究利用問卷調查，在期末課程結束前，利用上網方式，對該課程進行填答。問卷所得資料進行敘述統計、因素分析、信度分析、獨立樣本t檢定、單因子變異數分析、皮爾森相關、多元迴歸與R軟體進行詞彙權重、文字雲、主題模型和群集分析。本研究結論如下：一、學生評鑑教師教學現況以教學態度感受程度最高。二、問卷各題項以「教師教學態度認真負責，且授足所需授課之時數」平均分數最高。三、回饋性建議肯定「教學目標明確」最高，最需改善「彈性調整教學內容」。四、學生評鑑教師教學因學生年級和課程類別不同而有顯著差異。五、學生評鑑教師教學成效與學習成績呈低相關，以「教學評量」有預測力。六、重要詞彙與文字雲發現「教學」、「內容」、「喜歡」及「同學」共同詞彙。七、各學院主題模型命名，主要有觀察，考試與教學內容。八、各學院集群分析結果，學生重視教學內容、學習過程與收穫及考試。根據上述結果提出建議，以供教育行政主管機關、教師及未來研究者之參考。 / The purpose of this study was to explore the current situation of t in the C university of North, and finding the strength and suggestion of the class to opening question used text mining. Before the class will be over , a questionnaire survey, using the internet, was used to gather personal information and the measurement applied in this research. The questionnaire is analyized by descriptive statistics analysis, independent t test, one-way ANOVA, Pearson correlation analysis, multiple regression, vocabulary weight, word cloud, topic model, and cluster analysis in R software. Conclusions obtained in this study are as in the followings: 1. The situation of student ratings of instruction scored over average on the effectiveness of teaching, with “teaching atttitude” the highest. 2.. The highest average scores of the items in the questionnaire were "serious and responsible teachers' teaching attitude and the number of hours required for teaching grants." 3. The feedback of suggestions is “The current of teaching objectives” and need to improve the “filxible adjustment of teaching content”. 4. The student ratings of instruction were vary significant in terms of student grade and course type. 5. Student ratings of instruction effectiveness and academic performance is low correlation, with "Teaching evaluation" predictive. 6. The findings on the important phrases and word clouds were “Teaching”, “Content”, “Likes”, and “Classmates”. 7. The naming of the theme model in each college is “Observation”, “Examination”, and “Teaching content”. 8. The results of cluster analysis each college were focused on “Teaching content”, “Learning process and gain”, and “Examination”. Based on the findings above, suggestions and recommendation were provided as a reference for educational administrators, and teachers, and as a guide for future research. 文字探勘有效教學學生評鑑教師教學 Text mining Effect teaching Student ratings of instruction
499	財報文字分析之句子風險程度偵測研究 / Risk-related Sentence Detection in Financial Reports 柳育彣, Liu, Yu-Wen Unknown Date (has links) 本論文的目標是利用文本情緒分析技巧，針對美國上市公司的財務報表進行以句子為單位的風險評估。過去的財報文本分析研究裡，大多關注於詞彙層面的風險偵測。然而財務文本中大多數的財務詞彙與前後文具有高度的語意相關性，僅靠閱讀單一詞彙可能無法完全理解其隱含的財務訊息。本文將研究層次由詞彙拉升至句子，根據基於嵌入概念的~fastText~與~Siamese CBOW~兩種句子向量表示法學習模型，利用基於嵌入概念模型中，使用目標詞與前後詞彙關聯性表示目標詞語意的特性，萃取出財報句子裡更深層的財務意涵，並學習出更適合用於財務文本分析的句向量表示法。實驗驗證部分，我們利用~10-K~財報資料與本文提出的財務標記資料集進行財務風險分類器學習，並以傳統詞袋模型（Bag-of-Word）作為基準，利用精確度（Accuracy）與準確度（Precision）等評估標準進行比較。結果證實基於嵌入概念模型的表示法在財務風險評估上比傳統詞袋模型有著更準確的預測表現。由於近年大數據時代的來臨，網路中的資訊量大幅成長，依賴少量人力在短期間內分析海量的財務資訊變得更加困難。因此如何協助專業人員進行有效率的財務判斷與決策，已成為一項重要的議題。為此，本文同時提出一個以句子為分析單位的財報風險語句偵測系統~RiskFinder~，依照~fastText~與~Siamese CBOW~兩種模型，經由~10-K~財務報表與人工標記資料集學習出適當的風險語句分類器後，對~1996~至~2013~年的美國上市公司財務報表進行財報句子的自動風險預測，讓財務專業人士能透過系統的協助，有效率地由大量財務文本中獲得有意義的財務資訊。此外，系統會依照公司的財報發布日期動態呈現股票交易資訊與後設資料，以利使用者依股價的時間走勢比較財務文字型與數值型資料的關係。 / The main purpose of this paper is to evaluate the risk of financial report of listed companies in sentence-level. Most of past sentiment analysis studies focused on word-level risk detection. However, most financial keywords are highly context-sensitive, which may likely yield biased results. Therefore, to advance the understanding of financial textual information, this thesis broadens the analysis from word-level to sentence level. We use two sentence-level models, fastText and Siamese-CBOW, to learn sentence embedding and attempt to facilitate the financial risk detection. In our experiment, we use the 10-K corpus and a financial sentiment dataset which were labeled by financial professionals to train our financial risk classifier. Moreover, we adopt the Bag-of-Word model as a baseline and use accuracy, precision, recall and F1-score to evaluate the performance of financial risk prediction. The experimental results show that the embedding models could lead better performance than the Bag-of-word model. In addition, this paper proposes a web-based financial risk detection system which is constructed based on fastText and Siamese CBOW model called RiskFinder. There are total 40,708 financial reports inside the system and each risk-related sentence is highlighted based on different sentence embedding models. Besides, our system also provides metadata and a visualization of financial time-series data for the corresponding company according to release day of financial report. This system considerably facilitates case studies in the field of finance and can be of great help in capturing valuable insight within large amounts of textual information. 文字探勘財務風險情緒分析機器學習 Text mining Financial risk Sentiment analysis Machine learning
500	Statistické metody ve stylometrii / Statistical methods in stylometry Dupal, Pavel January 2017 (has links) The aim of this thesis is to provide an overview of some of the commonly used methods in the area of authorship attribution (stylometry). The text begins with a recap of history from the end of the 19th century to present time and the required terminology from the field of text mining is presented and explained. What follows is a list of selected methods from the field of multidimensional statistics (principal components analysis, cluster analysis) and machine learning (Support Vector Machines, Naive Bayes) and their application as pertains to stylometrical problems, including several methods created specifically for use in this field (bootstrap consensus tree, contrast analysis). Finally these same methods are applied to a practical problem of authorship verification based on a corpus bulit from the works of four internet writers.

Search results