81 |
Probabilistic Topic Models for Human Emotion AnalysisJanuary 2015 (has links)
abstract: While discrete emotions like joy, anger, disgust etc. are quite popular, continuous
emotion dimensions like arousal and valence are gaining popularity within the research
community due to an increase in the availability of datasets annotated with these
emotions. Unlike the discrete emotions, continuous emotions allow modeling of subtle
and complex affect dimensions but are difficult to predict.
Dimension reduction techniques form the core of emotion recognition systems and
help create a new feature space that is more helpful in predicting emotions. But these
techniques do not necessarily guarantee a better predictive capability as most of them
are unsupervised, especially in regression learning. In emotion recognition literature,
supervised dimension reduction techniques have not been explored much and in this
work a solution is provided through probabilistic topic models. Topic models provide
a strong probabilistic framework to embed new learning paradigms and modalities.
In this thesis, the graphical structure of Latent Dirichlet Allocation has been explored
and new models tuned to emotion recognition and change detection have been built.
In this work, it has been shown that the double mixture structure of topic models
helps 1) to visualize feature patterns, and 2) to project features onto a topic simplex
that is more predictive of human emotions, when compared to popular techniques
like PCA and KernelPCA. Traditionally, topic models have been used on quantized
features but in this work, a continuous topic model called the Dirichlet Gaussian
Mixture model has been proposed. Evaluation of DGMM has shown that while modeling
videos, performance of LDA models can be replicated even without quantizing
the features. Until now, topic models have not been explored in a supervised context
of video analysis and thus a Regularized supervised topic model (RSLDA) that
models video and audio features is introduced. RSLDA learning algorithm performs
both dimension reduction and regularized linear regression simultaneously, and has outperformed supervised dimension reduction techniques like SPCA and Correlation
based feature selection algorithms. In a first of its kind, two new topic models, Adaptive
temporal topic model (ATTM) and SLDA for change detection (SLDACD) have
been developed for predicting concept drift in time series data. These models do not
assume independence of consecutive frames and outperform traditional topic models
in detecting local and global changes respectively. / Dissertation/Thesis / Doctoral Dissertation Computer Science 2015
|
82 |
A identidade da União Europeia e a segurança internacional: análise de discurso da região euromediterrânea / European unions identity and international security: discourse analysis from the EuromediterraneanGuilherme Giuliano Nicolau 26 November 2015 (has links)
A dissertação mapeia a formação da identidade internacional da União Europeia através da sua arquitetura de segurança internacional que tem como um dos seus nós a securitização da imigração, utilizando ferramentas metodológicas não-tradicionais para confirmar a nossa tese. A primeira parte do trabalho é um marco teórico: discutimos a virada linguística nas relações internacionais para entender a intersubjetividade entre pesquisador e objeto, de modo que nós escolhemos reflexividade como a nossa abordagem metodológica; em seguida, discutimos as escolas europeias em segurança internacional do pós-guerra fria, como a Escola de Copenhague, Escola Crítica de Gales e Escola de Paris, apresentando conceitos e objetos estudados por especialistas que nos são caros para entender nosso estudo e colocar nossa pesquisa dentro de sua comunidade epistêmica; finalmente, discutimos e incorporamos conceitos e abordagens da Teoria do Discurso (estudos de Ernesto Laclau, Chantal Mouffe e Escola de Essex) para fazer uma construção cronológica e geodiscursiva da região euromediterrânea. Na segunda parte, reconstruímos histórica e institucionalmente a arquitetura europeia de segurança internacional do pós-guerra a hoje vis-à-vis com suas políticas de migração notando suas correlações, também com foco na análise detalhada dos principais documentos oficiais de segurança. A parte quantitativa final (e nossa contribuição original) procura confirmar a causalidade do link segurança-imigração na arquitetura europeia; para isso, utilizamo-nos da linguística computacional para análise semântica semi-automatizada, mais especificamente Topic Model; analisamos cerca de 20.000 documentos oficiais de segurança da União Europeia para indicar estatísticas, agentes, instituições, agendas e discursos que confirmam nossa tese. / The dissertation maps the formation of the international identity of European Union through its international security architecture that has as one of its nodes the securitization of immigration, using non-traditional methodological tools to confirm our thesis. The first part of the work is a theoretical framework: we discuss the linguistic turn in international relations to understand the intersubjectivity between researcher and object so we choose Reflexivity as our methodological approach; then we discuss the European schools in international security from post-cold war such as the Copenhagen School, Wales Critical School and Paris School, presenting concepts and objects studied by experts who are dear to us to understand our study and place our research within its epistemic community; Finally, we discuss and incorporate concepts and approaches from Discourse Theory (studies from Ernesto Laclau, Chantal Mouffe and Essex School) to make a chronological geodiscursive construction of the euromediterranean region. In the second part, we reconstruct historically and institutionally the European international security architecture from post-war till today vis-à-vis with its migration policies and noting their correlations, also focusing on detailed analysis of the main official security documents. A final quantitative section (and our original contribution) seeks to confirm the causality of the security-immigration link in European architecture; for this we use computational linguistics for semi-automated semantic analysis, more specifically Topic Model; We analyze around 20,000 official security documents from European Union to indicate statistics, agents, institutions, agendas and speeches which confirm our thesis.
|
83 |
O processo de organização tópica em dissertações escolares: da análise à emergência de uma abordagem para o ensino do gênero / The process of topic organization in school essays: from the analysis to the emergence of an approach for genre teachingValli, Mariana Veronezi [UNESP] 30 August 2017 (has links)
Submitted by Mariana Veronezi Valli null (ma_valli@hotmail.com) on 2017-10-23T13:20:20Z
No. of bitstreams: 1
DISSERTAÇÃO MARIANA VERONEZI VALLI REPOSITÓRIO.pdf: 4159141 bytes, checksum: eb07e8f9d908c28f2eae0d84002fada5 (MD5) / Approved for entry into archive by Luiz Galeffi (luizgaleffi@gmail.com) on 2017-10-26T16:08:10Z (GMT) No. of bitstreams: 1
valli_mv_me_sjrp.pdf: 4159141 bytes, checksum: eb07e8f9d908c28f2eae0d84002fada5 (MD5) / Made available in DSpace on 2017-10-26T16:08:10Z (GMT). No. of bitstreams: 1
valli_mv_me_sjrp.pdf: 4159141 bytes, checksum: eb07e8f9d908c28f2eae0d84002fada5 (MD5)
Previous issue date: 2017-08-30 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) / Considerando a realidade das produções textuais dos alunos de Ensino Médio, suas dificuldades e as exigências da escola e do concurso vestibular, verificamos a necessidade de analisar os processos de construção textual dos quais se valem esses alunos a fim de diagnosticar e dirimir essas dificuldades. Este trabalho, inserido no quadro teórico-metodológico da Gramática Textual-Interativa, uma vertente da Linguística Textual, ocupa-se do estudo do processo de organização tópica em dissertações escolares. Especificamente, o presente trabalho tem os seguintes objetivos: (i) analisar a organização tópica de dissertações consideradas padrão (aqui representadas por textos que obtiveram nota máxima em edições do Exame Nacional do Ensino Médio – ENEM), demonstrando, no nível intertópico, a existência de complexidade hierárquica e o grau dessa complexidade, bem como o tipo de linearização predominante, e, no nível intratópico, a existência de um padrão de estruturação; (ii) comparar a organização tópica de dissertações padrão com a organização tópica de dissertações produzidas por alunos do terceiro ano do ensino médio de escolas públicas da cidade de São José do Rio Preto (SP), identificando possíveis diferenças e similaridades entre esses dois grupos de textos. Para tanto, a investigação segue o método da análise tópica (JUBRAN, 2006), que prevê a análise textual por meio da categoria analítica do tópico discursivo. Nossos resultados mostraram mais similaridades do que diferenças: todos os textos do corpus são compostos de mais de um tópico discursivo e a transição entre tópicos se dá por continuidade. Além disso, há um padrão de organização interna dos enunciados nos segmentos tópicos, que perpassa a maioria dos textos: a alternância entre unidades de posição e suporte. Diante desses resultados e apoiados em Travaglia (2011), propomos uma abordagem para o trabalho de ensino e aprendizagem de leitura, compreensão e produção de texto com o gênero textual dissertação escolar a partir da categoria da organização tópica. / Considering the reality of text production developed by High School students, the difficulties presented to them and to the professors and the demands of school and the admitance exam (vestibular), we verified the necessity of analyzing the processes of text construction in order to diagnose and diminish these difficulties. The general purpose of this study, developed within the framework of Textual-Interactive Grammar, a dimension of Text Linguistics, is to investigate the process of topic organization in school essays. More specifically, this study aims to (i) analyze the topic organization of school essays, demonstrating, in the intertopic level, the existence of hierarchy complexity and its degree, as well as the predominant kind of linearization and, in the intratopic level, the existence of a general rule of organization and (ii) compare the topic organization of essays considered to be the pattern (published by the media due to the achievement of highest grades on the High School National Exam (ENEM)) and essays produced by public High School students from São José do Rio Preto (SP, Brazil), in order to identify potential differences and similarities between these two groups of texts. To that end, the investigation follows the method of topic analysis (Jubran, 2006), which provides textual analysis based on the analytical category of discourse topic. Our results have shown more similarities than differences: the whole set of texts collected are structured by more than one discourse topic and the transition between each topic is given by continuity. Furthermore, there is a pattern for internal organization of topic segments, which covers most of them: the alternation between the units of Position and Support. In the light of these results and based on Travaglia (2011), we proposed an approach for teaching and learning of reading, comprehension and text production of the text genre of school essay through the topic organization category.
|
84 |
Marcadores discursivos e articulação topica / Discourse markers and topic articulationPenhavel, Eduardo 15 August 2018 (has links)
Orientador: Ingedore Grunfeld Villaça Koch / Tese (doutorado) - Universidade Estadual de Campinas, Instituto de Estudos da Linguagem / Made available in DSpace on 2018-08-15T18:30:14Z (GMT). No. of bitstreams: 1
Penhavel_Eduardo_D.pdf: 1049063 bytes, checksum: 580184886e99a8e923d3f7630c0cf65a (MD5)
Previous issue date: 2010 / Resumo: O presente trabalho compreende, numa primeira parte, uma análise do atual estado da arte no que se refere a pesquisas sobre Marcadores Discursivos (MDs) e, numa segunda parte, uma análise específica de MDs particularmente no âmbito da Perspectiva Textual-Interativa (Jubran e Koch, 2006). Na primeira parte, apresentamos uma proposta de classificação de abordagens de MDs, baseada em Fischer (2006a), segundo a qual podem ser distinguidos três tipos básicos de abordagens: (i) as que analisam como MDs expressões integradas a um enunciado matriz, com função de conexão e que se referem a um aspecto desse enunciado; (ii) aquelas que tomam como MDs expressões constituindo um enunciado independente, com função de gerenciamento da conversação e que se referem a planos de referência; (iii) as que consideram como MDs ambos os tipos de expressões. Em seguida, focalizamos a questão da pluralidade desordenada de abordagens particulares, típica do cenário atual de estudos sobre MDs, e argumentamos que essa situação deve-se, em parte, à natureza processual das expressões estudadas e, em parte, à diversidade desarticulada de modelos de análise que caracteriza a própria área de estudos linguísticos atualmente. Na segunda parte da tese, procuramos demonstrar, especificamente, de que forma os MDs contribuem para o processo de estruturação intratópica. Defendemos que, particularmente no gênero Relato de Opinião, esse processo funda-se na relação central-subsidiário, ou seja, na combinação, potencialmente recursiva, de conjuntos de enunciados que constroem referências centrais e conjuntos que constroem referências subsidiárias em relação a uma ideia nuclear em pauta no decorrer de um Segmento Tópico. Assumimos, então, que é em relação a esse esquema de organização que os MDs atuam em termos de estruturação intratópica e sistematizamos três aspectos desse uso: (i) definimos o traço sequenciador tópico, como consistindo na introdução de grupos de enunciados com estatuto tópico central ou subsidiário; (ii) demonstramos que os MDs operam em relação a domínios de estruturação intratópica; (iii) distinguimos dois padrões básicos de uso de MDs, correspondentes à marcação total e parcial das partes componentes de um domínio / Abstract: This dissertation is about discourse markers (DMs). It is organized into two parts. In the first part, a classification of approaches to DMs is proposed on the basis of Fischer's (2006) analysis. Three basic types are distinguished: (i) approaches that analyze DMs as expressions integrated into host utterances, with connecting functions and referring to a certain aspect of these utterances; (ii) approaches that take DMs as independent expressions, with functions regarding conversation management and referring to planes of reference; (iii) approaches that combine the first two types. In addition, the issue of the many different perspectives on DMs that presently exist is addressed, with the argument being that such a situation arises not only from the procedural nature of these items, but also from the unintegrated diversity of models of analysis available in linguistics nowadays. In the second part of the dissertation, within textual-interactive perspective (Jubran & Koch 2006), it is shown how DMs contribute to the process of intratopic structuring. It is argued that, in the genre Opinion Report, this process is based on the central-subsidiary principle. More specifically, it is argued that the intratopic structuring consists in a potentially recursive combination of groups of utterances that construct central references in relation to an ongoing core idea, with groups of utterances that construct subsidiary references in relation to such an idea. DMs are assumed to operate (in terms of intratopic structuring) with respect to this organizational mechanism, with three aspects of such use being systematized: (i) the feature topic sequencing is defined, and it is treated as involving the introduction of groups of utterances with central or subsidiary topic status; (ii) DMs are showed to operate in relation to intratopic structuring domains; (iii) two basic patterns of use of DMs are distinguished, corresponding, respectively, to the introduction of all, or a few of, the component parts of a domain / Doutorado / Doutor em Linguística
|
85 |
When Does it Mean? Detecting Semantic Change in Historical TextsHengchen, Simon 06 December 2017 (has links)
Contrary to what has been done to date in the hybrid field of natural language processing (NLP), this doctoral thesis holds that the new approach developed below makes it possible to semi-automatically detect semantic changes in digitised, OCRed, historical corpora. We define the term semi-automatic as “making use of an advanced tool whilst remaining in control of key decisions regarding the processing of the corpus”. If the tool utilised – “topic modelling”, and more precisely the “Latent Dirichlet Allocation” (LDA) – is not unknown in NLP or computational historical semantics, where it is already mobilised to follow a priori selected words and try to detect when these words change meaning, it has never been used, to the best of our knowledge, to detect which words change in a humanistically-relevant way. In other terms, our method does not study a word in context to gather information on this specific word, but the whole context – which we consider a witness to a potential evolution of reality – to gather more contextual information on one or several particular semantic shift candidates. In order to detect these semantic changes, we use the algorithm to create lexical fields: groups of words that together define a subject to which they all relate. By comparing lexical fields over different time periods of the same corpus (that is, by mobilising a diachronic approach), we try to determine whether words appear over time. We support that if a word starts to be used in a certain context at a certain time, it is a likely candidate for semantic change. Of course, the method developed here and illustrated by a case study applies to a certain context: that of digitised, OCRed, historical archives in Dutch. Nevertheless, this doctoral work also describes the advantages and disadvantages of the algorithm and postulates, on the basis of this evaluation, that the method is applicable to other fields, under other conditions. By carrying out a critical evaluation of the tools available and used, this doctoral thesis invites the community to the reproducibility of the method, whilst pointing out obvious limitations of the approach and propositions on how to solve them. / Doctorat en Information et communication / info:eu-repo/semantics/nonPublished
|
86 |
A Gamma-Poisson topic model for short textMazarura, Jocelyn Rangarirai January 2020 (has links)
Most topic models are constructed under the assumption that documents follow a multinomial distribution. The Poisson distribution is an alternative distribution to describe the probability of count data. For topic modelling, the Poisson distribution describes the number of occurrences of a word in documents of fixed length. The Poisson distribution has been successfully applied in text classification, but its application to topic modelling is not well documented, specifically in the context of a generative probabilistic model. Furthermore, the few Poisson topic models in literature are admixture models, making the assumption that a document is generated from a mixture of topics.
In this study, we focus on short text. Many studies have shown that the simpler assumption of a mixture model fits short text better. With mixture models, as opposed to admixture models, the generative assumption is that a document is generated from a single topic. One topic model, which makes this one-topic-per-document assumption, is the Dirichlet-multinomial mixture model. The main contributions of this work are a new Gamma-Poisson mixture model, as well as a collapsed Gibbs sampler for the model. The benefit of the collapsed Gibbs sampler derivation is that the model is able to automatically select the number of topics contained in the corpus. The results show that the Gamma-Poisson mixture model performs better than the Dirichlet-multinomial mixture model at selecting the number of topics in labelled corpora. Furthermore, the Gamma-Poisson mixture produces better topic coherence scores than the Dirichlet-multinomial mixture model, thus making it a viable option for the challenging task of topic modelling of short text.
The application of GPM was then extended to a further real-world task: that of distinguishing between semantically similar and dissimilar texts. The objective was to determine whether GPM could produce semantic representations that allow the user to determine the relevance of new, unseen documents to a corpus of interest. The challenge of addressing this problem in short text from small corpora was of key interest. Corpora of small size are not uncommon. For example, at the start of the Coronavirus pandemic limited research was available on the topic. Handling short text is not only challenging due to the sparsity of such text, but some corpora, such as chats between people, also tend to be noisy. The performance of GPM was compared to that of word2vec under these challenging conditions on labelled corpora. It was found that the GPM was able to produce better results based on accuracy, precision and recall in most cases. In addition, unlike word2vec, GPM was shown to be applicable on datasets that were unlabelled and a methodology for this was also presented. Finally, a relevance index metric was introduced. This relevance index translates the similarity distance between a corpus of interest and a test document to the probability of the test document to be semantically similar to the corpus of interest. / Thesis (PhD (Mathematical Statistics))--University of Pretoria, 2020. / Statistics / PhD (Mathematical Statistics) / Unrestricted
|
87 |
Fifty Years of Information Management Research: A Conceptual Structure Analysis using Structural Topic ModelingSharma, A., Rana, Nripendra P., Nunkoo, R. 10 January 2021 (has links)
Yes / Information management is the management of organizational processes, technologies, and people which collectively create, acquire, integrate, organize, process, store, disseminate, access, and dispose of the information. Information management is a vast, multi-disciplinary domain that syndicates various subdomains and perfectly intermingles with other domains. This study aims to provide a comprehensive overview of the information management domain from 1970 to 2019. Drawing upon the methodology from statistical text analysis research, this study summarizes the evolution of knowledge in this domain by examining the publication trends as per authors, institutions, countries, etc. Further, this study proposes a probabilistic generative model based on structural topic modeling to understand and extract the latent themes from the research articles related to information management. Furthermore, this study graphically visualizes the variations in the topic prevalences over the period of 1970 to 2019. The results highlight that the most common themes are data management, knowledge management, environmental management, project management, service management, and mobile and web management. The findings also identify themes such as knowledge management, environmental management, project management, and social communication as academic hotspots for future research.
|
88 |
SOME MEASURED PERFORMANCE BOUNDS AND IMPLEMENTATION CONSIDERATIONS FOR THE LEMPEL-ZIV-WELCH DATA COMPACTION ALGORITHMJacobsen, H. D. 10 1900 (has links)
International Telemetering Conference Proceedings / October 26-29, 1992 / Town and Country Hotel and Convention Center, San Diego, California / Lempel-Ziv-Welch (LZW) algorithm is a popular data compaction technique that has been
adopted by CCITT in its V.42bis recommendation and is often implemented in association
with the V.32 standard for 9600 bps modems. It has also been implemented as Microcom
Networking Protocol (MNP) Level 7, where it goes by the name of Enhanced Data
Compression. LZW compacts data by encoding frequently occurring input strings with a
single output symbol. The algorithm automatically generates a string dictionary for each
symbol at each end of the transmission path. The amount of compaction that can be
derived with the LZW algorithm varies with the type of data being transmitted and the
efficiency by which table entries can be indexed. Table indexing is usually implemented by
use of a hashing table. Although some manufacturers advertise a 4-to-1 gain in throughput,
this seems to be an extreme case. This paper documents a implementation of the exact
ZLW algorithm. The results presented in this paper are significantly less, typically on the
order of 1-to-2 for ASCII text, with substantially less compaction for pre-compacted files
or files containing random bit patterns.
The efficiency of the LZW algorith on ASCII text is shown to be a function of dictionary
size and block size. Although fewer transmitted symbols are required for larger dictionary
tables, the additional bits required for the symbol index is marginally greater than the
efficiency that is gained. The net effect is that dictionary sizes beyond 2K in size are
increasingly less efficient for input data block sizes of 10K or more. The author concludes
that the algorithm could be implemented as a direct table look-up rather than through a
hashing algorithm. This would allow the LZW to be implemented with very simple
firmware and with a maximum of hardware efficiency.
|
89 |
STUDYING SOFTWARE QUALITY USING TOPIC MODELSChen, TSE-HSUN 14 January 2013 (has links)
Software is an integral part of our everyday lives, and hence the quality of software is very important. However, improving and maintaining high software quality is a difficult task, and a significant amount of resources is spent on fixing software defects. Previous studies have studied software quality using various measurable aspects of software, such as code size and code change history. Nevertheless, these metrics do not consider all possible factors that are related to defects. For instance, while lines of code may be a good general measure for defects, a large file responsible for simple I/O tasks is likely to have fewer defects than a small file responsible for complicated compiler implementation details. In this thesis, we address this issue by considering the conceptual concerns (or features). We use a statistical topic modelling approach to approximate the conceptual concerns as topics. We then use topics to study software quality along two dimensions: code quality and code testedness. We perform our studies using three versions of four large real-world software systems: Mylyn, Eclipse, Firefox, and NetBeans.
Our proposed topic metrics help improve the defect explanatory power (i.e., fitness of the regression model) of traditional static and historical metrics by 4–314%. We compare one of our metrics, which measures the cohesion of files, with other topic-based cohesion and coupling metrics in the literature and find that our metric gives the greatest improvement in explaining defects over traditional software quality metrics (i.e., lines of code) by 8–55%.
We then study how we can use topics to help improve the testing processes. By training on previous releases of the subject systems, we can predict not well-tested topics that are defect prone in future releases with a precision and recall of 0.77 and 0.75, respectively. We can map these topics back to files and help allocate code inspection and testing resources. We show that our approach outperforms traditional prediction-based resource allocation approaches in terms of saving testing and code inspection efforts.
The results of our studies show that topics can be used to study software quality and support traditional quality assurance approaches. / Thesis (Master, Computing) -- Queen's University, 2013-01-08 10:10:37.878
|
90 |
An exploration of social and cultural aspects of motorcycling during the interwar periodPotter, Christopher Thomas January 2007 (has links)
This thesis covers social and cultural aspects of the motorcycling movement during the interwar period of 1919 to 1939. Using contemporary records of both written and oral nature, a diverse set of themes are explored, beginning with the origins of the motorcycle enthusiasm, from its invention towards the end of the nineteenth century, to the dawn of the twenties, when for a while it held the dominant position in personal motorised transport, until through processes of economics such as the trickle down theory of consumer goods ownership, dominance was transferred to the motorcar. Next, the phenomenon of motorcycling clubs, their composition, practices and distribution, is covered in detail. Turning towards gender issues, the place women held within the movement is discussed. Despite a persistent element of male dominance within the pastime, some women held a prominent position, many achieving fame and acclaim both at a personal and national level. In the next chapter, legislative processes are covered, following governmental and police force involvement in controlling the increasing numbers of motorists of all types. Here, a special study of magistrates' records for the Darlington area provides a snapshot, which complements the national trends. Social class issues regarding the choice of motorized transport are addressed in the next chapter, allowing for a discussion of the wider, national picture and concentrating upon an analysis of the social structure of motorcyclists in the Darlington area, derived from records of registrations of 1920 machines. The motorcycle's place in art and related cultural themes is discussed in chapter six, allowing for analysis of artistic genre such as Futurism, Bauhaus, and other forms of modernist interpretation. Literary links with motorcycling, either through enthusiast journals or mainstream literature is explored, together with film and music, to provide an overview of motorcycling in these themes. Overall, the thesis discusses a wide range of hitherto unexplored themes relating to motorcycling during this era, and attempts to shed new light upon an important set of elements within social and cultural history.
|
Page generated in 0.0424 seconds