Global ETD Search

81	Computational Analyses of Scientific Publications Using Raw and Manually Curated Data with Applications to Text Visualization Shokat, Imran January 2018 (has links) Text visualization is a field dedicated to the visual representation of textual data by using computer technology. A large number of visualization techniques are available, and now it is becoming harder for researchers and practitioners to choose an optimal technique for a particular task among the existing techniques. To overcome this problem, the ISOVIS Group developed an interactive survey browser for text visualization techniques. ISOVIS researchers gathered papers which describe text visualization techniques or tools and categorized them according to a taxonomy. Several categories were manually assigned to each visualization technique. In this thesis, we aim to analyze the dataset of this browser. We carried out several analyses to find temporal trends and correlations of the categories present in the browser dataset. In addition, a comparison of these categories with a computational approach has been made. Our results show that some categories became more popular than before whereas others have declined in popularity. The cases of positive and negative correlation between various categories have been found and analyzed. Comparison between manually labeled datasets and results of computational text analyses were presented to the experts with an opportunity to refine the dataset. Data which is analyzed in this thesis project is specific to text visualization field, however, methods that are used in the analyses can be generalized for applications to other datasets of scientific literature surveys or, more generally, other manually curated collections of textual documents. Scientific literature analysis meta-analysis trends correlation NLP text mining topic modeling LDA HDP text visualization Software Engineering Programvaruteknik
82	Visualização em multirresolução do fluxo de tópicos em coleções de texto Schneider, Bruno 21 March 2014 (has links) Submitted by Bruno Schneider (bruno.sch@gmail.com) on 2014-05-08T17:46:04Z No. of bitstreams: 1 dissertacao_bruno_schneider.pdf.pdf: 8019497 bytes, checksum: 70ff1fddb844b630666397e95c188672 (MD5) / Approved for entry into archive by Janete de Oliveira Feitosa (janete.feitosa@fgv.br) on 2014-05-13T12:56:21Z (GMT) No. of bitstreams: 1 dissertacao_bruno_schneider.pdf.pdf: 8019497 bytes, checksum: 70ff1fddb844b630666397e95c188672 (MD5) / Approved for entry into archive by Marcia Bacha (marcia.bacha@fgv.br) on 2014-05-14T19:44:51Z (GMT) No. of bitstreams: 1 dissertacao_bruno_schneider.pdf.pdf: 8019497 bytes, checksum: 70ff1fddb844b630666397e95c188672 (MD5) / Made available in DSpace on 2014-05-14T19:45:33Z (GMT). No. of bitstreams: 1 dissertacao_bruno_schneider.pdf.pdf: 8019497 bytes, checksum: 70ff1fddb844b630666397e95c188672 (MD5) Previous issue date: 2014-03-21 / The combined use of algorithms for topic discovery in document collections with topic flow visualization techniques allows the exploration of thematic patterns in long corpus. In this task, those patterns could be revealed through compact visual representations. This research has investigated the requirements for viewing data about the thematic composition of documents obtained through topic modeling - where datasets are sparse and has multi-attributes - at different levels of detail through the development of an own technique and the use of an open source library for data visualization, comparatively. About the studied problem of topic flow visualization, we observed the presence of conflicting requirements for data display in different resolutions, which led to detailed investigation on ways of manipulating and displaying this data. In this study, the hypothesis put forward was that the integrated use of more than one visualization technique according to the resolution of data expands the possibilities for exploitation of the object under study in relation to what would be obtained using only one method. The exhibition of the limits on the use of these techniques according to the resolution of data exploration is the main contribution of this work, in order to provide subsidies for the development of new applications. / O uso combinado de algoritmos para a descoberta de tópicos em coleções de documentos com técnicas orientadas à visualização da evolução daqueles tópicos no tempo permite a exploração de padrões temáticos em corpora extensos a partir de representações visuais compactas. A pesquisa em apresentação investigou os requisitos de visualização do dado sobre composição temática de documentos obtido através da modelagem de tópicos – o qual é esparso e possui multiatributos – em diferentes níveis de detalhe, através do desenvolvimento de uma técnica de visualização própria e pelo uso de uma biblioteca de código aberto para visualização de dados, de forma comparativa. Sobre o problema estudado de visualização do fluxo de tópicos, observou-se a presença de requisitos de visualização conflitantes para diferentes resoluções dos dados, o que levou à investigação detalhada das formas de manipulação e exibição daqueles. Dessa investigação, a hipótese defendida foi a de que o uso integrado de mais de uma técnica de visualização de acordo com a resolução do dado amplia as possibilidades de exploração do objeto em estudo em relação ao que seria obtido através de apenas uma técnica. A exibição dos limites no uso dessas técnicas de acordo com a resolução de exploração do dado é a principal contribuição desse trabalho, no intuito de dar subsídios ao desenvolvimento de novas aplicações. Modelagem de tópicos Visualização Latent Dirichlet Allocation (LDA) Visualization Topic modeling Matemática Visualização de fluxo Modelagem de dados Mineração de dados (Computação)
83	How Public Opinion/Discussion Reflect on W.H.O Covid19 Activities : Case study of W.H.O and covid19 Hashtagged tweets. Ogbonnaya, Innocent Chukwuemeka January 2021 (has links) We used tweets to collect public discussion on organizations' activities during the specified Covid19 period. Through topic modeling, we were able to establish discussed topics in line with the organization's activities. Our research majored on tweets with matching hashtags W.H.O (world health organization) and coronavirus, covid19 or covid. We extracted five latent topics and explored the distribution or evolution of those topics over time. We were able to find people's opinions on hot topics (the period when a topic is mainly discussed); the hot topics reflect activities on the timeline of W.H.O during the specified period of the Pandemic. Our results show that the key topics are identified and characterized by specific events that happened during the specified period in our data. Our result describes the events that happened on the timeline of the W.H.O, showing the public opinion on each period a discussion is hot. It also shows how people's opinions revolve during the period. Our results will be helpful in identifying public sentiment on events, how people's opinion varies, and can also help understand different events of the organization based on the aim and objective of the event. Twitter social media Covid_19 W.H.O Topic modeling LDA Sentiment analysis Övrig annan samhällsvetenskap
84	New Computational Methods for Literature-Based Discovery Ding, Juncheng 05 1900 (has links) In this work, we leverage the recent developments in computer science to address several of the challenges in current literature-based discovery (LBD) solutions. First, LBD solutions cannot use semantics or are too computational complex. To solve the problems we propose a generative model OverlapLDA based on topic modeling, which has been shown both effective and efficient in extracting semantics from a corpus. We also introduce an inference method of OverlapLDA. We conduct extensive experiments to show the effectiveness and efficiency of OverlapLDA in LBD. Second, we expand LBD to a more complex and realistic setting. The settings are that there can be more than one concept connecting the input concepts, and the connectivity pattern between concepts can also be more complex than a chain. Current LBD solutions can hardly complete the LBD task in the new setting. We simplify the hypotheses as concept sets and propose LBDSetNet based on graph neural networks to solve this problem. We also introduce different training schemes based on self-supervised learning to train LBDSetNet without relying on comprehensive labeled hypotheses that are extremely costly to get. Our comprehensive experiments show that LBDSetNet outperforms strong baselines on simple hypotheses and addresses complex hypotheses. Literature-Based Discovery Text Mining Data Mining Topic Modeling Graph Neural Network Self-Supervised Learning Computer Science
85	Topic modeling on a classical Swedish text corpus of prose fiction : Hyperparameters’ effect on theme composition and identification of writing style Apelthun, Catharina January 2021 (has links) A topic modeling method, smoothed Latent Dirichlet Allocation (LDA) is applied on a text corpus data of classical Swedish prose fiction. The thesis consists of two parts. In the first part, a smoothed LDA model is applied to the corpus, investigating how changes in hyperparameter values affect the topics in terms of distribution of words within topics and topics within novels. In the second part, two smoothed LDA models are applied to a reduced corpus, only consisting of adjectives. The generated topics are examined to see if they are more likely to occur in a text of a particular author and if the model could be used for identification of writing style. With this new approach, the ability of the smoothed LDA model as a writing style identifier is explored. While the texts analyzed in this thesis is unusally long - as they are not seg- mented prose fiction - the effect of the hyperparameters on model performance was found to be similar to those found in previous research. For the adjectives corpus, the models did succeed in generating topics with a higher probability of occurring in novels by the same author. The smoothed LDA was shown to be a good model for identification of writing style. Keywords: Topic modeling, Smoothed Latent Dirichlet Allocation, Gibbs sam- pling, MCMC, Bayesian statistics, Swedish prose fiction. Topic modeling Smoothed Latent Dirichlet Allocation Gibbs sam- pling MCMC Bayesian statistics Swedish prose fiction. Probability Theory and Statistics Sannolikhetsteori och statistik
86	Unsupervised topic modeling for customer support chat : Comparing LDA and K-means Andersson, Fredrik, Idemark, Alexander January 2021 (has links) Fortnox takes in many errands via their support chat. Some of the questions can be hard to interpret, making it difficult to know where to delegate the question further. It would be beneficial if the process was automated to answer the questions instead of need to put in time to analyze the questions to be able to delegate them. So, the main task is to find an unsupervised model that can take questions and put them into topics. A literature review over NLP and clustering was needed to find the most suitable models and techniques for the problem. Then implementing the models and techniques and evaluating them using support chat questions received by Fortnox. The unsupervised models tested in this thesis were LDA and K-means. The resulting models after training are analyzed, and some of the clusters are given a label. The authors of the thesis give clusters a label after analyzing them by looking at the most relevant words for the cluster. Three different sets of labels are analyzed and tested. The models are evaluated using five different score metrics: Silhouette, AdjustedRand Index, Recall, Precision, and F1 score. K-means scores the best when looking at the score metrics and have an F1 score of 0.417. But can not handle very small documents. LDA does not perform very well and got i F1 score of 0.137 and is not able to categorize documents together. LDA K-means Topic modeling Natural Language Processing clustering customer support unsupervised machine learning Computer Sciences Datavetenskap (datalogi)
87	Investigating topic modeling techniques for historical feature location. Schulte, Lukas January 2021 (has links) Software maintenance and the understanding of where in the source code features are implemented are two strongly coupled tasks that make up a large portion of the effort spent on developing applications. The concept of feature location investigated in this thesis can serve as a supporting factor in those tasks as it facilitates the automation of otherwise manual searches for source code artifacts. Challenges in this subject area include the aggregation and composition of a training corpus from historical codebase data for models as well as the integration and optimization of qualified topic modeling techniques. Building up on previous research, this thesis provides a comparison of two different techniques and introduces a toolkit that can be used to reproduce and extend on the results discussed. Specifically, in this thesis a changeset-based approach to feature location is pursued and applied to a large open-source Java project. The project is used to optimize and evaluate the performance of Latent Dirichlet Allocation models and Pachinko Allocation models, as well as to compare the accuracy of the two models with each other. As discussed at the end of the thesis, the results do not indicate a clear favorite between the models. Instead, the outcome of the comparison depends on the metric and viewpoint from which it is assessed. feature location topic modeling changesets latent dirichlet distribution pachinko alloca-tion mining software repositories source code comprehension Software Engineering Programvaruteknik
88	Toward a Real-Time Recommendation for Online Social Networks Albalawi, Rania 07 June 2021 (has links) The Internet increases the demand for the development of commercial applications and services that can provide better shopping experiences for customers globally. It is full of information and knowledge sources that might confuse customers. This requires customers to spend additional time and effort when they are trying to find relevant information about specific topics or objects. Recommendation systems are considered to be an important method that solves this issue. Incorporating recommendation systems in online social networks led to a specific kind of recommendation system called social recommendation systems which have become popular with the global explosion in social media and online networks and they apply many prediction algorithms such as data mining techniques to address the problem of information overload and to analyze a vast amount of data. We believe that offering a real-time social recommendation system that can understand the real context of a user’s conversation dynamically is essential to defining and recommending interesting objects at the ideal time. In this thesis, we propose an architecture for a real-time social recommendation system that aims to improve word usage and understanding in social media platforms, advance the performance and accuracy of recommendations, and propose a possible solution to the user cold-start problem. Moreover, we aim to find out if the user’s social context can be used as an input source to offer personalized and improved recommendations that will help users to find valuable items immediately, without interrupting their conversation flow. The suggested architecture works as a third-party social recommendation system that could be incorporated with other existing social networking sites (e.g. Facebook and Twitter). The novelty of our approach is the dynamic understanding of the user-generated content, achieved by detecting topics from the user’s extracted dialogue and then matching them with an appropriate task as a recommendation. Topic extraction is done through a modified Latent Dirichlet Allocation topic modeling method. We also develop a social chat app as a proof of concept to validate our proposed architecture. The results of our proposed architecture offer promising gains in enhancing the real-time social recommendations. Social Recommendation System Social Media Architecture Real-Time Advertisements Online Social Networks Natural Language Processing Topic Modeling
89	News media attention in Climate Action: Latent topics and open access Karlsson, Kalle January 2020 (has links) The purpose of the thesis is i) to discover the latent topics of SDG13 and their coverage in news media ii) to investigate the share of OA and Non-OA articles and reviews in each topic iii) to compare the share of different OA types (Green, Gold, Hybrid and Bronze) in each topic. It imposes a heuristic perspective and explorative approach in reviewing the three concepts open access, altmetrics and climate action (SDG13). Data is collected from SciVal, Unpaywall, Altmetric.com and Scopus rendering a dataset of 70,206 articles and reviews published between 2014-2018. The documents retrieved are analyzed with descriptive statistics and topic modeling using Sklearn’s package for LDA(Latent Dirichlet Allocation) in Python. The findings show an altmetric advantage for OA in the case of news media and SDG13 which fluctuates over topics. News media is shown to focus on subjects with “visible” effects in concordance with previous research on media coverage. Examples of this were topics concerning emissions of greenhouse gases and melting glaciers. Gold OA is the most common type being mentioned in news outlets. It also generates the highest number of news mentions while the average sum of news mentions was highest for documents published as Bronze. Moreover, the thesis is largely driven by methods used and most notably the programming language Python. As such it outlines future paths for research into the three concepts reviewed as well as methods used for topic modeling and programming. topic modeling open access altmetrics SDG13 climate change python news media latent dirichlet allocation (LDA) Information Studies Biblioteks- och informationsvetenskap
90	Generating Thematic Maps from Hyperspectral Imagery Using a Bag-of-Materials Model Park, Kyoung Jin 25 July 2013 (has links) No description available. Remote Sensing Computer Engineering Computer Science Geographic Information Science Hyperspectral image clustering probabilistic topic modeling generative model latent dirichlet allocation

Search results