Global ETD Search

571	New Computational Methods for Literature-Based Discovery Ding, Juncheng 05 1900 (has links) In this work, we leverage the recent developments in computer science to address several of the challenges in current literature-based discovery (LBD) solutions. First, LBD solutions cannot use semantics or are too computational complex. To solve the problems we propose a generative model OverlapLDA based on topic modeling, which has been shown both effective and efficient in extracting semantics from a corpus. We also introduce an inference method of OverlapLDA. We conduct extensive experiments to show the effectiveness and efficiency of OverlapLDA in LBD. Second, we expand LBD to a more complex and realistic setting. The settings are that there can be more than one concept connecting the input concepts, and the connectivity pattern between concepts can also be more complex than a chain. Current LBD solutions can hardly complete the LBD task in the new setting. We simplify the hypotheses as concept sets and propose LBDSetNet based on graph neural networks to solve this problem. We also introduce different training schemes based on self-supervised learning to train LBDSetNet without relying on comprehensive labeled hypotheses that are extremely costly to get. Our comprehensive experiments show that LBDSetNet outperforms strong baselines on simple hypotheses and addresses complex hypotheses. Literature-Based Discovery Text Mining Data Mining Topic Modeling Graph Neural Network Self-Supervised Learning Computer Science
572	Topic modeling on a classical Swedish text corpus of prose fiction : Hyperparameters’ effect on theme composition and identification of writing style Apelthun, Catharina January 2021 (has links) A topic modeling method, smoothed Latent Dirichlet Allocation (LDA) is applied on a text corpus data of classical Swedish prose fiction. The thesis consists of two parts. In the first part, a smoothed LDA model is applied to the corpus, investigating how changes in hyperparameter values affect the topics in terms of distribution of words within topics and topics within novels. In the second part, two smoothed LDA models are applied to a reduced corpus, only consisting of adjectives. The generated topics are examined to see if they are more likely to occur in a text of a particular author and if the model could be used for identification of writing style. With this new approach, the ability of the smoothed LDA model as a writing style identifier is explored. While the texts analyzed in this thesis is unusally long - as they are not seg- mented prose fiction - the effect of the hyperparameters on model performance was found to be similar to those found in previous research. For the adjectives corpus, the models did succeed in generating topics with a higher probability of occurring in novels by the same author. The smoothed LDA was shown to be a good model for identification of writing style. Keywords: Topic modeling, Smoothed Latent Dirichlet Allocation, Gibbs sam- pling, MCMC, Bayesian statistics, Swedish prose fiction. Topic modeling Smoothed Latent Dirichlet Allocation Gibbs sam- pling MCMC Bayesian statistics Swedish prose fiction. Probability Theory and Statistics Sannolikhetsteori och statistik
573	Unsupervised topic modeling for customer support chat : Comparing LDA and K-means Andersson, Fredrik, Idemark, Alexander January 2021 (has links) Fortnox takes in many errands via their support chat. Some of the questions can be hard to interpret, making it difficult to know where to delegate the question further. It would be beneficial if the process was automated to answer the questions instead of need to put in time to analyze the questions to be able to delegate them. So, the main task is to find an unsupervised model that can take questions and put them into topics. A literature review over NLP and clustering was needed to find the most suitable models and techniques for the problem. Then implementing the models and techniques and evaluating them using support chat questions received by Fortnox. The unsupervised models tested in this thesis were LDA and K-means. The resulting models after training are analyzed, and some of the clusters are given a label. The authors of the thesis give clusters a label after analyzing them by looking at the most relevant words for the cluster. Three different sets of labels are analyzed and tested. The models are evaluated using five different score metrics: Silhouette, AdjustedRand Index, Recall, Precision, and F1 score. K-means scores the best when looking at the score metrics and have an F1 score of 0.417. But can not handle very small documents. LDA does not perform very well and got i F1 score of 0.137 and is not able to categorize documents together. LDA K-means Topic modeling Natural Language Processing clustering customer support unsupervised machine learning Computer Sciences Datavetenskap (datalogi)
574	Conversational Engine for Transportation Systems Sidås, Albin, Sandberg, Simon January 2021 (has links) Today's communication between operators and professional drivers takes place through direct conversations between the parties. This thesis project explores the possibility to support the operators in classifying the topic of incoming communications and which entities are affected through the use of named entity recognition and topic classifications. By developing a synthetic training dataset, a NER model and a topic classification model was developed and evaluated to achieve F1-scores of 71.4 and 61.8 respectively. These results were explained by a low variance in the synthetic dataset in comparison to a transcribed dataset from the real world which included anomalies not represented in the synthetic dataset. The aforementioned models were integrated into the dialogue framework Emora to seamlessly handle the back and forth communication and generating responses. Natural Language Processing Topic Classification Named Entity Classification NLP NER NERC
575	Investigating topic modeling techniques for historical feature location. Schulte, Lukas January 2021 (has links) Software maintenance and the understanding of where in the source code features are implemented are two strongly coupled tasks that make up a large portion of the effort spent on developing applications. The concept of feature location investigated in this thesis can serve as a supporting factor in those tasks as it facilitates the automation of otherwise manual searches for source code artifacts. Challenges in this subject area include the aggregation and composition of a training corpus from historical codebase data for models as well as the integration and optimization of qualified topic modeling techniques. Building up on previous research, this thesis provides a comparison of two different techniques and introduces a toolkit that can be used to reproduce and extend on the results discussed. Specifically, in this thesis a changeset-based approach to feature location is pursued and applied to a large open-source Java project. The project is used to optimize and evaluate the performance of Latent Dirichlet Allocation models and Pachinko Allocation models, as well as to compare the accuracy of the two models with each other. As discussed at the end of the thesis, the results do not indicate a clear favorite between the models. Instead, the outcome of the comparison depends on the metric and viewpoint from which it is assessed. feature location topic modeling changesets latent dirichlet distribution pachinko alloca-tion mining software repositories source code comprehension Software Engineering Programvaruteknik
576	Toward a Real-Time Recommendation for Online Social Networks Albalawi, Rania 07 June 2021 (has links) The Internet increases the demand for the development of commercial applications and services that can provide better shopping experiences for customers globally. It is full of information and knowledge sources that might confuse customers. This requires customers to spend additional time and effort when they are trying to find relevant information about specific topics or objects. Recommendation systems are considered to be an important method that solves this issue. Incorporating recommendation systems in online social networks led to a specific kind of recommendation system called social recommendation systems which have become popular with the global explosion in social media and online networks and they apply many prediction algorithms such as data mining techniques to address the problem of information overload and to analyze a vast amount of data. We believe that offering a real-time social recommendation system that can understand the real context of a user’s conversation dynamically is essential to defining and recommending interesting objects at the ideal time. In this thesis, we propose an architecture for a real-time social recommendation system that aims to improve word usage and understanding in social media platforms, advance the performance and accuracy of recommendations, and propose a possible solution to the user cold-start problem. Moreover, we aim to find out if the user’s social context can be used as an input source to offer personalized and improved recommendations that will help users to find valuable items immediately, without interrupting their conversation flow. The suggested architecture works as a third-party social recommendation system that could be incorporated with other existing social networking sites (e.g. Facebook and Twitter). The novelty of our approach is the dynamic understanding of the user-generated content, achieved by detecting topics from the user’s extracted dialogue and then matching them with an appropriate task as a recommendation. Topic extraction is done through a modified Latent Dirichlet Allocation topic modeling method. We also develop a social chat app as a proof of concept to validate our proposed architecture. The results of our proposed architecture offer promising gains in enhancing the real-time social recommendations. Social Recommendation System Social Media Architecture Real-Time Advertisements Online Social Networks Natural Language Processing Topic Modeling
577	Comparing and contrasting the dissemination cascades of different topics in a social network : What are the lifetimes of different topics and how do they spread / Jämförelse av spridningskaskader för olika ämnen i ett socialt nätverk Käll, Linus, Pertoft, Simon January 2021 (has links) The web has granted everyone the opportunity to freely share large amounts of data. Individuals, corporations, and communities have made the web an important tool in their arsenal. These entities are spreading information online, but not all of it is constructive. Some spread misinformation to protect themselves or to attack other entities or ideas on the web. Checking the integrity of all the information online is a complex problem and an ethical solution would be equally complex. Multiple latent factors decide how a topic spreads and finding these factors is non-trivial. In this thesis, the patterns of different topics are compared with each other and the generalized patterns of fake, true, and mixed news, using Latent Dirichlet Allocation (LDA) topic models. We look at how the dissemination of topics can be compared through different metrics, and how these can be calculated through networks related to the data. The analyzed data was collected using the Twitter API and news article scrapers. From this data, custom corpora were made through lemmatization and filtering unnecessary words and characters. The LDA models were made using these corpora, making it possible to extract the latent topics of the articles. By plotting the articles according to their most dominant topic, graphs for the popularity, size, and other distribution statistics could easily be drawn. From these graphs, the topics could be compared to each other and be categorized as fake, true, or mixed news by looking at their patterns and novelty. However, this brought up the question if it would be ethical to generalize topics in this way. Suppressing or censuring an article because it contains a lot of novel information might hide constructive novelties and violate freedom of speech. Finally, this thesis presents the means for further work in the future, which could involve collecting one large, continuous dataset for a fair and accurate comparison between topics. spreading dissemination topics topic modeling modelling lda fake false news social network Computer and Information Sciences Data- och informationsvetenskap
578	Skillnader i tillämpningen av internationella standarder i hållbarhetsrapporter : En komparativ studie om rapportering med GRI Standards i USA respektive Sverige Gure, Imran, Hailu, Yonas January 2021 (has links) Several international standards-makers have emerged with different global standards and guidelines for how CSR-related issues should be presented and what should be included in sustainability reports. Several previous studies show that CSR reporting differs depending on which country the company is in, while information regarding one's activities and reporting differ. The purpose of the study is to investigate the application of international standards, specifically GRI Standards, and whether the application differs internationally by examining companies in Sweden and USA respectively. The study is also intended to investigate international differences regarding disclosure about the topic specific GRI aspects. The study uses a deductive approach and performs a quantitative content analysis to compare 60 sustainability reports prepared in accordance with GRI Standards. The study consists of companies from the US and Sweden and uses descriptive statistics, a t-test, and a z-test. The standardization efforts may have succeeded but this study provides information which indicates the continuity of international differences in sustainability reporting despite the use of international standards in the companies that were studied in Sweden and the US. This is a sign that there is still more to do to highlight the challenges against standardization efforts. Sustainability reports topic-specific information standardization efforts GRI Standards Hållbarhetsredovisning ämnesspecifika upplysningar upplysningsnivå standardiseringsinstatser GRI Standards Business Administration Företagsekonomi
579	News media attention in Climate Action: Latent topics and open access Karlsson, Kalle January 2020 (has links) The purpose of the thesis is i) to discover the latent topics of SDG13 and their coverage in news media ii) to investigate the share of OA and Non-OA articles and reviews in each topic iii) to compare the share of different OA types (Green, Gold, Hybrid and Bronze) in each topic. It imposes a heuristic perspective and explorative approach in reviewing the three concepts open access, altmetrics and climate action (SDG13). Data is collected from SciVal, Unpaywall, Altmetric.com and Scopus rendering a dataset of 70,206 articles and reviews published between 2014-2018. The documents retrieved are analyzed with descriptive statistics and topic modeling using Sklearn’s package for LDA(Latent Dirichlet Allocation) in Python. The findings show an altmetric advantage for OA in the case of news media and SDG13 which fluctuates over topics. News media is shown to focus on subjects with “visible” effects in concordance with previous research on media coverage. Examples of this were topics concerning emissions of greenhouse gases and melting glaciers. Gold OA is the most common type being mentioned in news outlets. It also generates the highest number of news mentions while the average sum of news mentions was highest for documents published as Bronze. Moreover, the thesis is largely driven by methods used and most notably the programming language Python. As such it outlines future paths for research into the three concepts reviewed as well as methods used for topic modeling and programming. topic modeling open access altmetrics SDG13 climate change python news media latent dirichlet allocation (LDA) Information Studies Biblioteks- och informationsvetenskap
580	Topic Modeling and Spam Detection for Short Text Segments in Web Forums Sun, Yingcheng 28 January 2020 (has links) No description available. Computer Science short text online discussions topic model conversational structure opinion spam heterogeneous information network social network

Search results