Global ETD Search

31	Improving information gathering for IT experts. : Combining text summarization and individualized information recommendation. Bergenudd, Anton January 2022 (has links) Information gathering and information overload is an ever growing topic of concernfor Information Technology (IT) experts. The amount of information dealt withon an everyday basis is large enough to take up valuable time having to scatterthrough it all to find the relevant information. As for the application area of IT,time is directly related to money as having to waste valuable production time ininformation gathering and allocation of human resources is a direct loss of profitsfor any given company. Two issues are mainly addressed through this thesis: textsare too lengthy and the difficulty of finding relevant information. Through the useof Natural Language Processes (NLP) methods such as topic modelling and textsummarization, a proposed solution is constructed in the form of a technical basiswhich can be implemented in most business areas. An experiment along with anevaluation session is setup in order to evaluate the performance of the technical basisand enforce the focus of this paper, namely ”How effective is text summarizationcombined with individualized information recommendation in improving informationgathering of IT experts?”. Furthermore, the solution includes a construction of userprofiles in an attempt to individualize content and theoretically present more relevantinformation. The results for this project are affected by the substandard quality andmagnitude of data points, however positive trends are discovered. It is stated thatthe use of user profiles further enhances the amount of relevant articles presentedby the model along with the increasing recall and precision values per iteration andaccuracy per number of updates made per user. Not enough time is spent as for theextent of the evaluation process to confidently state the validity of the results morethan them being inconsistent and insufficient in magnitude. However, the positivetrends discovered creates further speculations on if the project is given enough timeand resources to reach its full potential. Essentially, one can theoretically improveinformation gathering by summarizing texts combined with individualization. Text summarization information gathering individualization topic modelling natural language processes profiling. Computer Sciences Datavetenskap (datalogi)
32	The Information Value of Unstructured Analyst Opinions / Studies on the Determinants of Information Value and its Relationship to Capital Markets Eickhoff, Matthias 29 June 2017 (has links) No description available. 330 Analyst Opinion Sentiment Analysis Topic Modelling Information Value Media Richness Theory Wisdom of Crowds Capital Markets Decision Support Decision Support Systems Text Mining Data Mining Unstructured Data Financial Decision Support Systems Wirtschaftswissenschaften (PPN621567140)
33	Exploration of an Automated Motivation Letter Scoring System to Emulate Human Judgement Munnecom, Lorenna, Pacheco, Miguel Chaves de Lemos January 2020 (has links) As the popularity of the master’s in data science at Dalarna University increases, so does the number of applicants. The aim of this thesis was to explore different approaches to provide an automated motivation letter scoring system which could emulate the human judgement and automate the process of candidate selection. Several steps such as image processing and text processing were required to enable the authors to retrieve numerous features which could lead to the identification of the factors graded by the program managers. Grammatical based features and Advanced textual features were extracted from the motivation letters followed by the application of Topic Modelling methods to extract the probability of each topics occurring within a motivation letter. Furthermore, correlation analysis was applied to quantify the association between the features and the different factors graded by the program managers, followed by Ordinal Logistic Regression and Random Forest to build models with the most impactful variables. Finally, Naïve Bayes Algorithm, Random Forest and Support Vector Machine were used, first for classification and then for prediction purposes. These results were not promising as the factors were not accurately identified. Nevertheless, the authors suspected that the factors may be strongly related to the highlight of specific topics within a motivation letter which can lead to further research. Natural Language Processing Machine Learning Supervised Learning Unsupervised Learning Automation Feature Extraction Image Processing Text Processing Text Exploration Motivation Letter Dalarna University Student Application Topic Modelling Business Intelligence Data Science Computer and Information Sciences Data- och informationsvetenskap
34	Semantic Topic Modeling and Trend Analysis Mann, Jasleen Kaur January 2021 (has links) This thesis focuses on finding an end-to-end unsupervised solution to solve a two-step problem of extracting semantically meaningful topics and trend analysis of these topics from a large temporal text corpus. To achieve this, the focus is on using the latest develop- ments in Natural Language Processing (NLP) related to pre-trained language models like Google’s Bidirectional Encoder Representations for Transformers (BERT) and other BERT based models. These transformer-based pre-trained language models provide word and sentence embeddings based on the context of the words. The results are then compared with traditional machine learning techniques for topic modeling. This is done to evalu- ate if the quality of topic models has improved and how dependent the techniques are on manually defined model hyperparameters and data preprocessing. These topic models provide a good mechanism for summarizing and organizing a large text corpus and give an overview of how the topics evolve with time. In the context of research publications or scientific journals, such analysis of the corpus can give an overview of research/scientific interest areas and how these interests have evolved over the years. The dataset used for this thesis is research articles and papers from a journal, namely ’Journal of Cleaner Productions’. This journal has more than 24000 research articles at the time of working on this project. We started with implementing Latent Dirichlet Allocation (LDA) topic modeling. In the next step, we implemented LDA along with document clus- tering to get topics within these clusters. This gave us an idea of the dataset and also gave us a benchmark. After having some base results, we explored transformer-based contextual word and sentence embeddings to evaluate if this leads to more meaningful, contextual, and semantic topics. For document clustering, we have used K-means clustering. In this thesis, we also discuss methods to optimally visualize the topics and the trend changes of these topics over the years. Finally, we conclude with a method for leveraging contextual embeddings using BERT and Sentence-BERT to solve this problem and achieve semantically meaningful topics. We also discuss the results from traditional machine learning techniques and their limitations. NLP unsupervised topic modelling trend analysis LDA BERT Sentence-BERT TF-IDF transformer based language models document clustering Computer Sciences Datavetenskap (datalogi)
35	A Confirmatory Analysis for Automating the Evaluation of Motivation Letters to Emulate Human Judgment Mercado Salazar, Jorge Anibal, Rana, S M Masud January 2021 (has links) Manually reading, evaluating, and scoring motivation letters as part of the admissions process is a time-consuming and tedious task for Dalarna University's program managers. An automated scoring system would provide them with relief as well as the ability to make much faster decisions when selecting applicants for admission. The aim of this thesis was to analyse current human judgment and attempt to emulate it using machine learning techniques. We used various topic modelling methods, such as Latent Dirichlet Allocation and Non-Negative Matrix Factorization, to find the most interpretable topics, build a bridge between topics and human-defined factors, and finally evaluate model performance by predicting scoring values and finding accuracy using logistic regression, discriminant analysis, and other classification algorithms. Despite the fact that we were able to discover the meaning of almost all human factors on our own, the topic models' accuracy in predicting overall score was unexpectedly low. Setting a threshold on overall score to select applicants for admission yielded a good overall accuracy result, but did not yield a good consistent precision or recall score. During our investigation, we attempted to determine the possible causes of these unexpected results and discovered that not only is topic modelling limitation to blame, but human bias also plays a role. Motivation Letter Natural Language Processing Topic Modelling Latent Dirichlet Allocation Non-Negative Matrix Factorization LDAVis Topic Factors Image Processing Text Processing Logistic Regression Unsupervised Learning Machine Learning Other Social Sciences Annan samhällsvetenskap
36	From models to data : understanding biodiversity patterns from environmental DNA data / Des modèles aux données : comprendre la structure de la biodiversité à partir de l'ADN Sommeria-Klein, Guilhem 14 September 2017 (has links) La distribution de l'abondance des espèces en un site, et la similarité de la composition taxonomique d'un site à l'autre, sont deux mesures de la biodiversité ayant servi de longue date de base empirique aux écologues pour tenter d'établir les règles générales gouvernant l'assemblage des communautés d'organismes. Pour ce type de mesures intégratives, le séquençage haut-débit d'ADN prélevé dans l'environnement (" ADN environnemental ") représente une alternative récente et prometteuse aux observations naturalistes traditionnelles. Cette approche présente l'avantage d'être rapide et standardisée, et donne accès à un large éventail de taxons microbiens jusqu'alors indétectables. Toutefois, ces jeux de données de grande taille à la structure complexe sont difficiles à analyser, et le caractère indirect des observations complique leur interprétation. Le premier objectif de cette thèse est d'identifier les modèles statistiques permettant d'exploiter ce nouveau type de données afin de mieux comprendre l'assemblage des communautés. Le deuxième objectif est de tester les approches retenues sur des données de biodiversité du sol en forêt amazonienne, collectées en Guyane française. Deux grands types de processus sont invoqués pour expliquer l'assemblage des communautés d'organismes : les processus "neutres", indépendants de l'espèce considérée, que sont la naissance, la mort et la dispersion des organismes, et les processus liés à la niche écologique occupée par les organismes, c'est-à-dire les interactions avec l'environnement et entre organismes. Démêler l'importance relative de ces deux types de processus dans l'assemblage des communautés est une question fondamentale en écologie ayant de nombreuses implications, notamment pour l'estimation de la biodiversité et la conservation. Le premier chapitre aborde cette question à travers la comparaison d'échantillons d'ADN environnemental prélevés dans le sol de diverses parcelles forestières en Guyane française, via les outils classiques d'analyse statistique en écologie des communautés. Le deuxième chapitre se concentre sur les processus neutres d'assemblages des communautés.[...] / Integrative patterns of biodiversity, such as the distribution of taxa abundances and the spatial turnover of taxonomic composition, have been under scrutiny from ecologists for a long time, as they offer insight into the general rules governing the assembly of organisms into ecological communities. Thank to recent progress in high-throughput DNA sequencing, these patterns can now be measured in a fast and standardized fashion through the sequencing of DNA sampled from the environment (e.g. soil or water), instead of relying on tedious fieldwork and rare naturalist expertise. They can also be measured for the whole tree of life, including the vast and previously unexplored diversity of microorganisms. Taking full advantage of this new type of data is challenging however: DNA-based surveys are indirect, and suffer as such from many potential biases; they also produce large and complex datasets compared to classical censuses. The first goal of this thesis is to investigate how statistical tools and models classically used in ecology or coming from other fields can be adapted to DNA-based data so as to better understand the assembly of ecological communities. The second goal is to apply these approaches to soil DNA data from the Amazonian forest, the Earth's most diverse land ecosystem. Two broad types of mechanisms are classically invoked to explain the assembly of ecological communities: 'neutral' processes, i.e. the random birth, death and dispersal of organisms, and 'niche' processes, i.e. the interaction of the organisms with their environment and with each other according to their phenotype. Disentangling the relative importance of these two types of mechanisms in shaping taxonomic composition is a key ecological question, with many implications from estimating global diversity to conservation issues. In the first chapter, this question is addressed across the tree of life by applying the classical analytic tools of community ecology to soil DNA samples collected from various forest plots in French Guiana. The second chapter focuses on the neutral aspect of community assembly.[...] Structure spatiale de la biodiversité Distribution d'abondance d'espèces Diversité beta ADN environnemental Metabarcoding Biodiversité du sol Forêt tropicale Guyane française Théorie neutre de la biodiversité Spatial biodiversity patterns Species abundance distribution Beta-diversity Environmental DNA Metabarcoding Soil biodiversity Tropical forest French Guiana Statistical modeling of biodiversity Neutral theory of biodiversity Topic modelling

Page generated in 0.0888 seconds