Global ETD Search

421	Storytelling vs. Dashboards – Wie Sie die richtige Methode zur Datenvisualisierung auswählen Sieben, Swen, Simmering, Paul 10 March 2022 (has links) aus dem Inhalt: „Datenvisualisierung wird immer wichtiger in der Kommunikation. Gerade in der Zeit der Corona-Pandemie spielt Datenvisualisierung eine zentrale Rolle, um die Lage und Dynamik zu kommunizieren. Wenn Daten erhoben und mit immer neuen Methoden analysiert werden, ist es wichtig, diese Daten addressatengerecht aufzubereiten.” info:eu-repo/classification/ddc/004 ddc:004
422	Impact Evaluation by Using Text Mining and Sentiment Analysis Stuetzer, Cathleen M., Jablonka, Marcel, Gaaw, Stephanie 03 September 2020 (has links) Web surveys in higher education are particularly important for assessing the quality of academic teaching and learning. Traditionally, mainly quantitative data is used for quality assessment. Increasingly, questions are being raised about the impact of attitudes of the individuals involved. Therefore, especially the analysis of open-ended text responses in web surveys offers the potential for impact evaluation. Despite the fact that qualitative text mining and sentiment analysis are being introduced in other research areas, these instruments are still slowly gaining access to evaluation research. On the one hand, there is a lack of methodological expertise to deal with large numbers of text responses (e.g. via semantic analysis, linguistically supported coding, etc.). On the other hand, deficiencies in interdisciplinary expertise are identified in order to be able to contextualize the results. The following contribution aims to address these issues. The presentation will contribute to the field of impact evaluation and reveals methodological implications for the development of text mining and sentiment analysis in evaluation processes. info:eu-repo/classification/ddc/004 ddc:004
423	Fouille de données textuelles et systèmes de recommandation appliqués aux offres d'emploi diffusées sur le web / Text mining and recommender systems applied to job postings Séguéla, Julie 03 May 2012 (has links) L'expansion du média Internet pour le recrutement a entraîné ces dernières années la multiplication des canaux dédiés à la diffusion des offres d'emploi. Dans un contexte économique où le contrôle des coûts est primordial, évaluer et comparer les performances des différents canaux de recrutement est devenu un besoin pour les entreprises. Cette thèse a pour objectif le développement d'un outil d'aide à la décision destiné à accompagner les recruteurs durant le processus de diffusion d'une annonce. Il fournit au recruteur la performance attendue sur les sites d'emploi pour un poste à pourvoir donné. Après avoir identifié les facteurs explicatifs potentiels de la performance d'une campagne de recrutement, nous appliquons aux annonces des techniques de fouille de textes afin de les structurer et d'en extraire de l'information pertinente pour enrichir leur description au sein d'un modèle explicatif. Nous proposons dans un second temps un algorithme prédictif de la performance des offres d'emploi, basé sur un système hybride de recommandation, adapté à la problématique de démarrage à froid. Ce système, basé sur une mesure de similarité supervisée, montre des résultats supérieurs à ceux obtenus avec des approches classiques de modélisation multivariée. Nos expérimentations sont menées sur un jeu de données réelles, issues d'une base de données d'annonces publiées sur des sites d'emploi. / Last years, e-recruitment expansion has led to the multiplication of web channels dedicated to job postings. In an economic context where cost control is fundamental, assessment and comparison of recruitment channel performances have become necessary. The purpose of this work is to develop a decision-making tool intended to guide recruiters while they are posting a job on the Internet. This tool provides to recruiters the expected performance on job boards for a given job offer. First, we identify the potential predictors of a recruiting campaign performance. Then, we apply text mining techniques to the job offer texts in order to structure postings and to extract information relevant to improve their description in a predictive model. The job offer performance predictive algorithm is based on a hybrid recommender system, suitable to the cold-start problem. The hybrid system, based on a supervised similarity measure, outperforms standard multivariate models. Our experiments are led on a real dataset, coming from a job posting database. Fouille de textes Extraction des connaissances Systèmes de recommandation Offres d'emploi Recrutement sur Internet Text mining Knowledge discovery Recommender systems Job postings E-Recruitment
424	Analysing CSR reporting over the years, company size, region, and sector through dictionary-based text mining Singhvi, Anuj, Jahangoshay Sarijlou, Dorna January 2023 (has links) As Corporate Social Responsibility (CSR) reports become more prevalent and systematised, there is a strong need to develop approaches that seek to analyse the contents of these reports. In this thesis, we present two valuable contributions. Firstly, we share a rule-based approach that can be a foundation for future supervised learning methods to examine CSR reports and generate predictions. Secondly, we focus on analysing CSR reports topic distributions across developing regions which are hardly covered in the existing literature. The analysis was conducted on a large corpus of 500+ million words over a sample of ~2500 CSR reports gathered from the Global Reporting Initiative (GRI) database for 2012-17. Using reliable CSR business dictionaries, we determined the absolute and relative frequencies for four topics – Employee, Social Community, Environment and Human Rights. We noticed that the four topics studied had a declining trend in the percentage frequencies by 2017. In most cases, the Employee topic was reported the highest among the four topics, followed by Social Community, Environment and Human Rights. This trend was primarily maintained, barring a few exceptions, even when analysed from different dimensions based on company sizes, regions, and sectors. We also compared our derived results with the works of a few previous studies. To know if the reports became easy or hard to understand, we checked the readability through two indices but could not get any clear trend. In the end, we investigated that though there was more attention in the media on the Environment topic around 2016, we did not observe any heightened frequency percentage in the CSR reporting. We believe dictionary-based text mining on CSR reports can be a powerful way to generate insights for different stakeholders. Companies and their management can use this approach to review their CSR communication strategies. Many Government and Non-Government agencies can utilise this approach to check on their policies' effectiveness and future decision-making. Corporate Social Responsibility Sustainability Reporting Natural Language Processing Text Mining CSR Dictionaries Business Administration Företagsekonomi Computer and Information Sciences Data- och informationsvetenskap
425	Bayesian Test Analytics for Document Collections Walker, Daniel David 15 November 2012 (has links) (PDF) Modern document collections are too large to annotate and curate manually. As increasingly large amounts of data become available, historians, librarians and other scholars increasingly need to rely on automated systems to efficiently and accurately analyze the contents of their collections and to find new and interesting patterns therein. Modern techniques in Bayesian text analytics are becoming wide spread and have the potential to revolutionize the way that research is conducted. Much work has been done in the document modeling community towards this end,though most of it is focused on modern, relatively clean text data. We present research for improved modeling of document collections that may contain textual noise or that may include real-valued metadata associated with the documents. This class of documents includes many historical document collections. Indeed, our specific motivation for this work is to help improve the modeling of historical documents, which are often noisy and/or have historical context represented by metadata. Many historical documents are digitized by means of Optical Character Recognition(OCR) from document images of old and degraded original documents. Historical documents also often include associated metadata, such as timestamps,which can be incorporated in an analysis of their topical content. Many techniques, such as topic models, have been developed to automatically discover patterns of meaning in large collections of text. While these methods are useful, they can break down in the presence of OCR errors. We show the extent to which this performance breakdown occurs. The specific types of analyses covered in this dissertation are document clustering, feature selection, unsupervised and supervised topic modeling for documents with and without OCR errors and a new supervised topic model that uses Bayesian nonparametrics to improve the modeling of document metadata. We present results in each of these areas, with an emphasis on studying the effects of noise on the performance of the algorithms and on modeling the metadata associated with the documents. In this research we effectively: improve the state of the art in both document clustering and topic modeling; introduce a useful synthetic dataset for historical document researchers; and present analyses that empirically show how existing algorithms break down in the presence of OCR errors. topic modeling Bayesian nonparametrics ocr text mining text analytics document clustering clustering feature selection unsupervised learning machine learning Computer Sciences
426	Visualisation de l'évolution d'un domaine scientifique par l'analyse des résumés de publication à l'aide de réseaux neuronaux Archambeault, Jean January 2002 (has links) Mémoire numérisé par la Direction des bibliothèques de l'Université de Montréal. Bibliométrie Scientométrie Visualisation de domaines scientifiques Dynamique du vocabulaire Réseaux sémantiques Analyse des réseaux sociaux "Text mining"
427	Development of an ETL-Pipeline for Automatic Sustainability Data Analysis Janmontree, Jettarat, Mehta, Aditya, Zadek, Hartmut 14 June 2023 (has links) As the scientific community and organizations increase their investments in sustainable development, the phrase is increasingly being used deceptively. To be sustainable, one must examine all three aspects, namely environmental, social, and economic. The release of sustainability reports has generated a vast amount of data regarding company sustainability practices. This data demands time and effort to evaluate and extract meaningful information. This research aims to create criteria that include a list of keywords for analyzing sustainability reports. Using these criteria, a proposed application based on the concepts of Extract, Transform, Load (ETL) was developed to automatize the process of data analysis. The results generated by the ETL tool can be used to conduct qualitative and quantitative assessments of the organization’s sustainability practices as well as compare the transparency in sustainability reporting across different industries. info:eu-repo/classification/ddc/330 ddc:330
428	Using Genetic Algorithms for Feature Set Selection in Text Mining Rogers, Benjamin Charles 17 January 2014 (has links) No description available. Artificial Intelligence Computer Engineering Computer Science Information Science genetic algorithms feature set selection text mining design rationale GATE WEKA pipeline
429	Domain-Specific Document Retrieval Framework for Near Real-time Social Health Data Soni, Swapnil 01 September 2015 (has links) No description available. Computer Science Twitter Data mining Triple pattern Real-time Health Chronic disease Social Media analysis Text mining
430	Entity Information Extraction using Structured and Semi-structured resources Sil, Avirup January 2014 (has links) Among all the tasks that exist in Information Extraction, Entity Linking, also referred to as entity disambiguation or entity resolution, is a new and important problem which has recently caught the attention of a lot of researchers in the Natural Language Processing (NLP) community. The task involves linking/matching a textual mention of a named-entity (like a person or a movie-name) to an appropriate entry in a database (e.g. Wikipedia or IMDB). If the database does not contain the entity it should return NIL (out-of-database) value. Existing techniques for linking named entities in text mostly focus on Wikipedia as a target catalog of entities. Yet for many types of entities, such as restaurants and cult movies, relational databases exist that contain far more extensive information than Wikipedia. In this dissertation, we introduce a new framework, called Open-Database Entity Linking (Open-DB EL), in which a system must be able to resolve named entities to symbols in an arbitrary database, without requiring labeled data for each new database. In experiments on two domains, our Open-DB EL strategies outperform a state-of-the-art Wikipedia EL system by over 25% in accuracy. Existing approaches typically perform EL using a pipeline architecture: they use a Named-Entity Recognition (NER) system to find the boundaries of mentions in text, and an EL system to connect the mentions to entries in structured or semi-structured repositories like Wikipedia. However, the two tasks are tightly coupled, and each type of system can benefit significantly from the kind of information provided by the other. We propose and develop a joint model for NER and EL, called NEREL, that takes a large set of candidate mentions from typical NER systems and a large set of candidate entity links from EL systems, and ranks the candidate mention-entity pairs together to make joint predictions. In NER and EL experiments across three datasets, NEREL significantly outperforms or comes close to the performance of two state-of-the-art NER systems, and it outperforms 6 competing EL systems. On the benchmark MSNBC dataset, NEREL, provides a 60% reduction in error over the next best NER system and a 68% reduction in error over the next-best EL system. We also extend the idea of using semi-structured resources to a relatively less explored area of entity information extraction. Most previous work on information extraction from text has focused on named-entity recognition, entity linking, and relation extraction. Much less attention has been paid to extracting the temporal scope for relations between named-entities; for example, the relation president-Of (John F. Kennedy, USA) is true only in the time-frame (January 20, 1961 - November 22, 1963). In this dissertation we present a system for temporal scoping of relational facts, called TSRF which is trained on distant supervision based on the largest semi-structured resource available: Wikipedia. TSRF employs language models consisting of patterns automatically bootstrapped from sentences collected from Wikipedia pages that contain the main entity of a page and slot-fillers extracted from the infobox tuples. This proposed system achieves state-of-the-art results on 6 out of 7 relations on the benchmark Text Analysis Conference (TAC) 2013 dataset for the task of temporal slot filling (TSF). Overall, the system outperforms the next best system that participated in the TAC evaluation by 10 points on the TAC-TSF evaluation metric. / Computer and Information Science Computer Science Information Science Computational Linguistics Entity Linking Machine Learning Named-entity Recognition Natural Language Processing Text Mining

Search results