421 |
Concept Based Knowledge Discovery From Biomedical LiteratureRadovanovic, Aleksandar January 2009 (has links)
Philosophiae Doctor - PhD / Advancement in biomedical research and continuous growth of scientific literature available in electronic form, calls for innovative methods and tools for information management, knowledge discovery, and data integration. Many biomedical fields such as genomics, proteomics, metabolomics, genetics, and emerging disciplines like systems biology and conceptual biology require synergy between experimental, computational, data mining and text mining technologies. A large amount of biomedical information available in various repositories, such as the US National Library of Medicine Bibliographic Database, emerge as a potential source of textual data for knowledge discovery. Text mining and its application of natural language processing and machine learning technologies to problems of knowledge discovery, is one of the most challenging fields in bioinformatics. This thesis describes and introduces novel methods for knowledge discovery and presents a software system that is able to extract information from biomedical literature, review interesting connections between various biomedical concepts and in so doing, generates new hypotheses. The experimental results obtained by using methods described in this thesis, are
compared to currently published results obtained by other methods and a number of case studies are described. This thesis shows how the technology presented can be integrated with the researchers' own knowledge, experimentation and observations for optimal progression of scientific research.
|
422 |
Storytelling vs. Dashboards – Wie Sie die richtige Methode zur Datenvisualisierung auswählenSieben, Swen, Simmering, Paul 10 March 2022 (has links)
aus dem Inhalt:
„Datenvisualisierung wird immer wichtiger in der Kommunikation. Gerade in der Zeit der Corona-Pandemie spielt Datenvisualisierung eine zentrale Rolle, um die Lage und
Dynamik zu kommunizieren. Wenn Daten erhoben und mit immer neuen Methoden analysiert werden, ist es wichtig, diese Daten addressatengerecht aufzubereiten.”
|
423 |
Impact Evaluation by Using Text Mining and Sentiment AnalysisStuetzer, Cathleen M., Jablonka, Marcel, Gaaw, Stephanie 03 September 2020 (has links)
Web surveys in higher education are particularly important for assessing the quality of academic teaching and learning. Traditionally, mainly quantitative data is used for quality assessment. Increasingly, questions are being raised about the impact of attitudes of the individuals involved. Therefore, especially the analysis of open-ended text responses in web surveys offers the potential for impact evaluation. Despite the fact that qualitative text mining and sentiment analysis are being introduced in other research areas, these instruments are still slowly gaining access to evaluation research. On the one hand, there is a lack of methodological expertise to deal with large numbers of text responses (e.g. via semantic analysis, linguistically supported coding, etc.). On the other hand, deficiencies in interdisciplinary expertise are identified in order to be able to contextualize the results. The following contribution aims to address these issues.
The presentation will contribute to the field of impact evaluation and reveals methodological implications for the development of text mining and sentiment analysis in evaluation processes.
|
424 |
Fouille de données textuelles et systèmes de recommandation appliqués aux offres d'emploi diffusées sur le web / Text mining and recommender systems applied to job postingsSéguéla, Julie 03 May 2012 (has links)
L'expansion du média Internet pour le recrutement a entraîné ces dernières années la multiplication des canaux dédiés à la diffusion des offres d'emploi. Dans un contexte économique où le contrôle des coûts est primordial, évaluer et comparer les performances des différents canaux de recrutement est devenu un besoin pour les entreprises. Cette thèse a pour objectif le développement d'un outil d'aide à la décision destiné à accompagner les recruteurs durant le processus de diffusion d'une annonce. Il fournit au recruteur la performance attendue sur les sites d'emploi pour un poste à pourvoir donné. Après avoir identifié les facteurs explicatifs potentiels de la performance d'une campagne de recrutement, nous appliquons aux annonces des techniques de fouille de textes afin de les structurer et d'en extraire de l'information pertinente pour enrichir leur description au sein d'un modèle explicatif. Nous proposons dans un second temps un algorithme prédictif de la performance des offres d'emploi, basé sur un système hybride de recommandation, adapté à la problématique de démarrage à froid. Ce système, basé sur une mesure de similarité supervisée, montre des résultats supérieurs à ceux obtenus avec des approches classiques de modélisation multivariée. Nos expérimentations sont menées sur un jeu de données réelles, issues d'une base de données d'annonces publiées sur des sites d'emploi. / Last years, e-recruitment expansion has led to the multiplication of web channels dedicated to job postings. In an economic context where cost control is fundamental, assessment and comparison of recruitment channel performances have become necessary. The purpose of this work is to develop a decision-making tool intended to guide recruiters while they are posting a job on the Internet. This tool provides to recruiters the expected performance on job boards for a given job offer. First, we identify the potential predictors of a recruiting campaign performance. Then, we apply text mining techniques to the job offer texts in order to structure postings and to extract information relevant to improve their description in a predictive model. The job offer performance predictive algorithm is based on a hybrid recommender system, suitable to the cold-start problem. The hybrid system, based on a supervised similarity measure, outperforms standard multivariate models. Our experiments are led on a real dataset, coming from a job posting database.
|
425 |
Analysing CSR reporting over the years, company size, region, and sector through dictionary-based text miningSinghvi, Anuj, Jahangoshay Sarijlou, Dorna January 2023 (has links)
As Corporate Social Responsibility (CSR) reports become more prevalent and systematised, there is a strong need to develop approaches that seek to analyse the contents of these reports. In this thesis, we present two valuable contributions. Firstly, we share a rule-based approach that can be a foundation for future supervised learning methods to examine CSR reports and generate predictions. Secondly, we focus on analysing CSR reports topic distributions across developing regions which are hardly covered in the existing literature. The analysis was conducted on a large corpus of 500+ million words over a sample of ~2500 CSR reports gathered from the Global Reporting Initiative (GRI) database for 2012-17. Using reliable CSR business dictionaries, we determined the absolute and relative frequencies for four topics – Employee, Social Community, Environment and Human Rights. We noticed that the four topics studied had a declining trend in the percentage frequencies by 2017. In most cases, the Employee topic was reported the highest among the four topics, followed by Social Community, Environment and Human Rights. This trend was primarily maintained, barring a few exceptions, even when analysed from different dimensions based on company sizes, regions, and sectors. We also compared our derived results with the works of a few previous studies. To know if the reports became easy or hard to understand, we checked the readability through two indices but could not get any clear trend. In the end, we investigated that though there was more attention in the media on the Environment topic around 2016, we did not observe any heightened frequency percentage in the CSR reporting. We believe dictionary-based text mining on CSR reports can be a powerful way to generate insights for different stakeholders. Companies and their management can use this approach to review their CSR communication strategies. Many Government and Non-Government agencies can utilise this approach to check on their policies' effectiveness and future decision-making.
|
426 |
Bayesian Test Analytics for Document CollectionsWalker, Daniel David 15 November 2012 (has links) (PDF)
Modern document collections are too large to annotate and curate manually. As increasingly large amounts of data become available, historians, librarians and other scholars increasingly need to rely on automated systems to efficiently and accurately analyze the contents of their collections and to find new and interesting patterns therein. Modern techniques in Bayesian text analytics are becoming wide spread and have the potential to revolutionize the way that research is conducted. Much work has been done in the document modeling community towards this end,though most of it is focused on modern, relatively clean text data. We present research for improved modeling of document collections that may contain textual noise or that may include real-valued metadata associated with the documents. This class of documents includes many historical document collections. Indeed, our specific motivation for this work is to help improve the modeling of historical documents, which are often noisy and/or have historical context represented by metadata. Many historical documents are digitized by means of Optical Character Recognition(OCR) from document images of old and degraded original documents. Historical documents also often include associated metadata, such as timestamps,which can be incorporated in an analysis of their topical content. Many techniques, such as topic models, have been developed to automatically discover patterns of meaning in large collections of text. While these methods are useful, they can break down in the presence of OCR errors. We show the extent to which this performance breakdown occurs. The specific types of analyses covered in this dissertation are document clustering, feature selection, unsupervised and supervised topic modeling for documents with and without OCR errors and a new supervised topic model that uses Bayesian nonparametrics to improve the modeling of document metadata. We present results in each of these areas, with an emphasis on studying the effects of noise on the performance of the algorithms and on modeling the metadata associated with the documents. In this research we effectively: improve the state of the art in both document clustering and topic modeling; introduce a useful synthetic dataset for historical document researchers; and present analyses that empirically show how existing algorithms break down in the presence of OCR errors.
|
427 |
Visualisation de l'évolution d'un domaine scientifique par l'analyse des résumés de publication à l'aide de réseaux neuronauxArchambeault, Jean January 2002 (has links)
Mémoire numérisé par la Direction des bibliothèques de l'Université de Montréal.
|
428 |
Development of an ETL-Pipeline for Automatic Sustainability Data AnalysisJanmontree, Jettarat, Mehta, Aditya, Zadek, Hartmut 14 June 2023 (has links)
As the scientific community and organizations increase their investments in sustainable development, the phrase is increasingly being used deceptively. To be sustainable, one must examine all three aspects, namely environmental, social, and economic. The release of sustainability reports has generated a vast amount of data regarding company sustainability practices. This data demands time and effort to evaluate and extract meaningful information. This research aims to create criteria that include a list of keywords for analyzing sustainability reports. Using these criteria, a proposed application based on the concepts of Extract, Transform, Load (ETL) was developed to automatize the process of data analysis. The results generated by the ETL tool can be used to conduct qualitative and quantitative assessments of the organization’s sustainability practices as well as compare the transparency in sustainability reporting across different industries.
|
429 |
Using Genetic Algorithms for Feature Set Selection in Text MiningRogers, Benjamin Charles 17 January 2014 (has links)
No description available.
|
430 |
Domain-Specific Document Retrieval Framework for Near Real-time Social Health DataSoni, Swapnil 01 September 2015 (has links)
No description available.
|
Page generated in 0.0338 seconds