Spelling suggestions: "subject:"curative.""
91 |
[en] LER: ANNOTATION AND AUTOMATIC CLASSIFICATION OF ENTITIES AND RELATIONS / [pt] LER: ANOTAÇÃO E CLASSIFICAÇÃO AUTOMÁTICA DE ENTIDADES E RELAÇÕESJONATAS DOS SANTOS GROSMAN 30 November 2017 (has links)
[pt] Diversas técnicas para extração de informações estruturadas de dados em linguagem natural foram desenvolvidas e demonstraram resultados muito satisfatórios. Entretanto, para obterem tais resultados, requerem uma série de atividades que geralmente são feitas de modo isolado, como a anotação de textos para geração de corpora, etiquetamento morfossintático, engenharia e extração de atributos, treinamento de modelos de aprendizado de máquina etc., o que torna onerosa a extração dessas informações, dado o esforço e tempo a serem investidos. O presente trabalho propõe e desenvolve uma plataforma em ambiente web, chamada LER (Learning Entities and Relations) que integra o fluxo necessário para essas atividades, com uma interface que visa a facilidade de uso. Outrossim, o trabalho mostra os resultados da implementação e uso da plataforma proposta. / [en] Many techniques for the structured information extraction from natural language data have been developed and have demonstrated their potentials yielding satisfactory results. Nevertheless, to obtain such results, they require some activities that are usually done separately, such as text annotation to generate corpora, Part-Of- Speech tagging, features engineering and extraction, machine learning models training etc., making the information extraction task a costly activity due to the effort and time spent on this. The present work proposes and develops a web based platform called LER (Learning Entities and Relations), that integrates the needed workflow for these activities, with an interface that aims the ease of use. The work also shows the platform implementation and its use.
|
92 |
Bibliotekarien som access point : En undersökning av artificiell intelligens inom svenska bibliotek / The librarian as an access point : A survey of artificial intelligence in Swedish librariesBorg, Stina, Ferlin, Michael January 2021 (has links)
Introduction. Artificial intelligence is growing in society at large and within libraries specifically. There are both positive and negative consequences of this development. In this essay, ethical issues concerning bias, transparency and integrity are examined in a Library and Information Science context. Method and theory. Qualitative survey questionnaires with questions about how the libraries work with AI, the informant’s thoughts on ethical problems with it and how they saw the library’s future with AI were created and sent to employees at research libraries in Sweden. Nine answers to the questionnaires and one article formed the data for analysis. Employing Anthony Gidden’s structuration theory, the essay uses concepts like access point, ontological security and reembedding of trust. Analysis. A qualitative content analysis was carried out on the data. The analysis employed a thematic sectioning of the analyzed text, where the themes were developed through content analysis of the analyzed data in relation to the previous research presented in the essay. Results. Five different themes were sectioned out from the data; bias, integrity, transparency, curation and media- and information literacy. The answers were sectioned into these themes and compared to what the previous research said about the subject. The results are presented in a thematic overview where each section analyses the answers in the specific theme. Conclusion. When using and developing AI, the libraries can use ethical guidelines and curation to be aware of and counteract building bias into the systems. An important part of the libraries’ work for the development of the democratic society is media- and information literacy and teaching about information technology, which AI and the way it is developed is a part of. This is a two years master’s thesis in Library and Information Science.
|
93 |
Automation and Validation of Big Data Generation via Simulation Pipeline for Flexible AssembliesAdrian, Alexander F. 26 October 2022 (has links)
No description available.
|
94 |
Between the Lines: Writing Ethics PedagogyMay, Phillip W. "Cactus", IV 03 July 2018 (has links)
No description available.
|
95 |
Large Language Models as Advanced Data Preprocessors : Transforming Unstructured Text into Fine-Tuning DatasetsVangeli, Marius January 2024 (has links)
The digital landscape increasingly generates vast amounts of unstructured textual data, valuable for analytics and various machine learning (ML) applications. These vast stores of data, often likened to digital gold, are often challenging to process and utilize. Traditional text processing methods, lacking the ability to generalize, typically struggle with unstructured and unlabeled data. For many complex data management workflows, the solution typically involves human intervention in the form of manual curation and labeling — a time-consuming process. Large Language Models (LLMs) are AI models trained on vast amounts of text data. They have remarkable Natural Language Processing (NLP) capabilities and offer a promising alternative. This thesis serves as an empirical case study of LLMs as advanced data preprocessing tools. It explores the effectiveness and limitations of using LLMs to automate and refine traditionally challenging data preprocessing tasks, highlighting a critical area of research in data management. An LLM-based preprocessing pipeline, designed to clean and prepare raw textual data for use in ML applications, is implemented and evaluated. This pipeline was applied to a corpus of unstructured text documents, extracted from PDFs, with the aim of transforming them into a fine-tuning dataset for LLMs. The efficacy of the LLM-based preprocessing pipeline was assessed by comparing the results against a manually curated benchmark dataset using two text similarity metrics: the Levenshtein distance and ROUGE score. The findings indicate that although LLMs are not yet capable of fully replacing human curation in complex data management workflows, they substantially improve the efficiency and manageability of preprocessing unstructured textual data.
|
96 |
Text Mining for Pathway CurationWeber-Genzel, Leon 17 November 2023 (has links)
Biolog:innen untersuchen häufig Pathways, Netzwerke von Interaktionen zwischen Proteinen und Genen mit einer spezifischen Funktion. Neue Erkenntnisse über Pathways werden in der Regel zunächst in Publikationen veröffentlicht und dann in strukturierter Form in Lehrbüchern, Datenbanken oder mathematischen Modellen weitergegeben. Deren Kuratierung kann jedoch aufgrund der hohen Anzahl von Publikationen sehr aufwendig sein. In dieser Arbeit untersuchen wir wie Text Mining Methoden die Kuratierung unterstützen können. Wir stellen PEDL vor, ein Machine-Learning-Modell zur Extraktion von Protein-Protein-Assoziationen (PPAs) aus biomedizinischen Texten. PEDL verwendet Distant Supervision und vortrainierte Sprachmodelle, um eine höhere Genauigkeit als vergleichbare Methoden zu erreichen. Eine Evaluation durch Expert:innen bestätigt die Nützlichkeit von PEDLs für Pathway-Kurator:innen. Außerdem stellen wir PEDL+ vor, ein Kommandozeilen-Tool, mit dem auch Nicht-Expert:innen PPAs effizient extrahieren können. Drei Kurator:innen bewerten 55,6 % bis 79,6 % der von PEDL+ gefundenen PPAs als nützlich für ihre Arbeit. Die große Anzahl von PPAs, die durch Text Mining identifiziert werden, kann für Forscher:innen überwältigend sein. Um hier Abhilfe zu schaffen, stellen wir PathComplete vor, ein Modell, das nützliche Erweiterungen eines Pathways vorschlägt. Es ist die erste Pathway-Extension-Methode, die auf überwachtem maschinellen Lernen basiert. Unsere Experimente zeigen, dass PathComplete wesentlich genauer ist als existierende Methoden. Schließlich schlagen wir eine Methode vor, um Pathways mit komplexen Ereignisstrukturen zu erweitern. Hier übertrifft unsere neue Methode zur konditionalen Graphenmodifikation die derzeit beste Methode um 13-24% Genauigkeit in drei Benchmarks. Insgesamt zeigen unsere Ergebnisse, dass Deep Learning basierte Informationsextraktion eine vielversprechende Grundlage für die Unterstützung von Pathway-Kurator:innen ist. / Biological knowledge often involves understanding the interactions between molecules, such as proteins and genes, that form functional networks called pathways. New knowledge about pathways is typically communicated through publications and later condensed into structured formats such as textbooks, pathway databases or mathematical models. However, curating updated pathway models can be labour-intensive due to the growing volume of publications. This thesis investigates text mining methods to support pathway curation. We present PEDL (Protein-Protein-Association Extraction with Deep Language Models), a machine learning model designed to extract protein-protein associations (PPAs) from biomedical text. PEDL uses distant supervision and pre-trained language models to achieve higher accuracy than the state of the art. An expert evaluation confirms its usefulness for pathway curators. We also present PEDL+, a command-line tool that allows non-expert users to efficiently extract PPAs. When applied to pathway curation tasks, 55.6% to 79.6% of PEDL+ extractions were found useful by curators. The large number of PPAs identified by text mining can be overwhelming for researchers. To help, we present PathComplete, a model that suggests potential extensions to a pathway. It is the first method based on supervised machine learning for this task, using transfer learning from pathway databases. Our evaluations show that PathComplete significantly outperforms existing methods. Finally, we generalise pathway extension from PPAs to more realistic complex events. Here, our novel method for conditional graph modification outperforms the current best by 13-24% accuracy on three benchmarks. We also present a new dataset for event-based pathway extension.
Overall, our results show that deep learning-based information extraction is a promising basis for supporting pathway curators.
|
97 |
A knowledgebase of stress reponsive gene regulatory elements in arabidopsis ThalianaAdam, Muhammed Saleem January 2011 (has links)
<p>Stress responsive genes play a key role in shaping the manner in which plants process and respond to environmental stress. Their gene products are linked to DNA transcription and its consequent translation into a response product. However, whilst these genes play a significant role in manufacturing responses to stressful stimuli, transcription factors coordinate access to these genes, specifically by accessing a gene&rsquo / s promoter region which houses transcription factor binding sites. Here transcriptional elements play a key role in mediating responses to environmental stress where each transcription factor binding site may constitute a potential response to a stress signal. Arabidopsis thaliana, a model organism, can be used to identify the mechanism of how transcription factors shape a plant&rsquo / s survival in a stressful environment. Whilst there are numerous plant stress research groups, globally there is a shortage of publicly available stress responsive gene databases. In addition a number of previous databases such as the Generation Challenge Programme&rsquo / s comparative plant stressresponsive gene catalogue, Stresslink and DRASTIC have become defunct whilst others have stagnated. There is currently a single Arabidopsis thaliana stress response database called STIFDB which was launched in 2008 and only covers abiotic stresses as handled by major abiotic stress responsive transcription factor families. Its data was sourced from microarray expression databases, contains numerous omissions as well as numerous erroneous entries and has not been updated since its inception.The Dragon Arabidopsis Stress Transcription Factor database (DASTF) was developed in response to the current lack of stress response gene resources. A total of 2333 entries were downloaded from SWISSPROT, manually curated and imported into DASTF. The entries represent 424 transcription factor families. Each entry has a corresponding SWISSPROT, ENTREZ GENBANK and TAIR accession number. The 5&rsquo / untranslated regions (UTR) of 417 families were scanned against TRANSFAC&rsquo / s binding site catalogue to identify binding sites. The relational database consists of two tables, namely a transcription factor table and a transcription factor family table called DASTF_TF and TF_Family respectively. Using a two-tier client-server architecture, a webserver was built with PHP, APACHE and MYSQL and the data was loaded into these tables with a PYTHON script. The DASTF database contains 60 entries which correspond to biotic stress and 167 correspond to abiotic stress while 2106 respond to biotic and/or abiotic stress. Users can search the database using text, family, chromosome and stress type search options. Online tools have been integrated into the DASTF  / database, such as HMMER, CLUSTALW, BLAST and HYDROCALCULATOR. User&rsquo / s can upload sequences to identify which transcription factor family their sequences belong to by using HMMER. The website can be accessed at http://apps.sanbi.ac.za/dastf/ and two updates per year are envisaged.</p>
|
98 |
A knowledgebase of stress reponsive gene regulatory elements in arabidopsis ThalianaAdam, Muhammed Saleem January 2011 (has links)
<p>Stress responsive genes play a key role in shaping the manner in which plants process and respond to environmental stress. Their gene products are linked to DNA transcription and its consequent translation into a response product. However, whilst these genes play a significant role in manufacturing responses to stressful stimuli, transcription factors coordinate access to these genes, specifically by accessing a gene&rsquo / s promoter region which houses transcription factor binding sites. Here transcriptional elements play a key role in mediating responses to environmental stress where each transcription factor binding site may constitute a potential response to a stress signal. Arabidopsis thaliana, a model organism, can be used to identify the mechanism of how transcription factors shape a plant&rsquo / s survival in a stressful environment. Whilst there are numerous plant stress research groups, globally there is a shortage of publicly available stress responsive gene databases. In addition a number of previous databases such as the Generation Challenge Programme&rsquo / s comparative plant stressresponsive gene catalogue, Stresslink and DRASTIC have become defunct whilst others have stagnated. There is currently a single Arabidopsis thaliana stress response database called STIFDB which was launched in 2008 and only covers abiotic stresses as handled by major abiotic stress responsive transcription factor families. Its data was sourced from microarray expression databases, contains numerous omissions as well as numerous erroneous entries and has not been updated since its inception.The Dragon Arabidopsis Stress Transcription Factor database (DASTF) was developed in response to the current lack of stress response gene resources. A total of 2333 entries were downloaded from SWISSPROT, manually curated and imported into DASTF. The entries represent 424 transcription factor families. Each entry has a corresponding SWISSPROT, ENTREZ GENBANK and TAIR accession number. The 5&rsquo / untranslated regions (UTR) of 417 families were scanned against TRANSFAC&rsquo / s binding site catalogue to identify binding sites. The relational database consists of two tables, namely a transcription factor table and a transcription factor family table called DASTF_TF and TF_Family respectively. Using a two-tier client-server architecture, a webserver was built with PHP, APACHE and MYSQL and the data was loaded into these tables with a PYTHON script. The DASTF database contains 60 entries which correspond to biotic stress and 167 correspond to abiotic stress while 2106 respond to biotic and/or abiotic stress. Users can search the database using text, family, chromosome and stress type search options. Online tools have been integrated into the DASTF  / database, such as HMMER, CLUSTALW, BLAST and HYDROCALCULATOR. User&rsquo / s can upload sequences to identify which transcription factor family their sequences belong to by using HMMER. The website can be accessed at http://apps.sanbi.ac.za/dastf/ and two updates per year are envisaged.</p>
|
99 |
Vzdělávací programy pro oblast digitálních knihoven a digitalizace na školách informační vědy a knihovnictví v USA / Educational programs for the digital libraries sphere and digitisation on information and library science schools in the USAJilečková, Šárka January 2015 (has links)
(in English): The aim of diploma theses Educational Programs for the Digital Libraries Sphere and Digitisation on Information and Library Science Schools in the USA is to compare studying programs of chosen schools of information science and librarianship in the USA for digital libraries and digitisation sphere and according to this analysis make some recommendations for ÚISK FF UK. In the theoretical part there are competencies for that field introduced and its development in praxis and education is showed. In the practical part of the work follows the analysis of chosen american schools of information studies (universities Chapel Hill in the North Carolina, Michigan, Boston, Illinois and Syracuse), where the issues of digitisation and digital libraries are taught. Based on this analysis continues the work with some recommendations possible for using in the ÚISK FF UK.
|
100 |
Curating news sections in a historical Swedish news corpusRekathati, Faton January 2020 (has links)
The National Library of Sweden uses optical character recognition software to digitize their collections of historical newspapers. The purpose of such software is first to automatically segment text and images from scanned newspaper pages, and second to read the contents of the identified text regions. While the raw text is often digitized successfully, important contextual information regarding whether the text constitutes for example a header, a section title or the body text of an article is not captured. These characteristics are easy for a human to distinguish, yet they remain difficult for a machine to recognize. The main purpose of this thesis is to investigate how well section titles in the newspaper Svenska Dagbladet can be classified by using so called image embeddings as features. A secondary aim is to examine whether section titles become harder to classify in older newspaper data. Lastly, we explore if manual annotation work can be reduced using the predictions of a semi-supervised classifier to help in the labeling process. Results indicate the use of image embeddings help quite substantially in classifying section titles. Datasets from three different time periods: 1990-1997, 2004-2013, and 2017 and onwards were sampled and annotated. The best performing model (Xgboost) achieved macro F1 scores of 0.886, 0.936 and 0.980 for the respective time periods. The results also showed classification became more difficult on older newspapers. Furthermore, a semi-supervised classifier managed an average precision of 83% with only single section title examples, showing promise as way to speed up manual annotation of data.
|
Page generated in 0.0774 seconds