• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 140
  • 5
  • 4
  • 2
  • 2
  • 1
  • 1
  • Tagged with
  • 163
  • 93
  • 80
  • 68
  • 67
  • 50
  • 48
  • 47
  • 46
  • 46
  • 46
  • 45
  • 45
  • 42
  • 41
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
61

Towards the creation of a Clinical Summarizer

Gunnarsson, Axel January 2022 (has links)
While Electronic Medical Records provide extensive information about patients, the vast amounts of data cause issues in attempts to quickly retrieve valuable information needed to make accurate assumptions and decisions directly concerned with patients’ health. This search process is naturally time-consuming and forces health professionals to focus on a labor intensive task that diverts their attention from the main task of applying their knowledge to save lives. With the general aim of potentially relieving the professionals from this task of finding information needed for an operational decision, this thesis explores the use of a general BERT model for extractive summarization of Swedish medical records to investigate its capability in extracting sentences that convey important information to MRI physicists. To achieve this, a domain expert evaluation of medical histories was performed, creating the references summaries that were used for model evaluation. Three implementations are included in this study and one of which is TextRank, a prominent unsupervised approach to extractive summarization. The other two are based on clustering and rely on BERT to encode the text. The implementations are then evaluated using ROUGE metrics. The results support the use of a general BERT model for extractive summarization on medical records. Furthermore, the results are discussed in relation to the collected reference summaries, leading to a discussion about potential improvements to be made with regards to the domain expert evaluation, as well as the possibilities for future work on the topic of summarization of clinical documents.
62

Theater in Bewegung. Eine Befragung des 'Architektonischen' anhand mobiler und temporärer Aufführungsanordnungen

Eitel, Verena Elisabet 06 July 2021 (has links)
Schon lange bespielen Theater Räume und Orte jenseits ihrer eigenen Spielstätten. Worin liegt der Reiz temporärer Aneignung von oft theaterunspezifischen Orten? Welche Abgrenzungen gegenüber spezifischen Theaterbauten und deren impliziten Verhaltens- und Handlungsweisen werden sichtbar? Anhand von fünf Projekten – „Rollende Road Schau“ (Bert Neumann/Volksbühne Berlin), „The World Is Not Fair – Die Große Weltausstellung 2012“ (HAU Berlin/raumlaborberlin), „Faust (to go)/Nathan (to go)“ (Düsseldorfer Schauspielhaus) und „shabbyshabby Hotel“ (Theater der Welt 2014) – werden unterschiedliche Spielarten eines Theaters in Bewegung aufgezeigt. Mobile und temporäre Aufführungsstrukturen werden zum „Architektonischen“ des Theaters ins Verhältnis gesetzt und es wird nach der Verschiebung der architektonischen Schwelle (Dirk Baecker) gefragt, die neue Ansätze im Umgang mit und Zugang zu den Aufführungskünsten hervorbringen kann.
63

Argument Mining: Claim Annotation, Identification, Verification

Karamolegkou, Antonia January 2021 (has links)
Researchers writing scientific articles summarize their work in the abstracts mentioning the final outcome of their study. Argumentation mining can be used to extract the claim of the researchers as well as the evidence that could support their claim. The rapid growth of scientific articles demands automated tools that could help in the detection and evaluation of the scientific claims’ veracity. However, there are neither a lot of studies focusing on claim identification and verification neither a lot of annotated corpora available to effectively train deep learning models. For this reason, we annotated two argument mining corpora and perform several experiments with state-of-the-art BERT-based models aiming to identify and verify scientific claims. We find that using SciBERT provides optimal results regardless of the dataset. Furthermore, increasing the amount of training data can improve the performance of every model we used. These findings highlight the need for large-scale argument mining corpora, as well as domain-specific pre-trained models.
64

Analyzing the Anisotropy Phenomenon in Transformer-based Masked Language Models / En analys av anisotropifenomenet i transformer-baserade maskerade språkmodeller

Luo, Ziyang January 2021 (has links)
In this thesis, we examine the anisotropy phenomenon in popular masked language models, BERT and RoBERTa, in detail. We propose a possible explanation for this unreasonable phenomenon. First, we demonstrate that the contextualized word vectors derived from pretrained masked language model-based encoders share a common, perhaps undesirable pattern across layers. Namely, we find cases of persistent outlier neurons within BERT and RoBERTa's hidden state vectors that consistently bear the smallest or largest values in said vectors. In an attempt to investigate the source of this information, we introduce a neuron-level analysis method, which reveals that the outliers are closely related to information captured by positional embeddings. Second, we find that a simple normalization method, whitening can make the vector space isotropic. Lastly, we demonstrate that ''clipping'' the outliers or whitening can more accurately distinguish word senses, as well as lead to better sentence embeddings when mean pooling.
65

Named-entity recognition with BERT for anonymization of medical records

Bridal, Olle January 2021 (has links)
Sharing data is an important part of the progress of science in many fields. In the largely deep learning dominated field of natural language processing, textual resources are in high demand. In certain domains, such as that of medical records, the sharing of data is limited by ethical and legal restrictions and therefore requires anonymization. The process of manual anonymization is tedious and expensive, thus automated anonymization is of great value. Since medical records consist of unstructured text, pieces of sensitive information have to be identified in order to be masked for anonymization. Named-entity recognition (NER) is the subtask of information extraction named entities, such as person names or locations, are identified and categorized. Recently, models that leverage unsupervised training on large quantities of unlabeled training data have performed impressively on the NER task, which shows promise in their usage for the problem of anonymization. In this study, a small set of medical records was annotated with named-entity tags. Because of the lack of any training data, a BERT model already fine-tuned for NER was then evaluated on the evaluation set. The aim was to find out how well the model would perform on NER on medical records, and to explore the possibility of using the model to anonymize medical records. The most positive result was that the model was able to identify all person names in the dataset. The average accuracy for identifying all entity types was however relatively low. It is discussed that the success of identifying person names shows promise in the model’s application for anonymization. However, because the overall accuracy is significantly worse than that of models fine-tuned on domain-specific data, it is suggested that there might be better methods for anonymization in the absence of relevant training data.
66

Building a Personally Identifiable Information Recognizer in a Privacy Preserved Manner Using Automated Annotation and Federated Learning

Hathurusinghe, Rajitha 16 September 2020 (has links)
This thesis explores the training of a deep neural network based named entity recognizer in an end-to-end privacy preserved setting where dataset creation and model training happen in an environment with minimal manual interventions. With the improvement of accuracy in Deep Learning Models for practical tasks, a rising concern is satisfying the demand for training data for these models amidst the concerns on the data privacy. Several scenarios of data protection are suggested in the recent past due to public concerns hence the legal guidelines to enforce them. A promising new development is the decentralized model training on isolated datasets, which eliminates the compromises of privacy upon providing data to a centralized entity. However, in this federated setting curating the data source is still a privacy risk mostly in unstructured data sources such as text. We explore the feasibility of automatic dataset annotation for a Named Entity Recognition (NER) task and training a deep learning model with it in two federated learning settings. We explore the feasibility of utilizing a dataset created in this manner for fine-tuning a stateof- the-art deep learning language model for the downstream task of named entity recognition. We also explore this novel setting of deep learning NLP model and federated learning for its deviation from the classical centralized setting. We created an automatically annotated dataset containing around 80,000 sentences, a manual human annotated test set and tools to extend the dataset with more manual annotations. We observed the noise from automated annotation can be overcome to a level by increasing the dataset size. We also contributed to the federated learning framework with state-of-the-art NLP model developments. Overall, our NER model achieved around 0.80 F1-score for recognition of entities in sentences.
67

Automatic Recognition and Classification of Translation Errors in Human Translation / Automatisk igenkänning och klassificering av fel i mänsklig översättning

Dürlich, Luise January 2020 (has links)
Grading assignments is a time-consuming part of teaching translation. Automatic tools that facilitate this task would allow teachers of professional translation to focus more on other aspects of their job. Within Natural Language Processing, error recognitionhas not been studied for human translation in particular. This thesis is a first attempt at both error recognition and classification with both mono- and bilingual models. BERT– a pre-trained monolingual language model – and NuQE – a model adapted from the field of Quality Estimation for Machine Translation – are trained on a relatively small hand annotated corpus of student translations. Due to the nature of the task, errors are quite rare in relation to correctly translated tokens in the corpus. To account for this,we train the models with both under- and oversampled data. While both models detect errors with moderate success, the NuQE model adapts very poorly to the classification setting. Overall, scores are quite low, which can be attributed to class imbalance and the small amount of training data, as well as some general concerns about the corpus annotations. However, we show that powerful monolingual language models can detect formal, lexical and translational errors with some success and that, depending on the model, simple under- and oversampling approaches can already help a great deal to avoid pure majority class prediction.
68

Community Recommendation in Social Networks with Sparse Data

Rahmaniazad, Emad 12 1900 (has links)
Indiana University-Purdue University Indianapolis (IUPUI) / Recommender systems are widely used in many domains. In this work, the importance of a recommender system in an online learning platform is discussed. After explaining the concept of adding an intelligent agent to online education systems, some features of the Course Networking (CN) website are demonstrated. Finally, the relation between CN, the intelligent agent (Rumi), and the recommender system is presented. Along with the argument of three different approaches for building a community recommendation system. The result shows that the Neighboring Collaborative Filtering (NCF) outperforms both the transfer learning method and the Continuous bag-of-words approach. The NCF algorithm has a general format with two various implementations that can be used for other recommendations, such as course, skill, major, and book recommendations.
69

Using Bert To Measure Objective Quality Of Rest-Api Specifications : Automated Approach For Quality Measurement

Eriksson, Fritz, Åkesson, Max January 2023 (has links)
Each day, the need for as well as the amount of network-based applications grows and with it the implementation of RESTful APIs. For all these APIs there is a need for documentation of the API's behavior, its benefits, how it interacts with other APIs, and its expected result. To solve this; An API specification is constructed. This is a document containing the design philosophy of the APIs and can act as a guideline for how they should be constructed. When designing API specifications it is often difficult to understand what objective quality the API document upholds. To understand the objective quality of an API specification it must first be understood what a good objective quality is in this regard. We used static code tests (linter rules) that are mapped to three quality attributes that fit the industry's consensus of the most important quality attributes that need to be complacent for a good quality API. We then implemented an automatic process of splitting API specifications into positive and negative training data using the linter results of the rules. The resulting data is used to train our BERT model.The model will then be able to give an objective score to unseen API specifications. We then used a saliency map (textual heatmap) in order to understand BERT's decisions, which added the potential to generate new linter rules from the given results. After testing unseen API specifications on our BERT model, we saw that it was able to generate a reasonable quality score. Although, when inserting smaller features to generate a textual heatmap, the predictions of our model were not correct, hence not making it possible to understand BERT's decisions through our implementation. This also meant that new rules could not be acquired from reviewing the BERT's result.
70

Towards Building a Versatile Tool for Social Media Spam Detection

Abdel Halim, Jalal 15 June 2023 (has links)
No description available.

Page generated in 0.0286 seconds