Spelling suggestions: "subject:"transformer based NLP"" "subject:"ransformer based NLP""
1 |
Building a Personally Identifiable Information Recognizer in a Privacy Preserved Manner Using Automated Annotation and Federated LearningHathurusinghe, Rajitha 16 September 2020 (has links)
This thesis explores the training of a deep neural network based named entity recognizer in
an end-to-end privacy preserved setting where dataset creation and model training happen
in an environment with minimal manual interventions. With the improvement of accuracy
in Deep Learning Models for practical tasks, a rising concern is satisfying the demand for
training data for these models amidst the concerns on the data privacy. Several scenarios of
data protection are suggested in the recent past due to public concerns hence the legal guidelines
to enforce them. A promising new development is the decentralized model training
on isolated datasets, which eliminates the compromises of privacy upon providing data to a
centralized entity. However, in this federated setting curating the data source is still a privacy
risk mostly in unstructured data sources such as text.
We explore the feasibility of automatic dataset annotation for a Named Entity Recognition
(NER) task and training a deep learning model with it in two federated learning settings.
We explore the feasibility of utilizing a dataset created in this manner for fine-tuning a stateof-
the-art deep learning language model for the downstream task of named entity recognition.
We also explore this novel setting of deep learning NLP model and federated learning
for its deviation from the classical centralized setting.
We created an automatically annotated dataset containing around 80,000 sentences, a
manual human annotated test set and tools to extend the dataset with more manual annotations.
We observed the noise from automated annotation can be overcome to a level by
increasing the dataset size. We also contributed to the federated learning framework with
state-of-the-art NLP model developments. Overall, our NER model achieved around 0.80
F1-score for recognition of entities in sentences.
|
Page generated in 0.0704 seconds