121 |
Accounting for Individual Speaker Properties in Automatic Speech RecognitionElenius, Daniel January 2010 (has links)
In this work, speaker characteristic modeling has been applied in the fields of automatic speech recognition (ASR) and automatic speaker verification (ASV). In ASR, a key problem is that acoustic mismatch between training and test conditions degrade classification per- formance. In this work, a child exemplifies a speaker not represented in training data and methods to reduce the spectral mismatch are devised and evaluated. To reduce the acoustic mismatch, predictive modeling based on spectral speech transformation is applied. Follow- ing this approach, a model suitable for a target speaker, not well represented in the training data, is estimated and synthesized by applying vocal tract predictive modeling (VTPM). In this thesis, the traditional static modeling on the utterance level is extended to dynamic modeling. This is accomplished by operating also on sub-utterance units, such as phonemes, phone-realizations, sub-phone realizations and sound frames. Initial experiments shows that adaptation of an acoustic model trained on adult speech significantly reduced the word error rate of ASR for children, but not to the level of a model trained on children’s speech. Multi-speaker-group training provided an acoustic model that performed recognition for both adults and children within the same model at almost the same accuracy as speaker-group dedicated models, with no added model complexity. In the analysis of the cause of errors, body height of the child was shown to be correlated to word error rate. A further result is that the computationally demanding iterative recognition process in standard VTLN can be replaced by synthetically extending the vocal tract length distribution in the training data. A multi-warp model is trained on the extended data and recognition is performed in a single pass. The accuracy is similar to that of the standard technique. A concluding experiment in ASR shows that the word error rate can be reduced by ex- tending a static vocal tract length compensation parameter into a temporal parameter track. A key component to reach this improvement was provided by a novel joint two-level opti- mization process. In the process, the track was determined as a composition of a static and a dynamic component, which were simultaneously optimized on the utterance and sub- utterance level respectively. This had the principal advantage of limiting the modulation am- plitude of the track to what is realistic for an individual speaker. The recognition error rate was reduced by 10% relative compared with that of a standard utterance-specific estimation technique. The techniques devised and evaluated can also be applied to other speaker characteristic properties, which exhibit a dynamic nature. An excursion into ASV led to the proposal of a statistical speaker population model. The model represents an alternative approach for determining the reject/accept threshold in an ASV system instead of the commonly used direct estimation on a set of client and impos- tor utterances. This is especially valuable in applications where a low false reject or false ac- cept rate is required. In these cases, the number of errors is often too few to estimate a reli- able threshold using the direct method. The results are encouraging but need to be verified on a larger database. / QC 20110502 / Pf-Star / KOBRA
|
122 |
Abstractive Summarization of Podcast TranscriptionsKarlbom, Hannes January 2021 (has links)
In the rapidly growing medium of podcasts, as episodes are automatically transcribed the need for good natural language summarization models which can handle a variety of obstacles presented by the transcriptions and the format has increased. This thesis investigates the transformer-based sequence-to-sequence models, where an attention mechanism keeps track of which words in the context are most important to the next word prediction in the sequence. Different summarization models are investigated on a large-scale open-domain podcast dataset which presents challenges such as transcription errors, multiple speakers, different genres, structures, as well as long texts. The results show that a sparse attention mechanism using a sliding window has an increased average ROUGE-2 score F-measure of 21.6% over transformer models using a short input length with fully connected attention layers.
|
123 |
Word Vector Representations using Shallow Neural NetworksAdewumi, Oluwatosin January 2021 (has links)
This work highlights some important factors for consideration when developing word vector representations and data-driven conversational systems. The neural network methods for creating word embeddings have gained more prominence than their older, count-based counterparts.However, there are still challenges, such as prolonged training time and the need for more data, especially with deep neural networks. Shallow neural networks with lesser depth appear to have the advantage of less complexity, however, they also face challenges, such as sub-optimal combination of hyper-parameters which produce sub-optimal models. This work, therefore, investigates the following research questions: "How importantly do hyper-parameters influence word embeddings’ performance?" and "What factors are important for developing ethical and robust conversational systems?" In answering the questions, various experiments were conducted using different datasets in different studies. The first study investigates, empirically, various hyper-parameter combinations for creating word vectors and their impact on a few natural language processing (NLP) downstream tasks: named entity recognition (NER) and sentiment analysis (SA). The study shows that optimal performance of embeddings for downstream \acrshort{nlp} tasks depends on the task at hand.It also shows that certain combinations give strong performance across the tasks chosen for the study. Furthermore, it shows that reasonably smaller corpora are sufficient or even produce better models in some cases and take less time to train and load. This is important, especially now that environmental considerations play prominent role in ethical research. Subsequent studies build on the findings of the first and explore the hyper-parameter combinations for Swedish and English embeddings for the downstream NER task. The second study presents the new Swedish analogy test set for evaluation of Swedish embeddings. Furthermore, it shows that character n-grams are useful for Swedish, a morphologically rich language. The third study shows that broad coverage of topics in a corpus appears to be important to produce better embeddings and that noise may be helpful in certain instances, though they are generally harmful. Hence, relatively smaller corpus can show better performance than a larger one, as demonstrated in the work with the smaller Swedish Wikipedia corpus against the Swedish Gigaword. The argument is made, in the final study (in answering the second question) from the point of view of the philosophy of science, that the near-elimination of the presence of unwanted bias in training data and the use of foralike the peer-review, conferences, and journals to provide the necessary avenues for criticism and feedback are instrumental for the development of ethical and robust conversational systems.
|
124 |
Argument Mining: Claim Annotation, Identification, VerificationKaramolegkou, Antonia January 2021 (has links)
Researchers writing scientific articles summarize their work in the abstracts mentioning the final outcome of their study. Argumentation mining can be used to extract the claim of the researchers as well as the evidence that could support their claim. The rapid growth of scientific articles demands automated tools that could help in the detection and evaluation of the scientific claims’ veracity. However, there are neither a lot of studies focusing on claim identification and verification neither a lot of annotated corpora available to effectively train deep learning models. For this reason, we annotated two argument mining corpora and perform several experiments with state-of-the-art BERT-based models aiming to identify and verify scientific claims. We find that using SciBERT provides optimal results regardless of the dataset. Furthermore, increasing the amount of training data can improve the performance of every model we used. These findings highlight the need for large-scale argument mining corpora, as well as domain-specific pre-trained models.
|
125 |
Analyzing the Anisotropy Phenomenon in Transformer-based Masked Language Models / En analys av anisotropifenomenet i transformer-baserade maskerade språkmodellerLuo, Ziyang January 2021 (has links)
In this thesis, we examine the anisotropy phenomenon in popular masked language models, BERT and RoBERTa, in detail. We propose a possible explanation for this unreasonable phenomenon. First, we demonstrate that the contextualized word vectors derived from pretrained masked language model-based encoders share a common, perhaps undesirable pattern across layers. Namely, we find cases of persistent outlier neurons within BERT and RoBERTa's hidden state vectors that consistently bear the smallest or largest values in said vectors. In an attempt to investigate the source of this information, we introduce a neuron-level analysis method, which reveals that the outliers are closely related to information captured by positional embeddings. Second, we find that a simple normalization method, whitening can make the vector space isotropic. Lastly, we demonstrate that ''clipping'' the outliers or whitening can more accurately distinguish word senses, as well as lead to better sentence embeddings when mean pooling.
|
126 |
Named-entity recognition with BERT for anonymization of medical recordsBridal, Olle January 2021 (has links)
Sharing data is an important part of the progress of science in many fields. In the largely deep learning dominated field of natural language processing, textual resources are in high demand. In certain domains, such as that of medical records, the sharing of data is limited by ethical and legal restrictions and therefore requires anonymization. The process of manual anonymization is tedious and expensive, thus automated anonymization is of great value. Since medical records consist of unstructured text, pieces of sensitive information have to be identified in order to be masked for anonymization. Named-entity recognition (NER) is the subtask of information extraction named entities, such as person names or locations, are identified and categorized. Recently, models that leverage unsupervised training on large quantities of unlabeled training data have performed impressively on the NER task, which shows promise in their usage for the problem of anonymization. In this study, a small set of medical records was annotated with named-entity tags. Because of the lack of any training data, a BERT model already fine-tuned for NER was then evaluated on the evaluation set. The aim was to find out how well the model would perform on NER on medical records, and to explore the possibility of using the model to anonymize medical records. The most positive result was that the model was able to identify all person names in the dataset. The average accuracy for identifying all entity types was however relatively low. It is discussed that the success of identifying person names shows promise in the model’s application for anonymization. However, because the overall accuracy is significantly worse than that of models fine-tuned on domain-specific data, it is suggested that there might be better methods for anonymization in the absence of relevant training data.
|
127 |
Improving Solr search with Natural Language Processing : An NLP implementation for information retrieval in Solr / Att förbättra Solr med Natural Language ProcessingLager, Adam January 2021 (has links)
The field of AI is emerging fast and institutions and companies are pushing the limits of impossibility. Natural Language Processing is a branch of AI where the goal is to understand human speech and/or text. This technology is used to improve an inverted index,the full text search engine Solr. Solr is open source and has integrated OpenNLP makingit a suitable choice for these kinds of operations. NLP-enabled Solr showed great results compared to the Solr that’s currently running on the systems, where NLP-Solr was slightly worse in terms of precision, it excelled at recall and returning the correct documents.
|
128 |
Automated Digitization and Summarization of Analog Archives : Comparing summaries made by GPT-3 and a humanLinderholm, Maja January 2022 (has links)
This thesis aimed to create a tool that could assist climate researchers in their fieldwork. Through dialog with researchers at Stockholms University a need and interest for automated digitization and summarization of their handwritten notes could be identified. Climate research may require work conducted out in the field and during fieldwork, many researchers prefer to take handwritten notes which can generate large physical archives. A downside with only physical archives is that the data and knowledge stored here become less available and create a threshold for researchers to use the data since manually digitizing handwritten texts can be very time-consuming. At the end of the thesis, a software program was created which could automatically digitize and summarize handwritten texts to save time for researchers. The tool consists of (1) Google Cloud Vision API used to digitize a photo of handwritten text by using a convolutional neural network (CNN) and (2) the transformer-based algorithm GPT-3 used to summarize the digitized text. The GPT-3 algorithm provided two different engines, Davinci and Curie. The performance of the algorithms was evaluated with a data set consisting of handwritten texts provided by Stockholms University. The results indicated that the performance of Google Cloud Vision API was highly correlated to the quality of the image and the way of handwriting. With a unique handwriting follows a poor classification of letters since the algorithm performed badly on shapes that were unfamiliar. A survey was used to evaluate the performance of GPT-3. The survey got 73 responses where the subjects would grade five summaries conducted by a human and the GPT-3 engines Davinci and Curie respectively from the same text. The results from the survey indicated that the performance of the engine Davinci was comparable to the performance of a human while Curie was not a preferable option.
|
129 |
Fine-grained sentiment analysis of product reviews in SwedishWestin, Emil January 2020 (has links)
In this study we gather customer reviews from Prisjakt, a Swedish price comparison site, with the goal to study the relationship between review and rating, known as sentiment analysis. The purpose of the study is to evaluate three different supervised machine learning models on a fine-grained dependent variable representing the review rating. For classification, a binary and multinomial model is used with the one-versus-one strategy implemented in the Support Vector Machine, with a linear kernel, evaluated with F1, accuracy, precision and recall scores. We use Support Vector Regression by approximating the fine-grained variable as continuous, evaluated using MSE. Furthermore, three models are evaluated on a balanced and unbalanced dataset in order to investigate the effects of class imbalance. The results show that the SVR performs better on unbalanced fine-grained data, with the best fine-grained model reaching a MSE 4.12, compared to the balanced SVR (6.84). The binary SVM model reaches an accuracy of 86.37% and weighted F1 macro of 86.36% on the unbalanced data, while the balanced binary SVM model reaches approximately 80% for both measures. The multinomial model shows the worst performance due to the inability to handle class imbalance, despite the implementation of class weights. Furthermore, results from feature engineering shows that SVR benefits marginally from certain regex conversions, and tf-idf weighting shows better performance on the balanced sets compared to the unbalanced sets.
|
130 |
Automatic Recognition and Classification of Translation Errors in Human Translation / Automatisk igenkänning och klassificering av fel i mänsklig översättningDürlich, Luise January 2020 (has links)
Grading assignments is a time-consuming part of teaching translation. Automatic tools that facilitate this task would allow teachers of professional translation to focus more on other aspects of their job. Within Natural Language Processing, error recognitionhas not been studied for human translation in particular. This thesis is a first attempt at both error recognition and classification with both mono- and bilingual models. BERT– a pre-trained monolingual language model – and NuQE – a model adapted from the field of Quality Estimation for Machine Translation – are trained on a relatively small hand annotated corpus of student translations. Due to the nature of the task, errors are quite rare in relation to correctly translated tokens in the corpus. To account for this,we train the models with both under- and oversampled data. While both models detect errors with moderate success, the NuQE model adapts very poorly to the classification setting. Overall, scores are quite low, which can be attributed to class imbalance and the small amount of training data, as well as some general concerns about the corpus annotations. However, we show that powerful monolingual language models can detect formal, lexical and translational errors with some success and that, depending on the model, simple under- and oversampling approaches can already help a great deal to avoid pure majority class prediction.
|
Page generated in 0.0478 seconds