Global ETD Search

11	Nested Noun Phrase Detection in English Text with BERT Misra, Shweta January 2023 (has links) In this project, we address the task of nested noun phrase identification in English sentences, where a phrase is defined as a group of words functioning as one unit in a sentence. Prior research has extensively explored the identification of various phrases for language understanding and text generation tasks. Our aim is to tackle the novel challenge of identifying nested noun phrases within sentences. To accomplish this, we first review existing work on related topics such as partial parsing and noun phrase identification. Subsequently, we propose a novel approach based on transformer models to recursively identify nested noun phrases in sentences. We fine-tune a pre-trained uncased BERT model to detect phrase structures in a sentence and determine whether they represent noun phrases. Our recursive approach involves merging relevant segments of a sentence and assigning labels to the noun phrases at each step, facilitating the identification of nested structures. The evaluation of our model demonstrates promising results, achieving a high accuracy of up to 93.6% when considering all noun phrases in isolation and 90.9% when accounting for the predicted phrase structure of the sentence. Additionally, our model exhibits a recall rate of 83.5% and 81.2% at both levels, respectively. Overall, our model proves to be effective in identifying nested noun phrases, showcasing the potential of transformer-based models in phrase structure identification. Future research should explore further applications and enhancements of such models in this domain. / I detta projekt tar vi upp uppgiften att identifiera nästlade substantivfraser i engelska meningar, där en fras definieras som en grupp ord som fungerar som en enhet i en mening. Tidigare forskning har utförligt utforskat identifieringen av olika fraser för språkförståelse och textgenereringsuppgifter. Vårt mål är att ta itu med den nya utmaningen att identifiera nästlade substantivfraser i meningar. För att åstadkomma detta granskar vi först befintligt arbete med relaterade ämnen som partiell analys och identifiering av substantivfraser. Därefter föreslår vi en ny metod baserad på transformers-modeller för att rekursivt identifiera nästlade substantivfraser i meningar. Vi finjusterar en förtränad BERT-modell utan kapsling för att upptäcka frasstrukturer i en mening och avgöra om de representerar substantivfraser. Vårt rekursiva tillvägagångssätt innebär att sammanfoga relevanta segment av en mening och att tilldela etiketter till substantivfraserna vid varje steg, vilket underlättar identifieringen av nästlade strukturer. Utvärderingen av vår modell visar lovande resultat och uppnår en hög precision på upp till 93,6% när man tar hänsyn till alla substantivfraser isolerat och 90,9% när man tar hänsyn till meningens förutsagda frasstruktur. Dessutom uppvisar vår modell en täckning (recall) på 83,5% respektive 81,2% på båda nivåerna. Sammantaget visar vår modell sig vara effektiv för att identifiera nästlade substantivfraser, vilket visar potentialen hos transformers-modeller för identifiering av frasstruktur. Framtida forskning bör utforska ytterligare tillämpningar och förbättringar av sådana modeller på detta område. Phrase detection nested noun phrase identification phrase structure identification sentence parsing transformer models machine learning natural language processing Frasdetektering kapslad substantivfrasidentifiering frasstrukturidentifiering meningsanalys transformers-modeller maskininlärning naturlig språkbehandling Computer and Information Sciences Data- och informationsvetenskap
12	A Comparative Analysis of Whisper and VoxRex on Swedish Speech Data Fredriksson, Max, Ramsay Veljanovska, Elise January 2024 (has links) With the constant development of more advanced speech recognition models, the need to determine which models are better in specific areas and for specific purposes becomes increasingly crucial. Even more so for low-resource languages such as Swedish, dependent on the progress of models for the large international languages. Lagerlöf (2022) conducted a comparative analysis between Google’s speech-to-text model and NLoS’s VoxRex B, concluding that VoxRex was the best for Swedish audio. Since then, OpenAI released their Automatic Speech Recognition model Whisper, prompting a reassessment of the preferred choice for transcribing Swedish. In this comparative analysis using data from Swedish radio news segments, Whisper performs better than VoxRex in tests on the raw output, highly affected by more proficient sentence constructions. It is not possible to conclude which model is better regarding pure word prediction. However, the results favor VoxRex, displaying a lower variability, meaning that even though Whisper can predict full text better, the decision of what model to use should be determined by the user’s needs. ASR Automatic Speech Recognition Swedish Speech Recognition Speech Recognition Models Speech-to-Text Whisper VoxRex Wav2Vec Model Comparison Transformer Models Neural Networks Machine Learning WER Word Error Rate Transcription Probability Theory and Statistics Sannolikhetsteori och statistik
13	Stylometry: Quantifying Classic Literature For Authorship Attribution : - A Machine Learning Approach Yousif, Jacob, Scarano, Donato January 2024 (has links) Classic literature is rich, be it linguistically, historically, or culturally, making it valuable for future studies. Consequently, this project chose a set of 48 classic books to conduct a stylometric analysis on the defined set of books, adopting an approach used by a related work to divide the books into text segments, quantify the resulting text segments, and analyze the books using the quantified values to understand the linguistic attributes of the books. Apart from the latter, this project conducted different classification tasks for other objectives. In one respect, the study used the quantified values of the text segments of the books for classification tasks using advanced models like LightGBM and TabNet to assess the application of this approach in authorship attribution. From another perspective, the study utilized a State-Of-The-Art model, namely, RoBERTa for classification tasks using the segmented texts of the books instead to evaluate the performance of the model in authorship attribution. The results uncovered the characteristics of the books to a reasonable degree. Regarding the authorship attribution tasks, the results suggest that segmenting and quantifying text using stylometric analysis and supervised machine learning algorithms is practical in such tasks. This approach, while showing promise, may still require further improvements to achieve optimal performance. Lastly, RoBERTa demonstrated high performance in authorship attribution tasks. Authorship Attribution Classic Literature Analysis Clustering Data Science Deep Learning Feature Engineering Feature Extraction Gradient Descent K-Means LightGBM Machine Learning Multiclass Classification NLP Neural Network RoBERTa Stylometric Analysis Stylometry TabNet t-SNE Text Mining Transformer Models Computer Sciences Datavetenskap (datalogi) Computer and Information Sciences Data- och informationsvetenskap

Page generated in 0.0974 seconds