Global ETD Search

1	Investigating the selection of example sentences for unknown target words in ICALL reading texts for L2 German Segler, Thomas M. January 2007 (has links) This thesis considers possible criteria for the selection of example sentences for difficult or unknown words in reading texts for students of German as a Second Language (GSL). The examples are intended to be provided within the context of an Intelligent Computer-Aided Language Learning (ICALL) Vocabulary Learning System, where students can choose among several explanation options for difficult words. Some of these options (e.g. glosses) have received a good deal of attention in the ICALL/Second Language (L2) Acquisition literature; in contrast, literature on examples has been the near exclusive province of lexicographers. The selection of examples is explored from an educational, L2 teaching point of view: the thesis is intended as a first exploration of the question of what makes an example helpful to the L2 student from the perspective of L2 teachers. An important motivation for this work is that selecting examples from a dictionary or randomly from a corpus has several drawbacks: first, the number of available dictionary examples is limited; second, the examples fail to take into account the context in which the word was encountered; and third, the rationale and precise principles behind the selection of dictionary examples is usually less than clear. Central to this thesis is the hypothesis that a random selection of example sentences from a suitable corpus can be improved by a guided selection process that takes into account characteristics of helpful examples. This is investigated by an empirical study conducted with teachers of L2 German. The teacher data show that four dimensions are significant criteria amenable to analysis: (a) reduced syntactic complexity, (b) sentence similarity, provision of (c) significant co-occurrences and (d) semantically related words. Models based on these dimensions are developed using logistic regression analysis, and evaluated through two further empirical studies with teachers and students of L2 German. The results of the teacher evaluation are encouraging: for the teacher evaluation, they indicate that, for one of the models, the top-ranked selections perform on the same level as dictionary examples. In addition, the model provides a ranking of potential examples that roughly corresponds to that of experienced teachers of L2 German. The student evaluation confirms and notably improves on the teacher evaluation in that the best-performing model of the teacher evaluation significantly outperforms both random corpus selections and dictionary examples (when a penalty for missing entries is included). Read more 371.3
2	Self-Supervised Fine-Tuning of sentence embedding models using a Smooth Inverse Frequency model : Automatic creation of labels with Smooth Inverse Frequency model / Självövervakad finjustering av modeller för inbäddning av meningar med hjälp av en Smooth Inverse Frequency-modell : Automatiskt skapande av etiketter med Smooth Inverse Frequency-modellen Pellegrini, Vittorio January 2023 (has links) Sentence embedding models play a key role in the field of Natural Language Processing. They can be exploited for the resolution of several tasks like sentence paraphrasing, sentence similarity, and sentence clustering. Fine-tuning pre-trained models for sentence embedding extraction is a common practice that allows it to reach state-of-the-art performance on downstream tasks. Nevertheless, this practice usually requires labeled data sets. This thesis project aims to overcome this issue by introducing a novel technique for the automatic creation of a target set for fine-tuning sentence embedding models for a specific downstream task. The technique is evaluated on three distinct tasks: sentence paraphrasing, sentence similarity, and sentence clustering. The results demonstrate a significant improvement in sentence embedding models when employing the Smooth Inverse Frequency technique for automatic extraction and labeling of sentence pairs. In the paraphrasing task, the proposed technique yields a noteworthy enhancement of 2.3% in terms of F1-score compared to the baseline results. Moreover, it showcases a 0.2% improvement in F1-score when compared to the ideal scenario where real labels are utilized. For the sentence similarity task, the proposed method achieves a Pearson score of 0.71, surpassing the baseline model’s score of 0.476. However, it falls short of the ideal model trained with human annotations, which attains a Pearson score of 0.845. Regarding the clustering task, from a quantitative standpoint, the best model achieves a harmonic mean (calculated using DBCV and cophenetic score) of 0.693, outperforming the baseline score of 0.671. Nevertheless, the qualitative assessment did not demonstrate a substantial improvement for the clustering task, highlighting the need for exploring alternative techniques to enhance performance in this area. / Modeller för inbäddning av meningar spelar en nyckelroll inom området Natural Language språkbehandling. De kan utnyttjas för att lösa flera uppgifter som meningsparafrasering, meningslikhet och meningsklustring. Fin- och finjustering av förtränade modeller för extraktion av meningsinbäddning är en vanlig praxis som gör det möjligt att nå toppmoderna prestanda på nedströmsuppgifter. Denna metod kräver dock vanligtvis märkta datauppsättningar. Detta avhandlingsprojekt syftar till att lösa detta problem genom att introducera en ny teknik för det automatiska skapandet av en måluppsättning för finjustering av meningsinbäddningsmodeller för en specifik nedströmsuppgift. Tekniken utvärderas på tre olika uppgifter uppgifter: meningsparafrasering, meningslikhet och meningsklustring. Resultaten visar en betydande förbättring av modellerna för inbäddning av meningar när Smooth Inverse Frequency-tekniken används för automatisk extraktion och märkning av meningspar. I parafraseringsuppgiften ger den föreslagna tekniken en anmärkningsvärd förbättring på 2,3% när det gäller F1-score jämfört med baslinjens resultat. Dessutom visar den en förbättring på 0,2% i F1-score jämfört med det ideala scenariot där riktiga etiketter används. För meningslikhetsuppgiften uppnår den föreslagna metoden en Pearson-poäng på 0,71, vilket överträffar baslinjemodellens poäng på 0,476. Det faller dock under den ideala modellen som tränats med mänskliga anteckningar, vilket uppnår en Pearson-poäng på 0.845. När det gäller klustringsuppgiften uppnår den bästa modellen ur kvantitativ synvinkel ett harmoniskt medelvärde (beräknat med DBCV och cophenetic score) på 0,693, vilket överträffar baslinjens poäng på 0,671. Den kvalitativa bedömningen visade dock inte på någon väsentlig förbättring för klustringsuppgiften, vilket understryker behovet av att utforska alternativa tekniker för att förbättra prestandan inom detta område. Translated with www.DeepL.com/Translator (free version) Read more Natural Language Processing sentence embeddings Transformer-based architectures sentence paraphrasing sentence similarity sentence clustering Naturlig språkbehandling inbäddning av meningar transformatorbaserade arkitekturer parafrasering av meningar meningslikhet Dockerbehållare Prestandajustering Computer and Information Sciences Data- och informationsvetenskap

Search results

Investigating the selection of example sentences for unknown target words in ICALL reading texts for L2 German