Global ETD Search

251	Lexis, Discourse Prosodies and the Taking of Stance : A Corpus Study of the Meaning of ‘Self-proclaimed’ Altemark, Mikael January 2011 (has links) This study is concerned with the description of the semantic and pragmatic characteristics of the attributive adjective self-proclaimed, employing corpus-linguistic methodology to explore its meaning from user-based data. The initial query provided the material from which a lexical pro-file of the target word was constructed, systematically describing collocational data, semantic preferences, semantic associations and discourse prosodies. Qualitative analysis of sample con-cordances illustrated the role of the target word in expressing different kinds of meaning-bearing stances. The results demonstrate the importance of context and communicative functionality as constraints determining meaning, determining the discourse prosodies of self-proclaimed as one of either negation; accepted-positive and accepted-negative. Further, the analysis of self-proclaimed as a stance marker indicates the linking evaluative meanings of extended lexical units to the project of linguistic description of intersubjective stancetaking as a possibly fruitful venue for research self-proclaimed corpus linguistics pragmatics semantics extended lexical unit discourse prosody stance intersubjectivity Specific Languages Studier av enskilda språk
252	Video Game Vocabulary : The effect of video games on Swedish learners‟ word comprehension Laveborn, Joel January 2009 (has links) Video games are very popular among children in the Western world. This study was done in order to investigate if video games had an effect on 49 Swedish students‟ comprehension of English words (grades 7-8). The investigation was based on questionnaire and word test data. The questionnaire aimed to measure with which frequency students were playing video games, and the word test aimed to measure their word comprehension in general. In addition, data from the word test were used to investigate how students explained the words. Depending on their explanations, students were categorized as either using a “video game approach” or a “dictionary approach” in their explanations. The results showed a gender difference, both with regard to the frequency of playing and what types of games that were played. Playing video games seemed to increase the students‟ comprehension of English words, though there was no clear connection between the frequency with which students were playing video games and the choice of a dictionary or video game approach as an explanation. Vocabulary acquisition word comprehension learning principles English as a second language dictionary approach video game approach Specific Languages Studier av enskilda språk
253	Semi-Automatic Translation of Medical Terms from English to Swedish : SNOMED CT in Translation / Semiautomatisk översättning av medicinska termer från engelska till svenska : Översättning av SNOMED CT Lindgren, Anna January 2011 (has links) The Swedish National Board of Health and Welfare has been overseeing translations of the international clinical terminology SNOMED CT from English to Swedish. This study was performed to find whether semi-automatic methods of translation could produce a satisfactory translation while requiring fewer resources than manual translation. Using the medical English-Swedish dictionary TermColl translations of select subsets of SNOMED CT were produced by ways of translation memory and statistical translation. The resulting translations were evaluated via BLEU score using translations provided by the Swedish National Board of Health and Welfare as reference before being compared with each other. The results showed a strong advantage for statistical translation over use of a translation memory; however, overall translation results were far from satisfactory. / Den internationella kliniska terminologin SNOMED CT har översatts från engelska till svenska under ansvar av Socialstyrelsen. Den här studien utfördes för att påvisa om semiautomatiska översättningsmetoder skulle kunna utföra tillräckligt bra översättning med färre resurser än manuell översättning. Den engelsk-svenska medicinska ordlistan TermColl användes som bas för översättning av delmängder av SNOMED CT via översättningsminne och genom statistisk översättning. Med Socialstyrelsens översättningar som referens poängsattes the semiautomatiska översättningarna via BLEU. Resultaten visade att statistisk översättning gav ett betydligt bättre resultat än översättning med översättningsminne, men över lag var resultaten alltför dåliga för att semiautomatisk översättning skulle kunna rekommenderas i detta fall. computational linguistics medical terminology statistical translation translation memory direct translation English Swedish datalingvistik medicinsk terminologi statistisk översättning översättningsminne direktöversättning engelska svenska
254	Automatic Java Code Generator for Regular Expressions and Finite Automata Memeti, Suejb January 2012 (has links) No description available. Finite Automata Regular Expression Automatic Programming Code Generator Computer Sciences Datavetenskap (datalogi) Software Engineering Programvaruteknik
255	Clustering the Web : Comparing Clustering Methods in Swedish / Webbklustring : En jämförelse av klustringsmetoder på svenska Hinz, Joel January 2013 (has links) Clustering -- automatically sorting -- web search results has been the focus of much attention but is by no means a solved problem, and there is little previous work in Swedish. This thesis studies the performance of three clustering algorithms -- k-means, agglomerative hierarchical clustering, and bisecting k-means -- on a total of 32 corpora, as well as whether clustering web search previews, called snippets, instead of full texts can achieve reasonably decent results. Four internal evaluation metrics are used to assess the data. Results indicate that k-means performs worse than the other two algorithms, and that snippets may be good enough to use in an actual product, although there is ample opportunity for further research on both issues; however, results are inconclusive regarding bisecting k-means vis-à-vis agglomerative hierarchical clustering. Stop word and stemmer usage results are not significant, and appear to not affect the clustering by any considerable magnitude. clustering web search results snippets k-means agglomerative hierarchical clustering bisecting k-means swedish Human Computer Interaction
256	Exploring NMF and LDA Topic Models of Swedish News Articles Svensson, Karin, Blad, Johan January 2020 (has links) The ability to automatically analyze and segment news articles by their content is a growing research field. This thesis explores the unsupervised machine learning method topic modeling applied on Swedish news articles for generating topics to describe and segment articles. Specifically, the algorithms non-negative matrix factorization (NMF) and the latent Dirichlet allocation (LDA) are implemented and evaluated. Their usefulness in the news media industry is assessed by its ability to serve as a uniform categorization framework for news articles. This thesis fills a research gap by studying the application of topic modeling on Swedish news articles and contributes by showing that this can yield meaningful results. It is shown that Swedish text data requires extensive data preparation for successful topic models and that nouns exclusively and especially common nouns are the most suitable words to use. Furthermore, the results show that both NMF and LDA are valuable as content analysis tools and categorization frameworks, but they have different characteristics, hence optimal for different use cases. Lastly, the conclusion is that topic models have issues since they can generate unreliable topics that could be misleading for news consumers, but that they nonetheless can be powerful methods for analyzing and segmenting articles efficiently on a grand scale by organizations internally. The thesis project is a collaboration with one of Sweden’s largest media groups and its results have led to a topic modeling implementation for large-scale content analysis to gain insight into readers’ interests. Topic Modeling NMF LDA Swedish News Articles Text Preprocessing Computer and Information Sciences Data- och informationsvetenskap
257	Context matters : Classifying Swedish texts using BERT's deep bidirectional word embeddings Holmer, Daniel January 2020 (has links) When classifying texts using a linear classifier, the texts are commonly represented as feature vectors. Previous methods to represent features as vectors have been unable to capture the context of individual words in the texts, in theory leading to a poor representation of natural language. Bidirectional Encoder Representations from Transformers (BERT), uses a multi-headed self-attention mechanism to create deep bidirectional feature representations, able to model the whole context of all words in a sequence. A BERT model uses a transfer learning approach, where it is pre-trained on a large amount of data and can be further fine-tuned for several down-stream tasks. This thesis uses one multilingual, and two dedicated Swedish BERT models, for the task of classifying Swedish texts as of either easy-to-read or standard complexity in their respective domains. The performance on the text classification task using the different models is then compared both with feature representation methods used in earlier studies, as well as with the other BERT models. The results show that all models performed better on the classification task than the previous methods of feature representation. Furthermore, the dedicated Swedish models show better performance than the multilingual model, with the Swedish model pre-trained on more diverse data outperforming the other. NLP text classification BERT feature representation pre-trained language models transformer networks fine-tuning
258	Detecting Lexical Semantic Change Using Probabilistic Gaussian Word Embeddings Moss, Adam January 2020 (has links) In this work, we test two novel methods of using word embeddings to detect lexical semantic change, attempting to overcome limitations associated with conventional approaches to this problem. Using a diachronic corpus spanning over a hundred years, we generate word embeddings for each decade with the intention of evaluating how meaning changes are represented in embeddings for the same word across time. Our approach differs from previous works in this field in that we encode words as probabilistic Gaussian distributions and bimodal probabilistic Gaussian mixtures, rather than conventional word vectors. We provide a discussion and analysis of our results, comparing the approaches we implemented with those used in previous works. We also conducted further analysis on whether additional information regarding the nature of semantic change could be discerned from particular qualities of the embeddings we generated for our experiments. In our results, we find that encoding words as probabilistic Gaussian embeddings can provide an enhanced degree of reliability with regard to detecting lexical semantic change. Furthermore, we are able to represent additional information regarding the nature of such changes through the variance of these embeddings. Encoding words as bimodal Gaussian mixtures however is generally unsuccessful for this task, proving to be not reliable enough at distinguishing between discrete senses to effectively detect and measure such changes. We provide potential explanations for the results we observe, and propose improvements that can be made to our approach to potentially improve performance. historical linguistics historical semantics lexical semantic change diachronic semantic change word embeddings probabilistic word embeddings gaussian word embeddings
259	Improving Transformer-Based Molecular Optimization Using Reinforcement Learning Chang, PoChun January 2021 (has links) By formulating the task of property-based molecular optimization into a neural machine translation problem, researchers have been able to apply the Transformer model from the field of natural language processing to generate molecules with desirable properties by making a small modification to a given starting molecule. These results verify the capability of Transformer models in capturing the connection between properties and structural changes in molecular pairs. However, the current research only proposes a Transformer model with fixed parameters that can produce limit amount of optimized molecules. Additionally, the trained Transformer model does not always successfully generate optimized output for every molecule and desirable property constraint given. In order to push the Transformer model into real applications where different sets of desirable property constraints in combination of variety of molecules might need to be optimized, these obstacles need to be overcome first. In this work, we present a framework using reinforcement learning as a fine-tuning method for the pre-trained Transformer to induce various output and leverage the prior knowledge of the model for a challenging data point. Our results show that, based on the definition of the scoring function, the Transformer model can generate much larger numbers of optimized molecules for a data point that is considered challenging to the pre-trained model. Meanwhile, we also showcase the relation between the sampling size and the efficiency of the framework in yielding desirable outputs to demonstrate the optimal configuration for future users. Furthermore, we have chemists to inspect the generated molecules and find that the reinforcement learning fine-tuning causes the catastrophic forgetting problem that leads our model into generating unstable molecules. Through maintaining the prior knowledge or applying rule-based scoring component, we demonstrate two strategies that can successfully reduce the effect of catastrophic forgetting as a reference for future research. molecular optimization transformer nlp natural language processing ai drug discovery machine learning deep learning
260	Lost in Transcription : Evaluating Clustering and Few-Shot learningfor transcription of historical ciphers Magnifico, Giacomo January 2021 (has links) Where there has been a steady development of Optical Character Recognition (OCR) techniques for printed documents, the instruments that provide good quality for hand-written manuscripts by Hand-written Text Recognition methods (HTR) and transcriptions are still some steps behind. With the main focus on historical ciphers (i.e. encrypted documents from the past with various types of symbol sets), this thesis examines the performance of two machine learning architectures developed within the DECRYPT project framework, a clustering based unsupervised algorithm and a semi-supervised few-shot deep-learning model. Both models are tested on seen and unseen scribes to evaluate the difference in performance and the shortcomings of the two architectures, with the secondary goal of determining the influences of the datasets on the performance. An in-depth analysis of the transcription results is performed with particular focus on the Alchemic and Zodiac symbol sets, with analysis of the model performance relative to character shape and size. The results show the promising performance of Few-Shot architectures when compared to Clustering algorithm, with a respective SER average of 0.336 (0.15 and 0.104 on seen data / 0.754 on unseen data) and 0.596 (0.638 and 0.350 on seen data / 0.8 on unseen data). Image Recognition Handwritten Text Recognition HTR Deep-learning K-mean clustering NN Neural Network Few-Shot

Search results