71 |
Efficient Sentiment Analysis and Topic Modeling in NLP using Knowledge Distillation and Transfer Learning / Effektiv sentimentanalys och ämnesmodellering inom NLP med användning av kunskapsdestillation och överföringsinlärningMalki, George January 2023 (has links)
This abstract presents a study in which knowledge distillation techniques were applied to a Large Language Model (LLM) to create smaller, more efficient models without sacrificing performance. Three configurations of the RoBERTa model were selected as ”student” models to gain knowledge from a pre-trained ”teacher” model. Multiple steps were used to improve the knowledge distillation process, such as copying some weights from the teacher to the student model and defining a custom loss function. The selected task for the knowledge distillation process was sentiment analysis on Amazon Reviews for Sentiment Analysis dataset. The resulting student models showed promising performance on the sentiment analysis task capturing sentiment-related information from text. The smallest of the student models managed to obtain 98% of the performance of the teacher model while being 45% lighter and taking less than a third of the time to analyze an entire the entire IMDB Dataset of 50K Movie Reviews dataset. However, the student models struggled to produce meaningful results on the topic modeling task. These results were consistent with the topic modeling results from the teacher model. In conclusion, the study showcases the efficacy of knowledge distillation techniques in enhancing the performance of LLMs on specific downstream tasks. While the model excelled in sentiment analysis, further improvements are needed to achieve desirable outcomes in topic modeling. These findings highlight the complexity of language understanding tasks and emphasize the importance of ongoing research and development to further advance the capabilities of NLP models. / Denna sammanfattning presenterar en studie där kunskapsdestilleringstekniker tillämpades på en stor språkmodell (Large Language Model, LLM) för att skapa mindre och mer effektiva modeller utan att kompremissa på prestandan. Tre konfigurationer av RoBERTa-modellen valdes som ”student”-modeller för att inhämta kunskap från en förtränad ”teacher”-modell. Studien mäter även modellernas prestanda på två ”DOWNSTREAM” uppgifter, sentimentanalys och ämnesmodellering. Flera steg användes för att förbättra kunskapsdestilleringsprocessen, såsom att kopiera vissa vikter från lärarmodellen till studentmodellen och definiera en anpassad förlustfunktion. Uppgiften som valdes för kunskapsdestilleringen var sentimentanalys på datamängden Amazon Reviews for Sentiment Analysis. De resulterande studentmodellerna visade lovande prestanda på sentimentanalysuppgiften genom att fånga upp information relaterad till sentiment från texten. Den minsta av studentmodellerna lyckades erhålla 98% av prestandan hos lärarmodellen samtidigt som den var 45% lättare och tog mindre än en tredjedel av tiden att analysera hela IMDB Dataset of 50K Movie Reviews datasettet.Dock hade studentmodellerna svårt att producera meningsfulla resultat på ämnesmodelleringsuppgiften. Dessa resultat överensstämde med ämnesmodelleringsresultaten från lärarmodellen. Dock hade studentmodellerna svårt att producera meningsfulla resultat på ämnesmodelleringsuppgiften. Dessa resultat överensstämde med ämnesmodelleringsresultaten från lärarmodellen.
|
72 |
Towards a Language Model for Stenography : A Proof of ConceptLangstraat, Naomi Johanna January 2022 (has links)
The availability of the stenographic manuscripts of Astrid Lindgren have sparked an interest in the creation of a language model for stenography. By its very nature stenography is low-resource and the unavailability of data requires a tool for using normal data. The tool presented in this thesis is to create stenographic data from manipulating orthographic data. Stenographic data is distinct from orthographic data through three different types manipulations that can be carried out. Firstly stenography is based on a phonetic version of language, secondly it used its own alphabet that is distinct from normal orthographic data, and thirdly it used several techniques to compress the data. The first type of manipulation is done by using a grapheme-to-phoneme converter. The second type is done by using an orthographic representation of a stenographic alphabet. The third type of manipulation is done by manipulating based on subword level, word level and phrase level. With these manipulations different datasets are created with different combinations of these manipulations. Results are measured for both perplexity on a GPT-2 language model and for compression rate on the different datasets. These results show a general decrease of perplexity scores and a slight compression rate across the board. We see that the lower perplexity scores are possibly due to the growth of ambiguity.
|
73 |
The Impact of the Retrieval Text Set for Text Sentiment Classification With the Retrieval-Augmented Language Model REALM / Effekten av hämtningstextsetet för sentimenttextklassificering med den hämtningsförstärkta språkmodellen REALMBlommegård, Oscar January 2023 (has links)
Large Language Models (LLMs) have demonstrated impressive results across various language technology tasks. By training on large corpora of diverse text collections from the internet, these models learn to process text effectively, allowing them to acquire comprehensive world knowledge. However, this knowledge is stored implicitly in the parameters of the model, and it is necessary to train ever-larger networks to capture more information. Retrieval-augmented language models have been proposed as a way of improving the interpretability and adaptability of normal language models by utilizing a separate retrieval text set during application time. These models have demonstrated state-of-the-art results on knowledge-intensive tasks such as question-answering and fact-checking. However, their effectiveness in text classification remains unexplored. This study investigates the impact of the retrieval text set on the performance of the retrieval-augmented language model REALM model for sentiment text classification tasks. The results indicate that the addition of retrieval text data fails to improve the prediction capabilities of REALM for sentiment text classification tasks. This outcome is mainly due to the difference in functionality of the retrieval mechanisms during pre-training and fine-tuning. During pre-training, the neural knowledge retriever focuses on retrieving factual knowledge such as dates, cities and names to enhance the prediction of the model. During fine-tuning, the retriever aims to retrieve texts that can strengthen the prediction of the text sentiment classification task. The findings suggest that retrieval models may hold limited potential to enhance performance for text sentiment classification tasks. / Stora språkmodeller har visat imponerande resultat inom många olika språkteknologiska uppgifter. Genom att träna på stora textmängder från internet lär sig dessa modeller att effektivt processa text, vilket gör att de kan förvärva omfattande världskunskap. Denna kunskap lagras emellertid implicit i modellernas parametrar, och det är nödvändigt att träna allt större nätverk för att fånga mer information. Hämtningsförstärkta språkmodeller (retrieval-augmented language models) har föreslagits som ett sätt att förbättra tolknings- och anpassningsförmågan hos språkmodeller genom att använda en separat hämtningstextmängd (retrieval text set) vid prediktion. Dessa modeller har visat imponerande resultat på kunskapsintensiva uppgifter som frågebesvarande (question-answering) och faktakontroll. Deras effektivitet för textklassificering är dock outforskad. Denna studie undersöker effekten av hämtningstextmängden på prestandan för den hämtningsförstärkta språkmodellen REALM för sentimenttextklassificeringsuppgifter. Resultaten indikerar att användning av hämtningstextmängd vid predicering inte lyckas förbättra REALM prediktionsförmåga för sentimenttextklassificeringsuppgifter. Detta beror främst på skillnaden i funktionalitet hos hämtningsmekanismen under förträning och finjustering. Under förträningen fokuserar hämtningsmekanismen på att hämta fakta som datum, städer och namn för att förbättra modellens predicering. Under finjusteringen syftar hätmningsmekanismen till att hämta texter som kan stärka förutsägelsen av sentimenttextklassificeringsuppgiften. Resultaten tyder på att hämtningsförstärkta modeller kan ha begränsad potential att förbättra prestandan för sentimenttextklassificeringsuppgifter.
|
74 |
(A)I want to start a podcast : En designbaserad & kvalitativ studie om AI verktyg i podcastproduktionGrimberg, Vilhelm, Kenez, Xander January 2024 (has links)
This study investigates the application and implications of AI-generated content in podcast production. The research particularly explores the use of text-to-speech (TTS) systems and AI language models to simulate authentic-sounding conversations. This study analyzes listener responses to different AI-generated and human-edited podcast episodes through a series of prototypes and interviews with listeners. Findings suggest that listeners often perceive AI-generated conversations as less authentic and natural than human-made ones, especially due to issues like unnatural intonation and a lack of natural discourse markers. Despite these challenges, improvements were noted in later prototypes where manual editing was combined with AI-generated content. This highlights the potential for AI to complement human creativity in podcast production. The study concludes that for AI-generated content to achieve the desired level of authenticity, further involvement of human intuition is necessary. Future research should explore refining AI models to better simulate natural conversation flow and focus on enhancing the nuances of human-like speech. The findings also underline the potential of AI tools to revolutionize podcast production workflows. / Denna studie undersöker användningen och implikationerna av AI-genererat innehåll i podcastproduktion. Forskningen utforskar särskilt användningen av text-till-tal-system (TTS) och AI-språkmodeller för att simulera samtal som låter autentiska. Studien analyserar lyssnarreaktioner på olika AI-genererade och mänskligt redigerade poddavsnitt genom en serie prototyper och intervjuer med lyssnare. Resultaten visar att lyssnare ofta upplever AI-genererade samtal som mindre autentiska och naturliga än de som skapats av människor. Särskilt på grund av problem som onaturliga betoningar och brist på naturliga diskurspartiklar. Trots dessa utmaningar märktes förbättringar i senare prototyper där manuell redigering kombinerades med AI-genererat innehåll, vilket belyser potentialen för AI att komplettera mänsklig kreativitet i podcastproduktion. Genom forskningen dras slutsatsen att AI-genererat innehåll kräver ytterligare integration av mänsklig intuition för att uppnå önskad nivå av autenticitet. Framtida forskning bör utforska hur AI-modeller kan förfinas för att bättre simulera naturligt samtalsflöde och fokusera på att förbättra nyanserna i mänskligt tal. Resultaten understryker också potentialen hos AI-verktyg att revolutionera arbetsflödena för podcastproduktion.
|
75 |
Word Classes in Language ModellingErikson, Emrik, Åström, Marcus January 2024 (has links)
This thesis concerns itself with word classes and their application to language modelling.Considering a purely statistical Markov model trained on sequences of word classes in theSwedish language different problems in language engineering are examined. Problemsconsidered are part-of-speech tagging, evaluating text modifiers such as translators withthe help of probability measurements and matrix norms, and lastly detecting differenttypes of text using the Fourier transform of cross entropy sequences of word classes.The results show that the word class language model is quite weak by itself but that itis able to improve part-of-speech tagging for 1 and 2 letter models. There are indicationsthat a stronger word class model could aid 3-letter and potentially even stronger models.For evaluating modifiers the model is often able to distinguish between shuffled andsometimes translated text as well as to assign a score as to how much a text has beenmodified. Future work on this should however take better care to ensure large enoughtest data. The results from the Fourier approach indicate that a Fourier analysis of thecross entropy sequence between word classes may allow the model to distinguish betweenA.I. generated text as well as translated text from human written text. Future work onmachine learning word class models could be carried out to get further insights into therole of word class models in modern applications. The results could also give interestinginsights in linguistic research regarding word classes.
|
76 |
Weighted Parsing Formalisms Based on Regular Tree GrammarsMörbitz, Richard 06 November 2024 (has links)
This thesis is situated at the boundary between formal language theory, algebra, and natural language processing (NLP).
NLP knows a wide range of language models:
from the simple n-gram models to the recently successful large language models (LLM).
Formal approaches to NLP view natural languages as formal languages, i.e., infinite sets of strings, where each phrase is seen as a string, and they seek finite descriptions of these sets.
Beyond language modeling, NLP deals with tasks such as syntactic analysis (or parsing), translation, information retrieval, and many others.
Solving such tasks using language models involves two steps:
Given a phrase of natural language, the model first builds a representation of the phrase and then computes the solution from that representation.
Formal language models usually employ trees or similar structures as representations, whose evaluation to output values can be elegantly described using algebra.
Chomsky introduced phrase structure grammars, which describe a process of generating strings using rewriting rules.
For modeling natural language, these rules follow an important aspect of its syntax: constituency, i.e., the hierarchical structure of phrases.
The best known grammar formalism is given by context-free grammars (CFG).
However, CFG fail to model discontinuities in constituency, where several non-adjacent parts of a phrase form a subphrase.
For instance, the German sentence “ich war auch einkaufen” can be understood so that “ich auch” is a noun phrase; it is discontinuous because it is interleaved by the verb “war”.
This problem can be solved by employing more expressive grammar formalisms such as linear context-free rewriting systems (LCFRS).
There are also grammar formalisms that generate sets of trees, e.g., regular tree grammars (RTG).
A similar formalisms exists with finite-state tree automata (FTA) whose semantics is defined in terms of accepting an input rather than generating it, but FTA and RTG have the same expressiveness.
Universal algebra lets us view trees as elements of a term algebra, which can evaluated to values in another algebra by applying a unique homomorphism.
For instance, the strings generated by a CFG can be obtained by evaluating trees over the rules of the CFG in this way.
Parsing is the problem of computing the constituency structure of a given phrase. Due to the ambiguity of natural language, several such structures may exist.
This problem can be extended by weights such as probabilities in order to compute, for instance, the best constituency structure.
The framework of semiring parsing abstracts from particular weights and is instead parameterized by a semiring, whereby many NLP problems can be obtained by plugging in an appropriate semiring.
However, the semiring parsing algorithm is only applicable to some problem instances. Weighted deductive parsing is a similar framework that employs a different algorithm, and thus its applicability differs.
We introduce a very general language model in the form of the RTG-based language model (RTG-LM) which consists of an RTG and a language algebra.
The RTG generates the constituency structures of a language and, inspired by the initial algebra semantics, the language algebra evaluates these structures to elements of the modeled language; we call these elements syntactic objects.
Through the free choice of the language algebra, many common grammar formalisms, such as CFG and LCFRS, are covered.
We add multioperator monoids, a generalization of semirings, as a weight algebra to RTG-LM and obtain weighted RTG-based language models (wRTG-LM).
This lets us define an abstract weighted parsing problem, called the M-monoid parsing problem.
Its inputs are a wRTG-LM 𝐺 and a syntactic object 𝑎, and it states to compute all representations that 𝐺 has for 𝑎 using the language algebra.
Then, these representations are evaluated to values in the weight algebra, and the values of all these representations are summed to a single output value.
We propose the M-monoid parsing algorithm to solve this problem. It generalizes both the semiring parsing algorithm and the weighted deductive parsing algorithm in a way that is inspired by Mohri's single-source shortest distance algorithm.
We prove two sufficient conditions for the termination and correctness of our algorithm.
We show that our framework covers semiring parsing, weighted deductive parsing, and other problems from NLP and beyond.
In the second part of this thesis, we explore constituent tree automata (CTA), a generalization of FTA, as a language model that is tailored towards modeling discontinuitiy.
We show several properties of CTA, including that their constituency parsing problem is an instance of our M-monoid parsing problem and can, for a large class of CTA, be solved by the M-monoid parsing algorithm.
This thesis aims to contribute a unifying formal framework for the specification of language models and NLP tasks.
Through our general M-monoid parsing algorithm, we also provide a means of investigating the algorithmic solvability of problems within this field.
|
77 |
Contextual short-term memory for LLM-based chatbot / Kontextuellt korttidsminne för en LLM-baserad chatbotLauri Aleksi Törnwall, Mikael January 2023 (has links)
The evolution of Language Models (LMs) has enabled building chatbot systems that are capable of human-like dialogues without the need for fine-tuning the chatbot for a specific task. LMs are stateless, which means that a LM-based chatbot does not have a recollection of the past conversation unless it is explicitly included in the input prompt. LMs have limitations in the length of the input prompt, and longer input prompts require more computational and monetary resources, so for longer conversations, it is often infeasible to include the whole conversation history in the input prompt. In this project a short-term memory module is designed and implemented to provide the chatbot context of the past conversation. We are introducing two methods, LimContext method and FullContext method, for producing an abstractive summary of the conversation history, which encompasses much of the relevant conversation history in a compact form that can then be supplied with the input prompt in a resource-effective way. To test these short-term memory implementations in practice, a user study is conducted where these two methods are introduced to 9 participants. Data is collected during the user study and each participant answers a survey after the conversation. These results are analyzed to assess the user experience of the two methods and the user experience between the two methods, and to assess the effectiveness of the prompt design for both answer generation and abstractive summarization tasks. According to the statistical analysis, the FullContext method method produced a better user experience, and this finding was in line with the user feedback. / Utvecklingen av LMs har gjort det möjligt att bygga chatbotsystem kapabla till mänskliga dialoger utan behov av att finjustera chatboten för ett specifikt uppdrag. LMs är stateless, vilket betyder att en chatbot baserad på en LM inte sparar tidigare delar av konversationen om de inte uttryckligen ingår i prompten. LMs begränsar längden av prompten, och längre prompter kräver mer beräknings- och monetära resurser. Således är det ofta omöjligt att inkludera hela konversationshistoriken i prompten. I detta projekt utarbetas och implementeras en korttidsminnesmodul, vars syfte är att tillhandahålla chatboten kontexten av den tidigare konversationen. Vi introducerar två metoder, LimContext metod och FullContext metod, för att ta fram en abstrakt sammanfattning av konversationshistoriken. Sammanfattningen omfattar mycket av det relevanta samtalet i en kompakt form, och kan sedan resurseffektivt förses med den påföljande prompten. För att testa dessa korttidsminnesimplementationer i praktiken genomförs en användarstudie där de två metoderna introduceras för 9-deltagare. Data samlas in under användarstudier. Varje deltagare svarar på en enkät efter samtalet. Resultaten analyseras för att bedöma användarupplevelsen av de två metoderna och användarupplevelsen mellan de två metoderna, och för att bedöma effektiviteten av den snabba designen för både svarsgenerering och abstrakta summeringsuppgifter. Enligt den statistiska analysen gav metoden FullContext metod en bättre användarupplevelse. Detta fynd var även i linje med användarnas feedback.
|
78 |
Concept oriented biomedical information retrievalShen, Wei 08 1900 (has links)
Le domaine biomédical est probablement le domaine où il y a les ressources les plus riches. Dans ces ressources, on regroupe les différentes expressions exprimant un concept, et définit des relations entre les concepts. Ces ressources sont construites pour faciliter l’accès aux informations dans le domaine. On pense généralement que ces ressources sont utiles pour la recherche d’information biomédicale. Or, les résultats obtenus jusqu’à présent sont mitigés : dans certaines études, l’utilisation des concepts a pu augmenter la performance de recherche, mais dans d’autres études, on a plutôt observé des baisses de performance. Cependant, ces résultats restent difficilement comparables étant donné qu’ils ont été obtenus sur des collections différentes. Il reste encore une question ouverte si et comment ces ressources peuvent aider à améliorer la recherche d’information biomédicale. Dans ce mémoire, nous comparons les différentes approches basées sur des concepts dans un même cadre, notamment l’approche utilisant les identificateurs de concept comme unité de représentation, et l’approche utilisant des expressions synonymes pour étendre la requête initiale. En comparaison avec l’approche traditionnelle de "sac de mots", nos résultats d’expérimentation montrent que la première approche dégrade toujours la performance, mais la seconde approche peut améliorer la performance. En particulier, en appariant les expressions de concepts comme des syntagmes stricts ou flexibles, certaines méthodes peuvent apporter des améliorations significatives non seulement par rapport à la méthode de "sac de mots" de base, mais aussi par rapport à la méthode de Champ Aléatoire Markov (Markov Random Field) qui est une méthode de l’état de l’art dans le domaine. Ces résultats montrent que quand les concepts sont utilisés de façon appropriée, ils peuvent grandement contribuer à améliorer la performance de recherche d’information biomédicale. Nous avons participé au laboratoire d’évaluation ShARe/CLEF 2014 eHealth. Notre résultat était le meilleur parmi tous les systèmes participants. / Health and biomedical area is probably the area where there are the richest domain resources. In these resources, different expressions are clustered into well defined concepts. They are designed to facilitate public access to the health information and are widely believed to be useful for biomedical information retrieval. However the results of previous works are highly mitigated: in some studies, concepts slightly improve the retrieval performance, while in some others degradations are observed. It is however difficult to compare the results directly due to the fact that they have been performed on different test collections. It is still unclear whether and how medical information retrieval can benefit from these knowledge resources. In this thesis we aim at comparing in the same framework two families of approaches to exploit concepts - using concept IDs as the representation units or using synonymous concept expressions to expand the original query. Compared to a traditional bag-of-words (BOW) baseline, our experiments on test collections show that concept IDs always degrades retrieval effectiveness, whereas the second approach can lead to some improvements. In particular, by matching the concept expressions as either strict or flexible phrases, some methods can lead to significant improvement over the BOW baseline and even over MRF model on most query sets. This study shows experimentally that when concepts are used in a suitable way, it can help improve the effectiveness of medical information retrieval. We participated at the ShARe/CLEF 2014 eHealth Evaluation Lab. Our result was the best among all the participating systems.
|
79 |
Continuous space models with neural networks in natural language processing / Modèles neuronaux pour la modélisation statistique de la langueLe, Hai Son 20 December 2012 (has links)
Les modèles de langage ont pour but de caractériser et d'évaluer la qualité des énoncés en langue naturelle. Leur rôle est fondamentale dans de nombreux cadres d'application comme la reconnaissance automatique de la parole, la traduction automatique, l'extraction et la recherche d'information. La modélisation actuellement état de l'art est la modélisation "historique" dite n-gramme associée à des techniques de lissage. Ce type de modèle prédit un mot uniquement en fonction des n-1 mots précédents. Pourtant, cette approche est loin d'être satisfaisante puisque chaque mot est traité comme un symbole discret qui n'a pas de relation avec les autres. Ainsi les spécificités du langage ne sont pas prises en compte explicitement et les propriétés morphologiques, sémantiques et syntaxiques des mots sont ignorées. De plus, à cause du caractère éparse des langues naturelles, l'ordre est limité à n=4 ou 5. Sa construction repose sur le dénombrement de successions de mots, effectué sur des données d'entrainement. Ce sont donc uniquement les textes d'apprentissage qui conditionnent la pertinence de la modélisation n-gramme, par leur quantité (plusieurs milliards de mots sont utilisés) et leur représentativité du contenu en terme de thématique, époque ou de genre. L'usage des modèles neuronaux ont récemment ouvert de nombreuses perspectives. Le principe de projection des mots dans un espace de représentation continu permet d'exploiter la notion de similarité entre les mots: les mots du contexte sont projetés dans un espace continu et l'estimation de la probabilité du mot suivant exploite alors la similarité entre ces vecteurs. Cette représentation continue confère aux modèles neuronaux une meilleure capacité de généralisation et leur utilisation a donné lieu à des améliorations significative en reconnaissance automatique de la parole et en traduction automatique. Pourtant, l'apprentissage et l'inférence des modèles de langue neuronaux à grand vocabulaire restent très couteux. Ainsi par le passé, les modèles neuronaux ont été utilisés soit pour des tâches avec peu de données d'apprentissage, soit avec un vocabulaire de mots à prédire limités en taille. La première contribution de cette thèse est donc de proposer une solution qui s’appuie sur la structuration de la couche de sortie sous forme d’un arbre de classification pour résoudre ce problème de complexité. Le modèle se nomme Structure OUtput Layer (SOUL) et allie une architecture neuronale avec les modèles de classes. Dans le cadre de la reconnaissance automatique de la parole et de la traduction automatique, ce nouveau type de modèle a permis d'obtenir des améliorations significatives des performances pour des systèmes à grande échelle et à état l'art. La deuxième contribution de cette thèse est d'analyser les représentations continues induites et de comparer ces modèles avec d'autres architectures comme les modèles récurrents. Enfin, la troisième contribution est d'explorer la capacité de la structure SOUL à modéliser le processus de traduction. Les résultats obtenus montrent que les modèles continus comme SOUL ouvrent des perspectives importantes de recherche en traduction automatique. / The purpose of language models is in general to capture and to model regularities of language, thereby capturing morphological, syntactical and distributional properties of word sequences in a given language. They play an important role in many successful applications of Natural Language Processing, such as Automatic Speech Recognition, Machine Translation and Information Extraction. The most successful approaches to date are based on n-gram assumption and the adjustment of statistics from the training data by applying smoothing and back-off techniques, notably Kneser-Ney technique, introduced twenty years ago. In this way, language models predict a word based on its n-1 previous words. In spite of their prevalence, conventional n-gram based language models still suffer from several limitations that could be intuitively overcome by consulting human expert knowledge. One critical limitation is that, ignoring all linguistic properties, they treat each word as one discrete symbol with no relation with the others. Another point is that, even with a huge amount of data, the data sparsity issue always has an important impact, so the optimal value of n in the n-gram assumption is often 4 or 5 which is insufficient in practice. This kind of model is constructed based on the count of n-grams in training data. Therefore, the pertinence of these models is conditioned only on the characteristics of the training text (its quantity, its representation of the content in terms of theme, date). Recently, one of the most successful attempts that tries to directly learn word similarities is to use distributed word representations in language modeling, where distributionally words, which have semantic and syntactic similarities, are expected to be represented as neighbors in a continuous space. These representations and the associated objective function (the likelihood of the training data) are jointly learned using a multi-layer neural network architecture. In this way, word similarities are learned automatically. This approach has shown significant and consistent improvements when applied to automatic speech recognition and statistical machine translation tasks. A major difficulty with the continuous space neural network based approach remains the computational burden, which does not scale well to the massive corpora that are nowadays available. For this reason, the first contribution of this dissertation is the definition of a neural architecture based on a tree representation of the output vocabulary, namely Structured OUtput Layer (SOUL), which makes them well suited for large scale frameworks. The SOUL model combines the neural network approach with the class-based approach. It achieves significant improvements on both state-of-the-art large scale automatic speech recognition and statistical machine translations tasks. The second contribution is to provide several insightful analyses on their performances, their pros and cons, their induced word space representation. Finally, the third contribution is the successful adoption of the continuous space neural network into a machine translation framework. New translation models are proposed and reported to achieve significant improvements over state-of-the-art baseline systems.
|
80 |
Dynamický dekodér pro rozpoznávání řeči / Dynamic Decoder for Speech RecognitionVeselý, Michal January 2017 (has links)
The result of this work is a fully working and significantly optimized implementation of a dynamic decoder. This decoder is based on dynamic recognition network generation and decoding by a modified version of the Token Passing algorithm. The implemented solution provides very similar results to the original static decoder from BSCORE (API of Phonexia company). Compared to BSCORE this implementation offers significant reduction of memory usage. This makes use of more complex language models possible. It also facilitates integration the speech recognition to some mobile devices or dynamic adding of new words to the system.
|
Page generated in 0.0916 seconds