• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 30
  • 2
  • 1
  • 1
  • Tagged with
  • 41
  • 41
  • 26
  • 16
  • 14
  • 12
  • 11
  • 10
  • 10
  • 9
  • 9
  • 8
  • 8
  • 8
  • 8
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
31

PLPrepare: A Grammar Checker for Challenging Cases

Hoyos, Jacob 01 May 2021 (has links)
This study investigates one of the Polish language’s most arbitrary cases: the genitive masculine inanimate singular. It collects and ranks several guidelines to help language learners discern its proper usage and also introduces a framework to provide detailed feedback regarding arbitrary cases. The study tests this framework by implementing and evaluating a hybrid grammar checker called PLPrepare. PLPrepare performs similarly to other grammar checkers and is able to detect genitive case usages and provide feedback based on a number of error classifications.
32

Attention Mechanisms for Transition-based Dependency Parsing

Gontrum, Johannes January 2019 (has links)
Transition-based dependency parsing is known to compute the syntactic structure of a sentence efficiently, but is less accurate to predict long-distance relations between tokens as it lacks global information about the sentence. Our main contribution is the integration of attention mechanisms to replace the static token selection with a dynamic approach that takes the complete sequence into account. Though our experiments confirm that our approach fundamentally works, our models do not outperform the baseline parser. We further present a line of follow-up experiments to investigate these results. Our main conclusion is that the BiLSTM of the traditional parser is already powerful enough to encode the required global information into each token, eliminating the need for an attention-driven approach. Our secondary results indicate that the attention models require a neural network with a higher capacity to potentially extract more latent information from the word embeddings and the LSTM than the traditional parser. We further show that positional encodings are not useful for our attention models, though BERT-style positional embeddings slightly improve the results. Finally, we experiment with replacing the LSTM with a Transformer-encoder to test the impact of self-attention. The results are disappointing, though we think that more future research should be dedicated to this. For our work, we implement a UUParser-inspired dependency parser from scratch in PyTorch and extend it with, among other things, full GPU support and mini-batch processing. We publish the code under a permissive open source license at https://github.com/jgontrum/parseridge.
33

Exploring source languages for Faroese in single-source and multi-source transfer learning using language-specific and multilingual language models

Fischer, Kristóf January 2024 (has links)
Cross-lingual transfer learning has been the driving force of low-resource natural language processing in recent years, relying on massively multilingual language models with hopes of solving the data scarcity issue for languages with a limited digital presence. However, this "one-size-fits-all" approach is not equally applicable to all low-resource languages, suggesting limitations of such models in cross-lingual transfer. Besides, known similarities and phylogenetic relationships between source and target languages are often overlooked. In this work, the emphasis is placed on Faroese, a low-resource North Germanic language with several closely related resource-rich sibling languages. The cross-lingual transfer potential from these strong Scandinavian source candidates, as well as from additional genetically related, geographically proximate, and syntactically similar source languages is studied in single-source and multi-source experiments, in terms of Faroese syntactic parsing and part-of-speech tagging. In addition, the effect of task-specific fine-tuning on monolingual, linguistically informed smaller multilingual, and massively multilingual pre-trained language models is explored. The results suggest Icelandic as a strong source candidate, however, only when fine-tuning a monolingual model. With multilingual models, task-specific fine-tuning in Norwegian and Swedish seems even more beneficial. Although they do not surpass fully Scandinavian fine-tuning, models trained on genetically related and syntactically similar languages produce good results. Additionally, the findings indicate that multilingual models outperform models pre-trained on a single language, and that even better results can be achieved using a smaller, linguistically informed model, compared to a massively multilingual one.
34

Predicting Linguistic Structure with Incomplete and Cross-Lingual Supervision

Täckström, Oscar January 2013 (has links)
Contemporary approaches to natural language processing are predominantly based on statistical machine learning from large amounts of text, which has been manually annotated with the linguistic structure of interest. However, such complete supervision is currently only available for the world's major languages, in a limited number of domains and for a limited range of tasks. As an alternative, this dissertation considers methods for linguistic structure prediction that can make use of incomplete and cross-lingual supervision, with the prospect of making linguistic processing tools more widely available at a lower cost. An overarching theme of this work is the use of structured discriminative latent variable models for learning with indirect and ambiguous supervision; as instantiated, these models admit rich model features while retaining efficient learning and inference properties. The first contribution to this end is a latent-variable model for fine-grained sentiment analysis with coarse-grained indirect supervision. The second is a model for cross-lingual word-cluster induction and the application thereof to cross-lingual model transfer. The third is a method for adapting multi-source discriminative cross-lingual transfer models to target languages, by means of typologically informed selective parameter sharing. The fourth is an ambiguity-aware self- and ensemble-training algorithm, which is applied to target language adaptation and relexicalization of delexicalized cross-lingual transfer parsers. The fifth is a set of sequence-labeling models that combine constraints at the level of tokens and types, and an instantiation of these models for part-of-speech tagging with incomplete cross-lingual and crowdsourced supervision. In addition to these contributions, comprehensive overviews are provided of structured prediction with no or incomplete supervision, as well as of learning in the multilingual and cross-lingual settings. Through careful empirical evaluation, it is established that the proposed methods can be used to create substantially more accurate tools for linguistic processing, compared to both unsupervised methods and to recently proposed cross-lingual methods. The empirical support for this claim is particularly strong in the latter case; our models for syntactic dependency parsing and part-of-speech tagging achieve the hitherto best published results for a wide number of target languages, in the setting where no annotated training data is available in the target language.
35

節境界単位での漸進的な独話係り受け解析

Inagaki, Yasuyoshi, Kato, Naoto, Kashioka, Hideki, Matsubara, Shigeki, Ohno, Tomohiro, 稲垣, 康善, 加藤, 直人, 柏岡, 秀紀, 松原, 茂樹, 大野, 誠寛 05 February 2005 (has links)
No description available.
36

同時的な独話音声要約に基づくリアルタイム字幕生成

大野, 誠寛, 松原, 茂樹, 柏岡, 秀紀, 稲垣, 康善 07 1900 (has links) (PDF)
ここに掲載した著作物の利用に関する注意 本著作物の著作権は(社)情報処理学会に帰属します。 本著作物は著作権者である情報処理学会の許可のもとに掲載するものです。 ご利用に当たっては「著作権法」ならびに「情報処理学会倫理綱領」 に従うことをお願いいたします。 Notice for the use of this material The copyright of this material is retained by the Information Processing Society of Japan (IPSJ). This material is published on this web site with the agreement of the author (s) and the IPSJ. Please be complied with Copyright Law of Japan and the Code of Ethics of the IPSJ if any users wish to reproduce, make derivative work, distribute or make available to the public any part or whole thereof. All Rights Reserved, Copyright (C) Information Processing Society of Japan. Comments are welcome. Mail to address:  editj<at>ipsj.or.jp, please.
37

Predicting Linguistic Structure with Incomplete and Cross-Lingual Supervision

Täckström, Oscar January 2013 (has links)
Contemporary approaches to natural language processing are predominantly based on statistical machine learning from large amounts of text, which has been manually annotated with the linguistic structure of interest. However, such complete supervision is currently only available for the world's major languages, in a limited number of domains and for a limited range of tasks. As an alternative, this dissertation considers methods for linguistic structure prediction that can make use of incomplete and cross-lingual supervision, with the prospect of making linguistic processing tools more widely available at a lower cost. An overarching theme of this work is the use of structured discriminative latent variable models for learning with indirect and ambiguous supervision; as instantiated, these models admit rich model features while retaining efficient learning and inference properties. The first contribution to this end is a latent-variable model for fine-grained sentiment analysis with coarse-grained indirect supervision. The second is a model for cross-lingual word-cluster induction and the application thereof to cross-lingual model transfer. The third is a method for adapting multi-source discriminative cross-lingual transfer models to target languages, by means of typologically informed selective parameter sharing. The fourth is an ambiguity-aware self- and ensemble-training algorithm, which is applied to target language adaptation and relexicalization of delexicalized cross-lingual transfer parsers. The fifth is a set of sequence-labeling models that combine constraints at the level of tokens and types, and an instantiation of these models for part-of-speech tagging with incomplete cross-lingual and crowdsourced supervision. In addition to these contributions, comprehensive overviews are provided of structured prediction with no or incomplete supervision, as well as of learning in the multilingual and cross-lingual settings. Through careful empirical evaluation, it is established that the proposed methods can be used to create substantially more accurate tools for linguistic processing, compared to both unsupervised methods and to recently proposed cross-lingual methods. The empirical support for this claim is particularly strong in the latter case; our models for syntactic dependency parsing and part-of-speech tagging achieve the hitherto best published results for a wide number of target languages, in the setting where no annotated training data is available in the target language.
38

Training parsers for low-resourced languages : improving cross-lingual transfer with monolingual knowledge / Apprentissage d'analyseurs syntaxiques pour les langues peu dotées : amélioration du transfert cross-lingue grâce à des connaissances monolingues

Aufrant, Lauriane 06 April 2018 (has links)
Le récent essor des algorithmes d'apprentissage automatique a rendu les méthodes de Traitement Automatique des Langues d'autant plus sensibles à leur facteur le plus limitant : la qualité des systèmes repose entièrement sur la disponibilité de grandes quantités de données, ce qui n'est pourtant le cas que d'une minorité parmi les 7.000 langues existant au monde. La stratégie dite du transfert cross-lingue permet de contourner cette limitation : une langue peu dotée en ressources (la cible) peut être traitée en exploitant les ressources disponibles dans une autre langue (la source). Les progrès accomplis sur ce plan se limitent néanmoins à des scénarios idéalisés, avec des ressources cross-lingues prédéfinies et de bonne qualité, de sorte que le transfert reste inapplicable aux cas réels de langues peu dotées, qui n'ont pas ces garanties. Cette thèse vise donc à tirer parti d'une multitude de sources et ressources cross-lingues, en opérant une combinaison sélective : il s'agit d'évaluer, pour chaque aspect du traitement cible, la pertinence de chaque ressource. L'étude est menée en utilisant l'analyse en dépendance par transition comme cadre applicatif. Le cœur de ce travail est l'élaboration d'un nouveau méta-algorithme de transfert, dont l'architecture en cascade permet la combinaison fine des diverses ressources, en ciblant leur exploitation à l'échelle du mot. L'approche cross-lingue pure n'étant en l'état pas compétitive avec la simple annotation de quelques phrases cibles, c'est avant tout la complémentarité de ces méthodes que souligne l'analyse empirique. Une série de nouvelles métriques permet une caractérisation fine des similarités cross-lingues et des spécificités syntaxiques de chaque langue, de même que de la valeur ajoutée de l'information cross-lingue par rapport au cadre monolingue. L'exploitation d'informations typologiques s'avère également particulièrement fructueuse. Ces contributions reposent largement sur des innovations techniques en analyse syntaxique, concrétisées par la publication en open source du logiciel PanParser, qui exploite et généralise la méthode dite des oracles dynamiques. Cette thèse contribue sur le plan monolingue à plusieurs autres égards, comme le concept de cascades monolingues, pouvant traiter par exemple d'abord toutes les dépendances faciles, puis seulement les difficiles. / As a result of the recent blossoming of Machine Learning techniques, the Natural Language Processing field faces an increasingly thorny bottleneck: the most efficient algorithms entirely rely on the availability of large training data. These technological advances remain consequently unavailable for the 7,000 languages in the world, out of which most are low-resourced. One way to bypass this limitation is the approach of cross-lingual transfer, whereby resources available in another (source) language are leveraged to help building accurate systems in the desired (target) language. However, despite promising results in research settings, the standard transfer techniques lack the flexibility regarding cross-lingual resources needed to be fully usable in real-world scenarios: exploiting very sparse resources, or assorted arrays of resources. This limitation strongly diminishes the applicability of that approach. This thesis consequently proposes to combine multiple sources and resources for transfer, with an emphasis on selectivity: can we estimate which resource of which language is useful for which input? This strategy is put into practice in the frame of transition-based dependency parsing. To this end, a new transfer framework is designed, with a cascading architecture: it enables the desired combination, while ensuring better targeted exploitation of each resource, down to the level of the word. Empirical evaluation dampens indeed the enthusiasm for the purely cross-lingual approach -- it remains in general preferable to annotate just a few target sentences -- but also highlights its complementarity with other approaches. Several metrics are developed to characterize precisely cross-lingual similarities, syntactic idiosyncrasies, and the added value of cross-lingual information compared to monolingual training. The substantial benefits of typological knowledge are also explored. The whole study relies on a series of technical improvements regarding the parsing framework: this work includes the release of a new open source software, PanParser, which revisits the so-called dynamic oracles to extend their use cases. Several purely monolingual contributions complete this work, including an exploration of monolingual cascading, which offers promising perspectives with easy-then-hard strategies.
39

[pt] APRENDIZADO ESTRUTURADO COM INDUÇÃO E SELEÇÃO INCREMENTAIS DE ATRIBUTOS PARA ANÁLISE DE DEPENDÊNCIA EM PORTUGUÊS / [en] STRUCTURED LEARNING WITH INCREMENTAL FEATURE INDUCTION AND SELECTION FOR PORTUGUESE DEPENDENCY PARSING

YANELY MILANES BARROSO 09 November 2016 (has links)
[pt] O processamento de linguagem natural busca resolver várias tarefas de complexidade crescente que envolvem o aprendizado de estruturas complexas, como grafos e sequências, para um determinado texto. Por exemplo, a análise de dependência envolve o aprendizado de uma árvore que descreve a estrutura sintática de uma sentença dada. Um método amplamente utilizado para melhorar a representação do conhecimento de domínio em esta tarefa é considerar combinações de atributos usando conjunções lógicas que codificam informação útil com um padrão não-linear. O número total de todas as combinações possíveis para uma conjunção dada cresce exponencialmente no número de atributos e pode resultar em intratabilidade computacional. Também, pode levar a overfitting. Neste cenário, uma técnica para evitar o superajuste e reduzir o conjunto de atributos faz-se necessário. Uma abordagem comum para esta tarefa baseia-se em atribuir uma pontuação a uma árvore de dependência, usando uma função linear do conjunto de atributos. Sabe-se que os modelos lineares esparsos resolvem simultaneamente o problema de seleção de atributos e a estimativa de um modelo linear, através da combinação de um pequeno conjunto de atributos. Neste caso, promover a esparsidade ajuda no controle do superajuste e na compactação do conjunto de atributos. Devido a sua exibilidade, robustez e simplicidade, o algoritmo de perceptron é um método linear discriminante amplamente usado que pode ser modificado para produzir modelos esparsos e para lidar com atributos não-lineares. Propomos a aprendizagem incremental da combinação de um modelo linear esparso com um procedimento de indução de variáveis não-lineares, num cénario de predição estruturada. O modelo linear esparso é obtido através de uma modificação do algoritmo perceptron. O método de indução é Entropy-Guided Feature Generation. A avaliação empírica é realizada usando o conjunto de dados para português da CoNLL 2006 Shared Task. O analisador resultante alcança 92,98 por cento de precisão, que é um desempenho competitivo quando comparado com os sistemas de estado- da-arte. Em sua versão regularizada, o analizador alcança uma precisão de 92,83 por cento , também mostra uma redução notável de 96,17 por cento do número de atributos binários e, reduz o tempo de aprendizagem em quase 90 por cento, quando comparado com a sua versão não regularizada. / [en] Natural language processing requires solving several tasks of increasing complexity, which involve learning to associate structures like graphs and sequences to a given text. For instance, dependency parsing involves learning of a tree that describes the dependency-based syntactic structure of a given sentence. A widely used method to improve domain knowledge representation in this task is to consider combinations of features, called templates, which are used to encode useful information with nonlinear pattern. The total number of all possible feature combinations for a given template grows exponentialy in the number of features and can result in computational intractability. Also, from an statistical point of view, it can lead to overfitting. In this scenario, it is required a technique that avoids overfitting and that reduces the feature set. A very common approach to solve this task is based on scoring a parse tree, using a linear function of a defined set of features. It is well known that sparse linear models simultaneously address the feature selection problem and the estimation of a linear model, by combining a small subset of available features. In this case, sparseness helps control overfitting and performs the selection of the most informative features, which reduces the feature set. Due to its exibility, robustness and simplicity, the perceptron algorithm is one of the most popular linear discriminant methods used to learn such complex representations. This algorithm can be modified to produce sparse models and to handle nonlinear features. We propose the incremental learning of the combination of a sparse linear model with an induction procedure of non-linear variables in a structured prediction scenario. The sparse linear model is obtained through a modifications of the perceptron algorithm. The induction method is the Entropy-Guided Feature Generation. The empirical evaluation is performed using the Portuguese Dependency Parsing data set from the CoNLL 2006 Shared Task. The resulting parser attains 92.98 per cent of accuracy, which is a competitive performance when compared against the state-of-art systems. On its regularized version, it accomplishes an accuracy of 92.83 per cent, shows a striking reduction of 96.17 per cent in the number of binary features and reduces the learning time in almost 90 per cent, when compared to its non regularized version.
40

Aspektbaserad Sentimentanalys för Business Intelligence inom E-handeln / Aspect-Based Sentiment Analysis for Business Intelligence in E-commerce

Eriksson, Albin, Mauritzon, Anton January 2022 (has links)
Many companies strive to make data-driven decisions. To achieve this, they need to explore new tools for Business Intelligence. The aim of this study was to examine the performance and usability of aspect-based sentiment analysis as a tool for Business Intelligence in E-commerce. The study was conducted in collaboration with Ellos Group AB which supplied anonymous customer feedback data. The implementation consists of two parts, aspect extraction and sentiment classification. The f irst part, aspect extraction, was implemented using dependency parsing and various aspect grouping techniques. The second part, sentiment classification, was implemented using the language model KB-BERT, a Swedish version of the BERT model. The method for aspect extraction achieved a satisfactory precision of 79,5% but only a recall of 27,2%. Moreover, the result for sentiment classification was unsatisfactory with an accuracy of 68,2%. Although the results underperform expectations, we conclude that aspect-based sentiment analysis in general is a great tool for Business Intelligence. Both as a means of generating customer insights from previously unused data and to increase productivity. However, it should only be used as a supportive tool and not to replace existing processes for decision-making. / Många företag strävar efter att fatta datadrivna beslut. För att åstadkomma detta behöver de utforska nya metoder för Business Intelligence. Syftet med denna studie var att undersöka prestandan och användbarheten av aspektbaserad sentimentanalys som ett verktyg för Business Intelligence inom e-handeln. Studien genomfördes i samarbete med Ellos Group AB som tillhandahöll data bestående av anonym kundfeedback. Implementationen består av två delar, aspektextraktion och sentimentklassificering. Aspektextraktion implementerades med hjälp av dependensparsning och olika aspektgrupperingstekniker. Sentimentklassificering implementerades med hjälp av språkmodellen KB-BERT, en svensk version av BERT. Metoden för aspektextraktion uppnådde en tillfredsställande precision på 79,5% men endast en recall på 27,2%. Resultatet för sentimentklassificering var otillfredsställande med en accuracy på 68,2%. Även om resultaten underpresterar förväntningarna drar vi slutsatsen att aspektbaserad sentimentanalys i allmänhet är ett bra verktyg för Business Intelligence. Både som ett sätt att generera kundinsikter från tidigare oanvända data och som ett sätt att öka produktiviteten. Det bör dock endast användas som ett stödjande verktyg och inte ersätta befintliga processer för beslutsfattande.

Page generated in 0.0538 seconds