Spelling suggestions: "subject:"work representations""
1 |
Utilisation de représentations de mots pour l’étiquetage de rôles sémantiques suivant FrameNetLéchelle, William 01 1900 (has links)
Dans la sémantique des cadres de Fillmore, les mots prennent leur sens par rapport au contexte événementiel ou situationnel dans lequel ils s’inscrivent. FrameNet, une ressource lexicale pour l’anglais, définit environ 1000 cadres conceptuels,
couvrant l’essentiel des contextes possibles.
Dans un cadre conceptuel, un prédicat appelle des arguments pour remplir les
différents rôles sémantiques associés au cadre (par exemple : Victime, Manière,
Receveur, Locuteur). Nous cherchons à annoter automatiquement ces rôles sémantiques, étant donné le cadre sémantique et le prédicat.
Pour cela, nous entrainons un algorithme d’apprentissage machine sur des arguments dont le rôle est connu, pour généraliser aux arguments dont le rôle est
inconnu. On utilisera notamment des propriétés lexicales de proximité sémantique
des mots les plus représentatifs des arguments, en particulier en utilisant des représentations vectorielles des mots du lexique. / According to Frame Semantics (Fillmore 1976), word meanings are best understood considering the semantic frame they play a role in, for the frame is what gives them context. FrameNet is a lexical database that defines about 1000 semantic frames, along with the roles to be filled by arguments to the predicate calling the frame in a sentence. Our task is to automatically label argument roles, given their position, the frame, and the predicate (sometimes refered to as semantic role labelling).
For this task, I make use of distributed word representations, in order to improve generalisation over the few training exemples available for each frame. A maximum entropy classifier using common features of the arguments is used as a strong baseline to be improved upon.
|
2 |
Clustering of Distributed Word Representations and its Applicability for Enterprise SearchKorger, Christina 04 October 2016 (has links) (PDF)
Machine learning of distributed word representations with neural embeddings is a state-of-the-art approach to modelling semantic relationships hidden in natural language. The thesis “Clustering of Distributed Word Representations and its Applicability for Enterprise Search” covers different aspects of how such a model can be applied to knowledge management in enterprises. A review of distributed word representations and related language modelling techniques, combined with an overview of applicable clustering algorithms, constitutes the basis for practical studies. The latter have two goals: firstly, they examine the quality of German embedding models trained with gensim and a selected choice of parameter configurations. Secondly, clusterings conducted on the resulting word representations are evaluated against the objective of retrieving immediate semantic relations for a given term. The application of the final results to company-wide knowledge management is subsequently outlined by the example of the platform intergator and conceptual extensions."
|
3 |
Word Representations and Machine Learning Models for Implicit Sense Classification in Shallow Discourse ParsingCallin, Jimmy January 2017 (has links)
CoNLL 2015 featured a shared task on shallow discourse parsing. In 2016, the efforts continued with an increasing focus on sense classification. In the case of implicit sense classification, there was an interesting mix of traditional and modern machine learning classifiers using word representation models. In this thesis, we explore the performance of a number of these models, and investigate how they perform using a variety of word representation models. We show that there are large performance differences between word representation models for certain machine learning classifiers, while others are more robust to the choice of word representation model. We also show that with the right choice of word representation model, simple and traditional machine learning classifiers can reach competitive scores even when compared with modern neural network approaches.
|
4 |
Clustering of Distributed Word Representations and its Applicability for Enterprise SearchKorger, Christina 18 August 2016 (has links)
Machine learning of distributed word representations with neural embeddings is a state-of-the-art approach to modelling semantic relationships hidden in natural language. The thesis “Clustering of Distributed Word Representations and its Applicability for Enterprise Search” covers different aspects of how such a model can be applied to knowledge management in enterprises. A review of distributed word representations and related language modelling techniques, combined with an overview of applicable clustering algorithms, constitutes the basis for practical studies. The latter have two goals: firstly, they examine the quality of German embedding models trained with gensim and a selected choice of parameter configurations. Secondly, clusterings conducted on the resulting word representations are evaluated against the objective of retrieving immediate semantic relations for a given term. The application of the final results to company-wide knowledge management is subsequently outlined by the example of the platform intergator and conceptual extensions.":1 Introduction
1.1 Motivation
1.2 Thesis Structure
2 Related Work
3 Distributed Word Representations
3.1 History
3.2 Parallels to Biological Neurons
3.3 Feedforward and Recurrent Neural Networks
3.4 Learning Representations via Backpropagation and Stochastic Gradient Descent
3.5 Word2Vec
3.5.1 Neural Network Architectures and Update Frequency
3.5.2 Hierarchical Softmax
3.5.3 Negative Sampling
3.5.4 Parallelisation
3.5.5 Exploration of Linguistic Regularities
4 Clustering Techniques
4.1 Categorisation
4.2 The Curse of Dimensionality
5 Training and Evaluation of Neural Embedding Models
5.1 Technical Setup
5.2 Model Training
5.2.1 Corpus
5.2.2 Data Segmentation and Ordering
5.2.3 Stopword Removal
5.2.4 Morphological Reduction
5.2.5 Extraction of Multi-Word Concepts
5.2.6 Parameter Selection
5.3 Evaluation Datasets
5.3.1 Measurement Quality Concerns
5.3.2 Semantic Similarities
5.3.3 Regularities Expressed by Analogies
5.3.4 Construction of a Representative Test Set for Evaluation of Paradigmatic Relations
5.3.5 Metrics
5.4 Discussion
6 Evaluation of Semantic Clustering on Word Embeddings
6.1 Qualitative Evaluation
6.2 Discussion
6.3 Summary
7 Conceptual Integration with an Enterprise Search Platform
7.1 The intergator Search Platform
7.2 Deployment Concepts of Distributed Word Representations
7.2.1 Improved Document Retrieval
7.2.2 Improved Query Suggestions
7.2.3 Additional Support in Explorative Search
8 Conclusion
8.1 Summary
8.2 Further Work
Bibliography
List of Figures
List of Tables
Appendix
|
5 |
Étude sur les représentations continues de mots appliquées à la détection automatique des erreurs de reconnaissance de la parole / A study of continuous word representations applied to the automatic detection of speech recognition errorsGhannay, Sahar 20 September 2017 (has links)
Nous abordons, dans cette thèse, une étude sur les représentations continues de mots (en anglais word embeddings) appliquées à la détection automatique des erreurs dans les transcriptions de la parole. Notre étude se concentre sur l’utilisation d’une approche neuronale pour améliorer la détection automatique des erreurs dans les transcriptions automatiques, en exploitant les word embeddings. L’exploitation des embeddings repose sur l’idée que la détection d’erreurs consiste à trouver les possibles incongruités linguistiques ou acoustiques au sein des transcriptions automatiques. L’intérêt est donc de trouver la représentation appropriée du mot qui permet de capturer des informations pertinentes pour pouvoir détecter ces anomalies. Notre contribution dans le cadre de cette thèse porte sur plusieurs axes. D’abord, nous commençons par une étude préliminaire dans laquelle nous proposons une architecture neuronale capable d’intégrer différents types de descripteurs, y compris les embeddings. Ensuite, nous nous focalisons sur une étude approfondie des représentations continues de mots. Cette étude porte d’une part sur l’évaluation de différents types d’embeddings linguistiques puis sur leurs combinaisons. D’autre part, elle s’intéresse aux embeddings acoustiques de mots. Puis, nous présentons une étude sur l’analyse des erreurs de classifications, qui a pour objectif de percevoir les erreurs difficiles à détecter.Finalement, nous exploitons les embeddings linguistiques et acoustiques ainsi que l’information fournie par notre système de détections d’erreurs dans plusieurs cadres applicatifs. / My thesis concerns a study of continuous word representations applied to the automatic detection of speech recognition errors. Our study focuses on the use of a neural approach to improve ASR errors detection, using word embeddings. The exploitation of continuous word representations is motivated by the fact that ASR error detection consists on locating the possible linguistic or acoustic incongruities in automatic transcriptions. The aim is therefore to find the appropriate word representation which makes it possible to capture pertinent information in order to be able to detect these anomalies. Our contribution in this thesis concerns several initiatives. First, we start with a preliminary study in which we propose a neural architecture able to integrate different types of features, including word embeddings. Second, we propose a deep study of continuous word representations. This study focuses on the evaluation of different types of linguistic word embeddings and their combination in order to take advantage of their complementarities. On the other hand, it focuses on acoustic word embeddings. Then, we present a study on the analysis of classification errors, with the aim of perceiving the errors that are difficult to detect. Perspectives for improving the performance of our system are also proposed, by modeling the errors at the sentence level. Finally, we exploit the linguistic and acoustic embeddings as well as the information provided by our ASR error detection system in several downstream applications.
|
Page generated in 0.1243 seconds