Return to search

Clustering of Distributed Word Representations and its Applicability for Enterprise Search

Machine learning of distributed word representations with neural embeddings is a state-of-the-art approach to modelling semantic relationships hidden in natural language. The thesis “Clustering of Distributed Word Representations and its Applicability for Enterprise Search” covers different aspects of how such a model can be applied to knowledge management in enterprises. A review of distributed word representations and related language modelling techniques, combined with an overview of applicable clustering algorithms, constitutes the basis for practical studies. The latter have two goals: firstly, they examine the quality of German embedding models trained with gensim and a selected choice of parameter configurations. Secondly, clusterings conducted on the resulting word representations are evaluated against the objective of retrieving immediate semantic relations for a given term. The application of the final results to company-wide knowledge management is subsequently outlined by the example of the platform intergator and conceptual extensions.":1 Introduction
1.1 Motivation
1.2 Thesis Structure

2 Related Work

3 Distributed Word Representations
3.1 History
3.2 Parallels to Biological Neurons
3.3 Feedforward and Recurrent Neural Networks
3.4 Learning Representations via Backpropagation and Stochastic Gradient Descent
3.5 Word2Vec
3.5.1 Neural Network Architectures and Update Frequency
3.5.2 Hierarchical Softmax
3.5.3 Negative Sampling
3.5.4 Parallelisation
3.5.5 Exploration of Linguistic Regularities

4 Clustering Techniques
4.1 Categorisation
4.2 The Curse of Dimensionality

5 Training and Evaluation of Neural Embedding Models
5.1 Technical Setup
5.2 Model Training
5.2.1 Corpus
5.2.2 Data Segmentation and Ordering
5.2.3 Stopword Removal
5.2.4 Morphological Reduction
5.2.5 Extraction of Multi-Word Concepts
5.2.6 Parameter Selection
5.3 Evaluation Datasets
5.3.1 Measurement Quality Concerns
5.3.2 Semantic Similarities
5.3.3 Regularities Expressed by Analogies
5.3.4 Construction of a Representative Test Set for Evaluation of Paradigmatic Relations
5.3.5 Metrics
5.4 Discussion

6 Evaluation of Semantic Clustering on Word Embeddings
6.1 Qualitative Evaluation
6.2 Discussion
6.3 Summary

7 Conceptual Integration with an Enterprise Search Platform
7.1 The intergator Search Platform
7.2 Deployment Concepts of Distributed Word Representations
7.2.1 Improved Document Retrieval
7.2.2 Improved Query Suggestions
7.2.3 Additional Support in Explorative Search

8 Conclusion
8.1 Summary
8.2 Further Work

Bibliography

List of Figures

List of Tables

Appendix

Identiferoai:union.ndltd.org:DRESDEN/oai:qucosa:de:qucosa:29739
Date18 August 2016
CreatorsKorger, Christina
ContributorsDemuth, Birgit, Crenze, Uwe, Aßmann, Uwe, Technische Universität Dresden
Source SetsHochschulschriftenserver (HSSS) der SLUB Dresden
LanguageEnglish
Detected LanguageEnglish
Typedoc-type:masterThesis, info:eu-repo/semantics/masterThesis, doc-type:Text
Rightsinfo:eu-repo/semantics/openAccess

Page generated in 0.0018 seconds