Global ETD Search

Return to search

Clustering of Distributed Word Representations and its Applicability for Enterprise Search

Machine learning of distributed word representations with neural embeddings is a state-of-the-art approach to modelling semantic relationships hidden in natural language. The thesis “Clustering of Distributed Word Representations and its Applicability for Enterprise Search” covers different aspects of how such a model can be applied to knowledge management in enterprises. A review of distributed word representations and related language modelling techniques, combined with an overview of applicable clustering algorithms, constitutes the basis for practical studies. The latter have two goals: firstly, they examine the quality of German embedding models trained with gensim and a selected choice of parameter configurations. Secondly, clusterings conducted on the resulting word representations are evaluated against the objective of retrieving immediate semantic relations for a given term. The application of the final results to company-wide knowledge management is subsequently outlined by the example of the platform intergator and conceptual extensions.":1 Introduction
1.1 Motivation
1.2 Thesis Structure

2 Related Work

3 Distributed Word Representations
3.1 History
3.2 Parallels to Biological Neurons
3.3 Feedforward and Recurrent Neural Networks
3.4 Learning Representations via Backpropagation and Stochastic Gradient Descent
3.5 Word2Vec
3.5.1 Neural Network Architectures and Update Frequency
3.5.2 Hierarchical Softmax
3.5.3 Negative Sampling
3.5.4 Parallelisation
3.5.5 Exploration of Linguistic Regularities

4 Clustering Techniques
4.1 Categorisation
4.2 The Curse of Dimensionality

5 Training and Evaluation of Neural Embedding Models
5.1 Technical Setup
5.2 Model Training
5.2.1 Corpus
5.2.2 Data Segmentation and Ordering
5.2.3 Stopword Removal
5.2.4 Morphological Reduction
5.2.5 Extraction of Multi-Word Concepts
5.2.6 Parameter Selection
5.3 Evaluation Datasets
5.3.1 Measurement Quality Concerns
5.3.2 Semantic Similarities
5.3.3 Regularities Expressed by Analogies
5.3.4 Construction of a Representative Test Set for Evaluation of Paradigmatic Relations
5.3.5 Metrics
5.4 Discussion

6 Evaluation of Semantic Clustering on Word Embeddings
6.1 Qualitative Evaluation
6.2 Discussion
6.3 Summary

7 Conceptual Integration with an Enterprise Search Platform
7.1 The intergator Search Platform
7.2 Deployment Concepts of Distributed Word Representations
7.2.1 Improved Document Retrieval
7.2.2 Improved Query Suggestions
7.2.3 Additional Support in Explorative Search

8 Conclusion
8.1 Summary
8.2 Further Work

Bibliography

List of Figures

List of Tables

Appendix

info:eu-repo/classification/ddc/004

ddc:004

Identifer	oai:union.ndltd.org:DRESDEN/oai:qucosa:de:qucosa:29739
Date	18 August 2016
Creators	Korger, Christina
Contributors	Demuth, Birgit, Crenze, Uwe, Aßmann, Uwe, Technische Universität Dresden
Source Sets	Hochschulschriftenserver (HSSS) der SLUB Dresden
Language	English
Detected Language	English
Type	doc-type:masterThesis, info:eu-repo/semantics/masterThesis, doc-type:Text
Rights	info:eu-repo/semantics/openAccess

Page generated in 0.0012 seconds

Clustering of Distributed Word Representations and its Applicability for Enterprise Search

Description

Links & Downloads

Tags

Additional Fields