Return to search

Multi-Layer Web Services Discovery using Word Embedding and Clustering Techniques

Web services discovery is the process of finding the right Web services that best match the end-users’ functional and non-functional requirements. Artificial intelligence, natural language processing, data mining, and text mining techniques have been applied by researchers in Web services discovery to facilitate the process of matchmaking. This thesis contributes to the area of Web services discovery and recommendation, adopting the Design Science Research Methodology to guide the development of useful knowledge, including design theory and artifacts.
The lack of a comprehensive review of Web services discovery and recommendation in the literature motivated us to conduct a systematic literature review. Our main purpose in conducting the systematic literature review was to identify and systematically compare current clustering and association rules techniques for Web services discovery and recommendation by providing answers to various research questions, investigating the prior knowledge, and identifying gaps in the related literature.
We then propose a conceptual model and a typology of Web services discovery systems. The conceptual model provides a high-level representation of Web services discovery systems, including their various elements, tasks, and relationships. The proposed typology of Web services discovery systems is composed of five groups of characteristics: storage and location characteristics, formalization characteristics, matchmaking characteristics, automation characteristics, and selection characteristics. We reference the typology to compare Web services discovery methods and architectures from the extant literature by linking them to the five proposed characteristics.
We employ the proposed conceptual model with its specified characteristics to design and develop the multi-layer data mining architecture for Web services discovery using word embedding and clustering techniques. The proposed architecture consists of five layers: Web services description and data preprocessing; word embedding and representation; syntactic similarity; semantic similarity; and clustering. In the first layer, we identify the steps to parse and preprocess the Web services documents. Bag of Words with Term Frequency–Inverse Document Frequency and three word-embedding models are employed for Web services representation in the second layer. Then in the third layer, four distance measures, including Cosine, Euclidean, Minkowski, and Word Mover, are studied to find the similarities between Web services documents. In layer four, WordNet and Normalized Google Distance are employed to represent and find the similarity between Web services documents. Finally, in the fifth layer, three clustering algorithms, including affinity propagation, K-means, and hierarchical agglomerative clustering, are investigated to cluster Web services based on the observed documents’ similarities. We demonstrate how each component of the five layers is employed in the process of Web services clustering using random-ly selected Web services documents.
We conduct experimental analysis to cluster Web services using a collected dataset of Web services documents and evaluating their clustering performances. Using a ground truth for evaluation purposes, we observe that clusters built based on the word embedding models performed better compared to those built using the Bag of Words with Term Frequency–Inverse Document Frequency model. Among the three word embedding models, the pre-trained Word2Vec’s skip-gram model reported higher performance in clustering Web services. Among the three semantic similarity measures, path-based WordNet similarity reported higher clustering performance. By considering the different words representations models and syntactic and semantic similarity measures, the affinity propagation clustering technique performed better in discovering similarities among Web services.

Identiferoai:union.ndltd.org:uottawa.ca/oai:ruor.uottawa.ca:10393/41840
Date25 February 2021
CreatorsObidallah, Waeal
ContributorsRaahemi, Bijan
PublisherUniversité d'Ottawa / University of Ottawa
Source SetsUniversité d’Ottawa
LanguageEnglish
Detected LanguageEnglish
TypeThesis
Formatapplication/pdf

Page generated in 0.0023 seconds