Spelling suggestions: "subject:"similarity 1earning"" "subject:"similarity c1earning""
1 |
Sequence queries on temporal graphsZhu, Haohan 21 June 2016 (has links)
Graphs that evolve over time are called temporal graphs. They can be used to describe and represent real-world networks, including transportation networks, social networks, and communication networks, with higher fidelity and accuracy. However, research is still limited on how to manage large scale temporal graphs and execute queries over these graphs efficiently and effectively. This thesis investigates the problems of temporal graph data management related to node and edge sequence queries. In temporal graphs, nodes and edges can evolve over time. Therefore, sequence queries on nodes and edges can be key components in managing temporal graphs. In this thesis, the node sequence query decomposes into two parts: graph node similarity and subsequence matching. For node similarity, this thesis proposes a modified tree edit distance that is metric and polynomially computable and has a natural, intuitive interpretation. Note that the proposed node similarity works even for inter-graph nodes and therefore can be used for graph de-anonymization, network transfer learning, and cross-network mining, among other tasks. The subsequence matching query proposed in this thesis is a framework that can be adopted to index generic sequence and time-series data, including trajectory data and even DNA sequences for subsequence retrieval. For edge sequence queries, this thesis proposes an efficient storage and optimized indexing technique that allows for efficient retrieval of temporal subgraphs that satisfy certain temporal predicates. For this problem, this thesis develops a lightweight data management engine prototype that can support time-sensitive temporal graph analytics efficiently even on a single PC.
|
2 |
Hybrid Recommender Systems via Spectral Learning and a Random ForestWilliams, Alyssa 01 December 2019 (has links)
We demonstrate spectral learning can be combined with a random forest classifier to produce a hybrid recommender system capable of incorporating meta information. Spectral learning is supervised learning in which data is in the form of one or more networks. Responses are predicted from features obtained from the eigenvector decomposition of matrix representations of the networks. Spectral learning is based on the highest weight eigenvectors of natural Markov chain representations. A random forest is an ensemble technique for supervised learning whose internal predictive model can be interpreted as a nearest neighbor network. A hybrid recommender can be constructed by first deriving a network model from a recommender's similarity matrix then applying spectral learning techniques to produce a new network model. The response learned by the new version of the recommender can be meta information. This leads to a system capable of incorporating meta data into recommendations.
|
3 |
Academic Recommendation System Based on the Similarity Learning of the Citation Network Using Citation ImpactAlshareef, Abdulrhman M. 29 April 2019 (has links)
In today's significant and rapidly increasing amount of scientific publications, exploring recent studies in a given research area and building an effective scientific collaboration has become more challenging than any time before. Scientific production growth has been increasing the difficulties for identifying the most relevant papers to cite or to find an appropriate conference or journal to submit a paper to publish. As a result, authors and publishers rely on different analytical approaches in order to measure the relationship among the citation network. Different parameters have been used such as the impact factor, number of citations, co-citation to assess the impact of the produced research publication. However, using one assessing factor considers only one level of relationship exploration, since it does not reflect the effect of the other factors. In this thesis, we propose an approach to measure the Academic Citation Impact that will help to identify the impact of articles, authors, and venues at their extended nearby citation network. We combine the content similarity with the bibliometric indices to evaluate the citation impact of articles, authors, and venues in their surrounding citation network. Using the article metadata, we calculate the semantic similarity between any two articles in the extended network. Then we use the similarity score and bibliometric indices to evaluate the impact of the articles, authors, and venues among their extended nearby citation network.
Furthermore, we propose an academic recommendation model to identify the latent preferences among the citation network of the given article in order to expose the concealed connection between the academic objects (articles, authors, and venues) at the citation network of the given article. To reveal the degree of trust for collaboration between academic objects (articles, authors, and venues), we use the similarity learning to estimate the collaborative confidence score that represents the anticipation of a prospect relationship between the academic objects among a scientific community. We conducted an offline experiment to measure the accuracy of delivering personalized recommendations, based on the user’s selection preferences; real-world datasets were used. Our evaluation results show a potential improvement to the quality of the recommendation when compared to baseline recommendation algorithms that consider co-citation information.
|
4 |
Product Matching Using Image SimilarityForssell, Melker, Janér, Gustav January 2020 (has links)
PriceRunner is an online shopping comparison company. To maintain up-todate prices, PriceRunner has to process large amounts of data every day. The processing of the data includes matching unknown products, referred to as offers, to known products. Offer data includes information about the product such as: title, description, price and often one image of the product. PriceRunner has previously implemented a textual-based machine learning (ML) model, but is also looking for new approaches to complement the current product matching system. The objective of this master’s thesis is to investigate the potential of using an image-based ML model for product matching. Our method uses a similarity learning approach where the network learns to recognise the similarity between images. To achieve this, a siamese neural network was trained with the triplet loss function. The network is trained to map similar images closer together and dissimilar images further apart in a vector space. This approach is often used for face recognition, where there is an extensive amount of classes and a limited amount of images per class, and new classes are frequently added. This is also the case for the image data used in this thesis project. A general model was trained on images from the Clothing and Accessories hierarchy, one of the 16 toplevel hierarchies at PriceRunner, consisting of 17 product categories. The results varied between each product category. Some categories proved to be less suitable for image-based classification while others excelled. The model handles new classes relatively well without any, or with briefer, retraining. It was concluded that there is potential in using images to complement the current product matching system at PriceRunner.
|
5 |
Unsupervised anomaly detection for structured data - Finding similarities between retail productsFockstedt, Jonas, Krcic, Ema January 2021 (has links)
Data is one of the most contributing factors for modern business operations. Having bad data could therefore lead to tremendous losses, both financially and for customer experience. This thesis seeks to find anomalies in real-world, complex, structured data, causing an international enterprise to miss out on income and the potential loss of customers. By using graph theory and similarity analysis, the findings suggest that certain countries contribute to the discrepancies more than other countries. This is believed to be an effect of countries customizing their products to match the market’s needs. This thesis is just scratching the surface of the analysis of the data, and the number of opportunities for future work are therefore many.
|
Page generated in 0.0903 seconds