Spelling suggestions: "subject:"recommendation systems"" "subject:"ecommendation systems""
91 |
Probabilistic Weighting and Deferred Acceptance in Reciprocal Recommendations : An A/B Test Evaluation of Tenant-to-Landlord Recommendation Systems on a Digital Rental Marketplace / Statistisk Viktning och Deferred Acceptance i Reciprok rekommendation : En A/B-testutvärdering av Hyresgäst-till-Hyresvärd Rekommendationssystem på en Digital HyresmarknadByström, Julia January 2024 (has links)
With growing information availability recommendation systems help users navigate and filter the many options. The home rental market has been pointed out as one of the unexplored areas for recommendations system. This project examines the effects of incorporating historical data for probabilistic weighting and matching algorithms for increased recommendation diversity for a tenant to landlord recommendation system. This was done by implementing two new recommendation systems. The first uses probabilistic weighting to measure the similarity between tenants and landlord homes. The second combines this probabilistic weighting with a variant of the Deferred Acceptance algorithm to enhance recommendation diversity. These two recommendation systems were A/B tested together with the existing tenant recommendation system on the Qasa platform, a digital end-to-end rental apartments marketplace in Sweden. With the objective of having the recommendation system increase landlord engagement a good recommendation was defined as one where the landlord choose to contact the tenant. After the A/B test period, the three recommendation variants were evaluated on Coverage@N, Gini-Index@K, Precision@K and Recall@K. The result revealed that the use of the Deferred Acceptance algorithm did increase the recommendation diversity, but it led to reduced precision in the top recommendations compared to the first new implementation that only used probabilistic weighting. However, the incorporation of historical data for the probabilistic weighting for similarity in booth new recommendation systems showed higher precision and number of contacted tenants compared to the existing tenant recommendation model on the Qasa platform. / Med växande informationstillgänglighet hjälper rekommendationssystem användarna att navigera och filtrera bland många alternativ. Hyresmarknaden har pekats ut som ett av de outforskade områdena för rekommendationssystem. Detta projekt undersöker effekterna av att inkorporera historiska data för statistiska vikter och matchningsalgoritmer för ökad rekommendations mångfald i ett rekommendationssystem från hyresgäster till hyresvärdar. Detta gjordes genom att implementera två nya rekommendationssystem. Det första använder statistiska vikter för att mäta likheten mellan hyresgäster och hyresvärdars bostäder. Det andra kombinerar dessa statistiska vikter med en variant av deferred acceptance algorithm algoritmen för att förbättra rekommendations mångfaldet. Dessa två rekommendationssystem A/B testades tillsammans med det befintliga rekommendationssystemet av hyresgäster på Qasa-plattformen, en digital marknadsplats för andrahandsuthyrning av lägenheter i Sverige. Med målet att rekommendationssystemet skulle öka hyresvärdens engagemang definierades en bra rekommendation som en där hyresvärden valde att kontakta hyresgästen. Efter A/B-testperioden utvärderades de tre rekommendationsvarianterna baserat på Coverage@N, Gini-Index@K, Precision@K och Recall@K. Resultatet visade att användningen av algoritmen för uppskjuten acceptans ökade mångfaldet i ett rekommendationssystem, men det ledde till minskad precision i de första rekommendationerna jämfört med den första nya implementationen som endast använde statistiska vikter. Däremot visade inkorporeringen av historiska data för statistiska vikter vid uträkning av likhet, något som gjordes i båda nya rekommendationssystem, högre precision och fler antal kontaktade hyresgäster jämfört med den befintliga modellen för hyresgästrekommendationer på Qasa-plattformen.
|
92 |
Papyres : un système de gestion et de recommandation d’articles de rechercheNaak, Amine 07 1900 (has links)
Les étudiants gradués et les professeurs (les chercheurs, en général), accèdent, passent en revue et utilisent régulièrement un grand nombre d’articles, cependant aucun des outils et solutions existants ne fournit la vaste gamme de fonctionnalités exigées pour gérer correctement ces ressources. En effet, les systèmes de gestion de bibliographie gèrent les références et les citations, mais ne parviennent pas à aider les chercheurs à manipuler et à localiser des ressources. D'autre part, les systèmes de recommandation d’articles de recherche et les moteurs de recherche spécialisés aident les chercheurs à localiser de nouvelles ressources, mais là encore échouent dans l’aide à les gérer. Finalement, les systèmes de gestion de contenu d'entreprise offrent les fonctionnalités de gestion de documents et des connaissances, mais ne sont pas conçus pour les articles de recherche. Dans ce mémoire, nous présentons une nouvelle classe de systèmes de gestion : système de gestion et de recommandation d’articles de recherche. Papyres (Naak, Hage, & Aïmeur, 2008, 2009) est un prototype qui l’illustre. Il combine des fonctionnalités de bibliographie avec des techniques de recommandation d’articles et des outils de gestion de contenu, afin de fournir un ensemble de fonctionnalités pour localiser les articles de recherche, manipuler et maintenir les bibliographies. De plus, il permet de gérer et partager les connaissances relatives à la littérature. La technique de recommandation utilisée dans Papyres est originale. Sa particularité réside dans l'aspect multicritère introduit dans le processus de filtrage collaboratif, permettant ainsi aux chercheurs d'indiquer leur intérêt pour des parties spécifiques des articles. De plus, nous proposons de tester et de comparer plusieurs approches afin de déterminer le voisinage dans le processus de Filtrage Collaboratif Multicritère, de telle sorte à accroître la précision de la recommandation. Enfin, nous ferons un rapport global sur la mise en œuvre et la validation de Papyres. / Graduate students and professors (researchers, in general) regularly access, review, and use large amounts of research papers, yet none of the existing tools and solutions provides the wide range of functionalities required to properly manage these resources. Indeed, bibliography management systems manage the references and citations but fail to help researchers in handling and locating resources. On the other hand, research paper recommendation systems and specialized search engines help researchers to locate new resources, but again fail to help researchers in managing the resources. Finally, Enterprise Content Management systems offer the required functionalities to manage resources and knowledge, but are not designed for research literature. Consequently, we suggest a new class of management systems: Research Paper Management and Recommendation System. Through our system Papyres (Naak, Hage, & Aïmeur, 2008, 2009) we illustrate our approach, which combines bibliography functionalities along with recommendation techniques and content management tools, in order to provide a set of functionalities to locate research papers, handle and maintain the bibliographies, and to manage and share knowledge related to the research literature. Additionally, we propose a novel research paper recommendation technique, used within Papyres. Its uniqueness lies in the multicriteria aspect introduced in the process of collaborative filtering, allowing researchers to indicate their interest in specific parts of articles. Moreover, we suggest test and compare several approaches to determine the neighbourhood in the Multicriteria Collaborative Filtering process, such as to increase the accuracy of the recommendation. Finally, we report on the implementation and validation of Papyres.
|
93 |
Trustworthiness, diversity and inference in recommendation systemsChen, Cheng 28 September 2016 (has links)
Recommendation systems are information filtering systems that help users effectively and efficiently explore large amount of information and identify items of interest. Accurate predictions of users' interests improve user satisfaction and are beneficial to business or service providers. Researchers have been making tremendous efforts to improve the accuracy of recommendations. Emerging trends of technologies and application scenarios, however, lead to challenges other than accuracy for recommendation systems. Three new challenges include: (1) opinion spam results in untrustworthy content and makes recommendations deceptive; (2) users prefer diversified content; (3) in some applications user behavior data may not be available to infer users' preference.
This thesis tackles the above challenges. We identify features of untrustworthy commercial campaigns on a question and answer website, and adopt machine learning-based techniques to implement an adaptive detection system which automatically detects commercial campaigns. We incorporate diversity requirements into a classic theoretical model and develop efficient algorithms with performance guarantees. We propose a novel and robust approach to infer user preference profile from recommendations using copula models. The proposed approach can offer in-depth business intelligence for physical stores that depend on Wi-Fi hotspots for mobile advertisement. / Graduate / 0984 / cchenv@uvic.ca
|
94 |
Uma abordagem híbrida para sistemas de recomendação de notícias / A hybrid approach to news recommendation systemsPagnossim, José Luiz Maturana 09 April 2018 (has links)
Sistemas de Recomendação (SR) são softwares capazes de sugerir itens aos usuários com base no histórico de interações de usuários ou por meio de métricas de similaridade que podem ser comparadas por item, usuário ou ambos. Existem diferentes tipos de SR e dentre os que despertam maior interesse deste trabalho estão: SR baseados em conteúdo; SR baseados em conhecimento; e SR baseado em filtro colaborativo. Alcançar resultados adequados às expectativas dos usuários não é uma meta simples devido à subjetividade inerente ao comportamento humano, para isso, SR precisam de soluções eficientes e eficazes para: modelagem dos dados que suportarão a recomendação; recuperação da informação que descrevem os dados; combinação dessas informações dentro de métricas de similaridade, popularidade ou adequabilidade; criação de modelos descritivos dos itens sob recomendação; e evolução da inteligência do sistema de forma que ele seja capaz de aprender a partir da interação com o usuário. A tomada de decisão por um sistema de recomendação é uma tarefa complexa que pode ser implementada a partir da visão de áreas como inteligência artificial e mineração de dados. Dentro da área de inteligência artificial há estudos referentes ao método de raciocínio baseado em casos e da recomendação baseada em casos. No que diz respeito à área de mineração de dados, os SR podem ser construídos a partir de modelos descritivos e realizar tratamento de dados textuais, constituindo formas de criar elementos para compor uma recomendação. Uma forma de minimizar os pontos fracos de uma abordagem, é a adoção de aspectos baseados em uma abordagem híbrida, que neste trabalho considera-se: tirar proveito dos diferentes tipos de SR; usar técnicas de resolução de problemas; e combinar recursos provenientes das diferentes fontes para compor uma métrica unificada a ser usada para ranquear a recomendação por relevância. Dentre as áreas de aplicação dos SR, destaca-se a recomendação de notícias, sendo utilizada por um público heterogêneo, amplo e exigente por relevância. Neste contexto, a presente pesquisa apresenta uma abordagem híbrida para recomendação de notícias construída por meio de uma arquitetura implementada para provar os conceitos de um sistema de recomendação. Esta arquitetura foi validada por meio da utilização de um corpus de notícias e pela realização de um experimento online. Por meio do experimento foi possível observar a capacidade da arquitetura em relação aos requisitos de um sistema de recomendação de notícias e também confirmar a hipótese no que se refere à privilegiar recomendações com base em similaridade, popularidade, diversidade, novidade e serendipidade. Foi observado também uma evolução nos indicadores de leitura, curtida, aceite e serendipidade conforme o sistema foi acumulando histórico de preferências e soluções. Por meio da análise da métrica unificada para ranqueamento foi possível confirmar sua eficácia ao verificar que as notícias melhores colocadas no ranqueamento foram as mais aceitas pelos usuários / Recommendation Systems (RS) are software capable of suggesting items to users based on the history of user interactions or by similarity metrics that can be compared by item, user, or both. There are different types of RS and those which most interest in this work are content-based, knowledge-based and collaborative filtering. Achieving adequate results to user\'s expectations is a hard goal due to the inherent subjectivity of human behavior, thus, the RS need efficient and effective solutions to: modeling the data that will support the recommendation; the information retrieval that describes the data; combining this information within similarity, popularity or suitability metrics; creation of descriptive models of the items under recommendation; and evolution of the systems intelligence to learn from the user\'s interaction. Decision-making by a RS is a complex task that can be implemented according to the view of fields such as artificial intelligence and data mining. In the artificial intelligence field there are studies concerning the method of case-based reasoning that works with the principle that if something worked in the past, it may work again in a new similar situation the one in the past. The case-based recommendation works with structured items, represented by a set of attributes and their respective values (within a ``case\'\' model), providing known and adapted solutions. Data mining area can build descriptive models to RS and also handle, manipulate and analyze textual data, constituting one option to create elements to compose a recommendation. One way to minimize the weaknesses of an approach is to adopt aspects based on a hybrid solution, which in this work considers: taking advantage of the different types of RS; using problem-solving techniques; and combining resources from different sources to compose a unified metric to be used to rank the recommendation by relevance. Among the RS application areas, news recommendation stands out, being used by a heterogeneous public, ample and demanding by relevance. In this context, the this work shows a hybrid approach to news recommendations built through a architecture implemented to prove the concepts of a recommendation system. This architecture has been validated by using a news corpus and by performing an online experiment. Through the experiment it was possible to observe the architecture capacity related to the requirements of a news recommendation system and architecture also related to privilege recommendations based on similarity, popularity, diversity, novelty and serendipity. It was also observed an evolution in the indicators of reading, likes, acceptance and serendipity as the system accumulated a history of preferences and solutions. Through the analysis of the unified metric for ranking, it was possible to confirm its efficacy when verifying that the best classified news in the ranking was the most accepted by the users
|
95 |
Uma abordagem híbrida para sistemas de recomendação de notícias / A hybrid approach to news recommendation systemsJosé Luiz Maturana Pagnossim 09 April 2018 (has links)
Sistemas de Recomendação (SR) são softwares capazes de sugerir itens aos usuários com base no histórico de interações de usuários ou por meio de métricas de similaridade que podem ser comparadas por item, usuário ou ambos. Existem diferentes tipos de SR e dentre os que despertam maior interesse deste trabalho estão: SR baseados em conteúdo; SR baseados em conhecimento; e SR baseado em filtro colaborativo. Alcançar resultados adequados às expectativas dos usuários não é uma meta simples devido à subjetividade inerente ao comportamento humano, para isso, SR precisam de soluções eficientes e eficazes para: modelagem dos dados que suportarão a recomendação; recuperação da informação que descrevem os dados; combinação dessas informações dentro de métricas de similaridade, popularidade ou adequabilidade; criação de modelos descritivos dos itens sob recomendação; e evolução da inteligência do sistema de forma que ele seja capaz de aprender a partir da interação com o usuário. A tomada de decisão por um sistema de recomendação é uma tarefa complexa que pode ser implementada a partir da visão de áreas como inteligência artificial e mineração de dados. Dentro da área de inteligência artificial há estudos referentes ao método de raciocínio baseado em casos e da recomendação baseada em casos. No que diz respeito à área de mineração de dados, os SR podem ser construídos a partir de modelos descritivos e realizar tratamento de dados textuais, constituindo formas de criar elementos para compor uma recomendação. Uma forma de minimizar os pontos fracos de uma abordagem, é a adoção de aspectos baseados em uma abordagem híbrida, que neste trabalho considera-se: tirar proveito dos diferentes tipos de SR; usar técnicas de resolução de problemas; e combinar recursos provenientes das diferentes fontes para compor uma métrica unificada a ser usada para ranquear a recomendação por relevância. Dentre as áreas de aplicação dos SR, destaca-se a recomendação de notícias, sendo utilizada por um público heterogêneo, amplo e exigente por relevância. Neste contexto, a presente pesquisa apresenta uma abordagem híbrida para recomendação de notícias construída por meio de uma arquitetura implementada para provar os conceitos de um sistema de recomendação. Esta arquitetura foi validada por meio da utilização de um corpus de notícias e pela realização de um experimento online. Por meio do experimento foi possível observar a capacidade da arquitetura em relação aos requisitos de um sistema de recomendação de notícias e também confirmar a hipótese no que se refere à privilegiar recomendações com base em similaridade, popularidade, diversidade, novidade e serendipidade. Foi observado também uma evolução nos indicadores de leitura, curtida, aceite e serendipidade conforme o sistema foi acumulando histórico de preferências e soluções. Por meio da análise da métrica unificada para ranqueamento foi possível confirmar sua eficácia ao verificar que as notícias melhores colocadas no ranqueamento foram as mais aceitas pelos usuários / Recommendation Systems (RS) are software capable of suggesting items to users based on the history of user interactions or by similarity metrics that can be compared by item, user, or both. There are different types of RS and those which most interest in this work are content-based, knowledge-based and collaborative filtering. Achieving adequate results to user\'s expectations is a hard goal due to the inherent subjectivity of human behavior, thus, the RS need efficient and effective solutions to: modeling the data that will support the recommendation; the information retrieval that describes the data; combining this information within similarity, popularity or suitability metrics; creation of descriptive models of the items under recommendation; and evolution of the systems intelligence to learn from the user\'s interaction. Decision-making by a RS is a complex task that can be implemented according to the view of fields such as artificial intelligence and data mining. In the artificial intelligence field there are studies concerning the method of case-based reasoning that works with the principle that if something worked in the past, it may work again in a new similar situation the one in the past. The case-based recommendation works with structured items, represented by a set of attributes and their respective values (within a ``case\'\' model), providing known and adapted solutions. Data mining area can build descriptive models to RS and also handle, manipulate and analyze textual data, constituting one option to create elements to compose a recommendation. One way to minimize the weaknesses of an approach is to adopt aspects based on a hybrid solution, which in this work considers: taking advantage of the different types of RS; using problem-solving techniques; and combining resources from different sources to compose a unified metric to be used to rank the recommendation by relevance. Among the RS application areas, news recommendation stands out, being used by a heterogeneous public, ample and demanding by relevance. In this context, the this work shows a hybrid approach to news recommendations built through a architecture implemented to prove the concepts of a recommendation system. This architecture has been validated by using a news corpus and by performing an online experiment. Through the experiment it was possible to observe the architecture capacity related to the requirements of a news recommendation system and architecture also related to privilege recommendations based on similarity, popularity, diversity, novelty and serendipity. It was also observed an evolution in the indicators of reading, likes, acceptance and serendipity as the system accumulated a history of preferences and solutions. Through the analysis of the unified metric for ranking, it was possible to confirm its efficacy when verifying that the best classified news in the ranking was the most accepted by the users
|
96 |
Papyres : un système de gestion et de recommandation d’articles de rechercheNaak, Amine 07 1900 (has links)
Les étudiants gradués et les professeurs (les chercheurs, en général), accèdent, passent en revue et utilisent régulièrement un grand nombre d’articles, cependant aucun des outils et solutions existants ne fournit la vaste gamme de fonctionnalités exigées pour gérer correctement ces ressources. En effet, les systèmes de gestion de bibliographie gèrent les références et les citations, mais ne parviennent pas à aider les chercheurs à manipuler et à localiser des ressources. D'autre part, les systèmes de recommandation d’articles de recherche et les moteurs de recherche spécialisés aident les chercheurs à localiser de nouvelles ressources, mais là encore échouent dans l’aide à les gérer. Finalement, les systèmes de gestion de contenu d'entreprise offrent les fonctionnalités de gestion de documents et des connaissances, mais ne sont pas conçus pour les articles de recherche. Dans ce mémoire, nous présentons une nouvelle classe de systèmes de gestion : système de gestion et de recommandation d’articles de recherche. Papyres (Naak, Hage, & Aïmeur, 2008, 2009) est un prototype qui l’illustre. Il combine des fonctionnalités de bibliographie avec des techniques de recommandation d’articles et des outils de gestion de contenu, afin de fournir un ensemble de fonctionnalités pour localiser les articles de recherche, manipuler et maintenir les bibliographies. De plus, il permet de gérer et partager les connaissances relatives à la littérature. La technique de recommandation utilisée dans Papyres est originale. Sa particularité réside dans l'aspect multicritère introduit dans le processus de filtrage collaboratif, permettant ainsi aux chercheurs d'indiquer leur intérêt pour des parties spécifiques des articles. De plus, nous proposons de tester et de comparer plusieurs approches afin de déterminer le voisinage dans le processus de Filtrage Collaboratif Multicritère, de telle sorte à accroître la précision de la recommandation. Enfin, nous ferons un rapport global sur la mise en œuvre et la validation de Papyres. / Graduate students and professors (researchers, in general) regularly access, review, and use large amounts of research papers, yet none of the existing tools and solutions provides the wide range of functionalities required to properly manage these resources. Indeed, bibliography management systems manage the references and citations but fail to help researchers in handling and locating resources. On the other hand, research paper recommendation systems and specialized search engines help researchers to locate new resources, but again fail to help researchers in managing the resources. Finally, Enterprise Content Management systems offer the required functionalities to manage resources and knowledge, but are not designed for research literature. Consequently, we suggest a new class of management systems: Research Paper Management and Recommendation System. Through our system Papyres (Naak, Hage, & Aïmeur, 2008, 2009) we illustrate our approach, which combines bibliography functionalities along with recommendation techniques and content management tools, in order to provide a set of functionalities to locate research papers, handle and maintain the bibliographies, and to manage and share knowledge related to the research literature. Additionally, we propose a novel research paper recommendation technique, used within Papyres. Its uniqueness lies in the multicriteria aspect introduced in the process of collaborative filtering, allowing researchers to indicate their interest in specific parts of articles. Moreover, we suggest test and compare several approaches to determine the neighbourhood in the Multicriteria Collaborative Filtering process, such as to increase the accuracy of the recommendation. Finally, we report on the implementation and validation of Papyres.
|
97 |
Τεχνικές εξόρυξης δεδομένων και εφαρμογές σε προβλήματα διαχείρισης πληροφορίας και στην αξιολόγηση λογισμικού / Data mining techniques and their applications in data management problems and in software systems evaluationΤσιράκης, Νικόλαος 20 April 2011 (has links)
Τα τελευταία χρόνια όλο και πιο επιτακτική είναι η ανάγκη αξιοποίησης των ψηφιακών δεδομένων τα οποία συλλέγονται και αποθηκεύονται σε διάφορες βάσεις δεδομένων. Το γεγονός αυτό σε συνδυασμό με τη ραγδαία αύξηση του όγκου των δεδομένων αυτών επιβάλλει τη δημιουργία υπολογιστικών μεθόδων με απώτερο σκοπό τη βοήθεια του ανθρώπου στην εξόρυξη της χρήσιμης πληροφορίας και γνώσης από αυτά.
Οι τεχνικές εξόρυξης δεδομένων παρουσιάζουν τα τελευταία χρόνια ιδιαίτερο ενδιαφέρον στις περιπτώσεις όπου η πηγή των δεδομένων είναι οι ροές δεδομένων ή άλλες μορφές όπως τα XML έγγραφα. Σύγχρονα συστήματα και εφαρμογές όπως είναι αυτά των κοινοτήτων πρακτικής έχουν ανάγκη χρήσης τέτοιων τεχνικών εξόρυξης για να βοηθήσουν τα μέλη τους. Τέλος ενδιαφέρον υπάρχει και κατά την αξιολόγηση λογισμικού όπου η πηγή δεδομένων είναι τα αρχεία πηγαίου κώδικα για σκοπούς καλύτερης συντηρησιμότητας τους.
Από τη μια μεριά οι ροές δεδομένων είναι προσωρινά δεδομένα τα οποία περνούν από ένα σύστημα «παρατηρητή» συνεχώς και σε μεγάλο όγκο. Υπάρχουν πολλές εφαρμογές που χειρίζονται δεδομένα σε μορφή ροών, όπως δεδομένα αισθητήρων, ροές κίνησης δικτύων, χρηματιστηριακά δεδομένα και τηλεπικοινωνίες. Αντίθετα με τα στατικά δεδομένα σε βάσεις δεδομένων, οι ροές δεδομένων παρουσιάζουν μεγάλο όγκο και χαρακτηρίζονται από μια συνεχή ροή πληροφορίας που δεν έχει αρχή και τέλος. Αλλάζουν δυναμικά, και απαιτούν γρήγορες αντιδράσεις. Ίσως είναι η μοναδική πηγή γνώσης για εξόρυξη δεδομένων και ανάλυση στην περίπτωση όπου οι ανάγκες μιας εφαρμογής περιορίζονται από τον χρόνο απόκρισης και το χώρο αποθήκευσης. Αυτά τα μοναδικά χαρακτηριστικά κάνουν την ανάλυση των ροών δεδομένων πολύ ενδιαφέρουσα ιδιαίτερα στον Παγκόσμιο Ιστό.
Ένας άλλος τομέας ενδιαφέροντος για τη χρήση νέων τεχνικών εξόρυξης δεδομένων είναι οι κοινότητες πρακτικής. Οι κοινότητες πρακτικής (Communities of Practice) είναι ομάδες ανθρώπων που συμμετέχουν σε μια διαδικασία συλλογικής εκμάθησης. Μοιράζονται ένα ενδιαφέρον ή μια ιδέα που έχουν και αλληλεπιδρούν για να μάθουν καλύτερα για αυτό. Οι κοινότητες αυτές είναι μικρές ή μεγάλες, τοπικές ή παγκόσμιες, face to face ή on line, επίσημα αναγνωρίσιμες, ανεπίσημες ή και αόρατες. Υπάρχουν δηλαδή παντού και σχεδόν όλοι συμμετέχουμε σε δεκάδες από αυτές. Ένα παράδειγμα αυτών είναι τα γνωστά forum συζητήσεων. Σκοπός μας ήταν ο σχεδιασμός νέων αλγορίθμων εξόρυξης δεδομένων από τις κοινότητες πρακτικής με τελικό σκοπό να βρεθούν οι σχέσεις των μελών τους και να γίνει ανάλυση των εξαγόμενων δεδομένων με μετρικές κοινωνικών δικτύων ώστε συνολικά να αποτελέσει μια μεθοδολογία ανάλυσης τέτοιων κοινοτήτων.
Επίσης η eXtensible Markup Language (XML) είναι το πρότυπο για αναπαράσταση δεδομένων στον Παγκόσμιο Ιστό. Η ραγδαία αύξηση του όγκου των δεδομένων που αναπαρίστανται σε XML μορφή δημιούργησε την ανάγκη αναζήτησης μέσα στην δενδρική δομή ενός ΧΜL εγγράφου για κάποια συγκεκριμένη πληροφορία. Η ανάγκη αυτή ταυτόχρονα με την ανάγκη για γρήγορη πρόσβαση στους κόμβους του ΧΜL δέντρου, οδήγησε σε διάφορα εξειδικευμένα ευρετήρια. Για να μπορέσουν να ανταποκριθούν στη δυναμική αυτή των δεδομένων, τα ευρετήρια πρέπει να έχουν τη δυνατότητα να μεταβάλλονται δυναμικά. Ταυτόχρονα λόγο της απαίτησης για αναζήτηση συγκεκριμένης πληροφορίας πρέπει να γίνεται το φιλτράρισμα ενός συνόλου XML δεδομένων διαμέσου κάποιων προτύπων και κανόνων ώστε να βρεθούν εκείνα τα δεδομένα που ταιριάζουν με τα αποθηκευμένα πρότυπα και κανόνες.
Από την άλλη μεριά οι διαστάσεις της εσωτερικής και εξωτερικής ποιότητας στη χρήση ενός προϊόντος λογισμικού αλλάζουν κατά τη διάρκεια ζωής του. Για παράδειγμα η ποιότητα όπως ορίζεται στην αρχή του κύκλου ζωής του λογισμικού δίνει πιο πολύ έμφαση στην εξωτερική ποιότητα και διαφέρει από την εσωτερική, όπως για παράδειγμα στη σχεδίαση η οποία αναφέρεται στην εσωτερική ποιότητα και αφορά τους μηχανικούς λογισμικού. Οι τεχνικές εξόρυξης δεδομένων που μπορούν να χρησιμοποιηθούν για την επίτευξη του απαραίτητου επιπέδου ποιότητας, όπως είναι ο καθορισμός και η αξιολόγηση της ποιότητας πρέπει να λαμβάνουν υπόψη τους τις διαφορετικές αυτές διαστάσεις σε κάθε στάδιο του κύκλου ζωής του προϊόντος.
Στα πλαίσια αυτής της διδακτορικής διατριβής έγινε σε βάθος έρευνα σχετικά με τεχνικές εξόρυξης δεδομένων και εφαρμογές τόσο στο πρόβλημα διαχείρισης πληροφορίας όσο και στο πρόβλημα της αξιολόγησης λογισμικού. / The World Wide Web has gradually transformed into a large data repository consisting of vast amount of data in many different types. These data doubles about every year, but useful information seems to be decreasing. The area of data mining has arisen over the last decade to address this problem. It has become not only an important research area, but also one with large potential in the real world. Data mining has many directives and handles various types of data.
When the related data are for example data streams or XML data then the problems seem to be very crucial and interesting. Also contemporary systems and applications related to communities of practice seek appropriate data mining techniques and algorithms in order to help their members. Finally, great interest has the field of software evaluation when by using data mining in order to facilitate the comprehension and maintainability evaluation of a software system’s source code. Source code artifacts and measurement values can be used as input to data mining algorithms in order to provide insights into a system’s structure or to create groups of artifacts with similar software measurements.
First, data streams are large volumes of data arriving continuously. Data mining techniques have been proposed and studied to help users better understand and analyze the information. Clustering is a useful and ubiquitous tool in data analysis. With the rapid increase in web-traffic and e-commerce, understanding user behavior based on their interaction with a website is becoming more and more important for website owners and clustering in correlation with personalization techniques of this information space has become a necessity. The knowledge obtained by learning the users preferences can help improve web content, find usability issues related to this content and its structure, ensure the security of provided data, analyze the different groups of users that can be derived from the web access logs and extract patterns, profiles and trends. This thesis investigates the application of a new model for clustering and analyzing click-stream data in the World Wide Web with two different approaches.
The next part of the thesis deals with data mining techniques regarding communities of practice. These are groups of people taking part in a collaborative way of learning and exchanging ideas. Systems for supporting argumentative collaboration have become more and more popular in digital world. There are many research attempts regarding collaboration filtering and recommendation systems. Sometimes depending on the system and its needs there are different problems and developers have to deal with special cases in order to provide useful service to users. Data mining can play an important role in the area of collaboration systems that want to provide decision support functionality. Data mining in these systems can be defined as the effort to generate actionable models through automated analysis of their databases. Data mining can only be deployed successfully when it generates insights that are substantially deeper than what a simple view of data can give. This thesis introduces a framework that can be applied to a wide range of software platforms aiming at facilitating collaboration and learning among users. More precisely, an approach that integrates techniques from the Data Mining and Social Network Analysis disciplines is being presented.
The next part of the thesis deals with XML data and ways to handle huge volumes of data that they may hold. Lately data written in a more sophisticated markup language such as XML have made great strides in many domains. Processing and management of XML documents have already become popular research issues with the main problem in this area being the need to optimally index them for storage and retrieval purposes. This thesis first presents a unified clustering algorithm for both homogeneous and heterogeneous XML documents. Then using this algorithm presents an XML P2P system that efficiently distributes a set of clustered XML documents in a P2P network in order to speed-up user queries.
Ultimately, data mining and its ability to handle large amounts of data and uncover hidden patterns has the potential to facilitate the comprehension and maintainability evaluation of a software system. This thesis investigates the applicability and suitability of data mining techniques to facilitate the comprehension and maintainability evaluation of a software system’s source code. What is more, this thesis focuses on the ability of data mining to produce either overviews of a software system (thus supporting a top down approach) or to point out specific parts of this system that require further attention (thus supporting a bottom up approach) potential to facilitate the comprehension and maintainability evaluation of a software system.
|
98 |
Sur deux problèmes d’apprentissage automatique : la détection de communautés et l’appariement adaptatif / On two problems in machine learning : community detection and adaptive matchingGulikers, Lennart 13 November 2017 (has links)
Dans cette thèse, nous étudions deux problèmes d'apprentissage automatique : (I) la détection des communautés et (II) l'appariement adaptatif. I) Il est bien connu que beaucoup de réseaux ont une structure en communautés. La détection de ces communautés nous aide à comprendre et exploiter des réseaux de tout genre. Cette thèse considère principalement la détection des communautés par des méthodes spectrales utilisant des vecteurs propres associés à des matrices choisiesavec soin. Nous faisons une analyse de leur performance sur des graphes artificiels. Au lieu du modèle classique connu sous le nom de « Stochastic Block Model » (dans lequel les degrés sont homogènes) nous considérons un modèle où les degrés sont plus variables : le « Degree-Corrected Stochastic Block Model » (DC-SBM). Dans ce modèle les degrés de tous les nœuds sont pondérés - ce qui permet de générer des suites des degrés hétérogènes. Nous étudions ce modèle dans deux régimes: le régime dense et le régime « épars », ou « dilué ». Dans le régime dense, nous prouvons qu'un algorithme basé sur une matrice d'adjacence normalisée réussit à classifier correctement tous les nœuds sauf une fraction négligeable. Dans le régime épars il existe un seuil en termes de paramètres du modèle en-dessous lequel n'importe quel algorithme échoue par manque d'information. En revanche, nous prouvons qu'un algorithme utilisant la matrice « non-backtracking » réussit jusqu'au seuil - cette méthode est donc très robuste. Pour montrer cela nous caractérisons le spectre des graphes qui sont générés selon un DC-SBM dans son régime épars. Nous concluons cette partie par des tests sur des réseaux sociaux. II) Les marchés d'intermédiation en ligne tels que des plateformes de Question-Réponse et des plateformes de recrutement nécessitent un appariement basé sur une information incomplète des deux parties. Nous développons un modèle de système d'appariement entre tâches et serveurs représentant le comportement de telles plateformes. Pour ce modèle nous donnons une condition nécessaire et suffisante pour que le système puisse gérer un certain flux de tâches. Nous introduisons également une politique de « back-pressure » sous lequel le débit gérable par le système est maximal. Nous prouvons que cette politique atteint un débit strictement plus grand qu'une politique naturelle « gloutonne ». Nous concluons en validant nos résultats théoriques avec des simulations entrainées par des données de la plateforme Stack-Overflow. / In this thesis, we study two problems of machine learning: (I) community detection and (II) adaptive matching. I) It is well-known that many networks exhibit a community structure. Finding those communities helps us understand and exploit general networks. In this thesis we focus on community detection using so-called spectral methods based on the eigenvectors of carefully chosen matrices. We analyse their performance on artificially generated benchmark graphs. Instead of the classical Stochastic Block Model (which does not allow for much degree-heterogeneity), we consider a Degree-Corrected Stochastic Block Model (DC-SBM) with weighted vertices, that is able to generate a wide class of degree sequences. We consider this model in both a dense and sparse regime. In the dense regime, we show that an algorithm based on a suitably normalized adjacency matrix correctly classifies all but a vanishing fraction of the nodes. In the sparse regime, we show that the availability of only a small amount of information entails the existence of an information-theoretic threshold below which no algorithm performs better than random guess. On the positive side, we show that an algorithm based on the non-backtracking matrix works all the way down to the detectability threshold in the sparse regime, showing the robustness of the algorithm. This follows after a precise characterization of the non-backtracking spectrum of sparse DC-SBM's. We further perform tests on well-known real networks. II) Online two-sided matching markets such as Q&A forums and online labour platforms critically rely on the ability to propose adequate matches based on imperfect knowledge of the two parties to be matched. We develop a model of a task / server matching system for (efficient) platform operation in the presence of such uncertainty. For this model, we give a necessary and sufficient condition for an incoming stream of tasks to be manageable by the system. We further identify a so-called back-pressure policy under which the throughput that the system can handle is optimized. We show that this policy achieves strictly larger throughput than a natural greedy policy. Finally, we validate our model and confirm our theoretical findings with experiments based on user-contributed content on an online platform.
|
99 |
Recommending digital books to children : Acomparative study of different state-of-the-art recommendation system techniques / Att rekommendera digitala böcker till barn : En jämförelsestudie av olika moderna tekniker för rekommendationssystemLundqvist, Malvin January 2023 (has links)
Collaborative filtering is a popular technique to use behavior data in the form of user’s interactions with, or ratings of, items in a system to provide personalized recommendations of items to the user. This study compares three different state-of-the-art Recommendation System models that implement this technique, Matrix Factorization, Multi-layer Perceptron and Neural Matrix Factorization, using behavior data from a digital book platform for children. The field of Recommendation Systems is growing, and many platforms can benefit of personalizing the user experience and simplifying the use of the platforms. To perform a more complex comparison and introduce a new take on the models, this study proposes a new way to represent the behavior data as input to the models, i.e., to use the Term Frequency-Inverse Document Frequency (TFIDF) of occurrences of interactions between users and books, as opposed to the traditional binary representation (positive if there has been any interaction and negative otherwise). The performance is measured by extracting the last book read for each user, and evaluating how the models would rank that book for recommendations to the user. To assess the value of the models for the children’s reading platform, the models are also compared to the existing Recommendation System on the digital book platform. The results indicate that the Matrix Factorization model performs best out of the three models when using children’s reading behavior data. However, due to the long training process and larger set of hyperparameters to tune for the other two models, these may not have reached an optimal hyperparameter tuning, thereby affecting the comparison among the three state-of-the-art models. This limitation is further discussed in the study. All three models perform significantly better than the current system on the digital book platform. The models with the proposed representation using TF-IDF values show notable promise, performing better than the binary representation in almost all numerical metrics for all models. These results can suggest future research work on more ways of representing behavior data as input to these types of models. / Kollaborativ filtrering är en populär teknik för att använda beteendedata från användare i form av t.ex. interaktioner med, eller betygsättning av, objekt i ett system för att ge användaren personliga rekommendationer om objekt. I den här studien jämförs tre olika modeller av moderna rekommendationssystem som tillämpar denna teknik, matrisfaktorisering, flerlagersperceptron och neural matrisfaktorisering, med hjälp av beteendedata från en digital läsplattform för barn. Rekommendationssystem är ett växande område, och många plattformar kan dra nytta av att anpassa användarupplevelsen utifrån individen och förenkla användningen av plattformen. För att utföra en mer komplex jämförelse och introducera en ny variant av modellerna, föreslår denna studie ett nytt sätt att representera beteendedata som indata till modellerna, d.v.s. att använda termfrekvens med omvänd dokumentfrekvens (TF- IDF) av förekomster av interaktioner mellan användare och böcker, i motsats till den traditionella binära representationen (positiv om en tidigare interaktion existerar och negativ i annat fall). Prestandan mäts genom att extrahera den senaste boken som lästs för varje användare, och utvärdera hur högt modellerna skulle rangordna den boken i rekommendationer till användaren. För att värdesätta modellerna för plattformen med digitala böcker, så jämförs modellerna också med det befintliga rekommendationssystemet på plattformen. Resultaten tyder på att matrisfaktorisering-modellen presterar bäst utav de tre modellerna när man använder data från barns läsbeteende. På grund av den långa träningstiden och fler hyperparametrar att optimera för de andra två modellerna, kan det dock vara så att de inte har nått en optimal hyperparameterinställning, vilket påverkar jämförelsen mellan de tre moderna modellerna. Denna begränsning diskuteras ytterligare i studien. Alla tre modellerna presterar betydligt bättre än det nuvarande systemet på läsplattformen. Modellerna med den föreslagna representationen av TFIDF-värden visar sig mycket lovande och presterar bättre än den binära representationen i nästan alla numeriska mått för alla modeller. Dessa resultat kan ge skäl för framtida forskning av fler sätt att representera beteendedata som indata till denna typ av modeller.
|
Page generated in 0.1248 seconds