Global ETD Search

1	A Study of Fairness and Information Heterogeneity in Recommendation Systems Altaf, Basmah 21 November 2019 (has links) Recommender systems are an integral and successful application of machine learning in e-commerce industry and in everyday lives of online users. Recommendation algorithms are used extensively for news, musics, books, point of interests, or travel recommendation as well as in many other domains. Although much focus has been paid on improving recommendation quality, however, some real-world aspects are not considered: How to ensure that top-n recommendations are fair and not biased due to any popularity boosting events, such as awards for movies or songs? How to recommend items to entities by explicitly considering information from heterogeneous sources. What is the best way to model sequential recommendation systems as heterogeneous context-aware design, and learning on-the-fly from spatial, temporal and social contexts. Can we model attributes and heterogeneous relations in a heterogeneous information network? The goal of this thesis is to pave the way towards the next generation of realworld recommendation systems tackling fairness and information heterogeneity challenges to improve the user experience, while giving good recommendations. This thesis bridges techniques from recommendation and deep-learning techniques for representation learning by proposing novel techniques to address the above real-world problems. We focus on four directions: (1) model the effect of popularity bias over time on the consumption of items, (2) model the heterogeneous information associated with sequential history of users and social links for sequential recommendation, (3) model the heterogeneous links and rich content of nodes in an academic heterogeneous information network, and (4) learn semantics using topic modeling for nodes based on their content and heterogeneous links in a heterogeneous information network. Recommendation Systems (RS) Fairness Information Heterogeneity Dataset Recommendation Sequential Recommendation Movie Award Bias
2	La recommandation des jeux de données basée sur le profilage pour le liage des données RDF / Profile-based Datas and Recommendation for RDF Data Linking Ben Ellefi, Mohamed 01 December 2016 (has links) Avec l’émergence du Web de données, notamment les données ouvertes liées, une abondance de données est devenue disponible sur le web. Cependant, les ensembles de données LOD et leurs sous-graphes inhérents varient fortement par rapport a leur taille, le thème et le domaine, les schémas et leur dynamicité dans le temps au niveau des données. Dans ce contexte, l'identification des jeux de données appropriés, qui répondent a des critères spécifiques, est devenue une tâche majeure, mais difficile a soutenir, surtout pour répondre a des besoins spécifiques tels que la recherche d'entités centriques et la recherche des liens sémantique des données liées. Notamment, en ce qui concerne le problème de liage des données, le besoin d'une méthode efficace pour la recommandation des jeux de données est devenu un défi majeur, surtout avec l'état actuel de la topologie du LOD, dont la concentration des liens est très forte au niveau des graphes populaires multi-domaines tels que DBpedia et YAGO, alors qu'une grande liste d'autre jeux de données considérés comme candidats potentiels pour le liage est encore ignorée. Ce problème est dû a la tradition du web sémantique dans le traitement du problème de "identification des jeux de données candidats pour le liage". Bien que la compréhension de la nature du contenu d'un jeu de données spécifique est une condition cruciale pour les cas d'usage mentionnées, nous adoptons dans cette thèse la notion de "profil de jeu de données"- un ensemble de caractéristiques représentatives pour un jeu de données spécifique, notamment dans le cadre de la comparaison avec d'autres jeux de données. Notre première direction de recherche était de mettre en œuvre une approche de recommandation basée sur le filtrage collaboratif, qui exploite à la fois les prols thématiques des jeux de données, ainsi que les mesures de connectivité traditionnelles, afin d'obtenir un graphe englobant les jeux de données du LOD et leurs thèmes. Cette approche a besoin d'apprendre le comportement de la connectivité des jeux de données dans le LOD graphe. Cependant, les expérimentations ont montré que la topologie actuelle de ce nuage LOD est loin d'être complète pour être considéré comme des données d'apprentissage.Face aux limites de la topologie actuelle du graphe LOD, notre recherche a conduit a rompre avec cette représentation de profil thématique et notamment du concept "apprendre pour classer" pour adopter une nouvelle approche pour l'identification des jeux de données candidats basée sur le chevauchement des profils intensionnels entre les différents jeux de données. Par profil intensionnel, nous entendons la représentation formelle d'un ensemble d'étiquettes extraites du schéma du jeu de données, et qui peut être potentiellement enrichi par les descriptions textuelles correspondantes. Cette représentation fournit l'information contextuelle qui permet de calculer la similarité entre les différents profils d'une manière efficace. Nous identifions le chevauchement de différentes profils à l'aide d'une mesure de similarité semantico-fréquentielle qui se base sur un classement calcule par le tfidf et la mesure cosinus. Les expériences, menées sur tous les jeux de données lies disponibles sur le LOD, montrent que notre méthode permet d'obtenir une précision moyenne de 53% pour un rappel de 100%.Afin d'assurer des profils intensionnels de haute qualité, nous introduisons Datavore- un outil oriente vers les concepteurs de métadonnées qui recommande des termes de vocabulaire a réutiliser dans le processus de modélisation des données. Datavore fournit également les métadonnées correspondant aux termes recommandés ainsi que des propositions des triples utilisant ces termes. L'outil repose sur l’écosystème des Vocabulaires Ouverts Lies (LOV) pour l'acquisition des vocabulaires existants et leurs métadonnées. / With the emergence of the Web of Data, most notably Linked Open Data (LOD), an abundance of data has become available on the web. However, LOD datasets and their inherent subgraphs vary heavily with respect to their size, topic and domain coverage, the schemas and their data dynamicity (respectively schemas and metadata) over the time. To this extent, identifying suitable datasets, which meet specific criteria, has become an increasingly important, yet challenging task to supportissues such as entity retrieval or semantic search and data linking. Particularlywith respect to the interlinking issue, the current topology of the LOD cloud underlines the need for practical and efficient means to recommend suitable datasets: currently, only well-known reference graphs such as DBpedia (the most obvious target), YAGO or Freebase show a high amount of in-links, while there exists a long tail of potentially suitable yet under-recognized datasets. This problem is due to the semantic web tradition in dealing with "finding candidate datasets to link to", where data publishers are used to identify target datasets for interlinking.While an understanding of the nature of the content of specific datasets is a crucial prerequisite for the mentioned issues, we adopt in this dissertation the notion of "dataset profile" - a set of features that describe a dataset and allow the comparison of different datasets with regard to their represented characteristics. Our first research direction was to implement a collaborative filtering-like dataset recommendation approach, which exploits both existing dataset topic proles, as well as traditional dataset connectivity measures, in order to link LOD datasets into a global dataset-topic-graph. This approach relies on the LOD graph in order to learn the connectivity behaviour between LOD datasets. However, experiments have shown that the current topology of the LOD cloud group is far from being complete to be considered as a ground truth and consequently as learning data.Facing the limits the current topology of LOD (as learning data), our research has led to break away from the topic proles representation of "learn to rank" approach and to adopt a new approach for candidate datasets identication where the recommendation is based on the intensional profiles overlap between differentdatasets. By intensional profile, we understand the formal representation of a set of schema concept labels that best describe a dataset and can be potentially enriched by retrieving the corresponding textual descriptions. This representation provides richer contextual and semantic information and allows to compute efficiently and inexpensively similarities between proles. We identify schema overlap by the help of a semantico-frequential concept similarity measure and a ranking criterion based on the tfidf cosine similarity. The experiments, conducted over all available linked datasets on the LOD cloud, show that our method achieves an average precision of up to 53% for a recall of 100%. Furthermore, our method returns the mappings between the schema concepts across datasets, a particularly useful input for the data linking step.In order to ensure a high quality representative datasets schema profiles, we introduce Datavore\| a tool oriented towards metadata designers that provides rankedlists of vocabulary terms to reuse in data modeling process, together with additional metadata and cross-terms relations. The tool relies on the Linked Open Vocabulary (LOV) ecosystem for acquiring vocabularies and metadata and is made available for the community. Liage de données RDF Jeux de données RDF Web Sémantique Profile de jeux de données Recommendation des jeux de données Linked Data RDF dataset Semantic WEB Dataset Profiling Dataset Recommendation
3	[en] GENERALIZATION OF THE DEEP LEARNING MODEL FOR NATURAL GAS INDICATION IN 2D SEISMIC IMAGE BASED ON THE TRAINING DATASET AND THE OPERATIONAL HYPER PARAMETERS RECOMMENDATION / [pt] GENERALIZAÇÃO DO MODELO DE APRENDIZADO PROFUNDO PARA INDICAÇÃO DE GÁS NATURAL EM DADOS SÍSMICOS 2D COM BASE NO CONJUNTO DE DADOS DE TREINAMENTO E RECOMENDAÇÃO DE HIPERPARÂMETROS OPERACIONAIS LUIS FERNANDO MARIN SEPULVEDA 21 March 2024 (has links) [pt] A interpretação de imagens sísmicas é uma tarefa essencial em diversas áreas das geociências, sendo um método amplamente utilizado na exploração de hidrocarbonetos. Porém, sua interpretação exige um investimento significativo de recursos, e nem sempre é possível obter um resultado satisfatório. A literatura mostra um número crescente de métodos de Deep Learning, DL, para detecção de horizontes, falhas e potenciais reservatórios de hidrocarbonetos, porém, os modelos para detecção de reservatórios de gás apresentam dificuldades de desempenho de generalização, ou seja, o desempenho fica comprometido quando utilizados em imagens sísmicas de novas explorações campanhas. Este problema é especialmente verdadeiro para levantamentos terrestres 2D, onde o processo de aquisição varia e as imagens apresentam muito ruído. Este trabalho apresenta três métodos para melhorar o desempenho de generalização de modelos DL de indicação de gás natural em imagens sísmicas 2D, para esta tarefa são utilizadas abordagens provenientes de Machine Learning, ML e DL. A pesquisa concentra-se na análise de dados para reconhecer padrões nas imagens sísmicas para permitir a seleção de conjuntos de treinamento para o modelo de inferência de gás com base em padrões nas imagens alvo. Esta abordagem permite uma melhor generalização do desempenho sem alterar a arquitetura do modelo DL de inferência de gás ou transformar os traços sísmicos originais. Os experimentos foram realizados utilizando o banco de dados de diferentes campos de exploração localizados na bacia do Parnaíba, no Nordeste do Brasil. Os resultados mostram um aumento de até 39 por cento na indicação correta do gás natural de acordo com a métrica de recall. Esta melhoria varia em cada campo e depende do método proposto utilizado e da existência de padrões representativos dentro do conjunto de treinamento de imagens sísmicas. Estes resultados concluem com uma melhoria no desempenho de generalização do modelo de inferência de gases DL que varia até 21 por cento de acordo com a pontuação F1 e até 15 por cento de acordo com a métrica IoU. Estes resultados demonstram que é possível encontrar padrões dentro das imagens sísmicas usando uma abordagem não supervisionada, e estas podem ser usadas para recomendar o conjunto de treinamento DL de acordo com o padrão na imagem sísmica alvo; Além disso, demonstra que o conjunto de treinamento afeta diretamente o desempenho de generalização do modelo DL para imagens sísmicas. / [en] Interpreting seismic images is an essential task in diverse fields of geosciences, and it s a widely used method in hydrocarbon exploration. However, its interpretation requires a significant investment of resources, and obtaining a satisfactory result is not always possible. The literature shows an increasing number of Deep Learning, DL, methods to detect horizons, faults, and potential hydrocarbon reservoirs, nevertheless, the models to detect gas reservoirs present generalization performance difficulties, i.e., performance is compromised when used in seismic images from new exploration campaigns. This problem is especially true for 2D land surveys where the acquisition process varies, and the images are very noisy. This work presents three methods to improve the generalization performance of DL models of natural gas indication in 2D seismic images, for this task, approaches that come from Machine Learning, ML, and DL are used. The research focuses on data analysis to recognize patterns within the seismic images to enable the selection of training sets for the gas inference model based on patterns in the target images. This approach allows a better generalization of performance without altering the architecture of the gas inference DL model or transforming the original seismic traces. The experiments were carried out using the database of different exploitation fields located in the Parnaíba basin, in northeastern Brazil. The results show an increase of up to 39 percent in the correct indication of natural gas according to the recall metric. This improvement varies in each field and depends on the proposed method used and the existence of representative patterns within the training set of seismic images. These results conclude with an improvement in the generalization performance of the DL gas inference model that varies up to 21 percent according to the F1 score and up to 15 percent according to the IoU metric. These results demonstrate that it is possible to find patterns within the seismic images using an unsupervised approach, and these can be used to recommend the DL training set according to the pattern in the target seismic image; Furthermore, it demonstrates that the training set directly affects the generalization performance of the DL model for seismic images. [pt] AGRUPAMENTO [en] GROUPING [pt] APRENDIZADO PROFUNDO [en] DEEP LEARNING [pt] IMAGEM SISMICA 2D EM TERRA [en] 2D SEISMIC ONSHORE IMAGE [pt] INDICACAO DE GAS [en] GAS INDICATION [en] TRAINING DATASET RECOMMENDATION

Search results

A Study of Fairness and Information Heterogeneity in Recommendation Systems

La recommandation des jeux de données basée sur le profilage pour le liage des données RDF / Profile-based Datas and Recommendation for RDF Data Linking