Global ETD Search

11	Learning deep embeddings by learning to rank He, Kun 05 February 2019 (has links) We study the problem of embedding high-dimensional visual data into low-dimensional vector representations. This is an important component in many computer vision applications involving nearest neighbor retrieval, as embedding techniques not only perform dimensionality reduction, but can also capture task-specific semantic similarities. In this thesis, we use deep neural networks to learn vector embeddings, and develop a gradient-based optimization framework that is capable of optimizing ranking-based retrieval performance metrics, such as the widely used Average Precision (AP) and Normalized Discounted Cumulative Gain (NDCG). Our framework is applied in three applications. First, we study Supervised Hashing, which is concerned with learning compact binary vector embeddings for fast retrieval, and propose two novel solutions. The first solution optimizes Mutual Information as a surrogate ranking objective, while the other directly optimizes AP and NDCG, based on the discovery of their closed-form expressions for discrete Hamming distances. These optimization problems are NP-hard, therefore we derive their continuous relaxations to enable gradient-based optimization with neural networks. Our solutions establish the state-of-the-art on several image retrieval benchmarks. Next, we learn deep neural networks to extract Local Feature Descriptors from image patches. Local features are used universally in low-level computer vision tasks that involve sparse feature matching, such as image registration and 3D reconstruction, and their matching is a nearest neighbor retrieval problem. We leverage our AP optimization technique to learn both binary and real-valued descriptors for local image patches. Compared to competing approaches, our solution eliminates complex heuristics, and performs more accurately in the tasks of patch verification, patch retrieval, and image matching. Lastly, we tackle Deep Metric Learning, the general problem of learning real-valued vector embeddings using deep neural networks. We propose a learning to rank solution through optimizing a novel quantization-based approximation of AP. For downstream tasks such as retrieval and clustering, we demonstrate promising results on standard benchmarks, especially in the few-shot learning scenario, where the number of labeled examples per class is limited. Computer science Average precision Computer vision Deep learning Learning to rank Nearest neighbor retrieval Vector embedding
12	Learning to rank para busca em Comércio Eletrônico Fonseca, Roberto Cidade, (095)991366353 28 August 2018 (has links) Submitted by Roberto Fonseca (rcf2@icomp.ufam.edu.br) on 2018-11-18T00:36:14Z No. of bitstreams: 2 rcidadef-final-dissertacao-mestrado.pdf: 998750 bytes, checksum: 1738deb5326e881be7192f444ccedb86 (MD5) 315 ATA de Defesa - Roberto Cidade (Assinada).pdf: 531920 bytes, checksum: 51157459356b7ee8be9be278b4579378 (MD5) / Approved for entry into archive by Secretaria PPGI (secretariappgi@icomp.ufam.edu.br) on 2018-11-19T17:31:42Z (GMT) No. of bitstreams: 2 rcidadef-final-dissertacao-mestrado.pdf: 998750 bytes, checksum: 1738deb5326e881be7192f444ccedb86 (MD5) 315 ATA de Defesa - Roberto Cidade (Assinada).pdf: 531920 bytes, checksum: 51157459356b7ee8be9be278b4579378 (MD5) / Approved for entry into archive by Divisão de Documentação/BC Biblioteca Central (ddbc@ufam.edu.br) on 2018-11-19T19:46:24Z (GMT) No. of bitstreams: 2 rcidadef-final-dissertacao-mestrado.pdf: 998750 bytes, checksum: 1738deb5326e881be7192f444ccedb86 (MD5) 315 ATA de Defesa - Roberto Cidade (Assinada).pdf: 531920 bytes, checksum: 51157459356b7ee8be9be278b4579378 (MD5) / Made available in DSpace on 2018-11-19T19:46:24Z (GMT). No. of bitstreams: 2 rcidadef-final-dissertacao-mestrado.pdf: 998750 bytes, checksum: 1738deb5326e881be7192f444ccedb86 (MD5) 315 ATA de Defesa - Roberto Cidade (Assinada).pdf: 531920 bytes, checksum: 51157459356b7ee8be9be278b4579378 (MD5) Previous issue date: 2018-08-28 / Machine learning (ML) based ranking functions generating methods have been broadly used on web search systems, such as the utilized by Google and Bing. Nonetheless, such methods have not been employed or studied in other contexts. It is the case, to cite an example, of electronic commerce (e-commerce), on which the user interaction with virtual stores produces data as: when an user landed on a page for the first time, queries submitted, products clicked and what she bought. In this work, we propose to leverage ML to learn ranking functions for the e-commerce context. We studied alternatives to estimate the relevance of a result for a given query and deployed experiments using data mined from e-commerce shops. We ran experiments in setups we denominated offline, where a dataset was created the traditional way by separating it in three subsets of training, validation and test, as well as in setups we denominated online, where distinct versions of the system were deployed to shops facing users in a real purchase situation. We present in the study our conclusions regarding the performed experiments. / Métodos que geram funções de ordenação de resultados baseadas em aprendizagem de máquina têm sido amplamente utilizados em sistemas de busca para a web, como as utilizadas em motores de busca como o Google e Bing. No entanto, esses recursos não têm sido muito empregados ou estudados em outros contextos. É o caso, por exemplo, do comércio eletrônico, no qual, a interação de usuários com lojas virtuais produz dados como: quando um usuário acessou a página de uma loja pela primeira vez, que consultas realizou, quais produtos clicou, e o que comprou. Neste trabalho, propomos a utilização de métodos de aprendizagem de máquina para aprender funções de ordenação de resultados no contexto de comércio eletrônico. Estudamos formas alternativas de estimar a relevância de um resultado para uma dada consulta e realizamos experimentos utilizando dados extraídos de lojas de comércio eletrônico. Realizamos experimentos tanto com ambientes que denominamos offline, onde uma base de dados é montada com a abordagem tradicional de separa-la em treino, validação e teste, quanto em ambientes que denominamos online, onde pusemos versões distintas dos sistemas para funcionar em lojas com usuários em situações reais de compra. Apresentamos no estudo nossas conclusões a respeito dos experimentos realizados. / Formulário longo, com várias fases e páginas. Learning to Rank Machine Learning Recuperação de Informação Comércio Eletrônico Teste A/B CIÊNCIAS EXATAS E DA TERRA
13	Recuperação de documentos e pessoas em ambientes empresariais através de árvores de decisão. / Documents and people retrieval in enterprises using decision tree. Fabrício Jailson Barth 29 May 2009 (has links) Este trabalho avalia o desempenho do uso de árvores de decisão como função de ordenação para documentos e pessoas em ambientes empresariais. Para tanto, identificouse atributos relevantes das entidades a serem recuperadas a partir da análise de: (i) dinâmica de produção e consumo de informações em um ambiente empresarial; (ii) algoritmos existentes na literatura para a recuperação de documentos e pessoas; e (iii) conceitos utilizados em funções de ordenação para domínios genéricos. Montou-se um ambiente de avaliação, utilizando a coleção de referência CERC, para avaliar a aplicabilidade do algoritmo C4.5 na obtenção de funções de ordenação para o domínio empresarial. O uso do algoritmo C4.5 para a construção de funções de ordenação mostrou-se parcialmente efetivo. Para a tarefa de recuperação de documentos não trouxe resultados bons. Porém, constatou-se que é possível controlar a forma de construção da função de ordenação a fim de otimizar a precisão nas primeiras posições do ranking ou otimizar a média das precisões (MAP). Para a tarefa de recuperação de pessoas o algoritmo C4.5 obteve uma árvore de decisão que consegue resultados melhores que todas as outras funções de ordenação avaliadas. OMAP obtido pela árvore de decisão foi 0, 83, enquanto que a média do MAP das outras funções de ordenação foi de 0, 74. Percebeu-se que a árvore de decisão utilizada para representar a função de ordenação contribui para a compreensão da composição dos diversos atributos utilizados na caracterização dos documentos e pessoas. A partir da análise da árvore de decisão utilizada como função de ordenação para pessoas foi possível entender que uma pessoa é considerada especialista em algum tópico se ela aparecer em muitos documentos, aparecer muitas vezes nos documentos e os documentos onde aparece têm uma relevância alta para a consulta. / This work evaluates the performance of using decision trees as ranking functions for documents and people in enterprises. It was identified relevant attributes of the entities to be retrieved from the analysis of: (i) the production and consumption of information behavior in an enterprise, (ii) algorithms for documents and people retrieval at literature, and (iii) the concepts used in ranking functions for generic domains. It was set up an evaluation environment, using the CERC collection, to evaluate the applicability of the C4.5 algorithm to obtain a ranking function for the enterprise domain. The use of C4.5 algorithm for the construction of ranking function was proved to be partially effective. In the case of documents retrieval the C4.5 has not found good results. However, it was found that is possible to control the way of building the ranking function in order to optimize the precision in the first positions of the ranking or optimize the mean average precision (MAP). For the task of people retrieval the C4.5 algorithm developed a ranking function that obtain better results than all other ranking functions assessed. The value of MAP obtained by decision tree was 0, 83, while the average MAP of other ranking functions was 0, 74. The decision tree used to represent the ranking function contributes to understanding the attributes composition used in the characterization of documents and people. Through the analysis of the decision tree used as ranking function for people, we could realise that a person is considered expert in any topic if he/she appear in many documents, appear many times in same documents and documents where he/she appears have a high relevance to the query. Aprendizado computacional Gestão da informação Recuperação da informação Information management Information retrieval Learning to rank Machine learning
14	A Query Dependent Ranking Approach for Information Retrieval Lee, Lian-Wang 28 August 2009 (has links) Ranking model construction is an important topic in information retrieval. Recently, many approaches based on the idea of ¡§learning to rank¡¨ have been proposed for this task and most of them attempt to score all documents of different queries by resorting to a single function. In this thesis, we propose a novel framework of query-dependent ranking. A simple similarity measure is used to calculate similarities between queries. An individual ranking model is constructed for each training query with corresponding documents. When a new query is asked, documents retrieved for the new query are ranked according to the scores determined by a ranking model which is combined from the models of similar training queries. A mechanism for determining combining weights is also provided. Experimental results show that this query dependent ranking approach is more effective than other approaches. information retrieval Ranking model model combination query similarity learning to rank query dependent ranking
15	APPLICATION OF RANDOM INDEXING TO MULTI LABEL CLASSIFICATION PROBLEMS: A CASE STUDY WITH MESH TERM ASSIGNMENT AND DIAGNOSIS CODE EXTRACTION Lu, Yuan 01 January 2015 (has links) Many manual biomedical annotation tasks can be categorized as instances of the typical multi-label classification problem where several categories or labels from a fixed set need to assigned to an input instance. MeSH term assignment to biomedical articles and diagnosis code extraction from medical records are two such tasks. To address this problem automatically, in this thesis, we present a way to utilize latent associations between labels based on output label sets. We used random indexing as a method to determine latent associations and use the associations as a novel feature in a learning-to-rank algorithm that reranks candidate labels selected based on either k-NN or binary relevance approach. Using this new feature as part of other features, for MeSH term assignment, we train our ranking model on a set of 200 documents, test it on two public datasets, and obtain new state-of-the-art results in precision, recall, and mean average precision. In diagnosis code extraction, we reach an average micro F-score of 0.478 based on a large EMR dataset from the University of Kentucky Medical Center, the first study of its kind to our knowledge. Our study shows the advantages and potential of random indexing method in determining and utilizing implicit relationships between labels in multi-label classification problems. MeSH ICD-9 Random Indexing Learning-to-Rank Coordinate Ascent Information Retrieval Other Computer Engineering
16	Context-Aware Rank-Oriented Recommender Systems January 2012 (has links) abstract: Recommender systems are a type of information filtering system that suggests items that may be of interest to a user. Most information retrieval systems have an overwhelmingly large number of entries. Most users would experience information overload if they were forced to explore the full set of results. The goal of recommender systems is to overcome this limitation by predicting how users will value certain items and returning the items that should be of the highest interest to the user. Most recommender systems collect explicit user feedback, such as a rating, and attempt to optimize their model to this rating value. However, there is potential for a system to collect implicit user feedback, such as user purchases and clicks, to learn user preferences. Additionally with implicit user feedback, it is possible for the system to remember the context of user feedback in terms of which other items a user was considering when making their decisions. When considering implicit user feedback, only a subset of all evaluation techniques can be used. Currently, sufficient evaluation techniques for evaluating implicit user feedback do not exist. In this thesis, I introduce a new model for recommendation that borrows the idea of opportunity cost from economics. There are two variations of the model, one considering context and one that does not. Additionally, I propose a new evaluation measure that works specifically for the case of implicit user feedback. / Dissertation/Thesis / M.S. Computer Science 2012 Computer science Information science context-aware implicit feedback learning to rank recommender systems
17	Learning to rank: combinação de algoritmos aplicando stacking e análise dos resultados Paris, Bruno Mendonça 07 November 2017 (has links) Submitted by Marta Toyoda (1144061@mackenzie.br) on 2018-02-21T23:45:28Z No. of bitstreams: 2 Bruno Mendonça Paris.pdf: 2393892 bytes, checksum: 0cd807e0fd978642fc513bf059389c1f (MD5) license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) / Approved for entry into archive by Paola Damato (repositorio@mackenzie.br) on 2018-04-04T11:43:59Z (GMT) No. of bitstreams: 2 Bruno Mendonça Paris.pdf: 2393892 bytes, checksum: 0cd807e0fd978642fc513bf059389c1f (MD5) license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) / Made available in DSpace on 2018-04-04T11:43:59Z (GMT). No. of bitstreams: 2 Bruno Mendonça Paris.pdf: 2393892 bytes, checksum: 0cd807e0fd978642fc513bf059389c1f (MD5) license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) Previous issue date: 2017-11-07 / With the growth of the amount of information available in recent years, which will continue to grow due to the increase in users, devices and information shared over the internet, accessing the desired information should be done in a quick way so it is not spent too much time looking for what you want. A search in engines like Google, Yahoo, Bing is expected that the rst results bring the desired information. An area that aims to bring relevant documents to the user is known as Information Retrieval and can be aided by Learning to Rank algorithms, which applies machine learning to try to bring important documents to users in the best possible ordering. This work aims to verify a way to get an even better ordering of documents, using a technique of combining algorithms known as Stacking. To do so, it will used the RankLib tool, part of Lemur Project, developed in the Java language that contains several Learning to Rank algorithms, and the datasets from a base maintained by Microsoft Research Group known as LETOR. / Com o crescimento da quantidade de informação disponível nos últimos anos, a qual irá continuar crescendo devido ao aumento de usuários, dispositivos e informações compartilhadas pela internet, acessar a informação desejada deve ser feita de uma maneira rápida afim de não se gastar muito tempo procurando o que se deseja. Uma busca em buscadores como Google, Yahoo, Bing espera-se que os primeiros resultados tragam a informação desejada. Uma área que tem o objetivo de trazer os documentos relevantes para o usuário é conhecida por Recuperação de Informação e pode ser auxiliada por algoritmos Learning to Rank, que aplica aprendizagem de máquina para tentar trazer os documentos importantes aos usuários na melhor ordenação possível. Esse trabalho visa verificar uma maneira de obter uma ordenação ainda melhor de documentos, empregando uma técnica de combinar algoritmos conhecida por Stacking. Para isso será utilizada a ferramenta RankLib, parte de um projeto conhecido por Lemur, desenvolvida na linguagem Java, que contém diversos algoritmos Learning to Rank, e o conjuntos de dados provenientes de uma base mantida pela Microsoft Research Group conhecida por LETOR. recuperação de informação ranking learning to rank stacking
18	A framework for finding and summarizing product defects, and ranking helpful threads from online customer forums through machine learning Jiao, Jian 05 June 2013 (has links) The Internet has revolutionized the way users share and acquire knowledge. As important and popular Web-based applications, online discussion forums provide interactive platforms for users to exchange information and report problems. With the rapid growth of social networks and an ever increasing number of Internet users, online forums have accumulated a huge amount of valuable user-generated data and have accordingly become a major information source for business intelligence. This study focuses specifically on product defects, which are one of the central concerns of manufacturing companies and service providers, and proposes a machine learning method to automatically detect product defects in the context of online forums. To complement the detection of product defects , we also present a product feature extraction method to summarize defect threads and a thread ranking method to search for troubleshooting solutions. To this end, we collected different data sets to test these methods experimentally and the results of the tests show that our methods are very promising: in fact, in most cases, they outperformed the current state-of-the-art methods. / Ph. D. product defect detection product feature extraction summarization clustering learning to rank thread ranking
19	Practical Web-scale Recommender Systems / 実用的なWebスケール推薦システム / # ja-Kana Tagami, Yukihiro 25 September 2018 (has links) 京都大学 / 0048 / 新制・課程博士 / 博士(情報学) / 甲第21390号 / 情博第676号 / 新制\|\|情\|\|117(附属図書館) / 京都大学大学院情報学研究科知能情報学専攻 / (主査)教授鹿島久嗣, 教授山本章博, 教授下平英寿 / 学位規則第4条第1項該当 / Doctor of Informatics / Kyoto University / DFAM Recommender systems Online advertising Extreme multi-label classification Learning-to-rank Approximate nearest neighbor search 007
20	Robust learning to rank models and their biomedical applications Sotudian, Shahabeddin 24 May 2023 (has links) There exist many real-world applications such as recommendation systems, document retrieval, and computational biology where the correct ordering of instances is of equal or greater importance than predicting the exact value of some discrete or continuous outcome. Learning-to-Rank (LTR) refers to a group of algorithms that apply machine learning techniques to tackle these ranking problems. Despite their empirical success, most existing LTR models are not built to be robust to errors in labeling or annotation, distributional data shift, or adversarial data perturbations. To fill this gap, we develop four LTR frameworks that are robust to various types of perturbations. First, Pairwise Elastic Net Regression Ranking (PENRR) is an elastic-net-based regression method for drug sensitivity prediction. PENRR infers robust predictors of drug responses from patient genomic information. The special design of this model (comparing each drug with other drugs in the same cell line and comparing that drug with itself in other cell lines) significantly enhances the accuracy of the drug prediction model under limited data. This approach is also able to solve the problem of fitting on the insensitive drugs that is commonly encountered in regression-based models. Second, Regression-based Ranking by Pairwise Cluster Comparisons (RRPCC) is a ridge-regression-based method for ranking clusters of similar protein complex conformations generated by an underlying docking program (i.e., ClusPro). Rather than using regression to predict scores, which would equally penalize deviations for either low-quality and high-quality clusters, we seek to predict the difference of scores for any pair of clusters corresponding to the same complex. RRPCC combines these pairwise assessments to form a ranked list of clusters, from higher to lower quality. We apply RRPCC to clusters produced by the automated docking server ClusPro and, depending on the training/validation strategy, we show. improvement by 24%–100% in ranking acceptable or better quality clusters first, and by 15%–100% in ranking medium or better quality clusters first. Third, Distributionally Robust Multi-Output Regression Ranking (DRMRR) is a listwise LTR model that induces robustness into LTR problems using the Distributionally Robust Optimization framework. Contrasting to existing methods, the scoring function of DRMRR was designed as a multivariate mapping from a feature vector to a vector of deviation scores, which captures local context information and cross-document interactions. DRMRR employs ranking metrics (i.e., NDCG) in its output. Particularly, we used the notion of position deviation to define a vector of relevance score instead of a scalar one. We then adopted the DRO framework to minimize a worst-case expected multi-output loss function over a probabilistic ambiguity set that is defined by the Wasserstein metric. We also presented an equivalent convex reformulation of the DRO problem, which is shown to be tighter than the ones proposed by the previous studies. Fourth, Inversion Transformer-based Neural Ranking (ITNR) is a Transformer-based model to predict drug responses using RNAseq gene expression profiles, drug descriptors, and drug fingerprints. It utilizes a Context-Aware-Transformer architecture as its scoring function that ensures the modeling of inter-item dependencies. We also introduced a new loss function using the concept of Inversion and approximate permutation matrices. The accuracy and robustness of these LTR models are verified through three medical applications, namely cluster ranking in protein-protein docking, medical document retrieval, and drug response prediction. Bioinformatics Drug discovery Learning to rank Personalized medicine Protein docking Recommendation systems Transformer

Search results