Global ETD Search

1	Statistical Consistency of Ranking:Bipartite and Multipartite Cases Uematsu, Kazuki 30 August 2012 (has links) No description available. Statistics Learning to rank
2	Learning to rank in supervised and unsupervised settings using convexity and monotonicity Acharyya, Sreangsu 10 September 2013 (has links) This dissertation addresses the task of learning to rank, both in the supervised and unsupervised settings, by exploiting the interplay of convex functions, monotonic mappings and their fixed points. In the supervised setting of learning to rank, one wishes to learn from examples of correctly ordered items whereas in the unsupervised setting, one tries to maximize some quantitatively defined characteristic of a "good" ranking. A ranking method selects one permutation from among the combinatorially many permutations defined on the items to rank. Accomplishing this optimally in the supervised setting, with minimal loss in generality, if any, is challenging. In this dissertation this problem is addressed by optimizing, globally and efficiently, a statistically consistent loss functional over the class of compositions of a linear function by an arbitrary, strictly monotonic, separable mapping with large margins. This capability also enables learning the parameters of a generalized linear model with an unknown link function. The method can handle infinite dimensional feature spaces if the corresponding kernel function is known. In the unsupervised setting, a popular ranking approach is is link analysis over a graph of recommendations, as exemplified by pagerank. This dissertation shows that pagerank may be viewed as an instance of an unsupervised consensus optimization problem. The dissertation then solves a more general problem of unsupervised consensus over noisy, directed recommendation graphs that have uncertainty over the set of "out" edges that emanate from a vertex. The proposed consensus rank is essentially the pagerank over the expected edge-set, where the expectation is computed over the distribution that achieves the most agreeable consensus. This consensus is measured geometrically by a suitable Bregman divergence between the consensus rank and the ranks induced by item specific distributions Real world deployed ranking methods need to be resistant to spam, a particularly sophisticated type of which is link-spam. A popular class of countermeasures "de-spam" the corrupted webgraph by removing abusive pages identified by supervised learning. Since exhaustive detection and neutralization is infeasible, there is a need for ranking functions that can, on one hand, attenuate the effects of link-spam without supervision and on the other hand, counter spam more aggressively when supervision is available. A family of non-linear, iteratively defined monotonic functions is proposed that propagates "rank" and "trust" scores through the webgraph. It relies on non-linearity, monotonicity and Schurconvexity to provide the resistance against spam. / text Learning to rank Convexity Monotonicity Bregman divergence
3	Using reinforcement learning to learn relevance ranking of search queries Sandupatla, Hareesh 05 1900 (has links) Indiana University-Purdue University Indianapolis (IUPUI) / Web search has become a part of everyday life for hundreds of millions of users around the world. However, the effectiveness of a user's search depends vitally on the quality of search result ranking. Even though enormous efforts have been made to improve the ranking quality, there is still significant misalignment between search engine ranking and an end user's preference order. This is evident from the fact that, for many search results on major search and e-commerce platforms, many users ignore the top ranked results and click on the lower ranked results. Nevertheless, finding a ranking that suits all the users is a difficult problem to solve as every user's need is different. So, an ideal ranking is the one which is preferred by the majority of the users. This emphasizes the need for an automated approach which improves the search engine ranking dynamically by incorporating user clicks in the ranking algorithm. In existing search result ranking methodologies, this direction has not been explored profoundly. A key challenge in using user clicks in search result ranking is that the relevance feedback that is learnt from click data is imperfect. This is due to the fact that a user is more likely to click a top ranked result than a lower ranked result, irrespective of the actual relevance of those results. This phenomenon is known as position bias which poses a major difficulty in obtaining an automated method for dynamic update of search rank orders. In my thesis, I propose a set of methodologies which incorporate user clicks for dynamic update of search rank orders. The updates are based on adaptive randomization of results using reinforcement learning strategy by considering the user click activities as reinforcement signal. Beginning at any rank order of the search results, the proposed methodologies guaranty to converge to a ranking which is close to the ideal rank order. Besides, the usage of reinforcement learning strategy enables the proposed methods to overcome the position bias phenomenon. To measure the effectiveness of the proposed method, I perform experiments considering a simplified user behavior model which I call color ball abstraction model. I evaluate the quality of the proposed methodologies using standard information retrieval metrics like Precision at n (P@n), Kendall tau rank correlation, Discounted Cumulative Gain (DCG) and Normalized Discounted Cumulative Gain (NDCG). The experiment results clearly demonstrate the success of the proposed methodologies. Information Retrieval Learning to Rank Web Search
4	Learning to Rank with Contextual Information Han, Peng 15 November 2021 (has links) Learning to rank is utilized in many scenarios, such as disease-gene association, information retrieval and recommender system. Improving the prediction accuracy of the ranking model is the main target of existing works. Contextual information has a significant influence in the ranking problem, and has been proved effective to increase the prediction performance of ranking models. Then we construct similarities for different types of entities that could utilize contextual information uniformly in an extensible way. Once we have the similarities constructed by contextual information, how to uti- lize them for different types of ranking models will be the task we should tackle. In this thesis, we propose four algorithms for learning to rank with contextual informa- tion. To refine the framework of matrix factorization, we propose an area under the ROC curve (AUC) loss to conquer the sparsity problem. Clustering and sampling methods are used to utilize the contextual information in the global perspective, and an objective function with the optimal solution is proposed to exploit the contex- tual information in the local perspective. Then, for the deep learning framework, we apply the graph convolutional network (GCN) on the ranking problem with the combination of matrix factorization. Contextual information is utilized to generate the input embeddings and graph kernels for the GCN. The third method in this thesis is proposed to directly exploit the contextual information for ranking. Laplacian loss is utilized to solve the ranking problem, which could optimize the ranking matrix directly. With this loss, entities with similar contextual information will have similar ranking results. Finally, we propose a two-step method to solve the ranking problem of the sequential data. The first step in this two-step method is to generate the em- beddings for all entities with a new sampling strategy. Graph neural network (GNN) and long short-term memory (LSTM) are combined to generate the representation of sequential data. Once we have the representation of the sequential data, we could solve the ranking problem of them with pair-wise loss and sampling strategy. learning to rank matrix factorization deep learning Laplacian regularization
5	Exploring Entity Relationship in Pairwise Ranking: Adaptive Sampler and Beyond Yu, Lu 12 1900 (has links) Living in the booming age of information, we have to rely on powerful information retrieval tools to seek the unique piece of desired knowledge from such a big data world, like using personalized search engine and recommendation systems. As one of the core components, ranking model can appear in almost everywhere as long as we need a relative order of desired/relevant entities. Based on the most general and intuitive assumption that entities without user actions (e.g., clicks, purchase, comments) are of less interest than those with user actions, the objective function of pairwise ranking models is formulated by measuring the contrast between positive (with actions) and negative (without actions) entities. This contrastive relationship is the core of pairwise ranking models. The construction of these positive-negative pairs has great influence on the model inference accuracy. Especially, it is challenging to explore the entity relationships in heterogeneous information network. In this thesis, we aim at advancing the development of the methodologies and principles of mining heterogeneous information network through learning entity relations from a pairwise learning to rank optimization perspective. More specifically we first show the connections of different relation learning objectives modified from different ranking metrics including both pairwise and list-wise objectives. We prove that most of popular ranking metrics can be optimized in the same lower bound. Secondly, we propose the class-imbalance problem imposed by entity relation comparison in ranking objectives, and prove that class-imbalance problem can lead to frequency 5 clustering and gradient vanishment problems. As a response, we indicate out that developing a fast adaptive sampling method is very essential to boost the pairwise ranking model. To model the entity dynamic dependency, we propose to unify the individual-level interaction and union-level interactions, and result in a multi-order attentive ranking model to improve the preference inference from multiple views. Deep Learning Contrastive Learning Recommender Systems Learning to Rank
6	Query-Dependent Selection of Retrieval Alternatives Balasubramanian, Niranjan 01 September 2011 (has links) The main goal of this thesis is to investigate query-dependent selection of retrieval alternatives for Information Retrieval (IR) systems. Retrieval alternatives include choices in representing queries (query representations), and choices in methods used for scoring documents. For example, an IR system can represent a user query without any modification, automatically expand it to include more terms, or reduce it by dropping some terms. The main motivation for this work is that no single query representation or retrieval model performs the best for all queries. This suggests that selecting the best representation or retrieval model for each query can yield improved performance. The key research question in selecting between alternatives is how to estimate the performance of the different alternatives. We treat query dependent selection as a general problem of selecting between the result sets of different alternatives. We develop a relative effectiveness estimation technique using retrieval-based features and a learning formulation that directly predict differences between the results sets. The main idea behind this technique is to aggregate the scores and features used for retrieval (retrieval-based features) as evidence towards the effectiveness of the results set. We apply this general technique to select between alternatives reduced versions for long queries and to combine multiple ranking algorithms. Then, we investigate the extension of query-dependent selection under specific efficiency constraints. Specifically, we consider the black-box meta-search scenario, where querying all available search engines can be expensive and the features and scores used by the search engines are not available. We develop easy-to-compute features based on the results page alone to predict when querying an alternate search engine can be useful. Finally, we present an analysis of selection performance to better understand when query-dependent selection can be useful. Information Retrieval Learning to Rank Machine Learning Computer Sciences
7	Query Expansion Study for Clinical Decision Support Zhuang, Wenjie 12 February 2018 (has links) Information retrieval is widely used for retrieving relevant information among a variety of data, such as text documents, images, audio and videos. Since the first medical batch retrieval system was developed in mid 1960s, significant research efforts have focused on applying information retrieval to medical data. However, despite the vast developments in medical information retrieval and accompanying technologies, the actual promise of this area remains unfulfilled due to properties of medical data and the huge volume of medical literature. Specifically, the recall and precision of the selected dataset from the TREC clinical decision support track are low. The overriding objective of this thesis is to improve the performance of information retrieval techniques applied to biomedical text documents. We have focused on improving recall and precision among the top retrieved results. To that end, we have removed redundant words, and then expanded queries by adding MeSH terms in TREC CDS topics. We have also used other external data sources and domain knowledge to implement the expansion. In addition, we have also considered using the doc2vec model to optimize retrieval. Finally, we have applied learning to rank which sorts documents based on relevance and put relevant documents in front of irrelevant documents, so as to return the relevant retrieved data on the top. We have discovered that queries, expanded with external data sources and domain knowledge, perform better than applying the TREC topic information directly. / Master of Science / Information retrieval is widely used for retrieving relevant information among a variety of data. Since the first medical batch retrieval system was developed in mid 1960s, significant research efforts have focused on applying information retrieval to medical data. However the actual promise of this area remains unfulfilled due to certain properties of medical data and the sheer volume of medical literature. The overriding objective of this thesis is to improve the performance of information retrieval techniques applied to biomedical text documents. This thesis presents several ways to implement query expansion in order to make more efficient retrieval. Then this thesis discusses some approaches to put documents relevant to the queries at the top. Query Expansion Information Retrieval Doc2Vec MeSH Term Learning to Rank
8	Seleção e geração de características utilizando regras de associação para o problema de ordenação de resultados de máquinas de buscas / Feature selection and generation using assossiation rules for the ranking problem of searches machines Araujo, Carina Calixto Ribeiro de 29 August 2014 (has links) Submitted by Luciana Ferreira (lucgeral@gmail.com) on 2015-03-31T12:22:43Z No. of bitstreams: 2 Dissertação - Carina Calixto Ribeiro de Araujo - 2014.pdf: 962707 bytes, checksum: 35c8b1aaf03b3f0aeefb923de0f8dfcc (MD5) license_rdf: 23148 bytes, checksum: 9da0b6dfac957114c6a7714714b86306 (MD5) / Approved for entry into archive by Luciana Ferreira (lucgeral@gmail.com) on 2015-04-01T10:56:06Z (GMT) No. of bitstreams: 2 Dissertação - Carina Calixto Ribeiro de Araujo - 2014.pdf: 962707 bytes, checksum: 35c8b1aaf03b3f0aeefb923de0f8dfcc (MD5) license_rdf: 23148 bytes, checksum: 9da0b6dfac957114c6a7714714b86306 (MD5) / Made available in DSpace on 2015-04-01T10:56:06Z (GMT). No. of bitstreams: 2 Dissertação - Carina Calixto Ribeiro de Araujo - 2014.pdf: 962707 bytes, checksum: 35c8b1aaf03b3f0aeefb923de0f8dfcc (MD5) license_rdf: 23148 bytes, checksum: 9da0b6dfac957114c6a7714714b86306 (MD5) Previous issue date: 2014-08-29 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - CAPES / Information Retrieval is an area of IT that deals with document storage and the information retrieval in these documents. With the advent of the Internet, the number of documents produced has increased as well as the need to retrieve the information more accurately. Many approaches have been proposed to meet these requirements and one of them is Learning to rank (L2R). Despite major advances achieved in the accuracy of retrived documents, there is still considerable room for improvement. This master thesis proposes the use of feature selection and generation using association rules to improve the accuracy of the L2R methods. / Recuperação de Informação é a área da informática que lida com o armazenamento de documentos e a recuperação de informação desses documentos. Com o advento da internet a quantidade de documentos produzidos aumentou, bem como a necessidade de recuperar a informação de forma mais mais precisa. Muitas abordagens surgiram para suprir essa requisição e uma delas é a abordagem Learning to Rank (L2R). Apesar de obtidos grandes avanços na precisão dos documentos retornados, ainda há espaço para melhorias. Esse trabalho de mestrado propõe a utilização de seleção e geração de características utilizando regras de associação para conseguir uma melhoria na acurácia dos métodos de L2R. Recuperação da informação Learning to rank Regras de associação Information retrieval Learning to rank Assossiation rules
9	Projeto e avaliação de algoritmos paralelos para sistemas Multicore e Manycore aplicados no processamento de documentos / Design and evaluation of parallel algorithms for Multicore and Manycore systems applied on document processing Freitas, Mateus Ferreira e 30 August 2017 (has links) Submitted by Luciana Ferreira (lucgeral@gmail.com) on 2017-10-02T15:28:01Z No. of bitstreams: 2 Dissertação - Mateus Ferreira e Freitas - 2017.pdf: 4269845 bytes, checksum: e84e69d8747a21125170793812384a98 (MD5) license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) / Approved for entry into archive by Luciana Ferreira (lucgeral@gmail.com) on 2017-10-02T15:30:07Z (GMT) No. of bitstreams: 2 Dissertação - Mateus Ferreira e Freitas - 2017.pdf: 4269845 bytes, checksum: e84e69d8747a21125170793812384a98 (MD5) license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) / Made available in DSpace on 2017-10-02T15:30:07Z (GMT). No. of bitstreams: 2 Dissertação - Mateus Ferreira e Freitas - 2017.pdf: 4269845 bytes, checksum: e84e69d8747a21125170793812384a98 (MD5) license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) Previous issue date: 2017-08-30 / Several applications process documents in different ways, aiming to filter, organize or learn with them. Nowadays, a great computational power is necessary in order to do that efficiently, due to the large and increasing number of documents. Usually, documents are independent of each other, which facilitates the use of parallelism to speed up this processing. This work explores three problems: active learning, learning to rank (L2R) and top-k search. Using the parallelism on multicore CPUs and manycore GPUs (Graphics Processing Unit), parallel algorithms were proposed and evaluated for each problem, and implemented with the OpenMP and CUDA APIs. For the active learning problem a multicore algorithm was proposed, which obtained 10.8x of speedup in the best case with 12 threads. The proposed manycore version obtained 128x of speedup over the serial version, and a solution with 4 GPUs achieved 3.5x of speedup over 1 GPU. For the L2R problem a manycore algorithm was proposed, which follows a thread-block approach using the concept of Combinadic, and uses a cache with fingerprint to speed up the processing. The best case speedups were 508x over the serial, 9x over a GPU baseline, and 4x over our solution when using 4 GPUs. When comparing with a version without combinadic, the speedup over it was 4.4x with both versions using 1 GPU and 3.9x with 4. These solutions used bitmap structures to speed up the association rules creation. In the top-k search a serial and multicore solutions were implemented from a state of the art manycore algorithm for exact searches. These implementations served as baselines for our extension of this algorithm, which includes the use of multi-GPU, group searches and an intra-block load balancing. The speedups were 2.7x over the original algorithm, 17x over the serial, 4x over the multicore, and 4x over our version when using 4 GPUs. / Diversas aplicações processam documentos de diferentes maneiras, visando filtrá-los, organizá-los ou aprender com eles. Atualmente, é necessário um grande poder computacional para que isso seja feito eficientemente, devido ao número grande e crescente de documentos. Geralmente os documentos são independentes entre si, o que facilita o uso de paralelismo para acelerar esse processamento. Este trabalho explora três problemas: aprendizado ativo, learning to rank (L2R) e busca top-k. Usando o paralelismo em CPUs multicore e GPUs (Graphics Processing Unit) manycore, algoritmos paralelos foram propostos e avaliados para cada problema, e implementados com as APIs OpenMP e CUDA. Para problema de aprendizado ativo foi proposto um algoritmo multicore, que obteve speedup de 10,8x no melhor caso com 12 threads. A versão manycore proposta obteve speedup de 128x em relação ao serial, e uma solução com 4 GPUs atingiu 3,5x de speedup sobre 1 GPU. Para o problema de L2R foi proposto um algoritmo manycore, que segue uma abordagem por bloco de threads} usando o conceito de Combinadic, e usa uma cache} com fingerprint para acelerar o processamento. Os speedups nos melhores casos foram de 508x sobre o serial, 9x sobre uma baseline em GPU, e 4x sobre nossa solução com 1 GPU ao usar 4 GPUs. Ao comparar com uma versão sem o combinadic, o speedup sobre ela foi de 4,4x com ambas versões usando 1 GPU e 3,9x usando 4. Estas soluções usaram estruturas de mapa de bits para acelerar a criação de regras de associação. Na busca top-k foram implementadas uma solução serial e uma multicore de um algoritmo manycore estado da arte para buscas exatas. Estas implementações serviram de baseline para nossa extensão desse algoritmo, que inclui o uso de multi-GPU, buscas em grupos e um balanceamento de carga intra-bloco. Os speedups obtidos foram de 2,7x sobre o algoritmo original, 17x sobre o serial, 4x sobre o multicore, e 4x sobre nossa versão ao usar 4 GPUs. Paralelismo Regras de associação Aprendizado ativo Busca top-K parallelism Learning to rank GPU Association rules Learning to rank Active learning Top-K search
10	Recuperação de documentos e pessoas em ambientes empresariais através de árvores de decisão. / Documents and people retrieval in enterprises using decision tree. Barth, Fabrício Jailson 29 May 2009 (has links) Este trabalho avalia o desempenho do uso de árvores de decisão como função de ordenação para documentos e pessoas em ambientes empresariais. Para tanto, identificouse atributos relevantes das entidades a serem recuperadas a partir da análise de: (i) dinâmica de produção e consumo de informações em um ambiente empresarial; (ii) algoritmos existentes na literatura para a recuperação de documentos e pessoas; e (iii) conceitos utilizados em funções de ordenação para domínios genéricos. Montou-se um ambiente de avaliação, utilizando a coleção de referência CERC, para avaliar a aplicabilidade do algoritmo C4.5 na obtenção de funções de ordenação para o domínio empresarial. O uso do algoritmo C4.5 para a construção de funções de ordenação mostrou-se parcialmente efetivo. Para a tarefa de recuperação de documentos não trouxe resultados bons. Porém, constatou-se que é possível controlar a forma de construção da função de ordenação a fim de otimizar a precisão nas primeiras posições do ranking ou otimizar a média das precisões (MAP). Para a tarefa de recuperação de pessoas o algoritmo C4.5 obteve uma árvore de decisão que consegue resultados melhores que todas as outras funções de ordenação avaliadas. OMAP obtido pela árvore de decisão foi 0, 83, enquanto que a média do MAP das outras funções de ordenação foi de 0, 74. Percebeu-se que a árvore de decisão utilizada para representar a função de ordenação contribui para a compreensão da composição dos diversos atributos utilizados na caracterização dos documentos e pessoas. A partir da análise da árvore de decisão utilizada como função de ordenação para pessoas foi possível entender que uma pessoa é considerada especialista em algum tópico se ela aparecer em muitos documentos, aparecer muitas vezes nos documentos e os documentos onde aparece têm uma relevância alta para a consulta. / This work evaluates the performance of using decision trees as ranking functions for documents and people in enterprises. It was identified relevant attributes of the entities to be retrieved from the analysis of: (i) the production and consumption of information behavior in an enterprise, (ii) algorithms for documents and people retrieval at literature, and (iii) the concepts used in ranking functions for generic domains. It was set up an evaluation environment, using the CERC collection, to evaluate the applicability of the C4.5 algorithm to obtain a ranking function for the enterprise domain. The use of C4.5 algorithm for the construction of ranking function was proved to be partially effective. In the case of documents retrieval the C4.5 has not found good results. However, it was found that is possible to control the way of building the ranking function in order to optimize the precision in the first positions of the ranking or optimize the mean average precision (MAP). For the task of people retrieval the C4.5 algorithm developed a ranking function that obtain better results than all other ranking functions assessed. The value of MAP obtained by decision tree was 0, 83, while the average MAP of other ranking functions was 0, 74. The decision tree used to represent the ranking function contributes to understanding the attributes composition used in the characterization of documents and people. Through the analysis of the decision tree used as ranking function for people, we could realise that a person is considered expert in any topic if he/she appear in many documents, appear many times in same documents and documents where he/she appears have a high relevance to the query. Aprendizado computacional Gestão da informação Information management Information retrieval Learning to rank Machine learning Recuperação da informação

Search results