Global ETD Search

21	Features for Ranking Tweets Based on Credibility and Newsworthiness Ross, Jacob W. 11 May 2015 (has links) No description available. Computer Science
22	Learning to Rank Algorithms and Their Application in Machine Translation Xia, Tian January 2015 (has links) No description available. Computer Engineering learning to rank boosting regression tree algorithm analysis machine translation training
23	Automated Software Defect Localization Ye, Xin 23 September 2016 (has links) No description available. Computer Science Software maintenance bug reports learning to rank word embeddings API documents
24	Shoppin’ in the Rain : An Evaluation of the Usefulness of Weather-Based Features for an ML Ranking Model in the Setting of Children’s Clothing Online Retailing / Handla i regnet : En utvärdering av användbarheten av väderbaserade variabler för en ML-rankningsmodell inom onlineförsäljning av barnkläder Lorentz, Isac January 2023 (has links) Online shopping offers numerous benefits, but large product catalogs make it difficult for shoppers to understand the existence and characteristics of every item for sale. To simplify the decision-making process, online retailers use ranking models to recommend products relevant to each individual user. Contextual user data, such as location, time, or local weather conditions, can serve as valuable features for ranking models, enabling personalized real-time recommendations. Little research has been published on the usefulness of weather-based features for ranking models in online clothing retailing, which makes additional research into this topic worthwhile. Using Swedish sales and customer data from Babyshop, an online retailer of children’s fashion, this study examined possible correlations between local weather data and sales. This was done by comparing differences in daily weather and differences in daily shares of sold items per clothing category for two cities: Stockholm and Göteborg. With Malmö as an additional city, historical observational weather data from one location each in the three cities Stockholm, Göteborg, and Malmö was then featurized and used along with the customers’ postal towns, sales features, and sales trend features to train and evaluate the ranking relevancy of a gradient boosted decision trees learning to rank LightGBM ranking model with weather features. The ranking relevancy was compared against a LightGBM baseline that omitted the weather features and a naive baseline: a popularity-based ranker. Several possible correlations between a clothing category such as shorts, rainwear, shell jackets, winter wear, and a weather variable such as feels-like temperature, solar energy, wind speed, precipitation, snow, and snow depth were found. Evaluation of the ranking relevancy was done using the mean reciprocal rank and the mean average precision @ 10 on a small dataset consisting only of customer data from the postal towns Stockholm, Göteborg, and Malmö and also on a larger dataset where customers in postal towns from larger geographical areas had their home locations approximated as Stockholm, Göteborg or Malmö. The LightGBM rankers beat the naive baseline in three out of four configurations, and the ranker with weather features outperformed the LightGBM baseline by 1.1 to 2.2 percent across all configurations. The findings can potentially help online clothing retailers create more relevant product recommendations. / Internethandel erbjuder flera fördelar, men stora produktsortiment gör det svårt för konsumenter att känna till existensen av och egenskaperna hos alla produkter som saluförs. För att förenkla beslutsprocessen så använder internethandlare rankningsmodeller för att rekommendera relevanta produkter till varje enskild användare. Kontextuell användardata såsom tid på dygnet, användarens plats eller lokalt väder kan vara värdefulla variabler för rankningsmodeller då det möjliggör personaliserade realtidsrekommendationer. Det finns inte mycket publicerad forskning inom nyttan av väderbaserade variabler för produktrekommendationssystem inom internethandel av kläder, vilket gör ytterligare studier inom detta område intressant. Med hjälp av svensk försäljnings- och kunddata från Babyshop, en internethandel för barnkläder så undersökte denna studie möjliga korrelationer mellan lokal väderdata och försäljning. Detta gjordes genom att jämföra skillnaderna i dagligt väder och skillnaderna i dagliga andelar av sålda artiklar per klädeskategori för två städer: Stockholm och Göteborg. Med Malmö som ytterligare en stad så gjordes historiska metereologiska observationer från en plats var i Stockholm, Göteborg och Malmö till variabler och användes tillsammans med kundernas postorter, försäljningsvariabler och variabler för försäljningstrender för att träna och utvärdera rankningsrelevansen hos en gradient-boosted decision trees learning to rank LightGBM rankningsmodell med vädervariabler. Rankningsrelevansen jämfördes mot en LightGBM baslinjesmodel som saknade vädervariabler samt en naiv baslinje: en popularitetsbaserad rankningsmodell. Flera möjliga korrelationer mellan en klädeskategori som shorts, regnkläder, skaljackor, vinterkläder och och en daglig vädervariabel som känns-som-temperatur, solenergi, vindhastighet, nederbörd, snö och snödjup upptäcktes. Utvärderingen av rankingsrelevansen utfördes med mean reciprocal rank och mean average precision @ 10 på ett mindre dataset som bestod endast av kunddata från postorterna Stockholm, Göteborg och Malmö och även på ett större dataset där kunder med postorter från större geografiska områden fick sina hemorter approximerade som Stockholm, Göteborg eller Malmö. LigthGBM-rankningsmodellerna slog den naiva baslinjen i tre av fyra konfigurationer och rankningsmodellen med vädervariabler slog LightGBM baslinjen med 1.1 till 2.2 procent i alla konfigurationer. Resultaten kan potentiellt hjälpa internethandlare inom mode att skapa bättre produktrekommendationssystem. Statistical analysis regression analysis recommender systems ensemble learning electronic commerce LightGBM learning to rank feature selection weather-based features fashion Statistisk analys regressionsanalys rekommendationssystem ensemble-inlärning näthandel LightGBM learning to rank variabelselektion väderbaserade variabler mode Computer and Information Sciences Data- och informationsvetenskap
25	Learning information retrieval functions and parameters on unlabeled collections / Apprentissage des fonctions de la recherche d'information et leurs paramètres sur des collections non-étiquetées Goswami, Parantapa 06 October 2014 (has links) Dans cette thèse, nous nous intéressons (a) à l'estimation des paramètres de modèles standards de Recherche d'Information (RI), et (b) à l'apprentissage de nouvelles fonctions de RI. Nous explorons d'abord plusieurs méthodes permettant, a priori, d'estimer le paramètre de collection des modèles d'information (chapitre. Jusqu'à présent, ce paramètre était fixé au nombre moyen de documents dans lesquels un mot donné apparaissait. Nous présentons ici plusieurs méthodes d'estimation de ce paramètre et montrons qu'il est possible d'améliorer les performances du système de recherche d'information lorsque ce paramètre est estimé de façon adéquate. Pour cela, nous proposons une approche basée sur l'apprentissage de transfert qui peut prédire les valeurs de paramètre de n'importe quel modèle de RI. Cette approche utilise des jugements de pertinence d'une collection de source existante pour apprendre une fonction de régression permettant de prédire les paramètres optimaux d'un modèle de RI sur une nouvelle collection cible non-étiquetée. Avec ces paramètres prédits, les modèles de RI sont non-seulement plus performants que les même modèles avec leurs paramètres par défaut mais aussi avec ceux optimisés en utilisant les jugements de pertinence de la collection cible. Nous étudions ensuite une technique de transfert permettant d'induire des pseudo-jugements de pertinence des couples de documents par rapport à une requête donnée d'une collection cible. Ces jugements de pertinence sont obtenus grâce à une grille d'information récapitulant les caractéristiques principale d'une collection. Ces pseudo-jugements de pertinence sont ensuite utilisés pour apprendre une fonction d'ordonnancement en utilisant n'importe quel algorithme d'ordonnancement existant. Dans les nombreuses expériences que nous avons menées, cette technique permet de construire une fonction d'ordonnancement plus performante que d'autres proposées dans l'état de l'art. Dans le dernier chapitre de cette thèse, nous proposons une technique exhaustive pour rechercher des fonctions de RI dans l'espace des fonctions existantes en utilisant un grammaire permettant de restreindre l'espace de recherche et en respectant les contraintes de la RI. Certaines fonctions obtenues sont plus performantes que les modèles de RI standards. / The present study focuses on (a) predicting parameters of already existing standard IR models and (b) learning new IR functions. We first explore various statistical methods to estimate the collection parameter of family of information based models (Chapter 2). This parameter determines the behavior of a term in the collection. In earlier studies, it was set to the average number of documents where the term appears, without full justification. We introduce here a fully formalized estimation method which leads to improved versions of these models over the original ones. But the method developed is applicable only to estimate the collection parameter under the information model framework. To alleviate this we propose a transfer learning approach which can predict values for any parameter for any IR model (Chapter 3). This approach uses relevance judgments on a past collection to learn a regression function which can infer parameter values for each single query on a new unlabeled target collection. The proposed method not only outperforms the standard IR models with their default parameter values, but also yields either better or at par performance with popular parameter tuning methods which use relevance judgments on target collection. We then investigate the application of transfer learning based techniques to directly transfer relevance information from a source collection to derive a "pseudo-relevance" judgment on an unlabeled target collection (Chapter 4). From this derived pseudo-relevance a ranking function is learned using any standard learning algorithm which can rank documents in the target collection. In various experiments the learned function outperformed standard IR models as well as other state-of-the-art transfer learning based algorithms. Though a ranking function learned through a learning algorithm is effective still it has a predefined form based on the learning algorithm used. We thus introduce an exhaustive discovery approach to search ranking functions from a space of simple functions (Chapter 5). Through experimentation we found that some of the discovered functions are highly competitive with respect to standard IR models. Recherche d'information Apprentissage automatique Apprentissage de transfert Information retrieval Machine learning Learning to rank Transfer learning 004
26	Bipartite RankBoost+: An Improvement to Bipartite RankBoost Zhang, Ganqin 22 January 2021 (has links) No description available. Computer Science Information Science AdaBoost RankBoost Bipartite RankBoost learning to rank pairwise ranking Recommendation System Machine Learning
27	Structured Stochastic Bandits Magureanu, Stefan January 2016 (has links) In this thesis we address the multi-armed bandit (MAB) problem with stochastic rewards and correlated arms. Particularly, we investigate the case when the expected rewards are a Lipschitz function of the arm, and the learning to rank problem, as viewed from a MAB perspective. For the former, we derive a problem specific lower bound and propose both an asymptotically optimal algorithm (OSLB) and a (pareto)optimal, algorithm (POSLB). For the latter, we construct the regret lower bound and determine its closed form for some particular settings, as well as propose two asymptotically optimal algorithms PIE and PIE-C. For all algorithms mentioned above, we present performance analysis in the form of theoretical regret guarantees as well as numerical evaluation on artificial datasets as well as real-world datasets, in the case of PIE and PIE-C. / <p>QC 20160223</p> Multi-armed bandits Learning to rank reinforcement learning Lipschitz Bandits Annan elektroteknik och elektronik
28	Generating an Interpretable Ranking Model: Exploring the Power of Local Model-Agnostic Interpretability for Ranking Analysis Galera Alfaro, Laura January 2023 (has links) Machine learning has revolutionized recommendation systems by employing ranking models for personalized item suggestions. However, the complexity of learning-to-rank (LTR) models poses challenges in understanding the underlying reasons contributing to the ranking outcomes. This lack of transparency raises concerns about potential errors, biases, and ethical implications. To address these issues, interpretable LTR models have emerged as a solution. Currently, the state-of-the-art for interpretable LTR models is led by generalized additive models (GAMs). However, ranking GAMs face limitations in terms of computational intensity and handling high-dimensional data. To overcome these drawbacks, post-hoc methods, including local interpretable modelagnostic explanations (LIME), have been proposed as potential alternatives. Nevertheless, a quantitative evaluation comparing post-hoc methods efficacy to state-of-the-art ranking GAMs remains largely unexplored. This study aims to investigate the capabilities and limitations of LIME in an attempt to approximate a complex ranking model using a surrogate model. The proposed methodology for this study is an experimental approach. The neural ranking GAM, trained on two benchmark information retrieval datasets, serves as the ground truth for evaluating LIME’s performance. The study adapts LIME in the context of ranking by translating the problem into a classification task and asses three different sampling strategies against the prevalence of imbalanced data and their influence on the correctness of LIME’s explanations. The findings of this study contribute to understanding the limitations of LIME in the context of ranking. It analyzes the low similarity between the explanations of LIME and those generated by the ranking model, highlighting the need to develop more robust sampling strategies specific to ranking. Additionally, the study emphasizes the importance of developing appropriate evaluation metrics for assessing the quality of explanations in ranking tasks. Explainable Artificial Intelligence Learning To Rank Local ModelAgnostic Interpretability Ranking Generalized Additive Models Computer Sciences Datavetenskap (datalogi)
29	非對稱性加權之排名學習機制 / Leaning to rank with asymmetric discordant penalty 王榮聖, Wang, Rung Sheng Unknown Date (has links) 資訊發達的時代，資訊取得的方式與管道比起以前更方便而多元，但龐大資料量同時也造成了我們往往很難找到真正需要資料的問題，也因此資料的排名(ranking)問題就變得十分重要。本研究目的在於運用排名學習找出良好的排名，利用人對於某特定議題所給予的排名順序找出排名規則，並應用於資料探勘上，讓電腦可自動對資料做評分，產生正確的排序，將有助於資料的搜尋。　　本研究分為兩部分，第一部份為排名演算法的設計，我們改良現有的排名方法(RankBoost)，設計出另一個新的演算法(RealRankBoost)，並且用LETOR benchmark實測，作為與其他方法的比較和效果提升的證明；第二部份為非對稱加權概念的提出，我們考量排名位置所造成的資料被檢視機率不同，而給予不同的權重，使排名結果能更貼近人類的角度。 / With the innovation in computer technology, we have easier ways to access information. But the huge amount of data also makes it hard for us to find what we really want. This is why ranking is important to us. The central issues of many applications are ranking, such as document retrieval, expert finding, and anti spam. The objective of this thesis is to discover a good ranking function according to specific ranking order of the human perceptions. We employ the learning-to-rank approach to automatically score and generate ranking order that helps data searching. This thesis is divided into two parts. Firstly, we design a new learning-to-rank algorithm named RealRankBoost based on an existing method (RankBoost). We investigate the efficacy of the proposed method by performing comparative analysis using the LETOR benchmark. Secondly, we propose to assign asymmetric weightings for ranking in the sense that incorrect placement of top-ranked items should yield higher penalty. Incorporation of the asymmetric weighting technique will further make our system to mimic human ranking strategy. 排名排名學習資料探勘非對稱加權 Ranking Learning to rank Information retrival Asymmetric weight RealRankBoost
30	Cumulative Distribution Networks: Inference, Estimation and Applications of Graphical Models for Cumulative Distribution Functions Huang, Jim C. 01 March 2010 (has links) This thesis presents a class of graphical models for directly representing the joint cumulative distribution function (CDF) of many random variables, called cumulative distribution networks (CDNs). Unlike graphical models for probability density and mass functions, in a CDN, the marginal probabilities for any subset of variables are obtained by computing limits of functions in the model. We will show that the conditional independence properties in a CDN are distinct from the conditional independence properties of directed, undirected and factor graph models, but include the conditional independence properties of bidirected graphical models. As a result, CDNs are a parameterization for bidirected models that allows us to represent complex statistical dependence relationships between observable variables. We will provide a method for constructing a factor graph model with additional latent variables for which graph separation of variables in the corresponding CDN implies conditional independence of the separated variables in both the CDN and in the factor graph with the latent variables marginalized out. This will then allow us to construct multivariate extreme value distributions for which both a CDN and a corresponding factor graph representation exist. In order to perform inference in such graphs, we describe the `derivative-sum-product' (DSP) message-passing algorithm where messages correspond to derivatives of the joint cumulative distribution function. We will then apply CDNs to the problem of learning to rank, or estimating parametric models for ranking, where CDNs provide a natural means with which to model multivariate probabilities over ordinal variables such as pairwise preferences. We will show that many previous probability models for rank data, such as the Bradley-Terry and Plackett-Luce models, can be viewed as particular types of CDN. Applications of CDNs will be described for the problems of ranking players in multiplayer team-based games, document retrieval and discovering regulatory sequences in computational biology using the above methods for inference and estimation of CDNs. Graphical models Cumulative distribution function Inference Message-passing Learning to rank Information retrieval Computational biology Bioinformatics Genomics Extreme value distribution microRNA Gene regulation Copula

Search results