1 |
Performance evaluation of latent factor models for rating predictionZheng, Lan 24 April 2015 (has links)
Since the Netflix Prize competition, latent factor models (LFMs) have become the comparison ``staples'' for many of the recent recommender methods. Meanwhile, it is still unclear to understand the impact of data preprocessing and updating algorithms on LFMs. The performance improvement of LFMs over baseline approaches, however, hovers at only low percentage numbers. Therefore, it is time for a better understanding of their real power beyond the overall root mean square error (RMSE), which as it happens, lies at a very compressed range, without providing too much chance for deeper insight.
We introduce an experiment based handbook of LFMs and reveal data preprocessing and updating algorithms' power. We perform a detailed experimental study regarding the performance of classical staple LFMs on a classical dataset, Movielens 1M, that sheds light on a much more pronounced excellence of LFMs for particular categories of users and items, for RMSE and other measures. In particular, LFMs exhibit surprising and excellent advantages when handling several difficult user and item categories. By comparing the distributions of test ratings and predicted ratings, we show that the performance of LFMs is influenced by rating distribution. We then propose a method to estimate the performance of LFMs for a given rating dataset. Also, we provide a very simple, open-source library that implements staple LFMs achieving a similar performance as some very recent (2013) developments in LFMs, and at the same time being more transparent than some other libraries in wide use. / Graduate
|
2 |
On Bayesian Analyses of Functional Regression, Correlated Functional Data and Non-homogeneous Computer ModelsMontagna, Silvia January 2013 (has links)
<p>Current frontiers in complex stochastic modeling of high-dimensional processes include major emphases on so-called functional data: problems in which the data are snapshots of curves and surfaces representing fundamentally important scientific quantities. This thesis explores new Bayesian methodologies for functional data analysis. </p><p>The first part of the thesis places emphasis on the role of factor models in functional data analysis. Data reduction becomes mandatory when dealing with such high-dimensional data, more so when data are available on a large number of individuals. In Chapter 2 we present a novel Bayesian framework which employs a latent factor construction to represent each variable by a low dimensional summary. Further, we explore the important issue of modeling and analyzing the relationship of functional data with other covariate and outcome variables simultaneously measured on the same subjects.</p><p>The second part of the thesis is concerned with the analysis of circadian data. The focus is on the identification of circadian genes that is, genes whose expression levels appear to be rhythmic through time with a period of approximately 24 hours. While addressing this goal, most of the current literature does not account for the potential dependence across genes. In Chapter 4, we propose a Bayesian approach which employs latent factors to accommodate dependence and verify patterns and relationships between genes, while representing the true gene expression trajectories in the Fourier domain allows for inference on period, phase, and amplitude of the signal.</p><p>The third part of the thesis is concerned with the statistical analysis of computer models (simulators). The heavy computational demand of these input-output maps calls for statistical techniques that quickly estimate the surface output at untried inputs given a few preliminary runs of the simulator at a set design points. In this regard, we propose a Bayesian methodology based on a non-stationary Gaussian process. Relying on a model-based assessment of uncertainty, we envision a sequential design technique which helps choosing input points where the simulator should be run to minimize the uncertainty in posterior surface estimation in an optimal way. The proposed non-stationary approach adapts well to output surfaces of unconstrained shape.</p> / Dissertation
|
3 |
Essays on Child DevelopmentJanuary 2018 (has links)
abstract: This dissertation comprises three chapters.
In chapter one, using a rich dataset for the United States, I estimate a series of models to document the birth order effects on cognitive outcomes, non-cognitive outcomes, and parental investments. I estimate a model that allows for heterogeneous birth order effects by unobservables to examine how birth order effects varies across households. I find that first-born children score 0.2 of a standard deviation higher on cognitive and non-cognitive outcomes than their later-born siblings. They also receive 10\% more in parental time, which accounts for more than half of the differences in outcomes. I document that birth order effects vary between 0.1 and 0.4 of a standard deviation across households with the effects being smaller in households with certain characteristics such as a high income.
In chapter two, I build a model of intra-household resource allocation that endogenously generates the decreasing birth order effects in household income with the aim of using the model for counterfactual policy experiments. The model has a life-cycle framework in which a household with two children confronts a sequence of time constraints and a lifetime monetary constraint, and divides the available time and monetary resources between consumption and investment. The counterfactual experiment shows that an annual income transfer of 10,000 USD to low-income households decreases the birth order effects on cognitive and non-cognitive skills by one-sixth, which is five times bigger than the effect in high-income household.
In chapter three, with Francesco Agostinelli and Matthew Wiswall, we examine the relative importance of investments at home and at school during an important transition for many children, entering formal schooling at kindergarten. Moreover, our framework allows for complementarities between children's skills and investments from schools. We find that investments from schools are an important determinant of children's skills at the end of kindergarten, whereas parental investments, although strongly correlated with end-of-kindergarten outcomes, have smaller effects. In addition, we document a negative complementarity between children's skills at kindergarten entry and investments from schools, implying that low-skill children benefit the most from an increase in the quality of schools. / Dissertation/Thesis / Doctoral Dissertation Economics 2018
|
4 |
Modèles à facteurs latents pour les études d'association écologique en génétique des populations / Latent factor models for ecological association studies in population geneticsFrichot, Eric 26 September 2014 (has links)
Nous introduisons un ensemble de modèles à facteurs latents dédié à la génomique du paysage et aux tests d'associations écologiques. Cela comprend des méthodes statistiques pour corriger des effets d'autocorrélation spatiale sur les cartes de composantes principales en génétique des populations (spFA), des méthodes pour estimer rapidement et efficacement les coefficients de métissage individuel à partir de matrices de génotypes de grande taille et évaluer le nombre de populations ancestrales (sNMF) et des méthodes pour identifier les polymorphismes génétiques qui montrent de fortes corrélations avec des gradients environnementaux ou avec des variables utilisées comme des indicateurs pour des pressions écologiques (LFMM). Nous avons aussi développé un ensemble de logiciels libres associés à ces méthodes, basés sur des programmes optimisés en C qui peuvent passer à l'échelle avec la dimension de très grand jeu de données, afin d'effectuer des analyses de structures de population et des cribles génomiques pour l'adaptation locale. / We introduce a set of latent factor models dedicated to landscape genomics and ecological association tests. It includes statistical methods for correcting principal component maps for effects of spatial autocorrelation (spFA); methods for estimating ancestry coefficients from large genotypic matrices and evaluating the number of ancestral populations (sNMF); and methods for identifying genetic polymorphisms that exhibit high correlation with some environmental gradient or with the variables used as proxies for ecological pressures (LFMM). We also developed a set of open source softwares associated with the methods, based on optimized C programs that can scale with the dimension of very large data sets, to run analyses of population structure and genome scans for local adaptation.
|
Page generated in 0.4621 seconds