Global ETD Search

1	Effect fusion using model-based clustering Malsiner-Walli, Gertraud, Pauger, Daniela, Wagner, Helga 01 April 2018 (has links) (PDF) In social and economic studies many of the collected variables are measured on a nominal scale, often with a large number of categories. The definition of categories can be ambiguous and different classification schemes using either a finer or a coarser grid are possible. Categorization has an impact when such a variable is included as covariate in a regression model: a too fine grid will result in imprecise estimates of the corresponding effects, whereas with a too coarse grid important effects will be missed, resulting in biased effect estimates and poor predictive performance. To achieve an automatic grouping of the levels of a categorical covariate with essentially the same effect, we adopt a Bayesian approach and specify the prior on the level effects as a location mixture of spiky Normal components. Model-based clustering of the effects during MCMC sampling allows to simultaneously detect categories which have essentially the same effect size and identify variables with no effect at all. Fusion of level effects is induced by a prior on the mixture weights which encourages empty components. The properties of this approach are investigated in simulation studies. Finally, the method is applied to analyse effects of high-dimensional categorical predictors on income in Austria.
2	What Men Want, What They Get and How to Find Out Wolf, Alexander 12 July 2017 (has links) (PDF) This thesis is concerned with a fundamental unit of the economy: Households. Even in advanced economies, upwards of 70% of the population live in households composed of multiple people. A large number of decisions are taken at the level of the household, that is to say, they are taken jointly by household members: How to raise children, how much and when to work, how many cartons of milk to purchase. How these decisions are made is therefore of great importance for the people who live in them and for their well-being.But precisely because household members make decisions jointly it is hard to know how they come about and to what extent they benefit individual members. This is why households are often viewed as unique decision makers in economics. Even if they contain multiple people, they are treated as though they were a single person with a single set of preferences. This unitary approach is often sufficient and can be a helpful simplification. But in many situations it does not deliver an adequate description of household behavior. For instance, the unitary model does not permit the study of individual wellbeing and inequality inside the household. In addition, implications of the unitary model have been rejected repeatedly in the demand literature.Bargaining models offer an alternative where household members have individual preferences and come to joint decisions in various ways. There are by now a great number of such models, all of which allow for the study of bargaining power, a measure of the influence a member has in decision making. This concept is important because it has implications for the welfare of individuals. If one household member’s bargaining power increases, the household’s choices will be more closely aligned with that member’s preferences, ceteris paribus.The three chapters below can be divided into two parts. The first part consists of Chapter 1, which looks to detect the influence of intra-household bargaining in a specific set of consumption choices: Consumption of the arts. The research in this chapter is designed to measure aspects of the effect of bargaining power in this domain, but does not seek to quantify bargaining power itself or to infer economic well-being of household members.Precisely this last point, however, is the focus of the second part of the thesis, consisting of Chapters 2 and 3. These focus specifically on the recovery of one measure of bargaining power, the resource share. Resource shares have the advantage of being interpretable in terms of economic well-being, which is not true of all such measures. They are estimated as part of structural models of household demand. These models are versions of the collective model of household decision making.Pioneered by Chiappori (1988) and Apps and Rees (1988), the collective model has become the go-to alternative to unitary approaches, where the household is seen as a single decision-making unit with a single well-behaved utility function. Instead, the collective model allows for individual utility functions for each member of the household. The model owes much of its success to the simplicity of its most fundamental assumption: That whatever the structure of the intra-household bargaining process, outcomes are Pareto-efficient. This means that no member can be made better off, without making another worse off. Though the model nests unitary models as special cases, it does have testable implications.The first chapter of the thesis is entitled “Household Decisions on Arts Consumption” and is joint work with Caterina Mauri, who has also collaborated with me on many other projects in her capacity as my girlfriend. In it, we explore the role of intra-household bargaining in arts consumption. We do this by estimating demand for various arts and cultural events such as the opera or dance performances using a large number of explanatory variables. One of these variables plays a special role. This variable is a distribution factor, meaning that it can be reasonably assumed to affect consumption only through the bargaining process, and not by modifying preferences. Such variables play an important role in the household bargaining literature. Here, three such variables are used. Among them is the share of household income that is contributed by the husband, the canonical distribution factor.The chapter fits into a literature on drivers of arts consumption, which has shown that in addition to such factors as age, income and education, spousal preferences and characteristics are important in determining how much and which cultural goods are consumed. Gender differences in preferences in arts consumption have also been shown to be important and to persist after accounting for class, education and other socio-economic factors (Bihagen and Katz-Gerro, 2000).We explore to what extent this difference in preferences can be used to shed light on the decision process in couples’ households. Using three different distribution factors, we infer whether changes in the relative bargaining power of spouses induce changes in arts consumption.Using a large sample from the US Current Population Survey which includes data on the frequency of visits to various categories of cultural activities, we regress atten- dance rates on a range of socio-economic variables using a suitable count data model.We find that attendance by men at events such as the opera, ballet and other dance performances, which are more frequently attended by women than by men, show a significant influence of the distribution factors. This significant effect persists irrespec- tively of which distribution factor is used. We conclude that more influential men tend to participate in these activities less frequently than less influential men, conditionally on a host of controls notably including hours worked.The second chapter centers around the recovery of resource shares. This chapter is joint work with Denni Tommasi, a fellow PhD student at ECARES. It relies on the collective model of the household, which assumes simply that household decisions are Pareto-efficient. From this assumption, a relatively simple household problem can be formulated. Households can be seen as maximizers of weighted sums of their members’ utility functions. Importantly the weights, known as bargaining weights (or bargaining power), may depend on many factors, including prices. The household problem in turn implies structure for household demand, which is observed in survey data.Collective demand systems do not necessarily identify measures of bargaining power however. In fact, the ability to recover such a measure, and especially one that is useful for welfare analysis, was an important milestone in the literature. It was reached by (Browning et al. 2013) (henceforth BCL), with a collective model capable of identi- fying resource shares (also known as a sharing rule). These shares provide a measure of how resources are allocated in the household and so can be used to study intra- household consumption inequality. They also take into account that households gen- erate economies of scale for their members, a phenomenon known as a consumption technology: By sharing goods such as housing, members of households can generate savings that can be used elsewhere.Estimation of these resource shares involves expressing household budget shares functions of preferences, a consumption technology and a sharing rule, each of which is a function of observables, and letting the resulting system loose on the data. But obtaining such a demand system is not free. In addition to the usual empirical speci- fications of the various parts of the system, an identifying assumption has to be made to assure that resource shares can be recovered in estimation. In BCL, this is the assumption that singles and adult members of households share the same preferences. In Chapter 2, however, an alternative assumption is used.In a recent paper, Dunbar et al. (2013) (hereafter DLP) develop a collective model based on BCL that allows to identify resource shares using assumptions on the simi- larity of preferences within and between households. The model uses demand only for assignable goods, a favorite of household economists. These are goods such as mens’ clothing and womens’ clothing for which it is known who in a household consumes them. In this chapter, we show why, especially when the data exhibit relatively flat Engel curves, the model is weakly identified and induces high variability and an im- plausible pattern in least squares estimates.We propose an estimation strategy nested in their framework that greatly reduces this practical impediment to recovery of individual resource shares. To achieve this, we follow an empirical Bayes method that incorporates additional (or out-of-sample) information on singles and relies on mild assumptions on preferences. We show the practical usefulness of this strategy through a series of Monte Carlo simulations and by applying it to Mexican data.The results show that our approach is robust, gives a plausible picture of the house- hold decision process, and is particularly beneficial for the practitioner who wishes to apply the DLP framework. Our welfare analysis of the PROGRESA program in Mexico is the first to include separate poverty rates for men and women in a CCT program.The third Chapter addresses a problem similar to the one discussed in Chapter 2. The goal, again, is to estimate resource shares and to remedy issues of imprecision and instability in the demand systems that can deliver them. Here, the collective model used is based on Lewbel and Pendakur (2008), and uses data on the entire basket of goods that households consume. The identifying assumption is similar to that used by BCL, although I allow for some differences in preferences between singles and married individuals.I set out to improve the precision and stability of the resulting estimates, and so to make the model more useful for welfare analysis. In order to do so, this chapter approaches, for the first time, the estimation of a collective household demand system from a Bayesian perspective. Using prior information on equivalence scales, as well as restrictions implied by theory, tight credible intervals are found for resource shares, a measure of the distribution of economic well-being in a household. A modern MCMC sampling method provides a complete picture of the high-dimensional parameter vec- tor’s posterior distribution and allows for reliable inference.The share of household earnings generated by a household member is estimated to have a positive effect on her share of household resources in a sample of couples from the US Consumer Expenditure survey. An increase in the earnings share of one percentage point is estimated to result in a shift of between 0.05% and 0.14% of household resources in the same direction, meaning that spouses partially insure one another against such shifts. The estimates imply an expected shift of 0.71% of household resources from the average man to the average woman in the same sample between 2008 and 2012, when men lost jobs at a greater rate than women.Both Chapters 2 and 3 explore unconventional ways to achieve gains in estimator precision and reliability at relatively little cost. This represents a valuable contribution to a literature that, for all its merits in complexity and ingenious modeling, has not yet seriously endeavored to make itself empirically useful. / Doctorat en Sciences économiques et de gestion / info:eu-repo/semantics/nonPublished Economie de la famille
3	Learning 3-D Models of Object Structure from Images Schlecht, Joseph January 2010 (has links) Recognizing objects in images is an effortless task for most people.Automating this task with computers, however, presents a difficult challengeattributable to large variations in object appearance, shape, and pose. The problemis further compounded by ambiguity from projecting 3-D objects into a 2-D image.In this thesis we present an approach to resolve these issues by modeling objectstructure with a collection of connected 3-D geometric primitives and a separatemodel for the camera. From sets of images we simultaneously learn a generative,statistical model for the object representation and parameters of the imagingsystem. By learning 3-D structure models we are going beyond recognitiontowards quantifying object shape and understanding its variation.We explore our approach in the context of microscopic images of biologicalstructure and single view images of man-made objects composed of block-likeparts, such as furniture. We express detected features from both domains asstatistically generated by an image likelihood conditioned on models for theobject structure and imaging system. Our representation of biological structurefocuses on Alternaria, a genus of fungus comprising ellipsoid and cylindershaped substructures. In the case of man-made furniture objects, we representstructure with spatially contiguous assemblages of blocks arbitrarilyconstructed according to a small set of design constraints.We learn the models with Bayesian statistical inference over structure andcamera parameters per image, and for man-made objects, across categories, suchas chairs. We develop a reversible-jump MCMC sampling algorithm to exploretopology hypotheses, and a hybrid of Metropolis-Hastings and stochastic dynamicsto search within topologies. Our results demonstrate that we can infer both 3-Dobject and camera parameters simultaneously from images, and that doing soimproves understanding of structure in images. We further show how 3-D structuremodels can be inferred from single view images, and that learned categoryparameters capture structure variation that is useful for recognition. 3-D Object recognition Computer vision Machine learning MCMC sampling Statistical inference
4	Efficient Computational and Statistical Models of Hepatic Metabolism Kuceyeski, Amy Frances 02 April 2009 (has links) No description available. Biochemistry Biomedical Research Mathematics Bayesian inference MCMC sampling Prior comparison Correlation Study Gluconeogenesis
5	Theoretical contributions to Monte Carlo methods, and applications to Statistics / Contributions théoriques aux méthodes de Monte Carlo, et applications à la Statistique Riou-Durand, Lionel 05 July 2019 (has links) La première partie de cette thèse concerne l'inférence de modèles statistiques non normalisés. Nous étudions deux méthodes d'inférence basées sur de l'échantillonnage aléatoire : Monte-Carlo MLE (Geyer, 1994), et Noise Contrastive Estimation (Gutmann et Hyvarinen, 2010). Cette dernière méthode fut soutenue par une justification numérique d'une meilleure stabilité, mais aucun résultat théorique n'avait encore été prouvé. Nous prouvons que Noise Contrastive Estimation est plus robuste au choix de la distribution d'échantillonnage. Nous évaluons le gain de précision en fonction du budget computationnel. La deuxième partie de cette thèse concerne l'échantillonnage aléatoire approché pour les distributions de grande dimension. La performance de la plupart des méthodes d’échantillonnage se détériore rapidement lorsque la dimension augmente, mais plusieurs méthodes ont prouvé leur efficacité (e.g. Hamiltonian Monte Carlo, Langevin Monte Carlo). Dans la continuité de certains travaux récents (Eberle et al., 2017 ; Cheng et al., 2018), nous étudions certaines discrétisations d’un processus connu sous le nom de kinetic Langevin diffusion. Nous établissons des vitesses de convergence explicites vers la distribution d'échantillonnage, qui ont une dépendance polynomiale en la dimension. Notre travail améliore et étend les résultats de Cheng et al. pour les densités log-concaves. / The first part of this thesis concerns the inference of un-normalized statistical models. We study two methods of inference based on sampling, known as Monte-Carlo MLE (Geyer, 1994), and Noise Contrastive Estimation (Gutmann and Hyvarinen, 2010). The latter method was supported by numerical evidence of improved stability, but no theoretical results had yet been proven. We prove that Noise Contrastive Estimation is more robust to the choice of the sampling distribution. We assess the gain of accuracy depending on the computational budget. The second part of this thesis concerns approximate sampling for high dimensional distributions. The performance of most samplers deteriorates fast when the dimension increases, but several methods have proven their effectiveness (e.g. Hamiltonian Monte Carlo, Langevin Monte Carlo). In the continuity of some recent works (Eberle et al., 2017; Cheng et al., 2018), we study some discretizations of the kinetic Langevin diffusion process and establish explicit rates of convergence towards the sampling distribution, that scales polynomially fast when the dimension increases. Our work improves and extends the results established by Cheng et al. for log-concave densities. Échantillonnage MCMC M-Estimateurs Ergodicité géométrique Temps de mélange Couplages Distance de Wassertein MCMC sampling M-Estimators Geometric ergodicity Mixing time Couplings Wasserstein distance 510
6	Mélanges bayésiens de modèles d'extrêmes multivariés : application à la prédétermination régionale des crues avec données incomplètes / Bayesian model mergings for multivariate extremes : application to regional predetermination of floods with incomplete data Sabourin, Anne 24 September 2013 (has links) La théorie statistique univariée des valeurs extrêmes se généralise au cas multivarié mais l'absence d'un cadre paramétrique naturel complique l'inférence de la loi jointe des extrêmes. Les marges d'erreur associée aux estimateurs non paramétriques de la structure de dépendance sont difficilement accessibles à partir de la dimension trois. Cependant, quantifier l'incertitude est d'autant plus important pour les applications que le problème de la rareté des données extrêmes est récurrent, en particulier en hydrologie. L'objet de cette thèse est de développer des modèles de dépendance entre extrêmes, dans un cadre bayésien permettant de représenter l'incertitude. Le chapitre 2 explore les propriétés des modèles obtenus en combinant des modèles paramétriques existants, par mélange bayésien (Bayesian Model Averaging BMA). Un modèle semi-paramétrique de mélange de Dirichlet est étudié au chapitre suivant : une nouvelle paramétrisation est introduite afin de s'affranchir d'une contrainte de moments caractéristique de la structure de dépendance et de faciliter l'échantillonnage de la loi à posteriori. Le chapitre 4 est motivé par une application hydrologique : il s'agit d'estimer la structure de dépendance spatiale des crues extrêmes dans la région cévenole des Gardons en utilisant des données historiques enregistrées en quatre points. Les données anciennes augmentent la taille de l'échantillon mais beaucoup de ces données sont censurées. Une méthode d'augmentation de données est introduite, dans le cadre du mélange de Dirichlet, palliant l'absence d'expression explicite de la vraisemblance censurée. Les conclusions et perspectives sont discutées au chapitre 5 / Uni-variate extreme value theory extends to the multivariate case but the absence of a natural parametric framework for the joint distribution of extremes complexifies inferential matters. Available non parametric estimators of the dependence structure do not come with tractable uncertainty intervals for problems of dimension greater than three. However, uncertainty estimation is all the more important for applied purposes that data scarcity is a recurrent issue, particularly in the field of hydrology. The purpose of this thesis is to develop modeling tools for the dependence structure between extremes, in a Bayesian framework that allows uncertainty assessment. Chapter 2 explores the properties of the model obtained by combining existing ones, in a Bayesian Model Averaging framework. A semi-parametric Dirichlet mixture model is studied next : a new parametrization is introduced, in order to relax a moments constraint which characterizes the dependence structure. The re-parametrization significantly improves convergence and mixing properties of the reversible-jump algorithm used to sample the posterior. The last chapter is motivated by an hydrological application, which consists in estimating the dependence structure of floods recorded at four neighboring stations, in the ‘Gardons’ region, southern France, using historical data. The latter increase the sample size but most of them are censored. The lack of explicit expression for the likelihood in the Dirichlet mixture model is handled by using a data augmentation framework Extrêmes multivariés Dépassement de seuil Bayesian model averaging Modèles de mélanges Méthodes de Monte-Carlo Augmentation de données Prédétermination des crues Multivariate extremes Threshold excesses Bayesian model averaging Mixture models MCMC sampling Data augmentation Predetermination of floods 519.5
7	Estimation de distribution de tailles de particules par techniques d'inférence bayésienne / Particle size distribution esimation using Bayesian inference techniques Boualem, Abdelbassit 06 December 2016 (has links) Ce travail de recherche traite le problème inverse d’estimation de la distribution de tailles de particules (DTP) à partir des données de la diffusion dynamique de lumière (DLS). Les méthodes actuelles d’estimation souffrent de la mauvaise répétabilité des résultats d’estimation et de la faible capacité à séparer les composantes d’un échantillon multimodal de particules. L’objectif de cette thèse est de développer de nouvelles méthodes plus performantes basées sur les techniques d’inférence bayésienne et cela en exploitant la diversité angulaire des données de la DLS. Nous avons proposé tout d’abord une méthode non paramétrique utilisant un modèle « free-form » mais qui nécessite une connaissance a priori du support de la DTP. Pour éviter ce problème, nous avons ensuite proposé une méthode paramétrique fondée sur la modélisation de la DTP en utilisant un modèle de mélange de distributions gaussiennes. Les deux méthodes bayésiennes proposées utilisent des algorithmes de simulation de Monte-Carlo par chaînes de Markov. Les résultats d’analyse de données simulées et réelles montrent la capacité des méthodes proposées à estimer des DTPs multimodales avec une haute résolution et une très bonne répétabilité. Nous avons aussi calculé les bornes de Cramér-Rao du modèle de mélange de distributions gaussiennes. Les résultats montrent qu’il existe des valeurs d’angles privilégiées garantissant des erreurs minimales sur l’estimation de la DTP. / This research work treats the inverse problem of particle size distribution (PSD) estimation from dynamic light scattering (DLS) data. The current DLS data analysis methods have bad estimation results repeatability and poor ability to separate the components (resolution) of a multimodal sample of particles. This thesis aims to develop new and more efficient estimation methods based on Bayesian inference techniques by taking advantage of the angular diversity of the DLS data. First, we proposed a non-parametric method based on a free-form model with the disadvantage of requiring a priori knowledge of the PSD support. To avoid this problem, we then proposed a parametric method based on modelling the PSD using a Gaussian mixture model. The two proposed Bayesian methods use Markov chain Monte Carlo simulation algorithms. The obtained results, on simulated and real DLS data, show the capability of the proposed methods to estimate multimodal PSDs with high resolution and better repeatability. We also computed the Cramér-Rao bounds of the Gaussian mixture model. The results show that there are preferred angle values ensuring minimum error on the PSD estimation. Diffusion dynamique de lumière Distribution de tailles des particules Estimation Problème inverse Inférence bayésienne Méthode d’échantillonnage MCMC Borne de Cramér-Rao Dynamic light scattering Particle size distribution Estimation Inverse problem Bayesian inference MCMC sampling methods Cramér-Rao bounds 621.382 2
8	Generative models : from data generation to representation learning Zhang, Ruixiang 08 1900 (has links) La modélisation générative est un domaine en pleine expansion dans l'apprentissage automatique, avec des modèles démontrant des capacités impressionnantes pour la synthèse de données en haute dimension à travers diverses modalités, y compris les images, le texte et l'audio. Cependant, des défis significatifs subsistent pour améliorer la qualité des échantillons et la contrôlabilité des modèles, ainsi que pour développer des méthodes plus principiées et efficaces pour apprendre des représentations de caractéristiques structurées avec des modèles génératifs. Cette thèse conduit une enquête complète en deux parties sur les frontières de la modélisation générative, en mettant l'accent sur l'amélioration de la qualité des échantillons et la manœuvrabilité, ainsi que sur l'apprentissage de représentations latentes de haute qualité. La première partie de la thèse propose de nouvelles techniques pour améliorer la qualité des échantillons et permettre un contrôle fin des modèles génératifs. Premièrement, une nouvelle perspective est introduite pour reformuler les réseaux antagonistes génératifs pré-entraînés comme des modèles basés sur l'énergie, permettant un échantillonnage plus efficace en exploitant à la fois le générateur et le discriminateur. Deuxièmement, un cadre théorique basé sur l'information est développé pour incorporer des biais inductifs explicites dans les modèles à variables latentes grâce aux réseaux bayésiens et à la théorie du goulot d'étranglement multivarié. Cela fournit une vision unifiée pour l'apprentissage de représentations structurées adaptées à différentes applications comme la modélisation multi-modale et l'équité algorithmique. La deuxième partie de la thèse se concentre sur l'apprentissage et l'extraction de caractéristiques de haute qualité des modèles génératifs de manière entièrement non supervisée. Premièrement, une approche basée sur l'énergie est présentée pour l'apprentissage non supervisé de représentations de scènes centrées sur l'objet avec une invariance de permutation. La compositionnalité de la fonction d'énergie permet également une manipulation contrôlable de la scène. Deuxièmement, des noyaux de Fisher neuronaux sont proposés pour extraire des représentations compactes et utiles des modèles génératifs pré-entraînés. Il est démontré que les approximations de rang faible du noyau de Fisher fournissent une technique d'extraction de représentation unifiée compétitive par rapport aux références courantes. Ensemble, ces contributions font progresser la modélisation générative et l'apprentissage de représentations sur des fronts complémentaires. Elles améliorent la qualité des échantillons et la manœuvrabilité grâce à de nouveaux objectifs d'entraînement et des techniques d'inférence. Elles permettent également d'extraire des caractéristiques latentes structurées des modèles génératifs en utilisant des perspectives théoriques basées sur l'information et le noyau neuronal. La thèse offre une enquête complète sur les défis interconnectés de la synthèse de données et de l'apprentissage de représentation pour les modèles génératifs modernes. / Generative modeling is a rapidly advancing field in machine learning, with models demonstrating impressive capabilities for high-dimensional data synthesis across modalities including images, text, and audio. However, significant challenges remain in enhancing sample quality and model controllability, as well as developing more principled and effective methods for learning structured feature representations with generative models. This dissertation conducts a comprehensive two-part investigation into pushing the frontiers of generative modeling, with a focus on improving sample quality and steerability, as well as enabling learning high-quality latent representations. The first part of the dissertation proposes novel techniques to boost sample quality and enable fine-grained control for generative models. First, a new perspective is introduced to reformulate pretrained generative adversarial networks as energy-based models, enabling more effective sampling leveraging both the generator and discriminator. Second, an information-theoretic framework is developed to incorporate explicit inductive biases into latent variable models through Bayesian networks and multivariate information bottleneck theory. This provides a unified view for learning structured representations catered to different applications like multi-modal modeling and algorithmic fairness. The second part of the dissertation focuses on learning and extracting high-quality features from generative models in a fully unsupervised manner. First, an energy-based approach is presented for unsupervised learning of object-centric scene representations with permutation invariance. Compositionality of the energy function also enables controllable scene manipulation. Second, neural fisher kernels are proposed to extract compact and useful representations from pretrained generative models. It is shown that low-rank approximations of the Fisher Kernel provide a unified representation extraction technique competitive with common baselines. Together, the contributions advance generative modeling and representation learning along complementary fronts. They improve sample quality and steerability through new training objectives and inference techniques. They also enable extracting structured latent features from generative models using information-theoretic and neural kernel perspectives. The thesis provides a comprehensive investigation into the interconnected challenges of data synthesis and representation learning for modern generative models. Generative models Representation Learning Modèles génératifs Apprentissage de représentation Modèles basés sur l’énergie Réseaux antagonistes génératifs Auto-encodeurs variationnels Apprentissage non supervisé Apprentissage centré sur l’objet Compréhension de scène Échantillonnage MCMC Réseaux bayésiens Inférence variationnelle Noyau de Fisher Energy-based models Generative adversarial networks Variational auto-encoders Unsupervised learning Object-centric learning Scene-understanding MCMC sampling Bayesian networks Variational inference Fisher kernel

Search results