Spelling suggestions: "subject:"misspecification"" "subject:"misspecifications""
41 |
Dynamic Programming Approaches for Estimating and Applying Large-scale Discrete Choice ModelsMai, Anh Tien 12 1900 (has links)
People go through their life making all kinds of decisions, and some of these decisions affect their demand for transportation, for example, their choices of where to live and where to work, how and when to travel and which route to take. Transport related choices are typically time dependent and characterized by large number of alternatives that can be spatially correlated. This thesis deals with models that can be used to analyze and predict discrete choices in large-scale networks. The proposed models and methods are highly relevant for, but not limited to, transport applications.
We model decisions as sequences of choices within the dynamic discrete choice framework, also known as parametric Markov decision processes. Such models are known to be difficult to estimate and to apply to make predictions because dynamic programming problems need to be solved in order to compute choice probabilities. In this thesis we show that it is possible to explore the network structure and the flexibility of dynamic programming so that the dynamic discrete choice modeling approach is not only useful to model time dependent choices, but also makes it easier to model large-scale static choices.
The thesis consists of seven articles containing a number of models and methods for estimating, applying and testing large-scale discrete choice models. In the following we group the contributions under three themes: route choice modeling, large-scale multivariate extreme value (MEV) model estimation and nonlinear optimization algorithms.
Five articles are related to route choice modeling. We propose different dynamic discrete choice models that allow paths to be correlated based on the MEV and mixed logit models. The resulting route choice models become expensive to estimate and we deal with this challenge by proposing innovative methods that allow to reduce the estimation cost. For example, we propose a decomposition method that not only opens up for possibility of mixing, but also speeds up the estimation for simple logit models, which has implications also for traffic simulation. Moreover, we compare the utility maximization and regret minimization decision rules, and we propose a misspecification test for logit-based route choice models.
The second theme is related to the estimation of static discrete choice models with large choice sets.
We establish that a class of MEV models can be reformulated as dynamic discrete choice models on the networks of correlation structures. These dynamic models can then be estimated quickly using dynamic programming techniques and an efficient nonlinear optimization algorithm.
Finally, the third theme focuses on structured quasi-Newton techniques for estimating discrete choice models by maximum likelihood. We examine and adapt switching methods that can be easily integrated into usual optimization algorithms (line search and trust region) to accelerate the estimation process.
The proposed dynamic discrete choice models and estimation methods can be used in various discrete choice applications. In the area of big data analytics, models that can deal with large choice sets and sequential choices are important.
Our research can therefore be of interest in various demand analysis applications (predictive analytics) or can be integrated with optimization models (prescriptive analytics). Furthermore, our studies indicate the potential of dynamic programming techniques in this context, even for static models, which opens up a variety of future research directions. / Les gens consacrent une importante part de leur existence à prendre diverses décisions, pouvant affecter leur demande en transport, par exemple les choix de lieux d'habitation et de travail, les modes de transport, les heures de départ, le nombre et type de voitures dans le ménage, les itinéraires ... Les choix liés au transport sont généralement fonction du temps et caractérisés par un grand nombre de solutions alternatives qui peuvent être spatialement corrélées. Cette thèse traite de modèles pouvant être utilisés pour analyser et prédire les choix discrets dans les applications liées aux réseaux de grandes tailles. Les modèles et méthodes proposées sont particulièrement pertinents pour les applications en transport, sans toutefois s'y limiter.
Nous modélisons les décisions comme des séquences de choix, dans le cadre des choix discrets dynamiques, aussi connus comme processus de décision de Markov paramétriques. Ces modèles sont réputés difficiles à estimer et à appliquer en prédiction, puisque le calcul des probabilités de choix requiert la résolution de problèmes de programmation dynamique. Nous montrons dans cette thèse qu'il est possible d'exploiter la structure du réseau et la flexibilité de la programmation dynamique afin de rendre l'approche de modélisation dynamique en choix discrets non seulement utile pour représenter les choix dépendant du temps, mais également pour modéliser plus facilement des choix statiques au sein d'ensembles de choix de très grande taille.
La thèse se compose de sept articles, présentant divers modèles et méthodes d'estimation, leur application ainsi que des expériences numériques sur des modèles de choix discrets de grande taille. Nous regroupons les contributions en trois principales thématiques: modélisation du choix de route, estimation de modèles en valeur extrême multivariée (MEV) de grande taille et algorithmes d'optimisation non-linéaire.
Cinq articles sont associés à la modélisation de choix de route. Nous proposons différents modèles de choix discrets dynamiques permettant aux utilités des chemins d'être corrélées, sur base de formulations MEV et logit mixte.
Les modèles résultants devenant coûteux à estimer, nous présentons de nouvelles approches permettant de diminuer les efforts de calcul. Nous proposons par exemple une méthode de décomposition qui non seulement ouvre la possibilité d'estimer efficacement des modèles logit mixte, mais également d'accélérer l'estimation de modèles simples comme les modèles logit multinomiaux, ce qui a également des implications en simulation de trafic. De plus, nous comparons les règles de décision basées sur le principe de maximisation d'utilité de celles sur la minimisation du regret pour ce type de modèles. Nous proposons finalement un test statistique sur les erreurs de spécification pour les modèles de choix de route basés sur le logit multinomial.
Le second thème porte sur l'estimation de modèles de choix discrets statiques avec de grands ensembles de choix. Nous établissons que certains types de modèles MEV peuvent être reformulés comme des modèles de choix discrets dynamiques, construits sur des réseaux de structure de corrélation. Ces modèles peuvent alors être estimées rapidement en utilisant des techniques de programmation dynamique en combinaison avec un algorithme efficace d'optimisation non-linéaire.
La troisième et dernière thématique concerne les algorithmes d'optimisation non-linéaires dans le cadre de l'estimation de modèles complexes de choix discrets par maximum de vraisemblance. Nous examinons et adaptons des méthodes quasi-Newton structurées qui peuvent être facilement intégrées dans des algorithmes d'optimisation usuels (recherche linéaire et région de confiance) afin d'accélérer le processus d'estimation.
Les modèles de choix discrets dynamiques et les méthodes d'optimisation proposés peuvent être employés dans diverses applications de choix discrets. Dans le domaine des sciences de données, des modèles qui peuvent traiter de grands ensembles de choix et des ensembles de choix séquentiels sont importants. Nos recherches peuvent dès lors être d'intérêt dans diverses applications d'analyse de la demande (analyse prédictive) ou peuvent être intégrées à des modèles d'optimisation (analyse prescriptive). De plus, nos études mettent en évidence le potentiel des techniques de programmation dynamique dans ce contexte, y compris pour des modèles statiques, ouvrant la voie à de multiples directions de recherche future.
|
42 |
Analyzing the Negative Log-Likelihood Loss in Generative Modeling / Analys av log-likelihood-optimering inom generativa modellerEspuña I Fontcuberta, Aleix January 2022 (has links)
Maximum-Likelihood Estimation (MLE) is a classic model-fitting method from probability theory. However, it has been argued repeatedly that MLE is inappropriate for synthesis applications, since its priorities are at odds with important principles of human perception, and that, e.g. Generative Adversarial Networks (GANs) are a more appropriate choice. In this thesis, we put these ideas to the test, and explore the effect of MLE in deep generative modelling, using image generation as our example application. Unlike previous studies, we apply a new methodology that allows us to isolate the effects of the training paradigm from several common confounding factors of variation, such as the model architecture and the properties of the true data distribution. The thesis addresses two main questions. First, we ask if models trained via Non-Saturating Generative Adversarial Networks (NSGANs) are capable of producing more realistic images than the exact same architecture trained by directly minimizing the Negative Log-Likelihood (NLL) loss function instead (which is equivalent to MLE). We compare the two training paradigms using the MNIST dataset and a normalizing-flow architecture known as Real NVP, which can explicitly represent a very broad family of density functions. We use the Fréchet Inception Distance (FID) as an algorithmic estimate of subjective image quality. Second, we also analyze how the NLL loss behaves in the presence of model misspecification, which is when the model architecture is not capable of representing the true data distribution, and compare the resulting training curves and performance to those produced by models without misspecification. In order to control for and study different degrees of model misspecification, we create a realistic-looking – but actually synthetic – toy version of the classic MNIST dataset. By this we mean that we create a machine-learning problem where the examples in the dataset look like MNIST, but in fact it have been generated by a Real NVP architecture with known weights, and therefore the true distribution that generated the image data is known. We are not aware of this type of large-scale, realistic-looking toy problem having been used in prior work. Our results show that, first, models trained via NLL perform unexpectedly well in terms of FID, and that a Real NVP trained via an NSGAN approach is unstable during training – even at the Nash equilibrium, which is the global optimum onto which the NSGAN training updates are supposed to converge. Second, the experiments on synthetic data show that models with different degrees of misspecification reach different NLL losses on the training set, but all of them exhibit qualitatively similar convergence behavior. However, looking at the validation NLL loss reveals an important overfitting effect due to the finite size of the synthetic dataset: The models that in theory are able to perfectly describe the true data distribution achieve worse validation NLL losses in practice than some misspecified models, whose reduced complexity acts as a regularizer that helps them generalize better. At the same time, we observe that overfitting has a much stronger negative effect on the validation NLL loss than on the image quality as measured by the FID score. We also conclude that models with too many parameters and degrees of freedom (overparameterized models) should be avoided, as they not only are slow and frequently unstable to train, even using the NLL loss, but they also overfit heavily and produce poorer images. Throughout the thesis, our results highlight the complex and non-intuitive relationship between the NLL loss and the perceptual image quality as measured by the FID score. / Maximum likelihood-metoden är en klassisk parameteruppskattningsmetod från sannolikhetsteori. Det hävdas dock ofta att maximum likelihood är ett olämpligt val för tillämpningar inom exempelvis ljud- och bildsyntes, eftersom metodens prioriteringar står i strid med viktiga principer inom mänsklig perception, och att t.ex. Generative Adversarial Networks (GANs) är ett mer perceptuellt lämpligt val. I den här avhandlingen testar vi dessa hypoteser och utforskar effekten av maximum likelihood i djupa generativa modeller, med bildsyntes som vår exempeltillämpning. Till skillnad från tidigare studier använder vi en ny metodik som gör att vi kan isolera effekterna av träningsparadigmen från flera vanliga störfaktorer, såsom modellarkitekturen och hur väl denna arkitektur svarar mot datats sanna fördelning. Avhandlingen tar upp två huvudfrågor. Först frågar vi oss huruvida modeller tränade via NSGAN (Non-Saturating Generative Adversarial Networks) producerar mer realistiska bilder än om exakt samma arkitektur istället tränas att direkt minimera målfunktionen Negativ Log-Likelihood (NLL). (Att minimera NLL är ekvivalent med maximum likelihood-metoden.) För att jämföra de två träningsparadigmerna använder vi datamängden MNIST samt en normalizing flow-arkitektur kallad Real NVP, vilken på ett explicit sätt kan representera en mycket bred familj av kontinuerliga fördelingsfunktioner. Vi använder också Fréchet Inception Distance (FID) som ett mått för att algoritmiskt uppskatta kvaliteten på syntetiserade bilder. För det andra analyserar vi också hur målfunktionen NLL beter sig för felspecificerade modeller, vilket är det fall när modellarkitekturen inte kan representera datas sanna sannolikhetsfördelning perfekt, och jämför resulterande träningskurvor och -prestanda med motsvarande resultat när vi tränar modeller utan felspecifikation. För att studera och utöva kontroll över olika grader av felspecificerade modeller skapar vi en realistisk – men i själva verket syntetisk – leksaksversion av MNIST. Med detta menar vi att vi skapar ett maskininlärningsproblem där exemplen i datamängden är visuellt mycket lika de i MNIST, men i själva verket alla är slumpgenererade från en Real NVP-arkitektur med kända modellparametrar (vikter), och således är den sanna fördelningen för detta syntetiska bilddatamaterialet känd. Vi är inte medvetna om att någon tidigare forskning använt ett realistiskt och storskaligt leksaksproblem enligt detta recept. Våra resultat visar, för det första, att modeller som tränats via NLL presterar oväntat bra i termer av FID, och att NSGAN-baserad träning av Real NVP-modeller är instabil – även om vi startar träningen vid Nashjämvikten, vilken är det globala optimum som NSGAN är tänkt att konvergera mot. För det andra visar experimenten på syntetiska data att modeller med olika grader av felspecifikation når olika NLL-värden på träningsmaterialet, men de uppvisar alla kvalitativt liknande konvergensbeteende. Om man tittar på NLL-värdena på valideringsdata syns dock en överanpassningseffekt, som härrör från den ändliga storleken på det syntetiska träningsdatamaterialet; specifikt ser vi att de modeller som i teorin perfekt kan beskriva den sanna datafördelningen i praktiken uppnår sämre NLL-värden på valideringsdata än vissa felspecificerade modeller. Den reducerade komplexiteten hos de senare regulariserar uppenbarligen modellerna och hjälper dem att generalisera bättre. Samtidigt noterar vi att överanpassning har en mycket mer uttalad negativ effekt på validerings-NLL än på bildkvalitetsmåttet FID. Vi drar också slutsatsen att modeller med alltför många parametrar och frihetsgrader (överparametriserade modeller) bör undvikas, eftersom de inte bara är långsamma och ofta instabila att träna, också om vi tränar baserat på NLL, men dessutom uppvisar kraftig överanpassning och sämre bildkvalitet. Som helhet belyser resultaten i detta examensarbete det komplexa och icke-intuitiva förhållandet mellan NLL/maximum likelihood och perceptuell bildkvalitet utvärderad med hjälp av FID.
|
43 |
Naturalism & Objectivity: Methods and Meta-methodsMiller, Jean Anne 19 August 2011 (has links)
The error statistical account provides a basic account of evidence and inference. Formally, the approach is a re-interpretation of standard frequentist (Fisherian, Neyman-Pearson) statistics. Informally, it gives an account of inductive inference based on arguing from error, an analog of frequentist statistics, which keeps the concept of error probabilities central to the evaluation of inferences and evidence. Error statistical work at present tends to remain distinct from other approaches of naturalism and social epistemology in philosophy of science and, more generally, Science and Technology Studies (STS). My goal is to employ the error statistical program in order to address a number of problems to approaches in philosophy of science, which fall under two broad headings: (1) naturalistic philosophy of science and (2) social epistemology. The naturalistic approaches that I am interested in looking at seek to provide us with an account of scientific and meta-scientific methodologies that will avoid extreme skepticism, relativism and subjectivity and claim to teach us something about scientific inferences and evidence produced by experiments (broadly construed). I argue that these accounts fail to identify a satisfactory program for achieving those goals and; moreover, to the extent that they succeed it is by latching on to the more general principles and arguments from error statistics. In sum, I will apply the basic ideas from error statistics and use them to examine (and improve upon) an area to which they have not yet been applied, namely in assessing and pushing forward these interdisciplinary pursuits involving naturalistic philosophies of science that appeal to cognitive science, psychology, the scientific record and a variety of social epistemologies. / Ph. D.
|
Page generated in 0.0742 seconds