Global ETD Search

1	Applications of continuum quantum Monte Carlo methods Leung, Wing-Kai January 2001 (has links) No description available. 518.282
2	On the analysis of Monte Carlo and quasi-Monte Carlo methods Dickinson, Andrew Samuel January 2004 (has links) No description available. 518.282
3	Using the autocorrelation time and auto-validating methods to improve the performance of Monte Carlo algorithms Everitt, Richard G. January 2008 (has links) In this thesis we investigate two alternative ways of improving Monte Carlo methods: 1. Through monitoring the autocorrelation time in a Markov chain Monte Carlo algorithm. 2. Through the use of auto-validating methods. 518.282
4	Monitoring in survival analysis and rare event simulation Phinikettos, Ioannis January 2012 (has links) Monte Carlo methods are a fundamental tool in many areas of statistics. In this thesis, we will examine these methods, especially for rare event simulation. We are mainly interested in the computation of multivariate normal probabilities and in constructing hitting thresholds in survival analysis models. Firstly, we develop an algorithm for computing high dimensional normal probabilities. These kinds of probabilities are a fundamental tool in many statistical applications. The new algorithm exploits the diagonalisation of the covariance matrix and uses various variance reduction techniques. Its performance is evaluated via a simulation study. The new method is designed for computing small exceedance probabilities. Secondly, we introduce a new omnibus cumulative sum chart for monitoring in survival analysis models. By omnibus we mean that it is able to detect any change. This chart exploits the absolute differences between the Kaplan-Meier estimator and the in-control distribution over specific time intervals. A simulation study is presented that evaluates the performance of our proposed chart and compares it to existing methods. Thirdly, we apply the method of adaptive multilevel splitting for the estimation of hitting probabilities and hitting thresholds for the survival analysis cumulative sum charts. Simulation results are presented evaluating the benefits of adaptive multilevel splitting. Finally, we extend the idea of adaptive multilevel splitting by estimating not just a hitting probability, but the whole distribution function up to a certain point. A theoretical result is proved that is used to construct confidence bands for the distribution function conditioned on lying in a closed interval. 518.282
5	Formalisation et étude de problématiques de scoring en risque de crédit : inférence de rejet, discrétisation de variables et interactions, arbres de régression logistique / Formalization and study of statistical problems in Credit Scoring : reject inference, discretization and pairwise interactions, logistic regression trees Ehrhardt, Adrien 30 September 2019 (has links) Cette thèse se place dans le cadre des modèles d’apprentissage automatique de classification binaire. Le cas d’application est le scoring de risque de crédit. En particulier, les méthodes proposées ainsi que les approches existantes sont illustrées par des données réelles de Crédit Agricole Consumer Finance, acteur majeur en Europe du crédit à la consommation, à l’origine de cette thèse grâce à un financement CIFRE. Premièrement, on s’intéresse à la problématique dite de “réintégration des refusés”. L’objectif est de tirer parti des informations collectées sur les clients refusés, donc par définition sans étiquette connue, quant à leur remboursement de crédit. L’enjeu a été de reformuler cette problématique industrielle classique dans un cadre rigoureux, celui de la modélisation pour données manquantes. Cette approche a permis de donner tout d’abord un nouvel éclairage aux méthodes standards de réintégration, et ensuite de conclure qu’aucune d’entre elles n’était réellement à recommander tant que leur modélisation, lacunaire en l’état, interdisait l’emploi de méthodes de choix de modèles statistiques. Une autre problématique industrielle classique correspond à la discrétisation des variables continues et le regroupement des modalités de variables catégorielles avant toute étape de modélisation. La motivation sous-jacente correspond à des raisons à la fois pratiques (interprétabilité) et théoriques (performance de prédiction). Pour effectuer ces quantifications, des heuristiques, souvent manuelles et chronophages, sont cependant utilisées. Nous avons alors reformulé cette pratique courante de perte d’information comme un problème de modélisation à variables latentes, revenant ainsi à une sélection de modèle. Par ailleurs, la combinatoire associée à cet espace de modèles nous a conduit à proposer des stratégies d’exploration, soit basées sur un réseau de neurone avec un gradient stochastique, soit basées sur un algorithme de type EM stochastique.Comme extension du problème précédent, il est également courant d’introduire des interactions entre variables afin, comme toujours, d’améliorer la performance prédictive des modèles. La pratique classiquement répandue est de nouveau manuelle et chronophage, avec des risques accrus étant donnée la surcouche combinatoire que cela engendre. Nous avons alors proposé un algorithme de Metropolis-Hastings permettant de rechercher les meilleures interactions de façon quasi-automatique tout en garantissant de bonnes performances grâce à ses propriétés de convergence standards. La dernière problématique abordée vise de nouveau à formaliser une pratique répandue, consistant à définir le système d’acceptation non pas comme un unique score mais plutôt comme un arbre de scores. Chaque branche de l’arbre est alors relatif à un segment de population particulier. Pour lever la sous-optimalité des méthodes classiques utilisées dans les entreprises, nous proposons une approche globale optimisant le système d’acceptation dans son ensemble. Les résultats empiriques qui en découlent sont particulièrement prometteurs, illustrant ainsi la flexibilité d’un mélange de modélisation paramétrique et non paramétrique. Enfin, nous anticipons sur les futurs verrous qui vont apparaître en Credit Scoring et qui sont pour beaucoup liés la grande dimension (en termes de prédicteurs). En effet, l’industrie financière investit actuellement dans le stockage de données massives et non structurées, dont la prochaine utilisation dans les règles de prédiction devra s’appuyer sur un minimum de garanties théoriques pour espérer atteindre les espoirs de performance prédictive qui ont présidé à cette collecte. / This manuscript deals with model-based statistical learning in the binary classification setting. As an application, credit scoring is widely examined with a special attention on its specificities. Proposed and existing approaches are illustrated on real data from Crédit Agricole Consumer Finance, a financial institute specialized in consumer loans which financed this PhD through a CIFRE funding. First, we consider the so-called reject inference problem, which aims at taking advantage of the information collected on rejected credit applicants for which no repayment performance can be observed (i.e. unlabelled observations). This industrial problem led to a research one by reinterpreting unlabelled observations as an information loss that can be compensated by modelling missing data. This interpretation sheds light on existing reject inference methods and allows to conclude that none of them should be recommended since they lack proper modelling assumptions that make them suitable for classical statistical model selection tools. Next, yet another industrial problem, corresponding to the discretization of continuous features or grouping of levels of categorical features before any modelling step, was tackled. This is motivated by practical (interpretability) and theoretical reasons (predictive power). To perform these quantizations, ad hoc heuristics are often used, which are empirical and time-consuming for practitioners. They are seen here as a latent variable problem, setting us back to a model selection problem. The high combinatorics of this model space necessitated a new cost-effective and automatic exploration strategy which involves either a particular neural network architecture or Stochastic-EM algorithm and gives precise statistical guarantees. Third, as an extension to the preceding problem, interactions of covariates may be introduced in the problem in order to improve the predictive performance. This task, up to now again manually processed by practitioners and highly combinatorial, presents an accrued risk of misselecting a “good” model. It is performed here with a Metropolis Hastings sampling procedure which finds the best interactions in an automatic fashion while ensuring its standard convergence properties, thus good predictive performance is guaranteed. Finally, contrary to the preceding problems which tackled a particular scorecard, we look at the scoring system as a whole. It generally consists of a tree-like structure composed of many scorecards (each relative to a particular population segment), which is often not optimized but rather imposed by the company’s culture and / or history. Again, ad hoc industrial procedures are used, which lead to suboptimal performance. We propose some lines of approach to optimize this logistic regression tree which result in good empirical performance and new research directions illustrating the predictive strength and interpretability of a mix of parametric and non-parametric models. This manuscript is concluded by a discussion on potential scientific obstacles, among which the high dimensionality (in the number of features). The financial industry is indeed investing massively in unstructured data storage, which remains to this day largely unused for Credit Scoring applications. Doing so will need statistical guarantees to achieve the additional predictive performance that was hoped for. Scoring 518.282
6	Multilevel Monte Carlo methods and uncertainty quantification Teckentrup, Aretha Leonore January 2013 (has links) We consider the application of multilevel Monte Carlo methods to elliptic partial differential equations with random coefficients. Such equations arise, for example, in stochastic groundwater ow modelling. Models for random coefficients frequently used in these applications, such as log-normal random fields with exponential covariance, lack uniform coercivity and boundedness with respect to the random parameter and have only limited spatial regularity. To give a rigorous bound on the cost of the multilevel Monte Carlo estimator to reach a desired accuracy, one needs to quantify the bias of the estimator. The bias, in this case, is the spatial discretisation error in the numerical solution of the partial differential equation. This thesis is concerned with establishing bounds on this discretisation error in the practically relevant and technically demanding case of coefficients which are not uniformly coercive or bounded with respect to the random parameter. Under mild assumptions on the regularity of the coefficient, we establish new results on the regularity of the solution for a variety of model problems. The most general case is that of a coefficient which is piecewise Hölder continuous with respect to a random partitioning of the domain. The established regularity of the solution is then combined with tools from classical discretisation error analysis to provide a full convergence analysis of the bias of the multilevel estimator for finite element and finite volume spatial discretisations. Our analysis covers as quantities of interest several spatial norms of the solution, as well as point evaluations of the solution and its gradient and any continuously Fréchet differentiable functional. Lastly, we extend the idea of multilevel Monte Carlo estimators to the framework of Markov chain Monte Carlo simulations. We develop a new multilevel version of a Metropolis Hastings algorithm, and provide a full convergence analysis. 518.282
7	New methods for mode jumping in Markov chain Monte Carlo algorithms Ibrahim, Adriana Irawati Nur January 2009 (has links) Standard Markov chain Monte Carlo (MCMC) sampling methods can experience problem sampling from multi-modal distributions. A variety of sampling methods have been introduced to overcome this problem. The mode jumping method of Tjelmeland & Hegstad (2001) tries to find a mode and propose a value from that mode in each mode jumping attempt. This approach is inefficient in that the work needed to find each mode and model the distribution in a neighbourhood of the mode is carried out repeatedly during the sampling process. We shall propose a new mode jumping approach which retains features of the Tjelmeland & Hegstad (2001) method but differs in that it finds the modes in an initial search, then uses this information to jump between modes effectively in the sampling run. Although this approach does not allow a second chance to find modes in the sampling run, we can show that the overall probability of missing a mode in our approach is still low. We apply our methods to sample from distributions which have continuous variables, discrete variables, a mixture of discrete and continuous variables and variable dimension. We show that our methods work well in each case and in general, are better than the MCMC sampling methods commonly used in these cases and also, are better than the Tjelmeland & Hegstad (2001) method in particular. 518.282 Markov chain Monte Carlo ; mode jumping
8	Testing for unit roots and cointegration in heterogeneous panels Sethapramote, Yuthana January 2005 (has links) This thesis undertakes a Monte Carlo study to investigate the finite sample properties of several panel unit root and cointegration tests. To this end, we consider a number of different experiments which potentially affect the properties of the tests. We first consider panel unit root tests in heterogenous panels. Application of the panel tests of Im, Pesaran and Shin (2003) (IPS), and Maddala and Wu (1999) (MW) increases their power over the standard ADF test. However, the power of the tests is significantly diminished when the panel is dominated by the non-stationary series. Neglecting the presence of cross-sectional dependence results in serious size distortions. In view of this, a variety of methods are applied to correct the size distortions. However, the power of all tests is diminished as the cross-correlations reduce the amount of independent information in the panel. The simulation results from the panel cointegration tests extend the findings of the unit root tests to multivariate cases. The likelihood-based panel rank test of Larsson, Lyhagen and Lothgren (2001) is found to be more powerful than the residual-based panel tests of IPS and MW, but slightly oversized in moderate sample sizes (Z). The effects of a mixed panel and of cross-correlations in the errors are similar to those of panel unit root tests. Therefore, we again, use the bootstrap method and the Cross-sectionally augmented IPS test (CIPS) ofPesaran (2003) to correct the size distortions. The presence of structural breaks affects the size and power properties of any panel unit root tests which fail to cope with it. When the break dates are known, the exogenous break panel LM test is applied, to control the effect of structural shifts. In addition, the endogenous break selection procedures are used to estimate the break points. The endogenous break panel LM test also performs considerably well in terms of the size, power and accuracy with which the true break points are estimated. Finally, application of the panel unit root and cointegration tests provide some evidence in support of the existence of long-run PPP and the monetary model in Asia Pacific countries. In addition, the presence of structural breaks as the impact of the currency crisis is also detected. However, evidence is found to be sensitive to the choice of deterministic terms (intercepts, trends), the methods used to estimate the panel test statistic (e.g. SUR and CIPS) and the break-point selection criteria. 518.282 HB Economic Theory : QA Mathematics

Search results