Spelling suggestions: "subject:"nonparametric estatistics"" "subject:"nonparametric cstatistics""
181 |
Bayesian methods and machine learning in astrophysicsHigson, Edward John January 2019 (has links)
This thesis is concerned with methods for Bayesian inference and their applications in astrophysics. We principally discuss two related themes: advances in nested sampling (Chapters 3 to 5), and Bayesian sparse reconstruction of signals from noisy data (Chapters 6 and 7). Nested sampling is a popular method for Bayesian computation which is widely used in astrophysics. Following the introduction and background material in Chapters 1 and 2, Chapter 3 analyses the sampling errors in nested sampling parameter estimation and presents a method for estimating them numerically for a single nested sampling calculation. Chapter 4 introduces diagnostic tests for detecting when software has not performed the nested sampling algorithm accurately, for example due to missing a mode in a multimodal posterior. The uncertainty estimates and diagnostics in Chapters 3 and 4 are implemented in the $\texttt{nestcheck}$ software package, and both chapters describe an astronomical application of the techniques introduced. Chapter 5 describes dynamic nested sampling: a generalisation of the nested sampling algorithm which can produce large improvements in computational efficiency compared to standard nested sampling. We have implemented dynamic nested sampling in the $\texttt{dyPolyChord}$ and $\texttt{perfectns}$ software packages. Chapter 6 presents a principled Bayesian framework for signal reconstruction, in which the signal is modelled by basis functions whose number (and form, if required) is determined by the data themselves. This approach is based on a Bayesian interpretation of conventional sparse reconstruction and regularisation techniques, in which sparsity is imposed through priors via Bayesian model selection. We demonstrate our method for noisy 1- and 2-dimensional signals, including examples of processing astronomical images. The numerical implementation uses dynamic nested sampling, and uncertainties are calculated using the methods introduced in Chapters 3 and 4. Chapter 7 applies our Bayesian sparse reconstruction framework to artificial neural networks, where it allows the optimum network architecture to be determined by treating the number of nodes and hidden layers as parameters. We conclude by suggesting possible areas of future research in Chapter 8.
|
182 |
Analyse statistique des modèles de croissance-fragmentation / Statistical analysis of growth-fragmentation modelsOlivier, Adelaïde 27 November 2015 (has links)
Cette étude théorique est pensée en lien étroit avec un champ d'application : il s'agit de modéliser la croissance d'une population de cellules qui se divisent selon un taux de division inconnu, fonction d’une variable dite structurante – l’âge et la taille des cellules étant les deux exemples paradigmatiques étudiés. Le champ mathématique afférent se situe à l'interface de la statistique des processus, de l’estimation non-paramétrique et de l’analyse des équations aux dérivées partielles. Les trois objectifs de ce travail sont les suivants : reconstruire le taux de division (fonction de l’âge ou de la taille) pour différents schémas d’observation (en temps généalogique ou en temps continu) ; étudier la transmission d'un trait biologique général d'une cellule à une autre et étudier le trait d’une cellule typique ; comparer la croissance de différentes populations de cellules à travers le paramètre de Malthus (après introduction de variabilité dans le taux de croissance par exemple). / This work is concerned with growth-fragmentation models, implemented for investigating the growth of a population of cells which divide according to an unknown splitting rate, depending on a structuring variable – age and size being the two paradigmatic examples. The mathematical framework includes statistics of processes, nonparametric estimations and analysis of partial differential equations. The three objectives of this work are the following : get a nonparametric estimate of the division rate (as a function of age or size) for different observation schemes (genealogical or continuous) ; to study the transmission of a biological feature from one cell to an other and study the feature of one typical cell ; to compare different populations of cells through their Malthus parameter, which governs the global growth (when introducing variability in the growth rate among cells for instance).
|
183 |
A nonparametric Bayesian perspective for machine learning in partially-observed settingsAkova, Ferit 31 July 2014 (has links)
Indiana University-Purdue University Indianapolis (IUPUI) / Robustness and generalizability of supervised learning algorithms depend on the quality of the labeled data set in representing the real-life problem. In many real-world domains, however, we may not have full knowledge of the underlying data-generating mechanism, which may even have an evolving nature introducing new classes continually. This constitutes a partially-observed setting, where it would be impractical to obtain a labeled data set exhaustively defined by a fixed set of classes. Traditional supervised learning algorithms, assuming an exhaustive training library, would misclassify a future sample of an unobserved class with probability one, leading to an ill-defined classification problem. Our goal is to address situations where such assumption is violated by a non-exhaustive training library, which is a very realistic yet an overlooked issue in supervised learning.
In this dissertation we pursue a new direction for supervised learning by defining self-adjusting models to relax the fixed model assumption imposed on classes and their distributions. We let the model adapt itself to the prospective data by dynamically adding new classes/components as data demand, which in turn gradually make the model more representative of the entire population. In this framework, we first employ suitably chosen nonparametric priors to model class distributions for observed as well as unobserved classes and then, utilize new inference methods to classify samples from observed classes and discover/model novel classes for those from unobserved classes.
This thesis presents the initiating steps of an ongoing effort to address one of the most overlooked bottlenecks in supervised learning and indicates the potential for taking new perspectives in some of the most heavily studied areas of machine learning: novelty detection, online class discovery and semi-supervised learning.
|
184 |
Multivariate semiparametric regression models for longitudinal dataLi, Zhuokai January 2014 (has links)
Multiple-outcome longitudinal data are abundant in clinical investigations. For example, infections with different pathogenic organisms are often tested concurrently, and assessments are usually taken repeatedly over time. It is therefore natural to consider a multivariate modeling approach to accommodate the underlying interrelationship among the multiple longitudinally measured outcomes. This dissertation proposes a multivariate semiparametric modeling framework for such data. Relevant estimation and inference procedures as well as model selection tools are discussed within this modeling framework. The first part of this research focuses on the analytical issues concerning binary data. The second part extends the binary model to a more general situation for data from the exponential family of distributions. The proposed model accounts for the correlations across the outcomes as well as the temporal dependency among the repeated measures of each outcome within an individual. An important feature of the proposed model is the addition of a bivariate smooth function for the depiction of concurrent nonlinear and possibly interacting influences of two independent variables on each outcome. For model implementation, a general approach for parameter estimation is developed by using the maximum penalized likelihood method. For statistical inference, a likelihood-based resampling procedure is proposed to compare the bivariate nonlinear effect surfaces across the outcomes. The final part of the dissertation presents a variable selection tool to facilitate model development in practical data analysis. Using the adaptive least absolute shrinkage and selection operator (LASSO) penalty, the variable selection tool simultaneously identifies important fixed effects and random effects, determines the correlation structure of the outcomes, and selects the interaction effects in the bivariate smooth functions. Model selection and estimation are performed through a two-stage procedure based on an expectation-maximization (EM) algorithm. Simulation studies are conducted to evaluate the performance of the proposed methods. The utility of the methods is demonstrated through several clinical applications.
|
185 |
Dimension Flexible and Adaptive Statistical LearningKhowaja, Kainat 02 March 2023 (has links)
Als interdisziplinäre Forschung verbindet diese Arbeit statistisches Lernen mit aktuellen fortschrittlichen Methoden, um mit hochdimensionalität und Nichtstationarität umzugehen. Kapitel 2 stellt Werkzeuge zur Verfügung, um statistische Schlüsse auf die Parameterfunktionen von Generalized Random Forests zu ziehen, die als Lösung der lokalen Momentenbedingung identifiziert wurden. Dies geschieht entweder durch die hochdimensionale Gaußsche Approximationstheorie oder durch Multiplier-Bootstrap. Die theoretischen Aspekte dieser beiden Ansätze werden neben umfangreichen Simulationen und realen Anwendungen im Detail diskutiert. In Kapitel 3 wird der lokal parametrische Ansatz auf zeitvariable Poisson-Prozesse ausgeweitet, um ein Instrument zur Ermittlung von Homogenitätsintervallen innerhalb der Zeitreihen von Zähldaten in einem nichtstationären Umfeld bereitzustellen. Die Methodik beinhaltet rekursive Likelihood-Ratio-Tests und hat ein Maximum in der Teststatistik mit unbekannter Verteilung. Um sie zu approximieren und den kritischen Wert zu finden, verwenden wir den Multiplier-Bootstrap und demonstrieren den Nutzen dieses Algorithmus für deutsche M\&A Daten. Kapitel 4 befasst sich mit der Erstellung einer niedrigdimensionalen Approximation von hochdimensionalen Daten aus dynamischen Systemen. Mithilfe der Resampling-Methoden, der Hauptkomponentenanalyse und Interpolationstechniken konstruieren wir reduzierte dimensionale Ersatzmodelle, die im Vergleich zu den ursprünglichen hochauflösenden Modellen schnellere Ausgaben liefern. In Kapitel 5 versuchen wir, die Verteilungsmerkmale von Kryptowährungen mit den von ihnen zugrunde liegenden Mechanismen zu verknüpfen. Wir verwenden charakteristikbasiertes spektrales Clustering, um Kryptowährungen mit ähnlichem Verhalten in Bezug auf Preis, Blockzeit und Blockgröße zu clustern, und untersuchen diese Cluster, um gemeinsame Mechanismen zwischen verschiedenen Krypto-Clustern zu finden. / As an interdisciplinary research, this thesis couples statistical learning with current advanced methods to deal with high dimensionality and nonstationarity. Chapter 2 provides tools to make statistical inference (uniformly over covariate space) on the parameter functions from Generalized Random Forests identified as the solution of the local moment condition. This is done by either highdimensional Gaussian approximation theorem or via multiplier bootstrap. The theoretical aspects of both of these approaches are discussed in detail alongside extensive simulations and real life applications. In Chapter 3, we extend the local parametric approach to time varying Poisson processes, providing a tool to find intervals of homogeneity within the time series of count data in a nonstationary setting. The methodology involves recursive likelihood ratio tests and has a maxima in test statistic with unknown distribution. To approximate it and find the critical value, we use multiplier bootstrap and demonstrate the utility of this algorithm on German M\&A data. Chapter 4 is concerned with creating low dimensional approximation of high dimensional data from dynamical systems. Using various resampling methods, Principle Component Analysis, and interpolation techniques, we construct reduced dimensional surrogate models that provide faster responses as compared to the original high fidelity models. In Chapter 5, we aim to link the distributional characteristics of cryptocurrencies to their underlying mechanism. We use characteristic based spectral clustering to cluster cryptos with similar behaviour in terms of price, block time, and block size, and scrutinize these clusters to find common mechanisms between various crypto clusters.
|
186 |
Tail Risk Protection via reproducible data-adaptive strategiesSpilak, Bruno 15 February 2024 (has links)
Die Dissertation untersucht das Potenzial von Machine-Learning-Methoden zur Verwaltung von Schwanzrisiken in nicht-stationären und hochdimensionalen Umgebungen. Dazu vergleichen wir auf robuste Weise datenabhängige Ansätze aus parametrischer oder nicht-parametrischer Statistik mit datenadaptiven Methoden. Da datengetriebene Methoden reproduzierbar sein müssen, um Vertrauen und Transparenz zu gewährleisten, schlagen wir zunächst eine neue Plattform namens Quantinar vor, die einen neuen Standard für wissenschaftliche Veröffentlichungen setzen soll. Im zweiten Kapitel werden parametrische, lokale parametrische und nicht-parametrische Methoden verglichen, um eine dynamische Handelsstrategie für den Schutz vor Schwanzrisiken in Bitcoin zu entwickeln. Das dritte Kapitel präsentiert die Portfolio-Allokationsmethode NMFRB, die durch eine Dimensionsreduktionstechnik hohe Dimensionen bewältigt. Im Vergleich zu klassischen Machine-Learning-Methoden zeigt NMFRB in zwei Universen überlegene risikobereinigte Renditen. Das letzte Kapitel kombiniert bisherige Ansätze zu einer Schwanzrisikoschutzstrategie für Portfolios. Die erweiterte NMFRB berücksichtigt Schwanzrisikomaße, behandelt nicht-lineare Beziehungen zwischen Vermögenswerten während Schwanzereignissen und entwickelt eine dynamische Schwanzrisikoschutzstrategie unter Berücksichtigung der Nicht-Stationarität der Vermögensrenditen. Die vorgestellte Strategie reduziert erfolgreich große Drawdowns und übertrifft andere moderne Schwanzrisikoschutzstrategien wie die Value-at-Risk-Spread-Strategie. Die Ergebnisse werden durch verschiedene Data-Snooping-Tests überprüft. / This dissertation shows the potential of machine learning methods for managing tail risk in a non-stationary and high-dimensional setting. For this, we compare in a robust manner data-dependent approaches from parametric or non-parametric statistics with data-adaptive methods. As these methods need to be reproducible to ensure trust and transparency, we start by proposing a new platform called Quantinar, which aims to set a new standard for academic publications. In the second chapter, we dive into the core subject of this thesis which compares various parametric, local parametric, and non-parametric methods to create a dynamic trading strategy that protects against tail risk in Bitcoin cryptocurrency. In the third chapter, we propose a new portfolio allocation method, called NMFRB, that deals with high dimensions thanks to a dimension reduction technique, convex Non-negative Matrix Factorization. This technique allows us to find latent interpretable portfolios that are diversified out-of-sample. We show in two universes that the proposed method outperforms other classical machine learning-based methods such as Hierarchical Risk Parity (HRP) concerning risk-adjusted returns. We also test the robustness of our results via Monte Carlo simulation. Finally, the last chapter combines our previous approaches to develop a tail-risk protection strategy for portfolios: we extend the NMFRB to tail-risk measures, we address the non-linear relationships between assets during tail events by developing a specific non-linear latent factor model, finally, we develop a dynamic tail risk protection strategy that deals with the non-stationarity of asset returns using classical econometrics models. We show that our strategy is successful at reducing large drawdowns and outperforms other modern tail-risk protection strategies such as the Value-at-Risk-spread strategy. We verify our findings by performing various data snooping tests.
|
187 |
Contribution à la statistique spatiale et l'analyse de données fonctionnelles / Contribution to spatial statistics and functional data analysisAhmed, Mohamed Salem 12 December 2017 (has links)
Ce mémoire de thèse porte sur la statistique inférentielle des données spatiales et/ou fonctionnelles. En effet, nous nous sommes intéressés à l’estimation de paramètres inconnus de certains modèles à partir d’échantillons obtenus par un processus d’échantillonnage aléatoire ou non (stratifié), composés de variables indépendantes ou spatialement dépendantes.La spécificité des méthodes proposées réside dans le fait qu’elles tiennent compte de la nature de l’échantillon étudié (échantillon stratifié ou composé de données spatiales dépendantes).Tout d’abord, nous étudions des données à valeurs dans un espace de dimension infinie ou dites ”données fonctionnelles”. Dans un premier temps, nous étudions les modèles de choix binaires fonctionnels dans un contexte d’échantillonnage par stratification endogène (échantillonnage Cas-Témoin ou échantillonnage basé sur le choix). La spécificité de cette étude réside sur le fait que la méthode proposée prend en considération le schéma d’échantillonnage. Nous décrivons une fonction de vraisemblance conditionnelle sous l’échantillonnage considérée et une stratégie de réduction de dimension afin d’introduire une estimation du modèle par vraisemblance conditionnelle. Nous étudions les propriétés asymptotiques des estimateurs proposées ainsi que leurs applications à des données simulées et réelles. Nous nous sommes ensuite intéressés à un modèle linéaire fonctionnel spatial auto-régressif. La particularité du modèle réside dans la nature fonctionnelle de la variable explicative et la structure de la dépendance spatiale des variables de l’échantillon considéré. La procédure d’estimation que nous proposons consiste à réduire la dimension infinie de la variable explicative fonctionnelle et à maximiser une quasi-vraisemblance associée au modèle. Nous établissons la consistance, la normalité asymptotique et les performances numériques des estimateurs proposés.Dans la deuxième partie du mémoire, nous abordons des problèmes de régression et prédiction de variables dépendantes à valeurs réelles. Nous commençons par généraliser la méthode de k-plus proches voisins (k-nearest neighbors; k-NN) afin de prédire un processus spatial en des sites non-observés, en présence de co-variables spatiaux. La spécificité du prédicteur proposé est qu’il tient compte d’une hétérogénéité au niveau de la co-variable utilisée. Nous établissons la convergence presque complète avec vitesse du prédicteur et donnons des résultats numériques à l’aide de données simulées et environnementales.Nous généralisons ensuite le modèle probit partiellement linéaire pour données indépendantes à des données spatiales. Nous utilisons un processus spatial linéaire pour modéliser les perturbations du processus considéré, permettant ainsi plus de flexibilité et d’englober plusieurs types de dépendances spatiales. Nous proposons une approche d’estimation semi paramétrique basée sur une vraisemblance pondérée et la méthode des moments généralisées et en étudions les propriétés asymptotiques et performances numériques. Une étude sur la détection des facteurs de risque de cancer VADS (voies aéro-digestives supérieures)dans la région Nord de France à l’aide de modèles spatiaux à choix binaire termine notre contribution. / This thesis is about statistical inference for spatial and/or functional data. Indeed, weare interested in estimation of unknown parameters of some models from random or nonrandom(stratified) samples composed of independent or spatially dependent variables.The specificity of the proposed methods lies in the fact that they take into considerationthe considered sample nature (stratified or spatial sample).We begin by studying data valued in a space of infinite dimension or so-called ”functionaldata”. First, we study a functional binary choice model explored in a case-controlor choice-based sample design context. The specificity of this study is that the proposedmethod takes into account the sampling scheme. We describe a conditional likelihoodfunction under the sampling distribution and a reduction of dimension strategy to definea feasible conditional maximum likelihood estimator of the model. Asymptotic propertiesof the proposed estimates as well as their application to simulated and real data are given.Secondly, we explore a functional linear autoregressive spatial model whose particularityis on the functional nature of the explanatory variable and the structure of the spatialdependence. The estimation procedure consists of reducing the infinite dimension of thefunctional variable and maximizing a quasi-likelihood function. We establish the consistencyand asymptotic normality of the estimator. The usefulness of the methodology isillustrated via simulations and an application to some real data.In the second part of the thesis, we address some estimation and prediction problemsof real random spatial variables. We start by generalizing the k-nearest neighbors method,namely k-NN, to predict a spatial process at non-observed locations using some covariates.The specificity of the proposed k-NN predictor lies in the fact that it is flexible and allowsa number of heterogeneity in the covariate. We establish the almost complete convergencewith rates of the spatial predictor whose performance is ensured by an application oversimulated and environmental data. In addition, we generalize the partially linear probitmodel of independent data to the spatial case. We use a linear process for disturbancesallowing various spatial dependencies and propose a semiparametric estimation approachbased on weighted likelihood and generalized method of moments methods. We establishthe consistency and asymptotic distribution of the proposed estimators and investigate thefinite sample performance of the estimators on simulated data. We end by an applicationof spatial binary choice models to identify UADT (Upper aerodigestive tract) cancer riskfactors in the north region of France which displays the highest rates of such cancerincidence and mortality of the country.
|
Page generated in 0.1004 seconds