Global ETD Search

11	Algorithmes stochastiques pour la statistique robuste en grande dimension / Stochastic algorithms for robust statistics in high dimension Godichon-Baggioni, Antoine 17 June 2016 (has links) Cette thèse porte sur l'étude d'algorithmes stochastiques en grande dimension ainsi qu'à leur application en statistique robuste. Dans la suite, l'expression grande dimension pourra aussi bien signifier que la taille des échantillons étudiés est grande ou encore que les variables considérées sont à valeurs dans des espaces de grande dimension (pas nécessairement finie). Afin d'analyser ce type de données, il peut être avantageux de considérer des algorithmes qui soient rapides, qui ne nécessitent pas de stocker toutes les données, et qui permettent de mettre à jour facilement les estimations. Dans de grandes masses de données en grande dimension, la détection automatique de points atypiques est souvent délicate. Cependant, ces points, même s'ils sont peu nombreux, peuvent fortement perturber des indicateurs simples tels que la moyenne ou la covariance. On va se concentrer sur des estimateurs robustes, qui ne sont pas trop sensibles aux données atypiques. Dans une première partie, on s'intéresse à l'estimation récursive de la médiane géométrique, un indicateur de position robuste, et qui peut donc être préférée à la moyenne lorsqu'une partie des données étudiées est contaminée. Pour cela, on introduit un algorithme de Robbins-Monro ainsi que sa version moyennée, avant de construire des boules de confiance non asymptotiques et d'exhiber leurs vitesses de convergence $L^{p}$ et presque sûre.La deuxième partie traite de l'estimation de la "Median Covariation Matrix" (MCM), qui est un indicateur de dispersion robuste lié à la médiane, et qui, si la variable étudiée suit une loi symétrique, a les mêmes sous-espaces propres que la matrice de variance-covariance. Ces dernières propriétés rendent l'étude de la MCM particulièrement intéressante pour l'Analyse en Composantes Principales Robuste. On va donc introduire un algorithme itératif qui permet d'estimer simultanément la médiane géométrique et la MCM ainsi que les $q$ principaux vecteurs propres de cette dernière. On donne, dans un premier temps, la forte consistance des estimateurs de la MCM avant d'exhiber les vitesses de convergence en moyenne quadratique.Dans une troisième partie, en s'inspirant du travail effectué sur les estimateurs de la médiane et de la "Median Covariation Matrix", on exhibe les vitesses de convergence presque sûre et $L^{p}$ des algorithmes de gradient stochastiques et de leur version moyennée dans des espaces de Hilbert, avec des hypothèses moins restrictives que celles présentes dans la littérature. On présente alors deux applications en statistique robuste: estimation de quantiles géométriques et régression logistique robuste.Dans la dernière partie, on cherche à ajuster une sphère sur un nuage de points répartis autour d'une sphère complète où tronquée. Plus précisément, on considère une variable aléatoire ayant une distribution sphérique tronquée, et on cherche à estimer son centre ainsi que son rayon. Pour ce faire, on introduit un algorithme de gradient stochastique projeté et son moyenné. Sous des hypothèses raisonnables, on établit leurs vitesses de convergence en moyenne quadratique ainsi que la normalité asymptotique de l'algorithme moyenné. / This thesis focus on stochastic algorithms in high dimension as well as their application in robust statistics. In what follows, the expression high dimension may be used when the the size of the studied sample is large or when the variables we consider take values in high dimensional spaces (not necessarily finite). In order to analyze these kind of data, it can be interesting to consider algorithms which are fast, which do not need to store all the data, and which allow to update easily the estimates. In large sample of high dimensional data, outliers detection is often complicated. Nevertheless, these outliers, even if they are not many, can strongly disturb simple indicators like the mean and the covariance. We will focus on robust estimates, which are not too much sensitive to outliers.In a first part, we are interested in the recursive estimation of the geometric median, which is a robust indicator of location which can so be preferred to the mean when a part of the studied data is contaminated. For this purpose, we introduce a Robbins-Monro algorithm as well as its averaged version, before building non asymptotic confidence balls for these estimates, and exhibiting their $L^{p}$ and almost sure rates of convergence.In a second part, we focus on the estimation of the Median Covariation Matrix (MCM), which is a robust dispersion indicator linked to the geometric median. Furthermore, if the studied variable has a symmetric law, this indicator has the same eigenvectors as the covariance matrix. This last property represent a real interest to study the MCM, especially for Robust Principal Component Analysis. We so introduce a recursive algorithm which enables us to estimate simultaneously the geometric median, the MCM, and its $q$ main eigenvectors. We give, in a first time, the strong consistency of the estimators of the MCM, before exhibiting their rates of convergence in quadratic mean.In a third part, in the light of the work on the estimates of the median and of the Median Covariation Matrix, we exhibit the almost sure and $L^{p}$ rates of convergence of averaged stochastic gradient algorithms in Hilbert spaces, with less restrictive assumptions than in the literature. Then, two applications in robust statistics are given: estimation of the geometric quantiles and application in robust logistic regression.In the last part, we aim to fit a sphere on a noisy points cloud spread around a complete or truncated sphere. More precisely, we consider a random variable with a truncated spherical distribution, and we want to estimate its center as well as its radius. In this aim, we introduce a projected stochastic gradient algorithm and its averaged version. We establish the strong consistency of these estimators as well as their rates of convergence in quadratic mean. Finally, the asymptotic normality of the averaged algorithm is given. Grande Dimension Données Fonctionnelles Algorithmes Stochastiques Algorithmes Récursifs Algorithmes de Gradient Stochastiques Moyennisation Statistique Robuste Médiane Géométrique High Dimension Functional Data Stochastic Algorithms Recursive Algorithms Stochastic Gradient Algorithms Averaging Robust Statistics Geometric Median 519
12	Development Of Deterministic And Stochastic Algorithms For Inverse Problems Of Optical Tomography Gupta, Saurabh 07 1900 (has links) (PDF) Stable and computationally efficient reconstruction methodologies are developed to solve two important medical imaging problems which use near-infrared (NIR) light as the source of interrogation, namely, diffuse optical tomography (DOT) and one of its variations, ultrasound-modulated optical tomography (UMOT). Since in both these imaging modalities the system matrices are ill-conditioned owing to insufficient and noisy data, the emphasis in this work is to develop robust stochastic filtering algorithms which can handle measurement noise and also account for inaccuracies in forward models through an appropriate assignment of a process noise. However, we start with demonstration of speeding of a Gauss-Newton (GN) algorithm for DOT so that a video-rate reconstruction from data recorded on a CCD camera is rendered feasible. Towards this, a computationally efficient linear iterative scheme is proposed to invert the normal equation of a Gauss-Newton scheme in the context of recovery of absorption coefficient distribution from DOT data, which involved the singular value decomposition (SVD) of the Jacobian matrix appearing in the update equation. This has sufficiently speeded up the inversion that a video rate recovery of time evolving absorption coefficient distribution is demonstrated from experimental data. The SVD-based algorithm has made the number of operations in image reconstruction to be rather than. 2()ONN3()ONN The rest of the algorithms are based on different forms of stochastic filtering wherein we arrive at a mean-square estimate of the parameters through computing their joint probability distributions conditioned on the measurement up to the current instant. Under this, the first algorithm developed uses a Bootstrap particle filter which also uses a quasi-Newton direction within. Since keeping track of the Newton direction necessitates repetitive computation of the Jacobian, for all particle locations and for all time steps, to make the recovery computationally feasible, we devised a faster update of the Jacobian. It is demonstrated, through analytical reasoning and numerical simulations, that the proposed scheme, not only accelerates convergence but also yields substantially reduced sample variance in the estimates vis-à-vis the conventional BS filter. Both accelerated convergence and reduced sample variance in the estimates are demonstrated in DOT optical parameter recovery using simulated and experimental data. In the next demonstration a derivative free variant of the pseudo-dynamic ensemble Kalman filter (PD-EnKF) is developed for DOT wherein the size of the unknown parameter is reduced by representing of the inhomogeneities through simple geometrical shapes. Also the optical parameter fields within the inhomogeneities are approximated via an expansion based on the circular harmonics (CH) (Fourier basis functions). The EnKF is then used to recover the coefficients in the expansion with both simulated and experimentally obtained photon fluence data on phantoms with inhomogeneous inclusions. The process and measurement equations in the Pseudo-Dynamic EnKF (PD-EnKF) presently yield a parsimonious representation of the filter variables, which consist of only the Fourier coefficients and the constant scalar parameter value within the inclusion. Using fictitious, low-intensity Wiener noise processes in suitably constructed ‘measurement’ equations, the filter variables are treated as pseudo-stochastic processes so that their recovery within a stochastic filtering framework is made possible. In our numerical simulations we have considered both elliptical inclusions (two inhomogeneities) and those with more complex shapes ( such as an annular ring and a dumbbell) in 2-D objects which are cross-sections of a cylinder with background absorption and (reduced) scattering coefficient chosen as = 0.01 mm-1 and = 1.0 mm-1respectively. We also assume=0.02 mm-1 within the inhomogeneity (for the single inhomogeneity case) and=0.02 and 0.03 mm-1 (for the two inhomogeneities case). The reconstruction results by the PD-EnKF are shown to be consistently superior to those through a deterministic and explicitly regularized Gauss-Newton algorithm. We have also estimated the unknown from experimentally gathered fluence data and verified the reconstruction by matching the experimental data with the computed one. The superiority of a modified version of the PD-EnKF, which uses an ensemble square root filter, is also demonstrated in the context of UMOT by recovering the distribution of mean-squared amplitude of vibration, related to the Young’s modulus, in the ultrasound focal volume. Since the ability of a coherent light probe to pick-up the overall optical path-length change is limited to modulo an optical wavelength, the individual displacements suffered owing to the US forcing should be very small, say within a few angstroms. The sensitivity of modulation depth to changes in these small displacements could be very small, especially when the ROI is far removed from the source and detector. The contrast recovery of the unknown distribution in such cases could be seriously impaired whilst using a quasi-Newton scheme (e.g. the GN scheme) which crucially makes use of the derivative information. The derivative-free gain-based Monte Carlo filter not only remedies this deficiency, but also provides a regularization insensitive and computationally competitive alternative to the GN scheme. The inherent ability of a stochastic filter in accommodating the model error owing to a diffusion approximation of the correlation transport may be cited as an added advantage in the context of the UMOT inverse problem. Finally to speed up forward solve of the partial differential equation (PDE) modeling photon transport in the context of UMOT for which the PDE has time as a parameter, a spectral decomposition of the PDE operator is demonstrated. This allows the computation of the time dependent forward solution in terms of the eigen functions of the PDE operator which has speeded up the forward solution, which in turn has rendered the UMOT parameter recovery computationally efficient. Optical Tompgraphy Diffuse Optical Tomography (DOT) Inverse Problems Gauss-Newton Algorithm Pseuo-Dynamic Ensemble Kalman Filter Stochastic Algorithms Stochastic Filtering Algorithms Medical Imaging Medical Instrumentation
13	Algorithmes de poursuite stochastiques et inégalités de concentration empiriques pour l'apprentissage statistique / Stochastic pursuit algorithms and empirical concentration inequalities for machine learning Peel, Thomas 29 November 2013 (has links) La première partie de cette thèse introduit de nouveaux algorithmes de décomposition parcimonieuse de signaux. Basés sur Matching Pursuit (MP) ils répondent au problème suivant : comment réduire le temps de calcul de l'étape de sélection de MP, souvent très coûteuse. En réponse, nous sous-échantillonnons le dictionnaire à chaque itération, en lignes et en colonnes. Nous montrons que cette approche fondée théoriquement affiche de bons résultats en pratique. Nous proposons ensuite un algorithme itératif de descente de gradient par blocs de coordonnées pour sélectionner des caractéristiques en classification multi-classes. Celui-ci s'appuie sur l'utilisation de codes correcteurs d'erreurs transformant le problème en un problème de représentation parcimonieuse simultanée de signaux. La deuxième partie expose de nouvelles inégalités de concentration empiriques de type Bernstein. En premier, elles concernent la théorie des U-statistiques et sont utilisées pour élaborer des bornes en généralisation dans le cadre d'algorithmes de ranking. Ces bornes tirent parti d'un estimateur de variance pour lequel nous proposons un algorithme de calcul efficace. Ensuite, nous présentons une version empirique de l'inégalité de type Bernstein proposée par Freedman [1975] pour les martingales. Ici encore, la force de notre borne réside dans l'introduction d'un estimateur de variance calculable à partir des données. Cela nous permet de proposer des bornes en généralisation pour l'ensemble des algorithmes d'apprentissage en ligne améliorant l'état de l'art et ouvrant la porte à une nouvelle famille d'algorithmes d'apprentissage tirant parti de cette information empirique. / The first part of this thesis introduces new algorithms for the sparse encoding of signals. Based on Matching Pursuit (MP) they focus on the following problem : how to reduce the computation time of the selection step of MP. As an answer, we sub-sample the dictionary in line and column at each iteration. We show that this theoretically grounded approach has good empirical performances. We then propose a bloc coordinate gradient descent algorithm for feature selection problems in the multiclass classification setting. Thanks to the use of error-correcting output codes, this task can be seen as a simultaneous sparse encoding of signals problem. The second part exposes new empirical Bernstein inequalities. Firstly, they concern the theory of the U-Statistics and are applied in order to design generalization bounds for ranking algorithms. These bounds take advantage of a variance estimator and we propose an efficient algorithm to compute it. Then, we present an empirical version of the Bernstein type inequality for martingales by Freedman [1975]. Again, the strength of our result lies in the variance estimator computable from the data. This allows us to propose generalization bounds for online learning algorithms which improve the state of the art and pave the way to a new family of learning algorithms taking advantage of this empirical information. Matching Pursuit Algorithmes Stochastiques Sélection de Caractéristiques Classification Multi-Classes Inégalités de Bernstein Empiriques U-Statistiques Martingales Ranking Apprentissage en Ligne Bornes d'Erreur en Généralisation Matching Pursuit Stochastic Algorithms Feature Selection Multiclass Classification Empirical Bernstein Inequalities U-Statistics Martingales Ranking Online Learning Generalization Bounds
14	Využití distribuovaných a stochastických algoritmů v síti / Application of distributed and stochastic algorithms in network. Yarmolskyy, Oleksandr January 2018 (has links) This thesis deals with the distributed and stochastic algorithms including testing their convergence in networks. The theoretical part briefly describes above mentioned algorithms, including their division, problems, advantages and disadvantages. Furthermore, two distributed algorithms and two stochastic algorithms are chosen. The practical part is done by comparing the speed of convergence on various network topologies in Matlab.
15	Využití distribuovaných a stochastických algoritmů v síti / Application of distributed and stochastic algorithms in network. Yarmolskyy, Oleksandr January 2018 (has links) This thesis deals with the distributed and stochastic algorithms, including testing their convergence in networks. The theoretical part briefly describes above mentioned algorithms, including their division, problems, advantages and disadvantages. Futhermore, two distributed algorithms and two stochastic algorithms are chosen. The practical part is done by comparing the speed of convergence on various network topologies in MATLAB.
16	Aide à la décision médicale et télémédecine dans le suivi de l’insuffisance cardiaque / Medical decision support and telemedecine in the monitoring of heart failure Duarte, Kevin 10 December 2018 (has links) Cette thèse s’inscrit dans le cadre du projet "Prendre votre cœur en mains" visant à développer un dispositif médical d’aide à la prescription médicamenteuse pour les insuffisants cardiaques. Dans une première partie, une étude a été menée afin de mettre en évidence la valeur pronostique d’une estimation du volume plasmatique ou de ses variations pour la prédiction des événements cardiovasculaires majeurs à court terme. Deux règles de classification ont été utilisées, la régression logistique et l’analyse discriminante linéaire, chacune précédée d’une phase de sélection pas à pas des variables. Trois indices permettant de mesurer l’amélioration de la capacité de discrimination par ajout du biomarqueur d’intérêt ont été utilisés. Dans une seconde partie, afin d’identifier les patients à risque de décéder ou d’être hospitalisé pour progression de l’insuffisance cardiaque à court terme, un score d’événement a été construit par une méthode d’ensemble, en utilisant deux règles de classification, la régression logistique et l’analyse discriminante linéaire de données mixtes, des échantillons bootstrap et en sélectionnant aléatoirement les prédicteurs. Nous définissons une mesure du risque d’événement par un odds-ratio et une mesure de l’importance des variables et des groupes de variables. Nous montrons une propriété de l’analyse discriminante linéaire de données mixtes. Cette méthode peut être mise en œuvre dans le cadre de l’apprentissage en ligne, en utilisant des algorithmes de gradient stochastique pour mettre à jour en ligne les prédicteurs. Nous traitons le problème de la régression linéaire multidimensionnelle séquentielle, en particulier dans le cas d’un flux de données, en utilisant un processus d’approximation stochastique. Pour éviter le phénomène d’explosion numérique et réduire le temps de calcul pour prendre en compte un maximum de données entrantes, nous proposons d’utiliser un processus avec des données standardisées en ligne au lieu des données brutes et d’utiliser plusieurs observations à chaque étape ou toutes les observations jusqu’à l’étape courante sans avoir à les stocker. Nous définissons trois processus et en étudions la convergence presque sûre, un avec un pas variable, un processus moyennisé avec un pas constant, un processus avec un pas constant ou variable et l’utilisation de toutes les observations jusqu’à l’étape courante. Ces processus sont comparés à des processus classiques sur 11 jeux de données. Le troisième processus à pas constant est celui qui donne généralement les meilleurs résultats / This thesis is part of the "Handle your heart" project aimed at developing a drug prescription assistance device for heart failure patients. In a first part, a study was conducted to highlight the prognostic value of an estimation of plasma volume or its variations for predicting major short-term cardiovascular events. Two classification rules were used, logistic regression and linear discriminant analysis, each preceded by a stepwise variable selection. Three indices to measure the improvement in discrimination ability by adding the biomarker of interest were used. In a second part, in order to identify patients at short-term risk of dying or being hospitalized for progression of heart failure, a short-term event risk score was constructed by an ensemble method, two classification rules, logistic regression and linear discriminant analysis of mixed data, bootstrap samples, and by randomly selecting predictors. We define an event risk measure by an odds-ratio and a measure of the importance of variables and groups of variables using standardized coefficients. We show a property of linear discriminant analysis of mixed data. This methodology for constructing a risk score can be implemented as part of online learning, using stochastic gradient algorithms to update online the predictors. We address the problem of sequential multidimensional linear regression, particularly in the case of a data stream, using a stochastic approximation process. To avoid the phenomenon of numerical explosion which can be encountered and to reduce the computing time in order to take into account a maximum of arriving data, we propose to use a process with online standardized data instead of raw data and to use of several observations per step or all observations until the current step. We define three processes and study their almost sure convergence, one with a variable step-size, an averaged process with a constant step-size, a process with a constant or variable step-size and the use of all observations until the current step without storing them. These processes are compared to classical processes on 11 datasets. The third defined process with constant step-size typically yields the best results Intelligence artificielle Insuffisance cardiaque Analyse discriminante Classification supervisée Sélection de variables Score d’événement Prédicteur d’ensemble Données massives Algorithmes stochastiques Apprentissage en ligne Artifical intelligence Heart failure Discriminant analysis Supervised classification Variable selection Event score Ensemble predictor Big data Stochastic algorithms Online learning 519.535 610.151 95

Page generated in 0.0487 seconds