Global ETD Search

1	Simulations and applications of large-scale k-determinantal point processes / Simulations et applications des k-processus ponctuels déterminantaux Wehbe, Diala 03 April 2019 (has links) Avec la croissance exponentielle de la quantité de données, l’échantillonnage est une méthode pertinente pour étudier les populations. Parfois, nous avons besoin d’échantillonner un grand nombre d’objets d’une part pour exclure la possibilité d’un manque d’informations clés et d’autre part pour générer des résultats plus précis. Le problème réside dans le fait que l’échantillonnage d’un trop grand nombre d’individus peut constituer une perte de temps.Dans cette thèse, notre objectif est de chercher à établir des ponts entre la statistique et le k-processus ponctuel déterminantal(k-DPP) qui est défini via un noyau. Nous proposons trois projets complémentaires pour l’échantillonnage de grands ensembles de données en nous basant sur les k-DPPs. Le but est de sélectionner des ensembles variés qui couvrent un ensemble d’objets beaucoup plus grand en temps polynomial. Cela peut être réalisé en construisant différentes chaînes de Markov où les k-DPPs sont les lois stationnaires.Le premier projet consiste à appliquer les processus déterminantaux à la sélection d’espèces diverses dans un ensemble d’espèces décrites par un arbre phylogénétique. En définissant le noyau du k-DPP comme un noyau d’intersection, les résultats fournissent une borne polynomiale sur le temps de mélange qui dépend de la hauteur de l’arbre phylogénétique.Le second projet vise à utiliser le k-DPP dans un problème d’échantillonnage de sommets sur un graphe connecté de grande taille. La pseudo-inverse de la matrice Laplacienne normalisée est choisie d’étudier la vitesse de convergence de la chaîne de Markov créée pour l’échantillonnage de la loi stationnaire k-DPP. Le temps de mélange résultant est borné sous certaines conditions sur les valeurs propres de la matrice Laplacienne.Le troisième sujet porte sur l’utilisation des k-DPPs dans la planification d’expérience avec comme objets d’étude plus spécifiques les hypercubes latins d’ordre n et de dimension d. La clé est de trouver un noyau positif qui préserve le contrainte de ce plan c’est-à-dire qui préserve le fait que chaque point se trouve exactement une fois dans chaque hyperplan. Ensuite, en créant une nouvelle chaîne de Markov dont le n-DPP est sa loi stationnaire, nous déterminons le nombre d’étapes nécessaires pour construire un hypercube latin d’ordre n selon le n-DPP. / With the exponentially growing amount of data, sampling remains the most relevant method to learn about populations. Sometimes, larger sample size is needed to generate more precise results and to exclude the possibility of missing key information. The problem lies in the fact that sampling large number may be a principal reason of wasting time.In this thesis, our aim is to build bridges between applications of statistics and k-Determinantal Point Process(k-DPP) which is defined through a matrix kernel. We have proposed different applications for sampling large data sets basing on k-DPP, which is a conditional DPP that models only sets of cardinality k. The goal is to select diverse sets that cover a much greater set of objects in polynomial time. This can be achieved by constructing different Markov chains which have the k-DPPs as their stationary distribution.The first application consists in sampling a subset of species in a phylogenetic tree by avoiding redundancy. By defining the k-DPP via an intersection kernel, the results provide a fast mixing sampler for k-DPP, for which a polynomial bound on the mixing time is presented and depends on the height of the phylogenetic tree.The second application aims to clarify how k-DPPs offer a powerful approach to find a diverse subset of nodes in large connected graph which authorizes getting an outline of different types of information related to the ground set. A polynomial bound on the mixing time of the proposed Markov chain is given where the kernel used here is the Moore-Penrose pseudo-inverse of the normalized Laplacian matrix. The resulting mixing time is attained under certain conditions on the eigenvalues of the Laplacian matrix. The third one purposes to use the fixed cardinality DPP in experimental designs as a tool to study a Latin Hypercube Sampling(LHS) of order n. The key is to propose a DPP kernel that establishes the negative correlations between the selected points and preserve the constraint of the design which is strictly confirmed by the occurrence of each point exactly once in each hyperplane. Then by creating a new Markov chain which has n-DPP as its stationary distribution, we determine the number of steps required to build a LHS with accordance to n-DPP. Algorithme Métropolis-Hastings Matrice laplacienne Échantillonnage par hypercube latin 519.233
2	Plans prédictifs à taille fixe et séquentiels pour le krigeage / Fixed-size and sequential designs for kriging Abtini, Mona 30 August 2018 (has links) La simulation numérique est devenue une alternative à l’expérimentation réelle pour étudier des phénomènes physiques. Cependant, les phénomènes complexes requièrent en général un nombre important de simulations, chaque simulation étant très coûteuse en temps de calcul. Une approche basée sur la théorie des plans d’expériences est souvent utilisée en vue de réduire ce coût de calcul. Elle consiste à partir d’un nombre réduit de simulations, organisées selon un plan d’expériences numériques, à construire un modèle d’approximation souvent appelé métamodèle, alors beaucoup plus rapide à évaluer que le code lui-même. Traditionnellement, les plans utilisés sont des plans de type Space-Filling Design (SFD). La première partie de la thèse concerne la construction de plans d’expériences SFD à taille fixe adaptés à l’identification d’un modèle de krigeage car le krigeage est un des métamodèles les plus populaires. Nous étudions l’impact de la contrainte Hypercube Latin (qui est le type de plans les plus utilisés en pratique avec le modèle de krigeage) sur des plans maximin-optimaux. Nous montrons que cette contrainte largement utilisée en pratique est bénéfique quand le nombre de points est peu élevé car elle atténue les défauts de la configuration maximin-optimal (majorité des points du plan aux bords du domaine). Un critère d’uniformité appelé discrépance radiale est proposé dans le but d’étudier l’uniformité des points selon leur position par rapport aux bords du domaine. Ensuite, nous introduisons un proxy pour le plan minimax-optimal qui est le plan le plus proche du plan IMSE (plan adapté à la prédiction par krigeage) et qui est coûteux en temps de calcul, ce proxy est basé sur les plans maximin-optimaux. Enfin, nous présentons une procédure bien réglée de l’optimisation par recuit simulé pour trouver les plans maximin-optimaux. Il s’agit ici de réduire au plus la probabilité de tomber dans un optimum local. La deuxième partie de la thèse porte sur un problème légèrement différent. Si un plan est construit de sorte à être SFD pour N points, il n’y a aucune garantie qu’un sous-plan à n points (n 6 N) soit SFD. Or en pratique le plan peut être arrêté avant sa réalisation complète. La deuxième partie est donc dédiée au développement de méthodes de planification séquentielle pour bâtir un ensemble d’expériences de type SFD pour tout n compris entre 1 et N qui soient toutes adaptées à la prédiction par krigeage. Nous proposons une méthode pour générer des plans séquentiellement ou encore emboités (l’un est inclus dans l’autre) basée sur des critères d’information, notamment le critère d’Information Mutuelle qui mesure la réduction de l’incertitude de la prédiction en tout point du domaine entre avant et après l’observation de la réponse aux points du plan. Cette approche assure la qualité des plans obtenus pour toutes les valeurs de n, 1 6 n 6 N. La difficulté est le calcul du critère et notamment la génération de plans en grande dimension. Pour pallier ce problème une solution a été présentée. Cette solution propose une implémentation astucieuse de la méthode basée sur le découpage par blocs des matrices de covariances ce qui la rend numériquement efficace. / In recent years, computer simulation models are increasingly used to study complex phenomena. Such problems usually rely on very large sophisticated simulation codes that are very expensive in computing time. The exploitation of these codes becomes a problem, especially when the objective requires a significant number of evaluations of the code. In practice, the code is replaced by global approximation models, often called metamodels, most commonly a Gaussian Process (kriging) adjusted to a design of experiments, i.e. on observations of the model output obtained on a small number of simulations. Space-Filling-Designs which have the design points evenly spread over the entire feasible input region, are the most used designs. This thesis consists of two parts. The main focus of both parts is on construction of designs of experiments that are adapted to kriging, which is one of the most popular metamodels. Part I considers the construction of space-fillingdesigns of fixed size which are adapted to kriging prediction. This part was started by studying the effect of Latin Hypercube constraint (the most used design in practice with the kriging) on maximin-optimal designs. This study shows that when the design has a small number of points, the addition of the Latin Hypercube constraint will be useful because it mitigates the drawbacks of maximin-optimal configurations (the position of the majority of points at the boundary of the input space). Following this study, an uniformity criterion called Radial discrepancy has been proposed in order to measure the uniformity of the points of the design according to their distance to the boundary of the input space. Then we show that the minimax-optimal design is the closest design to IMSE design (design which is adapted to prediction by kriging) but is also very difficult to evaluate. We then introduce a proxy for the minimax-optimal design based on the maximin-optimal design. Finally, we present an optimised implementation of the simulated annealing algorithm in order to find maximin-optimal designs. Our aim here is to minimize the probability of falling in a local minimum configuration of the simulated annealing. The second part of the thesis concerns a slightly different problem. If XN is space-filling-design of N points, there is no guarantee that any n points of XN (1 6 n 6 N) constitute a space-filling-design. In practice, however, we may have to stop the simulations before the full realization of design. The aim of this part is therefore to propose a new methodology to construct sequential of space-filling-designs (nested designs) of experiments Xn for any n between 1 and N that are all adapted to kriging prediction. We introduce a method to generate nested designs based on information criteria, particularly the Mutual Information criterion. This method ensures a good quality forall the designs generated, 1 6 n 6 N. A key difficulty of this method is that the time needed to generate a MI-sequential design in the highdimension case is very larg. To address this issue a particular implementation, which calculates the determinant of a given matrix by partitioning it into blocks. This implementation allows a significant reduction of the computational cost of MI-sequential designs, has been proposed. Krigeage Plan d’expériences Plan d’expériences séquentiel Hypercube Latin Maximin Minimax IMSE Proxy Information Mutuelle Entropie Recuit simulé Discrépance Kriging Design of Experiments (DoE) Sequential design Latin Hypercube Maximin Minimax IMSE Mutual Information Entropy Simulated annealing Discrepancy

Search results

Simulations and applications of large-scale k-determinantal point processes / Simulations et applications des k-processus ponctuels déterminantaux

Plans prédictifs à taille fixe et séquentiels pour le krigeage / Fixed-size and sequential designs for kriging