Global ETD Search

351	Multivariate analysis of high-throughput sequencing data / Analyses multivariées de données de séquençage à haut débit Durif, Ghislain 13 December 2016 (has links) L'analyse statistique de données de séquençage à haut débit (NGS) pose des questions computationnelles concernant la modélisation et l'inférence, en particulier à cause de la grande dimension des données. Le travail de recherche dans ce manuscrit porte sur des méthodes de réductions de dimension hybrides, basées sur des approches de compression (représentation dans un espace de faible dimension) et de sélection de variables. Des développements sont menés concernant la régression "Partial Least Squares" parcimonieuse (supervisée) et les méthodes de factorisation parcimonieuse de matrices (non supervisée). Dans les deux cas, notre objectif sera la reconstruction et la visualisation des données. Nous présenterons une nouvelle approche de type PLS parcimonieuse, basée sur une pénalité adaptative, pour la régression logistique. Cette approche sera utilisée pour des problèmes de prédiction (devenir de patients ou type cellulaire) à partir de l'expression des gènes. La principale problématique sera de prendre en compte la réponse pour écarter les variables non pertinentes. Nous mettrons en avant le lien entre la construction des algorithmes et la fiabilité des résultats.Dans une seconde partie, motivés par des questions relatives à l'analyse de données "single-cell", nous proposons une approche probabiliste pour la factorisation de matrices de comptage, laquelle prend en compte la sur-dispersion et l'amplification des zéros (caractéristiques des données single-cell). Nous développerons une procédure d'estimation basée sur l'inférence variationnelle. Nous introduirons également une procédure de sélection de variables probabiliste basée sur un modèle "spike-and-slab". L'intérêt de notre méthode pour la reconstruction, la visualisation et le clustering de données sera illustré par des simulations et par des résultats préliminaires concernant une analyse de données "single-cell". Toutes les méthodes proposées sont implémentées dans deux packages R: plsgenomics et CMF / The statistical analysis of Next-Generation Sequencing data raises many computational challenges regarding modeling and inference, especially because of the high dimensionality of genomic data. The research work in this manuscript concerns hybrid dimension reduction methods that rely on both compression (representation of the data into a lower dimensional space) and variable selection. Developments are made concerning: the sparse Partial Least Squares (PLS) regression framework for supervised classification, and the sparse matrix factorization framework for unsupervised exploration. In both situations, our main purpose will be to focus on the reconstruction and visualization of the data. First, we will present a new sparse PLS approach, based on an adaptive sparsity-inducing penalty, that is suitable for logistic regression to predict the label of a discrete outcome. For instance, such a method will be used for prediction (fate of patients or specific type of unidentified single cells) based on gene expression profiles. The main issue in such framework is to account for the response to discard irrelevant variables. We will highlight the direct link between the derivation of the algorithms and the reliability of the results. Then, motivated by questions regarding single-cell data analysis, we propose a flexible model-based approach for the factorization of count matrices, that accounts for over-dispersion as well as zero-inflation (both characteristic of single-cell data), for which we derive an estimation procedure based on variational inference. In this scheme, we consider probabilistic variable selection based on a spike-and-slab model suitable for count data. The interest of our procedure for data reconstruction, visualization and clustering will be illustrated by simulation experiments and by preliminary results on single-cell data analysis. All proposed methods were implemented into two R-packages "plsgenomics" and "CMF" based on high performance computing Statistiques computationnelles Données en grande dimension Réduction de dimension Compression Sélection de Variables Régression logistique Partial Least Squares parcimonieuse Factorisation probabiliste de matrices Computational Statistics High-dimensional data Dimension reduction Compression Variable selection Logistic regression Sparse Partial Least Squares Probabilistic matrix factorization 570.15
352	Modelling and experimental analysis of frequency dependent MIMO channels García Ariza, Alexis Paolo 04 December 2009 (has links) La integración de tecnologías de ulta-wideband, radio-cognitiva y MIMO representa una herramienta podersoa para mejorar la eficiencia espectral de los sistemas de comunicación inalámbricos. En esta dirección, nuevas estrategias para el modelado de canales MIMO y su caracterización se hacen necesarias si se desea investigar cómo la frecuencia central y el acho de banda afectan el desempeño de los sistemas MIMO. Investigaciones preliminares han enfocado menos atención en cómo estos parámetros afectan las características del canal MIMO. Se presenta una caracterización del canal MIMO en función de la frecuencia, abondándose puntos de vista experimentales y teóricos. Los problemas indicados tratan cinco áreas principales: medidas, post-procesado de datos, generación sintética del canal, estadística multivariable para datos y modelado del canal. Se ha diseñado y validado un sistema de medida basado en un analizador vectorial de redes y se han ejecutado medidas entre 2 y 12 GHz en condiciones estáticas, tanto en línea de vista como no línea de vista. Se ha propuesto y validado un procedimiento confiable para post-procesado, generación sintética de canal y análisis experimental basado en medidas en el dominio de frecuencia. El procedimiento experimental se ha focalizado en matrices de transferencia del canal para casos no selectivos en frecuencia, estimándose además las matrices complejas de covarianza, aplicándose la factorización de Cholesky sobre ls CCM y obteniéndose finalmente matrices de coloreado del sistema. Se presenta un procedimiento de corrección para generación sintética del canal aplicado a casos MIMO de grandes dimensiones y cuando la CCM es indefinida. Este CP permite la factorización de Cholesky y de dichas CCM. Las características multivariables de los datos experimentales han sido investigadas, realizándose un test de normalidad compleja multivariable. / García Ariza, AP. (2009). Modelling and experimental analysis of frequency dependent MIMO channels [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/6563 / Palancia Wssus Normality tests Mimo Correlated channels Wideband channels Multivariate normality Cholesky factorization Indefinite matrices Homogeneous plane waves Synthetic channels Alternating projection method Perfect mimo channel modelling In-homogeneous plane waves Channel sounding Channel modelling TEORIA DE LA SEÑAL Y COMUNICACIONES
353	Neuronové sítě pro doporučování knih / Deep Book Recommendation Gráca, Martin January 2018 (has links) This thesis deals with the field of recommendation systems using deep neural networks and their use in book recommendation. There are the main traditional recommender systems analysed and their representations are summarized, as well as systems with more advanced techniques based on machine learning. The core of the thesis is to use convolutional neural networks for natural language processing and create a hybrid book recommendation system. Suggested system includes matrix factorization and make recommendation based on user ratings and book metadata, including texts descriptions. I designed two models, one with bag-of-words technique and one with convolutional neural network. Both of them defeat baseline methods. On the created data set, that was created from the Goodreads, model with CNN beats model with BOW.
354	Neuronové sítě pro doporučování knih / Deep Book Recommendation Gráca, Martin January 2018 (has links) This thesis deals with the field of Recommendation systems using Deep Neural Networks and their use in book recommendation. There are the main traditional recommender systems analysed and their representations are summarized, as well as systems with more advancec techniques based on machine learning.. The core of the thesis is the use of convolutional neural networks for natural language processing and the creation of a book recommendation system. Suggested system make recommendation based on user data, including user reviews and book data, including full texts.
355	Méthodes rapides de traitement d’images hyperspectrales. Application à la caractérisation en temps réel du matériau bois / Fast methods for hyperspectral images processing. Application to the real-time characterization of wood material Nus, Ludivine 12 December 2019 (has links) Cette thèse aborde le démélange en-ligne d’images hyperspectrales acquises par un imageur pushbroom, pour la caractérisation en temps réel du matériau bois. La première partie de cette thèse propose un modèle de mélange en-ligne fondé sur la factorisation en matrices non-négatives. À partir de ce modèle, trois algorithmes pour le démélange séquentiel en-ligne, fondés respectivement sur les règles de mise à jour multiplicatives, le gradient optimal de Nesterov et l’optimisation ADMM (Alternating Direction Method of Multipliers) sont développés. Ces algorithmes sont spécialement conçus pour réaliser le démélange en temps réel, au rythme d'acquisition de l'imageur pushbroom. Afin de régulariser le problème d’estimation (généralement mal posé), deux sortes de contraintes sur les endmembers sont utilisées : une contrainte de dispersion minimale ainsi qu’une contrainte de volume minimal. Une méthode pour l’estimation automatique du paramètre de régularisation est également proposée, en reformulant le problème de démélange hyperspectral en-ligne comme un problème d’optimisation bi-objectif. Dans la seconde partie de cette thèse, nous proposons une approche permettant de gérer la variation du nombre de sources, i.e. le rang de la décomposition, au cours du traitement. Les algorithmes en-ligne préalablement développés sont ainsi modifiés, en introduisant une étape d’apprentissage d’une bibliothèque hyperspectrale, ainsi que des pénalités de parcimonie permettant de sélectionner uniquement les sources actives. Enfin, la troisième partie de ces travaux consiste en l’application de nos approches à la détection et à la classification des singularités du matériau bois. / This PhD dissertation addresses the problem of on-line unmixing of hyperspectral images acquired by a pushbroom imaging system, for real-time characterization of wood. The first part of this work proposes an on-line mixing model based on non-negative matrix factorization. Based on this model, three algorithms for on-line sequential unmixing, using multiplicative update rules, the Nesterov optimal gradient and the ADMM optimization (Alternating Direction Method of Multipliers), respectively, are developed. These algorithms are specially designed to perform the unmixing in real time, at the pushbroom imager acquisition rate. In order to regularize the estimation problem (generally ill-posed), two types of constraints on the endmembers are used: a minimum dispersion constraint and a minimum volume constraint. A method for the unsupervised estimation of the regularization parameter is also proposed, by reformulating the on-line hyperspectral unmixing problem as a bi-objective optimization. In the second part of this manuscript, we propose an approach for handling the variation in the number of sources, i.e. the rank of the decomposition, during the processing. Thus, the previously developed on-line algorithms are modified, by introducing a hyperspectral library learning stage as well as sparse constraints allowing to select only the active sources. Finally, the third part of this work consists in the application of these approaches to the detection and the classification of the singularities of wood. Démélange hyperspectral en-ligne Imagerie hyperspectrale pushbroom Factorisation en matrices non-négatives Contrainte de volume minimal Bibliothèque hyperspectrale Suivi du rang On-line hyperspectral unmixing Pushbroom hyperspectral imaging Non-negative matrix factorization Minimal volume constraint Regularization parameter estimation Hyperspectral library Rank tracking 621.367 006.4
356	Méthodes symboliques pour les systèmesdifférentiels linéaires à singularité irrégulière / Symbolic methods for linear differential systems with irregular singularity Saade, Joelle 05 November 2019 (has links) Cette thèse est consacrée aux méthodes symboliques de résolution locale des systèmes différentiels linéaires à coefficients dans K = C((x)), le corps des séries de Laurent, sur un corps effectif C. Plus précisément, nous nous intéressons aux algorithmes effectifs de réduction formelle. Au cours de la réduction, nous sommes amenés à introduire des extensions algébriques du corps de coefficients K (extensions algébriques de C, ramifications de la variable x) afin d’obtenir une structure plus fine. Du point de vue algorithmique, il est préférable de retarder autant que possible l’introduction de ces extensions. Dans ce but, nous développons un nouvel algorithme de réduction formelle qui utilise l’anneau des endomorphismes du système, appelé « eigenring », afin de se ramener au cas d’un système indécomposable sur K. En utilisant la classification formelle donnée par Balser-Jurkat-Lutz, nous déduisons la structure de l’eigenring d’un système indécomposable. Ces résultats théoriques nous permettent de construire une décomposition sur le corps de base K qui sépare les différentes parties exponentielles du système et permet ainsi d’isoler dans des sous-systèmes, indécomposables sur K, les différentes extensions de corps qui peuvent apparaître afin de les traiter séparément. Dans une deuxième partie, nous nous intéressons à l’algorithme de Miyake pour la réduction formelle. Celle-ci est basée sur le calcul du poids et d’une suite de Volevic de la matrice de valuation du système. Nous donnons des interprétations en théorie de graphe et en algèbre tropicale du poids et suites de Volevic, et obtenons ainsi des méthodes de calculs efficaces sur le plan pratique, à l’aide de la programmation linéaire. Ceci complète une étape fondamentale dans l’algorithme de réduction de Miyake. Ces différents algorithmes sont implémentés sous forme de librairies pour le logiciel de calcul formel Maple. Enfin, nous présentons une discussion sur la performance de l’algorithme de réduction avec l’eigenring ainsi qu’une comparaison en terme de temps de calcul entre notre implémentation de l’algorithme de réduction de Miyake par la programmation linéaire et ceux de Barkatou et Pflügel. / This thesis is devoted to symbolic methods for local resolution of linear differential systems with coefficients in K = C((x)), the field of Laurent series, on an effective field C. More specifically, we are interested in effective algorithms for formal reduction. During the reduction, we are led to introduce algebraic extensions of the field of coefficients K (algebraic extensions of C, ramification of the variable x) in order to obtain a finer structure. From an algorithmic point of view, it is preferable to delay as much as possible the introduction of these extensions. To this end, we developed a new algorithm for formal reduction that uses the ring of endomorphisms of the system, called "eigenring". Using the formal classification given by Balser-Jurkat-Lutz, we deduce the structure of the eigenring of an indecomposable system. These theoretical results allow us to construct a decomposition on the base field K that separates the different exponential parts of the system and thus allows us to isolate, in indecomposable subsystems in K, the different algebraic extensions that can appear in order to treat them separately. In a second part, we are interested in Miyake’s algorithm for formal reduction. This algorithm is based on the computation of the Volevic weight and numbers of the valuation matrix of the system. We provide interpretations in graph theory and tropical algebra of the Volevic weight and numbers, and thus obtain practically efficient methods using linear programming. This completes a fundamental step in the Miyake reduction algorithm. These different algorithms are implemented as libraries for the computer algebra software Maple. Finally, we present a discussion on the performance of the reduction algorithm using the eigenring as well as a comparison in terms of timing between our implementation of Miyake’s reduction algorithm by linear programming and the algorithms of Barkatou and Pflügel. Calcul formel Algorithmes Système différentiel Singularités Réduction formelle Solutions formelles Parties exponentielles Eigenring Décomposition Factorisation Réduction du rang de Poincaré Computer algebra Algorithms Differential systems Singularities Formal reduction Formal solutions Exponential parts Eigenring Decomposition Factorization Poincaré rank reduction 515.3
357	The Main Diagonal of a Permutation Matrix Lindner, Marko, Strang, Gilbert January 2011 (has links) By counting 1's in the "right half" of 2w consecutive rows, we locate the main diagonal of any doubly infinite permutation matrix with bandwidth w. Then the matrix can be correctly centered and factored into block-diagonal permutation matrices. Part II of the paper discusses the same questions for the much larger class of band-dominated matrices. The main diagonal is determined by the Fredholm index of a singly infinite submatrix. Thus the main diagonal is determined "at infinity" in general, but from only 2w rows for banded permutations. info:eu-repo/classification/ddc/518 ddc:518 15A23, 47A53, 47B36
358	Block SOR Preconditional Projection Methods for Kronecker Structured Markovian Representations Buchholz, Peter, Dayar, Tuğrul 15 January 2013 (has links) Kronecker structured representations are used to cope with the state space explosion problem in Markovian modeling and analysis. Currently an open research problem is that of devising strong preconditioners to be used with projection methods for the computation of the stationary vector of Markov chains (MCs) underlying such representations. This paper proposes a block SOR (BSOR) preconditioner for hierarchical Markovian Models (HMMs) that are composed of multiple low level models and a high level model that defines the interaction among low level models. The Kronecker structure of an HMM yields nested block partitionings in its underlying continuous-time MC which may be used in the BSOR preconditioner. The computation of the BSOR preconditioned residual in each iteration of a preconditioned projection method becoms the problem of solving multiple nonsingular linear systems whose coefficient matrices are the diagonal blocks of the chosen partitioning. The proposed BSOR preconditioner solvers these systems using sparse LU or real Schur factors of diagonal blocks. The fill-in of sparse LU factorized diagonal blocks is reduced using the column approximate minimum degree algorithm (COLAMD). A set of numerical experiments are presented to show the merits of the proposed BSOR preconditioner. info:eu-repo/classification/ddc/004 ddc:004
359	Block SOR for Kronecker structured representations Buchholz, Peter, Dayar, Tuğrul 15 January 2013 (has links) Hierarchical Markovian Models (HMMs) are composed of multiple low level models (LLMs) and high level model (HLM) that defines the interaction among LLMs. The essence of the HMM approach is to model the system at hand in the form of interacting components so that its (larger) underlying continous-time Markov chain (CTMC) is not generated but implicitly represented as a sum of Kronecker products of (smaller) component matrices. The Kronecker structure of an HMM induces nested block partitionings in its underlying CTMC. These partitionings may be used in block versions of classical iterative methods based on splittings, such as block SOR (BSOR), to solve the underlying CTMC for its stationary vector. Therein the problem becomes that of solving multiple nonsingular linear systems whose coefficient matrices are the diagonal blocks of a particular partitioning. This paper shows that in each HLM state there may be diagonal blocks with identical off-diagonal parts and diagonals differing from each other by a multiple of the identity matrix. Such diagonal blocks are named candidate blocks. The paper explains how candidate blocks can be detected and how the can mutually benefit from a single real Schur factorization. It gives sufficient conditions for the existence of diagonal blocks with real eigenvalues and shows how these conditions can be checked using component matrices. It describes how the sparse real Schur factors of candidate blocks satisfying these conditions can be constructed from component matrices and their real Schur factors. It also demonstrates how fill in- of LU factorized (non-candidate) diagonal blocks can be reduced by using the column approximate minimum degree algorithm (COLAMD). Then it presents a three-level BSOR solver in which the diagonal blocks at the first level are solved using block Gauss-Seidel (BGS) at the second and the methods of real Schur and LU factorizations at the third level. Finally, on a set of numerical experiments it shows how these ideas can be used to reduce the storage required by the factors of the diagonal blocks at the third level and to improve the solution time compared to an all LU factorization implementation of the three-level BSOR solver. info:eu-repo/classification/ddc/004 ddc:004
360	A Confirmatory Analysis for Automating the Evaluation of Motivation Letters to Emulate Human Judgment Mercado Salazar, Jorge Anibal, Rana, S M Masud January 2021 (has links) Manually reading, evaluating, and scoring motivation letters as part of the admissions process is a time-consuming and tedious task for Dalarna University's program managers. An automated scoring system would provide them with relief as well as the ability to make much faster decisions when selecting applicants for admission. The aim of this thesis was to analyse current human judgment and attempt to emulate it using machine learning techniques. We used various topic modelling methods, such as Latent Dirichlet Allocation and Non-Negative Matrix Factorization, to find the most interpretable topics, build a bridge between topics and human-defined factors, and finally evaluate model performance by predicting scoring values and finding accuracy using logistic regression, discriminant analysis, and other classification algorithms. Despite the fact that we were able to discover the meaning of almost all human factors on our own, the topic models' accuracy in predicting overall score was unexpectedly low. Setting a threshold on overall score to select applicants for admission yielded a good overall accuracy result, but did not yield a good consistent precision or recall score. During our investigation, we attempted to determine the possible causes of these unexpected results and discovered that not only is topic modelling limitation to blame, but human bias also plays a role. Motivation Letter Natural Language Processing Topic Modelling Latent Dirichlet Allocation Non-Negative Matrix Factorization LDAVis Topic Factors Image Processing Text Processing Logistic Regression Unsupervised Learning Machine Learning Other Social Sciences Annan samhällsvetenskap

Search results