Global ETD Search

201	Canonical Correlation and Clustering for High Dimensional Data Ouyang, Qing January 2019 (has links) Multi-view datasets arise naturally in statistical genetics when the genetic and trait profile of an individual is portrayed by two feature vectors. A motivating problem concerning the Skin Intrinsic Fluorescence (SIF) study on the Diabetes Control and Complications Trial (DCCT) subjects is presented. A widely applied quantitative method to explore the correlation structure between two domains of a multi-view dataset is the Canonical Correlation Analysis (CCA), which seeks the canonical loading vectors such that the transformed canonical covariates are maximally correlated. In the high dimensional case, regularization of the dataset is required before CCA can be applied. Furthermore, the nature of genetic research suggests that sparse output is more desirable. In this thesis, two regularized CCA (rCCA) methods and a sparse CCA (sCCA) method are presented. When correlation sub-structure exists, stand-alone CCA method will not perform well. To tackle this limitation, a mixture of local CCA models can be employed. In this thesis, I review a correlation clustering algorithm proposed by Fern, Brodley and Friedl (2005), which seeks to group subjects into clusters such that features are identically correlated within each cluster. An evaluation study is performed to assess the effectiveness of CCA and correlation clustering algorithms using artificial multi-view datasets. Both sCCA and sCCA-based correlation clustering exhibited superior performance compare to the rCCA and rCCA-based correlation clustering. The sCCA and the sCCA-clustering are applied to the multi-view dataset consisted of PrediXcan imputed gene expression and SIF measurements of DCCT subjects. The stand-alone sparse CCA method identified 193 among 11538 genes being correlated with SIF#7. Further investigation of these 193 genes with simple linear regression and t-test revealed that only two genes, ENSG00000100281.9 and ENSG00000112787.8, were significance in association with SIF#7. No plausible clustering scheme was detected by the sCCA based correlation clustering method. / Thesis / Master of Science (MSc) Machine Learning Correlation Clustering Sparse Canonical Correlation Analysis Skin Intrinsic Fluorescence Multi-view dataset Lasso Dimensionality reduction PrediXcan High dimensional data
202	Semiparametric and Nonparametric Methods for Complex Data Kim, Byung-Jun 26 June 2020 (has links) A variety of complex data has broadened in many research fields such as epidemiology, genomics, and analytical chemistry with the development of science, technologies, and design scheme over the past few decades. For example, in epidemiology, the matched case-crossover study design is used to investigate the association between the clustered binary outcomes of disease and a measurement error in covariate within a certain period by stratifying subjects' conditions. In genomics, high-correlated and high-dimensional(HCHD) data are required to identify important genes and their interaction effect over diseases. In analytical chemistry, multiple time series data are generated to recognize the complex patterns among multiple classes. Due to the great diversity, we encounter three problems in analyzing those complex data in this dissertation. We have then provided several contributions to semiparametric and nonparametric methods for dealing with the following problems: the first is to propose a method for testing the significance of a functional association under the matched study; the second is to develop a method to simultaneously identify important variables and build a network in HDHC data; the third is to propose a multi-class dynamic model for recognizing a pattern in the time-trend analysis. For the first topic, we propose a semiparametric omnibus test for testing the significance of a functional association between the clustered binary outcomes and covariates with measurement error by taking into account the effect modification of matching covariates. We develop a flexible omnibus test for testing purposes without a specific alternative form of a hypothesis. The advantages of our omnibus test are demonstrated through simulation studies and 1-4 bidirectional matched data analyses from an epidemiology study. For the second topic, we propose a joint semiparametric kernel machine network approach to provide a connection between variable selection and network estimation. Our approach is a unified and integrated method that can simultaneously identify important variables and build a network among them. We develop our approach under a semiparametric kernel machine regression framework, which can allow for the possibility that each variable might be nonlinear and is likely to interact with each other in a complicated way. We demonstrate our approach using simulation studies and real application on genetic pathway analysis. Lastly, for the third project, we propose a Bayesian focal-area detection method for a multi-class dynamic model under a Bayesian hierarchical framework. Two-step Bayesian sequential procedures are developed to estimate patterns and detect focal intervals, which can be used for gas chromatography. We demonstrate the performance of our proposed method using a simulation study and real application on gas chromatography on Fast Odor Chromatographic Sniffer (FOX) system. / Doctor of Philosophy / A variety of complex data has broadened in many research fields such as epidemiology, genomics, and analytical chemistry with the development of science, technologies, and design scheme over the past few decades. For example, in epidemiology, the matched case-crossover study design is used to investigate the association between the clustered binary outcomes of disease and a measurement error in covariate within a certain period by stratifying subjects' conditions. In genomics, high-correlated and high-dimensional(HCHD) data are required to identify important genes and their interaction effect over diseases. In analytical chemistry, multiple time series data are generated to recognize the complex patterns among multiple classes. Due to the great diversity, we encounter three problems in analyzing the following three types of data: (1) matched case-crossover data, (2) HCHD data, and (3) Time-series data. We contribute to the development of statistical methods to deal with such complex data. First, under the matched study, we discuss an idea about hypothesis testing to effectively determine the association between observed factors and risk of interested disease. Because, in practice, we do not know the specific form of the association, it might be challenging to set a specific alternative hypothesis. By reflecting the reality, we consider the possibility that some observations are measured with errors. By considering these measurement errors, we develop a testing procedure under the matched case-crossover framework. This testing procedure has the flexibility to make inferences on various hypothesis settings. Second, we consider the data where the number of variables is very large compared to the sample size, and the variables are correlated to each other. In this case, our goal is to identify important variables for outcome among a large amount of the variables and build their network. For example, identifying few genes among whole genomics associated with diabetes can be used to develop biomarkers. By our proposed approach in the second project, we can identify differentially expressed and important genes and their network structure with consideration for the outcome. Lastly, we consider the scenario of changing patterns of interest over time with application to gas chromatography. We propose an efficient detection method to effectively distinguish the patterns of multi-level subjects in time-trend analysis. We suggest that our proposed method can give precious information on efficient search for the distinguishable patterns so as to reduce the burden of examining all observations in the data. Bayesian Hierarchical Model Fused Lasso Gaussian graphical model High-dimensional regression Kernel machine learning based regression Matched case-control study Measurement error in covariates Multivariate analysis Semiparametric regression
203	Change Detection and Analysis of Data with Heterogeneous Structures Chu, Shuyu 28 July 2017 (has links) Heterogeneous data with different characteristics are ubiquitous in the modern digital world. For example, the observations collected from a process may change on its mean or variance. In numerous applications, data are often of mixed types including both discrete and continuous variables. Heterogeneity also commonly arises in data when underlying models vary across different segments. Besides, the underlying pattern of data may change in different dimensions, such as in time and space. The diversity of heterogeneous data structures makes statistical modeling and analysis challenging. Detection of change-points in heterogeneous data has attracted great attention from a variety of application areas, such as quality control in manufacturing, protest event detection in social science, purchase likelihood prediction in business analytics, and organ state change in the biomedical engineering. However, due to the extraordinary diversity of the heterogeneous data structures and complexity of the underlying dynamic patterns, the change-detection and analysis of such data is quite challenging. This dissertation aims to develop novel statistical modeling methodologies to analyze four types of heterogeneous data and to find change-points efficiently. The proposed approaches have been applied to solve real-world problems and can be potentially applied to a broad range of areas. / Ph. D. / Heterogeneous data with different characteristics are ubiquitous in the modern digital world. Detection of change-points in heterogeneous data has attracted great attention from a variety of application areas, such as quality control in manufacturing, protest event detection in social science, purchase likelihood prediction in business analytics, and organ state change in the biomedical engineering. However, due to the extraordinary diversity of the heterogeneous data structures and complexity of the underlying dynamic patterns, the change-detection and analysis of such data is quite challenging. This dissertation focuses on modeling and analysis of data with heterogeneous structures. Particularly, four types of heterogeneous data are analyzed and different techniques are proposed in order to nd change-points efficiently. The proposed approaches have been applied to solve real-world problems and can be potentially applied to a broad range of areas. Adaptive network lasso Gaussian process generalized likelihood ratio logistic regression mixed-type observation particle filter robustness spectral mixture kernels State space model thermal image data
204	Inférence non-paramétrique pour des interactions poissoniennes / Adaptive nonparametric inference for Poissonian interactions Sansonnet, Laure 14 June 2013 (has links) L'objet de cette thèse est d'étudier divers problèmes de statistique non-paramétrique dans le cadre d'un modèle d'interactions poissoniennes. De tels modèles sont, par exemple, utilisés en neurosciences pour analyser les interactions entre deux neurones au travers leur émission de potentiels d'action au cours de l'enregistrement de l'activité cérébrale ou encore en génomique pour étudier les distances favorisées ou évitées entre deux motifs le long du génome. Dans ce cadre, nous introduisons une fonction dite de reproduction qui permet de quantifier les positions préférentielles des motifs et qui peut être modélisée par l'intensité d'un processus de Poisson. Dans un premier temps, nous nous intéressons à l'estimation de cette fonction que l'on suppose très localisée. Nous proposons une procédure d'estimation adaptative par seuillage de coefficients d'ondelettes qui est optimale des points de vue oracle et minimax. Des simulations et une application en génomique sur des données réelles provenant de la bactérie E. coli nous permettent de montrer le bon comportement pratique de notre procédure. Puis, nous traitons les problèmes de test associés qui consistent à tester la nullité de la fonction de reproduction. Pour cela, nous construisons une procédure de test optimale du point de vue minimax sur des espaces de Besov faibles, qui a également montré ses performances du point de vue pratique. Enfin, nous prolongeons ces travaux par l'étude d'une version discrète en grande dimension du modèle précédent en proposant une procédure adaptative de type Lasso. / The subject of this thesis is the study of some adaptive nonparametric statistical problems in the framework of a Poisson interactions model. Such models are used, for instance, in neurosciences to analyze interactions between two neurons through their spikes emission during the recording of the brain activity or in genomics to study favored or avoided distances between two motifs along a genome. In this setting, we naturally introduce a so-called reproduction function that allows to quantify the favored positions of the motifs and which is considered as the intensity of a Poisson process. Our first interest is the estimation of this function assumed to be well localized. We propose a data-driven wavelet thresholding estimation procedure that is optimal from oracle and minimax points of view. Simulations and an application to genomic data from the bacterium E. coli allow us to show the good practical behavior of our procedure. Then, we deal with associated problems on tests which consist in testing the nullity of the reproduction function. For this purpose, we build a minimax optimal testing procedure on weak Besov spaces and we provide some simulations showing good practical performances of our procedure. Finally, we extend this work with the study of a high-dimensional discrete setting of our previous model by proposing an adaptive Lasso-type procedure. Processus de Poisson Estimation et tests adaptatifs Seuillage de coefficients d'ondelettes Inégalités oracle U-statistiques Vitesse de séparation uniforme Modèle d'interactions , processus de Hawkes Espaces de Besov Lasso Poisson process Adaptive estimation and tests Wavelet thresholding rules Oracle inequalities U-statistics Uniform separation rate Interactions model Hawkes processes Besov spaces Lasso
205	Représentation parcimonieuse et procédures de tests multiples : application à la métabolomique / Sparse representation and multiple testing procedures : application to metabolimics Tardivel, Patrick 24 November 2017 (has links) Considérons un vecteur gaussien Y de loi N (m,sigma²Idn) et X une matrice de dimension n x p avec Y observé, m inconnu, Sigma et X connus. Dans le cadre du modèle linéaire, m est supposé être une combinaison linéaire des colonnes de X. En petite dimension, lorsque n ≥ p et que ker (X) = 0, il existe alors un unique paramètre Beta* tel que m = X Beta* ; on peut alors réécrire Y sous la forme Y = X Beta* + Epsilon. Dans le cadre du modèle linéaire gaussien en petite dimension, nous construisons une nouvelle procédure de tests multiples contrôlant le FWER pour tester les hypothèses nulles Betai = 0 pour i appartient à [[1,p]]. Cette procédure est appliquée en métabolomique au travers du programme ASICS qui est disponible en ligne. ASICS permet d'identifier et de quantifier les métabolites via l'analyse des spectres RMN. En grande dimension, lorsque n < p on a ker (X) ≠ 0, ainsi le paramètre Beta décrit précédemment n'est pas unique. Dans le cas non bruité lorsque Sigma = 0, impliquant que Y = m, nous montrons que les solutions du système linéaire d'équations Y = X Beta avant un nombre de composantes non nulles minimales s'obtiennent via la minimisation de la "norme" lAlpha avec Alpha suffisamment petit. / Let Y be a Gaussian vector distributed according to N (m,sigma²Idn) and X a matrix of dimension n x p with Y observed, m unknown, sigma and X known. In the linear model, m is assumed to be a linear combination of the columns of X In small dimension, when n ≥ p and ker (X) = 0, there exists a unique parameter Beta* such that m = X Beta; then we can rewrite Y = Beta + Epsilon. In the small-dimensional linear Gaussian model framework, we construct a new multiple testing procedure controlling the FWER to test the null hypotheses Betai = 0 for i belongs to [[1,p]]. This procedure is applied in metabolomics through the freeware ASICS available online. ASICS allows to identify and to qualify metabolites via the analyse of RMN spectra. In high dimension, when n < p we have ker (X) ≠ 0 consequently the parameter Beta described above is no longer unique. In the noiseless case when Sigma = 0, implying thus Y = m, we show that the solutions of the linear system of equation Y = X Beta having a minimal number of non-zero components are obtained via the lalpha with alpha small enough. Procédure de tests multiples FWER Estimateur lasso Paramètre de régularisation Minimisation de la norme l1 Minimisation de la "norme" l0 Représentation parcimonieuse Résonance magnétique nucléaire Identification de métabolites Quantification de métabolites Multiple testing procedure Familywise error rate Lasso Estimator Tuning parameter Basis pursuit Alpha minimization Sparsest representation Nuclear magnetic resonance Identification of metabolites Quantification of metabolites
206	New trends in dairy cattle genetic evaluation NICOLAZZI, EZEQUIEL LUIS 24 February 2011 (has links) I sistemi di valutazione genetica nel mondo sono in rapido sviluppo. Attualmente, i programmi di selezione “tradizionale” basati su fenotipi e rapporti di parentela tra gli animali vengono integrati, e nel futuro potrebbero essere sostituiti, dalle informazioni molecolari. In questo periodo di transizione, questa tesi riguarda ricerche su entrambi i tipi di valutazioni: dall’accertamento sull’accuratezza degli indici genetici internazionali (tradizionali), allo studio di metodi statistici utilizzati per integrare informazioni genomiche nella selezione (selezione genomica). Tre capitoli valutano gli approcci per stimare i valori genetici dai dati genomici riducendo il numero di variabili indipendenti. In modo particolare, la correzione di Bonferroni e il test di permutazioni con regressione a marcatori singoli (Capitolo III), analisi delle componenti principali con BLUP (Capitolo IV) e indice Fst tra razze con BayesA (Capitolo VI). Inoltre, il Capitolo V analizza l’accuratezza dei valori genomici con BLUP, BayesA e Bayesian LASSO includendo tutte le variabili disponibili. I risultati di questa tesi indicano che il progresso genetico atteso dall’analisi dei dati simulati può effettivamente essere ottenuto, anche se ulteriori ricerche sono necessarie per ottimizzare l’utilizzo delle informazioni molecolari in modo da ottimizzare i risultati per tutti i caratteri sotto selezione. / Genetic evaluation systems are in rapid development worldwide. In most countries, “traditional” breeding programs based on phenotypes and relationships between animals are currently being integrated and in the future might be replaced by the introduction of molecular information. This thesis stands in this transition period, therefore it covers research on both types of genetic evaluations: from the assessment of the accuracy of (traditional) international genetic evaluations to the study of statistical methods used to integrate genomic information into breeding (genomic selection). Three chapters investigate and evaluate approaches for the estimation of genetic values from genomic data reducing the number of independent variables. In particular, Bonferroni correction and Permutation test combined with single marker regression (Chapter III), principal component analysis combined with BLUP (Chapter IV) and Fst across breeds combined with BayesA (Chapter VI). In addition, Chapter V analyzes the accuracy of direct genomic values with BLUP, BayesA and Bayesian LASSO including all available variables. The results of this thesis indicate that the genetic gains expected from the analysis of simulated data can be obtained on real data. Still, further research is needed to optimize the use of genome-wide information and obtain the best possible estimates for all traits under selection.
207	[en] VARIABLE SELECTION FOR LINEAR AND SMOOTH TRANSITION MODELS VIA LASSO: COMPARISONS, APPLICATIONS AND NEW METHODOLOGY / [pt] SELEÇÃO DE VARIÁVEIS PARA MODELOS LINEARES E DE TRANSIÇÃO SUAVE VIA LASSO: COMPARAÇÕES, APLICAÇÕES E NOVA METODOLOGIA CAMILA ROSA EPPRECHT 10 June 2016 (has links) [pt] A seleção de variáveis em modelos estatísticos é um problema importante, para o qual diferentes soluções foram propostas. Tradicionalmente, pode-se escolher o conjunto de variáveis explicativas usando critérios de informação ou informação à priori, mas o número total de modelos a serem estimados cresce exponencialmente a medida que o número de variáveis candidatas aumenta. Um problema adicional é a presença de mais variáveis candidatas que observações. Nesta tese nós estudamos diversos aspectos do problema de seleção de variáveis. No Capítulo 2, comparamos duas metodologias para regressão linear: Autometrics, que é uma abordagem geral para específico (GETS) baseada em testes estatísticos, e LASSO, um método de regularização. Diferentes cenários foram contemplados para a comparação no experimento de simulação, variando o tamanho da amostra, o número de variáveis relevantes e o número de variáveis candidatas. Em uma aplicação a dados reais, os métodos foram comparados para a previsão do PIB dos EUA. No Capítulo 3, introduzimos uma metodologia para seleção de variáveis em modelos regressivos e autoregressivos de transição suave (STR e STAR) baseada na regularização do LASSO. Apresentamos uma abordagem direta e uma escalonada (stepwise). Ambos os métodos foram testados com exercícios de simulação exaustivos e uma aplicação a dados genéticos. Finalmente, no Capítulo 4, propomos um critério de mínimos quadrados penalizado baseado na penalidade l1 do LASSO e no CVaR (Conditional Value at Risk) dos erros da regressão out-of-sample. Este é um problema de otimização quadrática resolvido pelo método de pontos interiores. Em um estudo de simulação usando modelos de regressão linear, mostra-se que o método proposto apresenta performance superior a do LASSO quando os dados são contaminados por outliers, mostrando ser um método robusto de estimação e seleção de variáveis. / [en] Variable selection in statistical models is an important problem, for which many different solutions have been proposed. Traditionally, one can choose the set of explanatory variables using information criteria or prior information, but the total number of models to evaluate increases exponentially as the number of candidate variables increases. One additional problem is the presence of more candidate variables than observations. In this thesis we study several aspects of the variable selection problem. First, we compare two procedures for linear regression: Autometrics, which is a general-to-specific (GETS) approach based on statistical tests, and LASSO, a shrinkage method. Different scenarios were contemplated for the comparison in a simulation experiment, varying the sample size, the number of relevant variables and the number of candidate variables. In a real data application, we compare the methods for GDP forecasting. In a second part, we introduce a variable selection methodology for smooth transition regressive (STR) and autoregressive (STAR) models based on LASSO regularization. We present a direct and a stepwise approach. Both methods are tested with extensive simulation exercises and an application to genetic data. Finally, we introduce a penalized least square criterion based on the LASSO l1- penalty and the CVaR (Conditional Value at Risk) of the out-of-sample regression errors. This is a quadratic optimization problem solved by interior point methods. In a simulation study in a linear regression framework, we show that the proposed method outperforms the LASSO when the data is contaminated by outliers, showing to be a robust method of estimation and variable selection. [pt] SELECAO DE VARIAVEIS [en] SELECTION OF VARIABLES [pt] CVAR [pt] LASSO [en] LASSO [pt] INTERACOES [en] INTERACTIONS [pt] SELECAO DE MODELOS [en] MODEL SELECTION [pt] AUTOMETRICS [en] AUTOMETRICS [pt] ADALASSO [en] ADALASSO [pt] PROPRIEDADE DE ORACULO [en] ORACLE PROPERTY [pt] MODELOS DE TRANSICAO SUAVE [en] SMOOTH TRANSITION MODELS [pt] DADOS GENETICOS [en] GENETIC DATA
208	Theoretical and Numerical Analysis of Super-Resolution Without Grid / Analyse numérique et théorique de la super-résolution sans grille Denoyelle, Quentin 09 July 2018 (has links) Cette thèse porte sur l'utilisation du BLASSO, un problème d'optimisation convexe en dimension infinie généralisant le LASSO aux mesures, pour la super-résolution de sources ponctuelles. Nous montrons d'abord que la stabilité du support des solutions, pour N sources se regroupant, est contrôlée par un objet appelé pré-certificat aux 2N-1 dérivées nulles. Quand ce pré-certificat est non dégénéré, dans un régime de petit bruit dont la taille est contrôlée par la distance minimale séparant les sources, le BLASSO reconstruit exactement le support de la mesure initiale. Nous proposons ensuite l'algorithme Sliding Frank-Wolfe, une variante de l'algorithme de Frank-Wolfe avec déplacement continu des amplitudes et des positions, qui résout le BLASSO. Sous de faibles hypothèses, cet algorithme converge en un nombre fini d'itérations. Nous utilisons cet algorithme pour un problème 3D de microscopie par fluorescence en comparant trois modèles construits à partir des techniques PALM/STORM. / This thesis studies the noisy sparse spikes super-resolution problem for positive measures using the BLASSO, an infinite dimensional convex optimization problem generalizing the LASSO to measures. First, we show that the support stability of the BLASSO for N clustered spikes is governed by an object called the (2N-1)-vanishing derivatives pre-certificate. When it is non-degenerate, solving the BLASSO leads to exact support recovery of the initial measure, in a low noise regime whose size is controlled by the minimal separation distance of the spikes. In a second part, we propose the Sliding Frank-Wolfe algorithm, based on the Frank-Wolfe algorithm with an added step moving continuously the amplitudes and positions of the spikes, that solves the BLASSO. We show that, under mild assumptions, it converges in a finite number of iterations. We apply this algorithm to the 3D fluorescent microscopy problem by comparing three models based on the PALM/STORM technics. Super-Résolution Parcimonie Blasso Lasso Variation totale Mesures positives Reconstruction exacte du support Algorithme de Frank-Wolfe Microscopie par fluorescence Palm/storm Ma-Tirf Double-Hélice Astigmatisme Super-Resolution Sparsity Blasso Lasso Total variation Positive measures Exact support recovery Frank-Wolfe algorithm Fluorescence microscopy Palm/storm Ma-Tirf Double-Helix Astigmatism 519.5
209	Contributions au démélange non-supervisé et non-linéaire de données hyperspectrales / Contributions to unsupervised and nonlinear unmixing of hyperspectral data Ammanouil, Rita 13 October 2016 (has links) Le démélange spectral est l’un des problèmes centraux pour l’exploitation des images hyperspectrales. En raison de la faible résolution spatiale des imageurs hyperspectraux en télédetection, la surface représentée par un pixel peut contenir plusieurs matériaux. Dans ce contexte, le démélange consiste à estimer les spectres purs (les end members) ainsi que leurs fractions (les abondances) pour chaque pixel de l’image. Le but de cette thèse estde proposer de nouveaux algorithmes de démélange qui visent à améliorer l’estimation des spectres purs et des abondances. En particulier, les algorithmes de démélange proposés s’inscrivent dans le cadre du démélange non-supervisé et non-linéaire. Dans un premier temps, on propose un algorithme de démelange non-supervisé dans lequel une régularisation favorisant la parcimonie des groupes est utilisée pour identifier les spectres purs parmi les observations. Une extension de ce premier algorithme permet de prendre en compte la présence du bruit parmi les observations choisies comme étant les plus pures. Dans un second temps, les connaissances a priori des ressemblances entre les spectres à l’échelle localeet non-locale ainsi que leurs positions dans l’image sont exploitées pour construire un graphe adapté à l’image. Ce graphe est ensuite incorporé dans le problème de démélange non supervisé par le biais d’une régularisation basée sur le Laplacian du graphe. Enfin, deux algorithmes de démélange non-linéaires sont proposés dans le cas supervisé. Les modèles de mélanges non-linéaires correspondants incorporent des fonctions à valeurs vectorielles appartenant à un espace de Hilbert à noyaux reproduisants. L’intérêt de ces fonctions par rapport aux fonctions à valeurs scalaires est qu’elles permettent d’incorporer un a priori sur la ressemblance entre les différentes fonctions. En particulier, un a priori spectral, dans un premier temps, et un a priori spatial, dans un second temps, sont incorporés pour améliorer la caractérisation du mélange non-linéaire. La validation expérimentale des modèles et des algorithmes proposés sur des données synthétiques et réelles montre une amélioration des performances par rapport aux méthodes de l’état de l’art. Cette amélioration se traduit par une meilleure erreur de reconstruction des données / Spectral unmixing has been an active field of research since the earliest days of hyperspectralremote sensing. It is concerned with the case where various materials are found inthe spatial extent of a pixel, resulting in a spectrum that is a mixture of the signatures ofthose materials. Unmixing then reduces to estimating the pure spectral signatures and theircorresponding proportions in every pixel. In the hyperspectral unmixing jargon, the puresignatures are known as the endmembers and their proportions as the abundances. Thisthesis focuses on spectral unmixing of remotely sensed hyperspectral data. In particular,it is aimed at improving the accuracy of the extraction of compositional information fromhyperspectral data. This is done through the development of new unmixing techniques intwo main contexts, namely in the unsupervised and nonlinear case. In particular, we proposea new technique for blind unmixing, we incorporate spatial information in (linear and nonlinear)unmixing, and we finally propose a new nonlinear mixing model. More precisely, first,an unsupervised unmixing approach based on collaborative sparse regularization is proposedwhere the library of endmembers candidates is built from the observations themselves. Thisapproach is then extended in order to take into account the presence of noise among theendmembers candidates. Second, within the unsupervised unmixing framework, two graphbasedregularizations are used in order to incorporate prior local and nonlocal contextualinformation. Next, within a supervised nonlinear unmixing framework, a new nonlinearmixing model based on vector-valued functions in reproducing kernel Hilbert space (RKHS)is proposed. The aforementioned model allows to consider different nonlinear functions atdifferent bands, regularize the discrepancies between these functions, and account for neighboringnonlinear contributions. Finally, the vector-valued kernel framework is used in orderto promote spatial smoothness of the nonlinear part in a kernel-based nonlinear mixingmodel. Simulations on synthetic and real data show the effectiveness of all the proposedtechniques Données hyperspectrales Démélange non-supervisé Démélange non-linéaire Régularisation de type groupe lasso Régularisation avec le Laplacian Hyperspectral data Unsupervised unmixing Nonlinear unmixing Group lasso regularization Laplacian regularization
210	[en] FORECASTING AMERICAN INDUSTRIAL PRODUCTION WITH HIGH DIMENSIONAL ENVIRONMENTS FROM FINANCIAL MARKETS, SENTIMENTS, EXPECTATIONS, AND ECONOMIC VARIABLES / [pt] PREVENDO A PRODUÇÃO INDUSTRIAL AMERICANA EM AMBIENTES DE ALTA DIMENSIONALIDADE, ATRAVÉS DE MERCADOS FINANCEIROS, SENTIMENTOS, EXPECTATIVAS E VARIÁVEIS ECONÔMICAS EDUARDO OLIVEIRA MARINHO 20 February 2020 (has links) [pt] O presente trabalho traz 6 diferentes técnicas de previsão para a variação mensal do Índice da Produção Industrial americana em 3 ambientes diferentes totalizando 18 modelos. No primeiro ambiente foram usados como variáveis explicativas a própria defasagem da variação mensal do Índice da produção industrial e outras 55 variáveis de mercado e de expectativa tais quais retornos setoriais, prêmio de risco de mercado, volatilidade implícita, prêmio de taxa de juros (corporate e longo prazo), sentimento do consumidor e índice de incerteza. No segundo ambiente foi usado à data base do FRED com 130 variáveis econômicas como variáveis explicativas. No terceiro ambiente foram usadas as variáveis mais relevantes do ambiente 1 e do ambiente 2. Observa-se no trabalho uma melhora em prever o IP contra um modelo AR e algumas interpretações a respeito do comportamento da economia americana nos últimos 45 anos (importância de setores econômicos, períodos de incerteza, mudanças na resposta a prêmio de risco, volatilidade e taxa de juros). / [en] This thesis presents 6 different forecasting techniques for the monthly variation of the American Industrial Production Index in 3 different environments, totaling 18 models. In the first environment, the lags of the monthly variation of the industrial production index and other 55 market and expectation variables such as sector returns, market risk premium, implied volatility, and interest rate risk premiums (corporate premium and long term premium), consumer sentiment and uncertainty index. In the second environment was used the FRED data base with 130 economic variables as explanatory variables. In the third environment, the most relevant variables of environment 1 and environment 2 were used. It was observed an improvement in predicting IP against an AR model and some interpretations regarding the behavior of the American economy in the last 45 years (importance of sectors, uncertainty periods, and changes in response to risk premium, volatility and interest rate). [pt] VOLATILIDADE [pt] PREMIO A TERMO [pt] RIDGE [pt] ALTA DIMENSIONALIDADE [pt] RANDOM FOREST [pt] BAGGING [pt] LASSO [pt] PRODUCAO INDUSTRIAL [pt] EXPECTATIVAS [pt] SENTIMENTO [pt] PREMIO DE RISCO [pt] INCERTEZA [pt] TAXA DE JUROS [en] VOLATILITY MODELS [en] TERM PREMIUM [en] RIDGE [en] HIGH DIMENSION [en] RANDOM FOREST [en] BAGGING [en] LASSO [en] INDUSTRIAL PRODUCTION [en] EXPECTATIONS [en] FEELING [en] EQUITY RISK PREMIUM [en] UNCERTAINTY [en] INTEREST RATES

Search results