Spelling suggestions: "subject:"[een] LASSO"" "subject:"[enn] LASSO""
201 |
Canonical Correlation and Clustering for High Dimensional DataOuyang, Qing January 2019 (has links)
Multi-view datasets arise naturally in statistical genetics when the genetic
and trait profile of an individual is portrayed by two feature vectors.
A motivating problem concerning the Skin Intrinsic Fluorescence (SIF)
study on the Diabetes Control and Complications Trial (DCCT) subjects
is presented. A widely applied quantitative method to explore the correlation
structure between two domains of a multi-view dataset is the
Canonical Correlation Analysis (CCA), which seeks the canonical loading
vectors such that the transformed canonical covariates are maximally
correlated. In the high dimensional case, regularization of the dataset is
required before CCA can be applied. Furthermore, the nature of genetic
research suggests that sparse output is more desirable. In this thesis, two
regularized CCA (rCCA) methods and a sparse CCA (sCCA) method
are presented. When correlation sub-structure exists, stand-alone CCA
method will not perform well. To tackle this limitation, a mixture of
local CCA models can be employed. In this thesis, I review a correlation
clustering algorithm proposed by Fern, Brodley and Friedl (2005),
which seeks to group subjects into clusters such that features are identically
correlated within each cluster. An evaluation study is performed
to assess the effectiveness of CCA and correlation clustering algorithms
using artificial multi-view datasets. Both sCCA and sCCA-based correlation
clustering exhibited superior performance compare to the rCCA and
rCCA-based correlation clustering. The sCCA and the sCCA-clustering
are applied to the multi-view dataset consisted of PrediXcan imputed gene
expression and SIF measurements of DCCT subjects. The stand-alone
sparse CCA method identified 193 among 11538 genes being correlated
with SIF#7. Further investigation of these 193 genes with simple linear
regression and t-test revealed that only two genes, ENSG00000100281.9
and ENSG00000112787.8, were significance in association with SIF#7. No
plausible clustering scheme was detected by the sCCA based correlation
clustering method. / Thesis / Master of Science (MSc)
|
202 |
Semiparametric and Nonparametric Methods for Complex DataKim, Byung-Jun 26 June 2020 (has links)
A variety of complex data has broadened in many research fields such as epidemiology, genomics, and analytical chemistry with the development of science, technologies, and design scheme over the past few decades. For example, in epidemiology, the matched case-crossover study design is used to investigate the association between the clustered binary outcomes of disease and a measurement error in covariate within a certain period by stratifying subjects' conditions. In genomics, high-correlated and high-dimensional(HCHD) data are required to identify important genes and their interaction effect over diseases. In analytical chemistry, multiple time series data are generated to recognize the complex patterns among multiple classes. Due to the great diversity, we encounter three problems in analyzing those complex data in this dissertation. We have then provided several contributions to semiparametric and nonparametric methods for dealing with the following problems: the first is to propose a method for testing the significance of a functional association under the matched study; the second is to develop a method to simultaneously identify important variables and build a network in HDHC data; the third is to propose a multi-class dynamic model for recognizing a pattern in the time-trend analysis.
For the first topic, we propose a semiparametric omnibus test for testing the significance of a functional association between the clustered binary outcomes and covariates with measurement error by taking into account the effect modification of matching covariates. We develop a flexible omnibus test for testing purposes without a specific alternative form of a hypothesis. The advantages of our omnibus test are demonstrated through simulation studies and 1-4 bidirectional matched data analyses from an epidemiology study.
For the second topic, we propose a joint semiparametric kernel machine network approach to provide a connection between variable selection and network estimation. Our approach is a unified and integrated method that can simultaneously identify important variables and build a network among them. We develop our approach under a semiparametric kernel machine regression framework, which can allow for the possibility that each variable might be nonlinear and is likely to interact with each other in a complicated way. We demonstrate our approach using simulation studies and real application on genetic pathway analysis.
Lastly, for the third project, we propose a Bayesian focal-area detection method for a multi-class dynamic model under a Bayesian hierarchical framework. Two-step Bayesian sequential procedures are developed to estimate patterns and detect focal intervals, which can be used for gas chromatography. We demonstrate the performance of our proposed method using a simulation study and real application on gas chromatography on Fast Odor Chromatographic Sniffer (FOX) system. / Doctor of Philosophy / A variety of complex data has broadened in many research fields such as epidemiology, genomics, and analytical chemistry with the development of science, technologies, and design scheme over the past few decades. For example, in epidemiology, the matched case-crossover study design is used to investigate the association between the clustered binary outcomes of disease and a measurement error in covariate within a certain period by stratifying subjects' conditions. In genomics, high-correlated and high-dimensional(HCHD) data are required to identify important genes and their interaction effect over diseases. In analytical chemistry, multiple time series data are generated to recognize the complex patterns among multiple classes. Due to the great diversity, we encounter three problems in analyzing the following three types of data: (1) matched case-crossover data, (2) HCHD data, and (3) Time-series data. We contribute to the development of statistical methods to deal with such complex data.
First, under the matched study, we discuss an idea about hypothesis testing to effectively determine the association between observed factors and risk of interested disease. Because, in practice, we do not know the specific form of the association, it might be challenging to set a specific alternative hypothesis. By reflecting the reality, we consider the possibility that some observations are measured with errors. By considering these measurement errors, we develop a testing procedure under the matched case-crossover framework. This testing procedure has the flexibility to make inferences on various hypothesis settings.
Second, we consider the data where the number of variables is very large compared to the sample size, and the variables are correlated to each other. In this case, our goal is to identify important variables for outcome among a large amount of the variables and build their network. For example, identifying few genes among whole genomics associated with diabetes can be used to develop biomarkers. By our proposed approach in the second project, we can identify differentially expressed and important genes and their network structure with consideration for the outcome.
Lastly, we consider the scenario of changing patterns of interest over time with application to gas chromatography. We propose an efficient detection method to effectively distinguish the patterns of multi-level subjects in time-trend analysis. We suggest that our proposed method can give precious information on efficient search for the distinguishable patterns so as to reduce the burden of examining all observations in the data.
|
203 |
Change Detection and Analysis of Data with Heterogeneous StructuresChu, Shuyu 28 July 2017 (has links)
Heterogeneous data with different characteristics are ubiquitous in the modern digital world. For example, the observations collected from a process may change on its mean or variance. In numerous applications, data are often of mixed types including both discrete and continuous variables. Heterogeneity also commonly arises in data when underlying models vary across different segments. Besides, the underlying pattern of data may change in different dimensions, such as in time and space. The diversity of heterogeneous data structures makes statistical modeling and analysis challenging.
Detection of change-points in heterogeneous data has attracted great attention from a variety of application areas, such as quality control in manufacturing, protest event detection in social science, purchase likelihood prediction in business analytics, and organ state change in the biomedical engineering. However, due to the extraordinary diversity of the heterogeneous data structures and complexity of the underlying dynamic patterns, the change-detection and analysis of such data is quite challenging.
This dissertation aims to develop novel statistical modeling methodologies to analyze four types of heterogeneous data and to find change-points efficiently. The proposed approaches have been applied to solve real-world problems and can be potentially applied to a broad range of areas. / Ph. D. / Heterogeneous data with different characteristics are ubiquitous in the modern digital world. Detection of change-points in heterogeneous data has attracted great attention from a variety of application areas, such as quality control in manufacturing, protest event detection in social science, purchase likelihood prediction in business analytics, and organ state change in the biomedical engineering. However, due to the extraordinary diversity of the heterogeneous data structures and complexity of the underlying dynamic patterns, the change-detection and analysis of such data is quite challenging.
This dissertation focuses on modeling and analysis of data with heterogeneous structures. Particularly, four types of heterogeneous data are analyzed and different techniques are proposed in order to nd change-points efficiently. The proposed approaches have been applied to solve real-world problems and can be potentially applied to a broad range of areas.
|
204 |
Devesa de l'Albufera: el cambio de paradigma en el turismo de masas de la ciudad de ValenciaMartínez Lloréns, Felipe 27 May 2019 (has links)
[ES] El litoral mediterráneo español ha sido protagonista de un intenso proceso de desarrollo urbanístico desde la década de los 60 del siglo XX. Este proceso ha sido provocado por el incremento de las actividades económicas basadas en el crecimiento de las ciudades y el desarrollo del turismo. En la actualidad, en la comunidad valenciana el sesenta por ciento del suelo contiguo al dominio público marítimo terrestre está clasificado como suelo urbano, mientras que el veinte por ciento se encuentra amparado por alguna figura de protección ambiental.
La urbanización de la "Devesa de l¿Albufera" (Valencia) fue uno de los primeros prototipos urbanísticos implantados en España como respuesta al fenómeno del turismo de masas, pero a diferencia de otros prototipos coetáneos (como La Manga del Mar Menor, en Murcia) no llegó a construirse en su totalidad. La paralización de este proyecto fue consecuencia de una movilización ciudadana en defensa de este territorio, y la finca inicialmente destinada a la construcción de una ciudad de turismo acabó siendo protegida e integrada en el primer parque natural de Valencia: el Parque Natural de L¿Albufera (PNLA). En este trabajo se aborda la reconstrucción histórica del proceso de cambio de paradigma de la Devesa, desde la planificación y construcción de la ciudad de turismo hasta la restauración y protección del territorio mediante la declaración del PNLA, identificando los hitos históricos de relevancia en la evolución del caso, las causas del cambio de paradigma y las características que definen el paradigma alternativo derivado de este cambio. / [CA] El litoral mediterrani espanyol ha sigut protagonista d¿un intens procés de desenvolupament urbanístic des de la dècada dels 60 del segle XX. Este procés ha sigut provocat per l¿increment de les activitats econòmiques basades en el creixement de les ciutats i el desenvolupament del turisme. Actualment, en la Comunitat Valenciana el seixanta per cent del sòl contigu al domini públic marítim terrestre està classificat com a sòl urbà, mentres que el vint per cent es troba emparat per alguna figura de protecció ambiental.
La urbanització de la "Devesa de l¿Albufera" (València) va ser un dels primers prototipus urbanístics implantats a Espanya com a resposta al fenomen del turisme de masses, però a diferència d¿altres prototipus coetanis (com el de La Manga del Mar Menor, a Múrcia) no va arribar a construir-se en la seua totalitat. La paralització d¿este projecte va ser conseqüència d¿una mobilització ciutadana en defensa d¿este territori, i la finca inicialment destinada a la construcció d¿una ciutat de turisme va acabar sent protegida i integrada dins del primer parc natural de València: el parc natural de l¿Albufera (PNLA). En este treball s¿aborda la reconstrucció històrica del procés de canvi de paradigma de la Devesa, des de la planificació i construcció de la ciutat de turisme fins a la restauració i protecció del territori mitjançant la declaració del PNLA, identificant les fites històriques de rellevància en l¿evolució del cas, les causes del canvi de paradigma i les característiques que definixen el paradigma alternatiu derivat d¿este canvi. / [EN] The spanish mediterranean coast has been the main protagonist of an intense process of urban development from the 60s of the twentieh century. This process has been caused by the increase of economic activities based on the growth of the cities and the development of tourism. Nowadays, the Valencian Community have sixty percent of the land adjacent to the maritime terrestrial public domain (the seaside) classified as urban soil, while only twenty percent of it is covered by some figure of environmental protection.
At the beginning, the urbanization of "Devesa de l¿Albufera" (Valencia) was one of the first urban prototypes proposed in Spain in order to solve the phenomenon of mass tourism, but unlike other contemporary prototypes (such as the Manga del Mar Menor, in Murcia) this one hasn¿t been never completely built. The strong citizen mobilization in defense of its territory, forced the paralysis of the project; then, the parcel initially intended to be the main tourism¿s city site, turned out to be an integrated and protected area inside the first natural park of Valencia: the natural park of L¿Albufera (PNLA). This actual research is all about the historical reconstruction of the paradigm change process, that took place in the Devesa; the planning and construction of the tourism¿s city changed once the declaration of the PNLA was obtained, then began the restoration and protection of the territory. The methodology of this research has been mainly about the identifycation of the historical milestones in the evolution of the case, the causes of this paradigm change and finally the characteristics defining the alternative paradigm derived from this change. / Martínez Lloréns, F. (2019). Devesa de l'Albufera: el cambio de paradigma en el turismo de masas de la ciudad de Valencia [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/121137
|
205 |
Inférence non-paramétrique pour des interactions poissoniennes / Adaptive nonparametric inference for Poissonian interactionsSansonnet, Laure 14 June 2013 (has links)
L'objet de cette thèse est d'étudier divers problèmes de statistique non-paramétrique dans le cadre d'un modèle d'interactions poissoniennes. De tels modèles sont, par exemple, utilisés en neurosciences pour analyser les interactions entre deux neurones au travers leur émission de potentiels d'action au cours de l'enregistrement de l'activité cérébrale ou encore en génomique pour étudier les distances favorisées ou évitées entre deux motifs le long du génome. Dans ce cadre, nous introduisons une fonction dite de reproduction qui permet de quantifier les positions préférentielles des motifs et qui peut être modélisée par l'intensité d'un processus de Poisson. Dans un premier temps, nous nous intéressons à l'estimation de cette fonction que l'on suppose très localisée. Nous proposons une procédure d'estimation adaptative par seuillage de coefficients d'ondelettes qui est optimale des points de vue oracle et minimax. Des simulations et une application en génomique sur des données réelles provenant de la bactérie E. coli nous permettent de montrer le bon comportement pratique de notre procédure. Puis, nous traitons les problèmes de test associés qui consistent à tester la nullité de la fonction de reproduction. Pour cela, nous construisons une procédure de test optimale du point de vue minimax sur des espaces de Besov faibles, qui a également montré ses performances du point de vue pratique. Enfin, nous prolongeons ces travaux par l'étude d'une version discrète en grande dimension du modèle précédent en proposant une procédure adaptative de type Lasso. / The subject of this thesis is the study of some adaptive nonparametric statistical problems in the framework of a Poisson interactions model. Such models are used, for instance, in neurosciences to analyze interactions between two neurons through their spikes emission during the recording of the brain activity or in genomics to study favored or avoided distances between two motifs along a genome. In this setting, we naturally introduce a so-called reproduction function that allows to quantify the favored positions of the motifs and which is considered as the intensity of a Poisson process. Our first interest is the estimation of this function assumed to be well localized. We propose a data-driven wavelet thresholding estimation procedure that is optimal from oracle and minimax points of view. Simulations and an application to genomic data from the bacterium E. coli allow us to show the good practical behavior of our procedure. Then, we deal with associated problems on tests which consist in testing the nullity of the reproduction function. For this purpose, we build a minimax optimal testing procedure on weak Besov spaces and we provide some simulations showing good practical performances of our procedure. Finally, we extend this work with the study of a high-dimensional discrete setting of our previous model by proposing an adaptive Lasso-type procedure.
|
206 |
Représentation parcimonieuse et procédures de tests multiples : application à la métabolomique / Sparse representation and multiple testing procedures : application to metabolimicsTardivel, Patrick 24 November 2017 (has links)
Considérons un vecteur gaussien Y de loi N (m,sigma²Idn) et X une matrice de dimension n x p avec Y observé, m inconnu, Sigma et X connus. Dans le cadre du modèle linéaire, m est supposé être une combinaison linéaire des colonnes de X. En petite dimension, lorsque n ≥ p et que ker (X) = 0, il existe alors un unique paramètre Beta* tel que m = X Beta* ; on peut alors réécrire Y sous la forme Y = X Beta* + Epsilon. Dans le cadre du modèle linéaire gaussien en petite dimension, nous construisons une nouvelle procédure de tests multiples contrôlant le FWER pour tester les hypothèses nulles Beta*i = 0 pour i appartient à [[1,p]]. Cette procédure est appliquée en métabolomique au travers du programme ASICS qui est disponible en ligne. ASICS permet d'identifier et de quantifier les métabolites via l'analyse des spectres RMN. En grande dimension, lorsque n < p on a ker (X) ≠ 0, ainsi le paramètre Beta* décrit précédemment n'est pas unique. Dans le cas non bruité lorsque Sigma = 0, impliquant que Y = m, nous montrons que les solutions du système linéaire d'équations Y = X Beta avant un nombre de composantes non nulles minimales s'obtiennent via la minimisation de la "norme" lAlpha avec Alpha suffisamment petit. / Let Y be a Gaussian vector distributed according to N (m,sigma²Idn) and X a matrix of dimension n x p with Y observed, m unknown, sigma and X known. In the linear model, m is assumed to be a linear combination of the columns of X In small dimension, when n ≥ p and ker (X) = 0, there exists a unique parameter Beta* such that m = X Beta*; then we can rewrite Y = Beta* + Epsilon. In the small-dimensional linear Gaussian model framework, we construct a new multiple testing procedure controlling the FWER to test the null hypotheses Beta*i = 0 for i belongs to [[1,p]]. This procedure is applied in metabolomics through the freeware ASICS available online. ASICS allows to identify and to qualify metabolites via the analyse of RMN spectra. In high dimension, when n < p we have ker (X) ≠ 0 consequently the parameter Beta* described above is no longer unique. In the noiseless case when Sigma = 0, implying thus Y = m, we show that the solutions of the linear system of equation Y = X Beta having a minimal number of non-zero components are obtained via the lalpha with alpha small enough.
|
207 |
New trends in dairy cattle genetic evaluationNICOLAZZI, EZEQUIEL LUIS 24 February 2011 (has links)
I sistemi di valutazione genetica nel mondo sono in rapido sviluppo. Attualmente, i programmi di selezione “tradizionale” basati su fenotipi e rapporti di parentela tra gli animali vengono integrati, e nel futuro potrebbero essere sostituiti, dalle informazioni molecolari. In questo periodo di transizione, questa tesi riguarda ricerche su entrambi i tipi di valutazioni: dall’accertamento sull’accuratezza degli indici genetici internazionali (tradizionali), allo studio di metodi statistici utilizzati per integrare informazioni genomiche nella selezione (selezione genomica). Tre capitoli valutano gli approcci per stimare i valori genetici dai dati genomici riducendo il numero di variabili indipendenti. In modo particolare, la correzione di Bonferroni e il test di permutazioni con regressione a marcatori singoli (Capitolo III), analisi delle componenti principali con BLUP (Capitolo IV) e indice Fst tra razze con BayesA (Capitolo VI). Inoltre, il Capitolo V analizza l’accuratezza dei valori genomici con BLUP, BayesA e Bayesian LASSO includendo tutte le variabili disponibili. I risultati di questa tesi indicano che il progresso genetico atteso dall’analisi dei dati simulati può effettivamente essere ottenuto, anche se ulteriori ricerche sono necessarie per ottimizzare l’utilizzo delle informazioni molecolari in modo da ottimizzare i risultati per tutti i caratteri sotto selezione. / Genetic evaluation systems are in rapid development worldwide. In most countries, “traditional” breeding programs based on phenotypes and relationships between animals are currently being integrated and in the future might be replaced by the introduction of molecular information. This thesis stands in this transition period, therefore it covers research on both types of genetic evaluations: from the assessment of the accuracy of (traditional) international genetic evaluations to the study of statistical methods used to integrate genomic information into breeding (genomic selection). Three chapters investigate and evaluate approaches for the estimation of genetic values from genomic data reducing the number of independent variables. In particular, Bonferroni correction and Permutation test combined with single marker regression (Chapter III), principal component analysis combined with BLUP (Chapter IV) and Fst across breeds combined with BayesA (Chapter VI). In addition, Chapter V analyzes the accuracy of direct genomic values with BLUP, BayesA and Bayesian LASSO including all available variables.
The results of this thesis indicate that the genetic gains expected from the analysis of simulated data can be obtained on real data. Still, further research is needed to optimize the use of genome-wide information and obtain the best possible estimates for all traits under selection.
|
208 |
[en] VARIABLE SELECTION FOR LINEAR AND SMOOTH TRANSITION MODELS VIA LASSO: COMPARISONS, APPLICATIONS AND NEW METHODOLOGY / [pt] SELEÇÃO DE VARIÁVEIS PARA MODELOS LINEARES E DE TRANSIÇÃO SUAVE VIA LASSO: COMPARAÇÕES, APLICAÇÕES E NOVA METODOLOGIACAMILA ROSA EPPRECHT 10 June 2016 (has links)
[pt] A seleção de variáveis em modelos estatísticos é um problema importante,
para o qual diferentes soluções foram propostas. Tradicionalmente, pode-se
escolher o conjunto de variáveis explicativas usando critérios de informação ou
informação à priori, mas o número total de modelos a serem estimados cresce
exponencialmente a medida que o número de variáveis candidatas aumenta. Um
problema adicional é a presença de mais variáveis candidatas que observações.
Nesta tese nós estudamos diversos aspectos do problema de seleção de variáveis.
No Capítulo 2, comparamos duas metodologias para regressão linear:
Autometrics, que é uma abordagem geral para específico (GETS) baseada em
testes estatísticos, e LASSO, um método de regularização. Diferentes cenários
foram contemplados para a comparação no experimento de simulação, variando o
tamanho da amostra, o número de variáveis relevantes e o número de variáveis
candidatas. Em uma aplicação a dados reais, os métodos foram comparados para a
previsão do PIB dos EUA. No Capítulo 3, introduzimos uma metodologia para
seleção de variáveis em modelos regressivos e autoregressivos de transição suave
(STR e STAR) baseada na regularização do LASSO. Apresentamos uma
abordagem direta e uma escalonada (stepwise). Ambos os métodos foram testados
com exercícios de simulação exaustivos e uma aplicação a dados genéticos.
Finalmente, no Capítulo 4, propomos um critério de mínimos quadrados
penalizado baseado na penalidade l1 do LASSO e no CVaR (Conditional Value
at Risk) dos erros da regressão out-of-sample. Este é um problema de otimização
quadrática resolvido pelo método de pontos interiores. Em um estudo de
simulação usando modelos de regressão linear, mostra-se que o método proposto
apresenta performance superior a do LASSO quando os dados são contaminados
por outliers, mostrando ser um método robusto de estimação e seleção de
variáveis. / [en] Variable selection in statistical models is an important problem, for which
many different solutions have been proposed. Traditionally, one can choose the
set of explanatory variables using information criteria or prior information, but the
total number of models to evaluate increases exponentially as the number of
candidate variables increases. One additional problem is the presence of more
candidate variables than observations. In this thesis we study several aspects of
the variable selection problem. First, we compare two procedures for linear
regression: Autometrics, which is a general-to-specific (GETS) approach based on
statistical tests, and LASSO, a shrinkage method. Different scenarios were
contemplated for the comparison in a simulation experiment, varying the sample
size, the number of relevant variables and the number of candidate variables. In a
real data application, we compare the methods for GDP forecasting. In a second
part, we introduce a variable selection methodology for smooth transition
regressive (STR) and autoregressive (STAR) models based on LASSO
regularization. We present a direct and a stepwise approach. Both methods are
tested with extensive simulation exercises and an application to genetic data.
Finally, we introduce a penalized least square criterion based on the LASSO l1-
penalty and the CVaR (Conditional Value at Risk) of the out-of-sample
regression errors. This is a quadratic optimization problem solved by interior point
methods. In a simulation study in a linear regression framework, we show that the
proposed method outperforms the LASSO when the data is contaminated by
outliers, showing to be a robust method of estimation and variable selection.
|
209 |
Theoretical and Numerical Analysis of Super-Resolution Without Grid / Analyse numérique et théorique de la super-résolution sans grilleDenoyelle, Quentin 09 July 2018 (has links)
Cette thèse porte sur l'utilisation du BLASSO, un problème d'optimisation convexe en dimension infinie généralisant le LASSO aux mesures, pour la super-résolution de sources ponctuelles. Nous montrons d'abord que la stabilité du support des solutions, pour N sources se regroupant, est contrôlée par un objet appelé pré-certificat aux 2N-1 dérivées nulles. Quand ce pré-certificat est non dégénéré, dans un régime de petit bruit dont la taille est contrôlée par la distance minimale séparant les sources, le BLASSO reconstruit exactement le support de la mesure initiale. Nous proposons ensuite l'algorithme Sliding Frank-Wolfe, une variante de l'algorithme de Frank-Wolfe avec déplacement continu des amplitudes et des positions, qui résout le BLASSO. Sous de faibles hypothèses, cet algorithme converge en un nombre fini d'itérations. Nous utilisons cet algorithme pour un problème 3D de microscopie par fluorescence en comparant trois modèles construits à partir des techniques PALM/STORM. / This thesis studies the noisy sparse spikes super-resolution problem for positive measures using the BLASSO, an infinite dimensional convex optimization problem generalizing the LASSO to measures. First, we show that the support stability of the BLASSO for N clustered spikes is governed by an object called the (2N-1)-vanishing derivatives pre-certificate. When it is non-degenerate, solving the BLASSO leads to exact support recovery of the initial measure, in a low noise regime whose size is controlled by the minimal separation distance of the spikes. In a second part, we propose the Sliding Frank-Wolfe algorithm, based on the Frank-Wolfe algorithm with an added step moving continuously the amplitudes and positions of the spikes, that solves the BLASSO. We show that, under mild assumptions, it converges in a finite number of iterations. We apply this algorithm to the 3D fluorescent microscopy problem by comparing three models based on the PALM/STORM technics.
|
210 |
Contributions au démélange non-supervisé et non-linéaire de données hyperspectrales / Contributions to unsupervised and nonlinear unmixing of hyperspectral dataAmmanouil, Rita 13 October 2016 (has links)
Le démélange spectral est l’un des problèmes centraux pour l’exploitation des images hyperspectrales. En raison de la faible résolution spatiale des imageurs hyperspectraux en télédetection, la surface représentée par un pixel peut contenir plusieurs matériaux. Dans ce contexte, le démélange consiste à estimer les spectres purs (les end members) ainsi que leurs fractions (les abondances) pour chaque pixel de l’image. Le but de cette thèse estde proposer de nouveaux algorithmes de démélange qui visent à améliorer l’estimation des spectres purs et des abondances. En particulier, les algorithmes de démélange proposés s’inscrivent dans le cadre du démélange non-supervisé et non-linéaire. Dans un premier temps, on propose un algorithme de démelange non-supervisé dans lequel une régularisation favorisant la parcimonie des groupes est utilisée pour identifier les spectres purs parmi les observations. Une extension de ce premier algorithme permet de prendre en compte la présence du bruit parmi les observations choisies comme étant les plus pures. Dans un second temps, les connaissances a priori des ressemblances entre les spectres à l’échelle localeet non-locale ainsi que leurs positions dans l’image sont exploitées pour construire un graphe adapté à l’image. Ce graphe est ensuite incorporé dans le problème de démélange non supervisé par le biais d’une régularisation basée sur le Laplacian du graphe. Enfin, deux algorithmes de démélange non-linéaires sont proposés dans le cas supervisé. Les modèles de mélanges non-linéaires correspondants incorporent des fonctions à valeurs vectorielles appartenant à un espace de Hilbert à noyaux reproduisants. L’intérêt de ces fonctions par rapport aux fonctions à valeurs scalaires est qu’elles permettent d’incorporer un a priori sur la ressemblance entre les différentes fonctions. En particulier, un a priori spectral, dans un premier temps, et un a priori spatial, dans un second temps, sont incorporés pour améliorer la caractérisation du mélange non-linéaire. La validation expérimentale des modèles et des algorithmes proposés sur des données synthétiques et réelles montre une amélioration des performances par rapport aux méthodes de l’état de l’art. Cette amélioration se traduit par une meilleure erreur de reconstruction des données / Spectral unmixing has been an active field of research since the earliest days of hyperspectralremote sensing. It is concerned with the case where various materials are found inthe spatial extent of a pixel, resulting in a spectrum that is a mixture of the signatures ofthose materials. Unmixing then reduces to estimating the pure spectral signatures and theircorresponding proportions in every pixel. In the hyperspectral unmixing jargon, the puresignatures are known as the endmembers and their proportions as the abundances. Thisthesis focuses on spectral unmixing of remotely sensed hyperspectral data. In particular,it is aimed at improving the accuracy of the extraction of compositional information fromhyperspectral data. This is done through the development of new unmixing techniques intwo main contexts, namely in the unsupervised and nonlinear case. In particular, we proposea new technique for blind unmixing, we incorporate spatial information in (linear and nonlinear)unmixing, and we finally propose a new nonlinear mixing model. More precisely, first,an unsupervised unmixing approach based on collaborative sparse regularization is proposedwhere the library of endmembers candidates is built from the observations themselves. Thisapproach is then extended in order to take into account the presence of noise among theendmembers candidates. Second, within the unsupervised unmixing framework, two graphbasedregularizations are used in order to incorporate prior local and nonlocal contextualinformation. Next, within a supervised nonlinear unmixing framework, a new nonlinearmixing model based on vector-valued functions in reproducing kernel Hilbert space (RKHS)is proposed. The aforementioned model allows to consider different nonlinear functions atdifferent bands, regularize the discrepancies between these functions, and account for neighboringnonlinear contributions. Finally, the vector-valued kernel framework is used in orderto promote spatial smoothness of the nonlinear part in a kernel-based nonlinear mixingmodel. Simulations on synthetic and real data show the effectiveness of all the proposedtechniques
|
Page generated in 0.1091 seconds