Global ETD Search

1	A study of the prediction performance and multivariate extensions of the horseshoe estimator Yunfan Li (6624032) 14 May 2019 (has links) The horseshoe prior has been shown to successfully handle high-dimensional sparse estimation problems. It both adapts to sparsity efficiently and provides nearly unbiased estimates for large signals. In addition, efficient sampling algorithms have been developed and successively applied to a vast array of high-dimensional sparse estimation problems. In this dissertation, we investigate the prediction performance of the horseshoe prior in sparse regression, and extend the horseshoe prior to two multivariate settings.<br><br>We begin with a study of the finite sample prediction performance of shrinkage regression methods, where the risk can be unbiasedly estimated using Stein's approach. We show that the horseshoe prior achieves an improved prediction risk over global shrinkage rules, by using a component-specific local shrinkage term that is learned from the data under a heavy-tailed prior, in combination with a global term providing shrinkage towards zero. We demonstrate improved prediction performance in a simulation study and in a pharmacogenomics data set, confirming our theoretical findings.<br><br>We then shift to extending the horseshoe prior to handle two high-dimensional multivariate problems. First, we develop a new estimator of the inverse covariance matrix for high-dimensional multivariate normal data. The proposed graphical horseshoe estimator has attractive properties compared to other popular estimators. The most prominent benefit is that when the true inverse covariance matrix is sparse, the graphical horseshoe estimator provides estimates with small information divergence from the sampling model. The posterior mean under the graphical horseshoe prior can also be almost unbiased under certain conditions. In addition to these theoretical results, we provide a full Gibbs sampler for implementation. The graphical horseshoe estimator compares favorably to existing techniques in simulations and in a human gene network data analysis.<br><br>In our second setting, we apply the horseshoe prior to the joint estimation of regression coefficients and the inverse covariance matrix in normal models. The computational challenge in this problem is due to the dimensionality of the parameter space that routinely exceeds the sample size. We show that the advantages of the horseshoe prior in estimating a mean vector, or an inverse covariance matrix, separately are also present when addressing both simultaneously. We propose a full Bayesian treatment, with a sampling algorithm that is linear in the number of predictors. Extensive performance comparisons are provided with both frequentist and Bayesian alternatives, and both estimation and prediction performances are verified on a genomic data set. Statistics Bayesian statistical model multivariate analysis Gaussian graphical model (GGM)
2	Gaussian Graphical Model Selection for Gene Regulatory Network Reverse Engineering and Function Prediction Kontos, Kevin 02 July 2009 (has links) One of the most important and challenging ``knowledge extraction' tasks in bioinformatics is the reverse engineering of gene regulatory networks (GRNs) from DNA microarray gene expression data. Indeed, as a result of the development of high-throughput data-collection techniques, biology is experiencing a data flood phenomenon that pushes biologists toward a new view of biology--systems biology--that aims at system-level understanding of biological systems. Unfortunately, even for small model organisms such as the yeast Saccharomyces cerevisiae, the number p of genes is much larger than the number n of expression data samples. The dimensionality issue induced by this ``small n, large p' data setting renders standard statistical learning methods inadequate. Restricting the complexity of the models enables to deal with this serious impediment. Indeed, by introducing (a priori undesirable) bias in the model selection procedure, one reduces the variance of the selected model thereby increasing its accuracy. Gaussian graphical models (GGMs) have proven to be a very powerful formalism to infer GRNs from expression data. Standard GGM selection techniques can unfortunately not be used in the ``small n, large p' data setting. One way to overcome this issue is to resort to regularization. In particular, shrinkage estimators of the covariance matrix--required to infer GGMs--have proven to be very effective. Our first contribution consists in a new shrinkage estimator that improves upon existing ones through the use of a Monte Carlo (parametric bootstrap) procedure. Another approach to GGM selection in the ``small n, large p' data setting consists in reverse engineering limited-order partial correlation graphs (q-partial correlation graphs) to approximate GGMs. Our second contribution consists in an inference algorithm, the q-nested procedure, that builds a sequence of nested q-partial correlation graphs to take advantage of the smaller order graphs' topology to infer higher order graphs. This allows us to significantly speed up the inference of such graphs and to avoid problems related to multiple testing. Consequently, we are able to consider higher order graphs, thereby increasing the accuracy of the inferred graphs. Another important challenge in bioinformatics is the prediction of gene function. An example of such a prediction task is the identification of genes that are targets of the nitrogen catabolite repression (NCR) selection mechanism in the yeast Saccharomyces cerevisiae. The study of model organisms such as Saccharomyces cerevisiae is indispensable for the understanding of more complex organisms. Our third contribution consists in extending the standard two-class classification approach by enriching the set of variables and comparing several feature selection techniques and classification algorithms. Finally, our fourth contribution formulates the prediction of NCR target genes as a network inference task. We use GGM selection to infer multivariate dependencies between genes, and, starting from a set of genes known to be sensitive to NCR, we classify the remaining genes. We hence avoid problems related to the choice of a negative training set and take advantage of the robustness of GGM selection techniques in the ``small n, large p' data setting. ``small n large p' bioinformatics machine learning Gaussian graphical model (GGM)
3	Geoid Model of Tanzania from Sparse and Varying Gravity Data Density by the KTH method Ulotu, Prosper January 2009 (has links) Developed countries are striving to achieve a cm geoid model. Most developing countries/regions think that the situation in their areas does not allow even a few decimetre geoid model. GNSS, which provides us with position, is one of the greatest achievements of the present time. Conversion of ellipsoidal height to orthometric height, which is more useful, requires an accurate geoid model. In spite of the sparse terrestrial gravity data of variable density, distribution and quality (a typical situation in developing countries), this study set out to develop as accurately as possibly achievable, a high quality geoid model of Tanzania. Literature review of three more preferred geoid methods came to a conclusion, that the Royal Institute of Technology of Sweden (KTH) method of least squares modification of Stokes formula (LSMS) with additive corrections (AC) is the most suitable for this research. However, even with a good method, the accuracy and the quality of a geoid model depend much on the quality of the data. In this study, a procedure to create a gravity database (GDB) out of sparse data with varying density, distribution and quality has been developed. This GDB is of high density and full coverage, which ensures presence of high and low gravity frequencies, with medium frequencies ranging between fair and excellent. Also an alternative local/regional Global Gravitational Model (GGM) validation method based on quality terrestrial point surface gravity anomaly has been developed. Validation of a GGM using the new approach of terrestrial point gravity and GPS/Levelling, gave the same results. Once satisfactorily proved, the method has extra advantages. The limits of Tanzania GDB (TGDB) are latitudes 15 ° S to 4 ° N and longitudes 26 ° E to 44 ° E . Cleaning and quality control of the TGDB was based on the cross validation (XV) by the Kriging method and Gaussian distribution of the XV residuals. The data used in the LSMS with AC to develop a new Tanzania gravimetric geoid model 2008, TZG08, are 1′ ×1′ clean and statistically tested surface gravity anomalies. 39,677 point gravity in land and 57,723 in the ocean were utilised. Pure satellite ITGGRACE03S GGM to degree 120 was used to determine modification parameters and long-wavelength component of the geoid model. 3′′ Shuttle Radar Topographic Mission (SRTM) Digital Elevation Model (DEM), ITG-GRACE03S to degree 120 and EIGENCG03C to degree 360 combined GGM qualified to patch the data voids in accordance to the method of this research. TZG08 is referred to Geodetic Reference System 1980 (GRS80), and its extents are latitudes 12 ° S to 1 ° N and longitudes 29 ° E to 41 ° E . 19 GPS/levelling points qualified to assess the overall accuracy of TZG08 as 29.7 cm, and upon approximate removal of GPS and orthometric systematic effects, the accuracy of TZG08 is 27.8 cm. A corrector surface (CS) for conversion of GPS height to orthometric height referred to Tanzania National Height Datum (TNHD) has been created for a part of TZG08. Using the CS and TZG08, orthometric height of Mt. Kilimanjaro is re-established as it was in 1952 to be 5,895 m above the TNHD, which is still the official height of the mountain. / <p>QC 20100813</p> Geoid sparse gravity data gravity database GGM validation by gravity corrector surface hybrid geoid KTH-LSMS with AC Mt. Kilimanjaro Tanzania. Earth and Related Environmental Sciences Geovetenskap och miljövetenskap
4	Projeto de um modulador sigma-delta de baixo consumo para sinais de áudio / Low power audio sigma delta modulator design Alarcón Cubas, Heiner Grover 23 May 2013 (has links) Este trabalho descreve o projeto de um modulador Analógico-Digital (A/D) Sigma-Delta de 16 bits (98 dB de SNR) de baixo consumo em tecnologia CMOS para a aquisição de sinais de áudio. Para projetar o modulador foi utilizada a metodologia top down, a qual consiste em projetar desde o nível de sistema até os blocos básicos em nível de transistores. O sistema foi analizado e projetado utilizando equacões e modelos comportamentais para obter as especificações de cada bloco do modulador. Considerando um baixo consumo de potência foi escolhida a topologia CIFF (do inglês Chain of Integrator with FeedForward) de terceira ordem e quatro bits implementado com capacitores chaveados. O modulador projetado é composto por três integradores chaveados, um somador analógico, um weigthed DAC e um quantizador de quatro bits. A técnica de Chopper é incluida no modulador para diminuir o ruído Flicker na entrada do modulador. Os blocos de maior consumo dentro do modulador são as OTAs. Por esta razão eles são projetados utilizando a metodologia gm/ID reduzindo assim o consumo de potência. O projeto foi realizado na tecnologia IBM 0,18 \'mü\'m sendo utilizado o simulador spectre do Cadence. O modulador Sigma-Delta atinge um SNR de 98 dB para uma banda de 20 kHz e um consumo de potência de 2,4 mW para uma fonte de alimentação de 1,8 V. / This work describes the design of a 16 bits low power Sigma-Delta modulator (98 dB SNR) in a CMOS technology for the acquisition of audio signals. To design the modulator it was used the top-down methodology, which consists on the design from system level to the transistor-level basic blocks. The system was analyzed and designed using behavioral models and equations to obtain the specifications of each block of the modulator. Considering a low power consumption it was chosen a third-order four bits CIFF topology (Chain Integrator with feedforward) implemented with switched capacitors. The modulator is composed by three integrators, one analog adder, one weigthed DAC and one four bit quantizer. The Chopper technique is included in the modulator to reduce the Flicker noise at the input of the modulator. The blocks of higher consumption within the modulator are the OTAs. Hence, they was designed using the methodology gm/ID to reduce power consumption. It was designed on the 0.18 \'mü\'m IBM technology and using the Cadence Spectre simulator. The Sigma-Delta modulator achieves a SNR of 98 dB for a bandwidth of 20 kHz and a power consumption of 2.4 mW with a 1.8 V power supply. Baixo consumo Ggm/ID methodology Low power Metodologia gm/ID Metodologia top-down Modulador sigma-delta Sigma-delta modulators Top-down methodology
5	Methods for modelling human functional brain networks with MEG and fMRI Colclough, Giles January 2016 (has links) MEG and fMRI offer complementary insights into connected human brain function. Evidence from the use of both techniques in the study of networked activity indicates that functional connectivity reflects almost every measurable aspect of human reality, being indicative of ability and deteriorating with disease. Functional network analyses may offer improved prediction of dysfunction and characterisation of cognition. Three factors holding back progress are the difficulty in synthesising information from multiple imaging modalities; a need for accurate modelling of connectivity in individual subjects, not just average effects; and a lack of scalable solutions to these problems that are applicable in a big-data setting. I propose two methodological advances that tackle these issues. A confound to network analysis in MEG, the artificial correlations induced across the brain by the process of source reconstruction, prevents the transfer of connectivity models from fMRI to MEG. The first advance is a fast correction for this confound, allowing comparable analyses to be performed in both modalities. A comparative study demonstrates that this new approach for MEG shows better repeatability for connectivity estimation, both within and between subjects, than a wide range of alternative models in popular use. A case-study analysis uses both fMRI and MEG recordings from a large dataset to determine the genetic basis for functional connectivity in the human brain. Genes account for 20% - 65% of the variation in connectivity, and outweigh the influence of the developmental environment. The second advance is a Bayesian hierarchical model for sparse functional networks that is applicable to both modalities. By sharing information over a group of subjects, more accurate estimates can be constructed for individuals' connectivity patterns. The approach scales to large datasets, outperforms state-of-the-art methods, and can provide a 50% noise reduction in MEG resting-state networks.
6	Projeto de um modulador sigma-delta de baixo consumo para sinais de áudio / Low power audio sigma delta modulator design Heiner Grover Alarcón Cubas 23 May 2013 (has links) Este trabalho descreve o projeto de um modulador Analógico-Digital (A/D) Sigma-Delta de 16 bits (98 dB de SNR) de baixo consumo em tecnologia CMOS para a aquisição de sinais de áudio. Para projetar o modulador foi utilizada a metodologia top down, a qual consiste em projetar desde o nível de sistema até os blocos básicos em nível de transistores. O sistema foi analizado e projetado utilizando equacões e modelos comportamentais para obter as especificações de cada bloco do modulador. Considerando um baixo consumo de potência foi escolhida a topologia CIFF (do inglês Chain of Integrator with FeedForward) de terceira ordem e quatro bits implementado com capacitores chaveados. O modulador projetado é composto por três integradores chaveados, um somador analógico, um weigthed DAC e um quantizador de quatro bits. A técnica de Chopper é incluida no modulador para diminuir o ruído Flicker na entrada do modulador. Os blocos de maior consumo dentro do modulador são as OTAs. Por esta razão eles são projetados utilizando a metodologia gm/ID reduzindo assim o consumo de potência. O projeto foi realizado na tecnologia IBM 0,18 \'mü\'m sendo utilizado o simulador spectre do Cadence. O modulador Sigma-Delta atinge um SNR de 98 dB para uma banda de 20 kHz e um consumo de potência de 2,4 mW para uma fonte de alimentação de 1,8 V. / This work describes the design of a 16 bits low power Sigma-Delta modulator (98 dB SNR) in a CMOS technology for the acquisition of audio signals. To design the modulator it was used the top-down methodology, which consists on the design from system level to the transistor-level basic blocks. The system was analyzed and designed using behavioral models and equations to obtain the specifications of each block of the modulator. Considering a low power consumption it was chosen a third-order four bits CIFF topology (Chain Integrator with feedforward) implemented with switched capacitors. The modulator is composed by three integrators, one analog adder, one weigthed DAC and one four bit quantizer. The Chopper technique is included in the modulator to reduce the Flicker noise at the input of the modulator. The blocks of higher consumption within the modulator are the OTAs. Hence, they was designed using the methodology gm/ID to reduce power consumption. It was designed on the 0.18 \'mü\'m IBM technology and using the Cadence Spectre simulator. The Sigma-Delta modulator achieves a SNR of 98 dB for a bandwidth of 20 kHz and a power consumption of 2.4 mW with a 1.8 V power supply. Baixo consumo Metodologia gm/ID Metodologia top-down Modulador sigma-delta Ggm/ID methodology Low power Sigma-delta modulators Top-down methodology
7	Gaussian graphical model selection for gene regulatory network reverse engineering and function prediction Kontos, Kevin 02 July 2009 (has links) One of the most important and challenging ``knowledge extraction' tasks in bioinformatics is the reverse engineering of gene regulatory networks (GRNs) from DNA microarray gene expression data. Indeed, as a result of the development of high-throughput data-collection techniques, biology is experiencing a data flood phenomenon that pushes biologists toward a new view of biology--systems biology--that aims at system-level understanding of biological systems.<p><p>Unfortunately, even for small model organisms such as the yeast Saccharomyces cerevisiae, the number p of genes is much larger than the number n of expression data samples. The dimensionality issue induced by this ``small n, large p' data setting renders standard statistical learning methods inadequate. Restricting the complexity of the models enables to deal with this serious impediment. Indeed, by introducing (a priori undesirable) bias in the model selection procedure, one reduces the variance of the selected model thereby increasing its accuracy.<p><p>Gaussian graphical models (GGMs) have proven to be a very powerful formalism to infer GRNs from expression data. Standard GGM selection techniques can unfortunately not be used in the ``small n, large p' data setting. One way to overcome this issue is to resort to regularization. In particular, shrinkage estimators of the covariance matrix--required to infer GGMs--have proven to be very effective. Our first contribution consists in a new shrinkage estimator that improves upon existing ones through the use of a Monte Carlo (parametric bootstrap) procedure.<p><p>Another approach to GGM selection in the ``small n, large p' data setting consists in reverse engineering limited-order partial correlation graphs (q-partial correlation graphs) to approximate GGMs. Our second contribution consists in an inference algorithm, the q-nested procedure, that builds a sequence of nested q-partial correlation graphs to take advantage of the smaller order graphs' topology to infer higher order graphs. This allows us to significantly speed up the inference of such graphs and to avoid problems related to multiple testing. Consequently, we are able to consider higher order graphs, thereby increasing the accuracy of the inferred graphs.<p><p>Another important challenge in bioinformatics is the prediction of gene function. An example of such a prediction task is the identification of genes that are targets of the nitrogen catabolite repression (NCR) selection mechanism in the yeast Saccharomyces cerevisiae. The study of model organisms such as Saccharomyces cerevisiae is indispensable for the understanding of more complex organisms. Our third contribution consists in extending the standard two-class classification approach by enriching the set of variables and comparing several feature selection techniques and classification algorithms.<p><p>Finally, our fourth contribution formulates the prediction of NCR target genes as a network inference task. We use GGM selection to infer multivariate dependencies between genes, and, starting from a set of genes known to be sensitive to NCR, we classify the remaining genes. We hence avoid problems related to the choice of a negative training set and take advantage of the robustness of GGM selection techniques in the ``small n, large p' data setting. / Doctorat en Sciences / info:eu-repo/semantics/nonPublished Informatique générale Sciences exactes et naturelles Bioinformatics DNA microarrays Genetic regulation -- Data processing Bio-informatique Puces à ADN Régulation génétique -- Informatique machine learning bioinformatics large p' small n Gaussian graphical model (GGM)

1

Page generated in 0.0431 seconds