Global ETD Search

141	Multivariate Quality Control Using Loss-Scaled Principal Components Murphy, Terrence Edward 24 November 2004 (has links) We consider a principal components based decomposition of the expected value of the multivariate quadratic loss function, i.e., MQL. The principal components are formed by scaling the original data by the contents of the loss constant matrix, which defines the economic penalty associated with specific variables being off their desired target values. We demonstrate the extent to which a subset of these ``loss-scaled principal components", i.e., LSPC, accounts for the two components of expected MQL, namely the trace-covariance term and the off-target vector product. We employ the LSPC to solve a robust design problem of full and reduced dimensionality with deterministic models that approximate the true solution and demonstrate comparable results in less computational time. We also employ the LSPC to construct a test statistic called loss-scaled T^2 for multivariate statistical process control. We show for one case how the proposed test statistic has faster detection than Hotelling's T^2 of shifts in location for variables with high weighting in the MQL. In addition we introduce a principal component based decomposition of Hotelling's T^2 to diagnose the variables responsible for driving the location and/or dispersion of a subgroup of multivariate observations out of statistical control. We demonstrate the accuracy of this diagnostic technique on a data set from the literature and show its potential for diagnosing the loss-scaled T^2 statistic as well. Multivariate statistical process control Robust design Loss function Principal components Quality control Statistical methods Multivariate analysis Principal components analysis Process control Statistical methods
142	Investigation of probabilistic principal component analysis compared to proper orthogonal decomposition methods for basis extraction and missing data estimation Lee, Kyunghoon 21 May 2010 (has links) The identification of flow characteristics and the reduction of high-dimensional simulation data have capitalized on an orthogonal basis achieved by proper orthogonal decomposition (POD), also known as principal component analysis (PCA) or the Karhunen-Loeve transform (KLT). In the realm of aerospace engineering, an orthogonal basis is versatile for diverse applications, especially associated with reduced-order modeling (ROM) as follows: a low-dimensional turbulence model, an unsteady aerodynamic model for aeroelasticity and flow control, and a steady aerodynamic model for airfoil shape design. Provided that a given data set lacks parts of its data, POD is required to adopt a least-squares formulation, leading to gappy POD, using a gappy norm that is a variant of an L2 norm dealing with only known data. Although gappy POD is originally devised to restore marred images, its application has spread to aerospace engineering for the following reason: various engineering problems can be reformulated in forms of missing data estimation to exploit gappy POD. Similar to POD, gappy POD has a broad range of applications such as optimal flow sensor placement, experimental and numerical flow data assimilation, and impaired particle image velocimetry (PIV) data restoration. Apart from POD and gappy POD, both of which are deterministic formulations, probabilistic principal component analysis (PPCA), a probabilistic generalization of PCA, has been used in the pattern recognition field for speech recognition and in the oceanography area for empirical orthogonal functions in the presence of missing data. In formulation, PPCA presumes a linear latent variable model relating an observed variable with a latent variable that is inferred only from an observed variable through a linear mapping called factor-loading. To evaluate the maximum likelihood estimates (MLEs) of PPCA parameters such as a factor-loading, PPCA can invoke an expectation-maximization (EM) algorithm, yielding an EM algorithm for PPCA (EM-PCA). By virtue of the EM algorithm, the EM-PCA is capable of not only extracting a basis but also restoring missing data through iterations whether the given data are intact or not. Therefore, the EM-PCA can potentially substitute for both POD and gappy POD inasmuch as its accuracy and efficiency are comparable to those of POD and gappy POD. In order to examine the benefits of the EM-PCA for aerospace engineering applications, this thesis attempts to qualitatively and quantitatively scrutinize the EM-PCA alongside both POD and gappy POD using high-dimensional simulation data. In pursuing qualitative investigations, the theoretical relationship between POD and PPCA is transparent such that the factor-loading MLE of PPCA, evaluated by the EM-PCA, pertains to an orthogonal basis obtained by POD. By contrast, the analytical connection between gappy POD and the EM-PCA is nebulous because they distinctively approximate missing data due to their antithetical formulation perspectives: gappy POD solves a least-squares problem whereas the EM-PCA relies on the expectation of the observation probability model. To juxtapose both gappy POD and the EM-PCA, this research proposes a unifying least-squares perspective that embraces the two disparate algorithms within a generalized least-squares framework. As a result, the unifying perspective reveals that both methods address similar least-squares problems; however, their formulations contain dissimilar bases and norms. Furthermore, this research delves into the ramifications of the different bases and norms that will eventually characterize the traits of both methods. To this end, two hybrid algorithms of gappy POD and the EM-PCA are devised and compared to the original algorithms for a qualitative illustration of the different basis and norm effects. After all, a norm reflecting a curve-fitting method is found to more significantly affect estimation error reduction than a basis for two example test data sets: one is absent of data only at a single snapshot and the other misses data across all the snapshots. From a numerical performance aspect, the EM-PCA is computationally less efficient than POD for intact data since it suffers from slow convergence inherited from the EM algorithm. For incomplete data, this thesis quantitatively found that the number of data-missing snapshots predetermines whether the EM-PCA or gappy POD outperforms the other because of the computational cost of a coefficient evaluation, resulting from a norm selection. For instance, gappy POD demands laborious computational effort in proportion to the number of data-missing snapshots as a consequence of the gappy norm. In contrast, the computational cost of the EM-PCA is invariant to the number of data-missing snapshots thanks to the L2 norm. In general, the higher the number of data-missing snapshots, the wider the gap between the computational cost of gappy POD and the EM-PCA. Based on the numerical experiments reported in this thesis, the following criterion is recommended regarding the selection between gappy POD and the EM-PCA for computational efficiency: gappy POD for an incomplete data set containing a few data-missing snapshots and the EM-PCA for an incomplete data set involving multiple data-missing snapshots. Last, the EM-PCA is applied to two aerospace applications in comparison to gappy POD as a proof of concept: one with an emphasis on basis extraction and the other with a focus on missing data reconstruction for a given incomplete data set with scattered missing data. The first application exploits the EM-PCA to efficiently construct reduced-order models of engine deck responses obtained by the numerical propulsion system simulation (NPSS), some of whose results are absent due to failed analyses caused by numerical instability. Model-prediction tests validate that engine performance metrics estimated by the reduced-order NPSS model exhibit considerably good agreement with those directly obtained by NPSS. Similarly, the second application illustrates that the EM-PCA is significantly more cost effective than gappy POD at repairing spurious PIV measurements obtained from acoustically-excited, bluff-body jet flow experiments. The EM-PCA reduces computational cost on factors 8 ~ 19 compared to gappy POD while generating the same restoration results as those evaluated by gappy POD. All in all, through comprehensive theoretical and numerical investigation, this research establishes that the EM-PCA is an efficient alternative to gappy POD for an incomplete data set containing missing data over an entire data set. Basis extraction Gappy proper orthogonal decomposition Proper orthogonal decomposition Missing data estimation Orthogonal decompositions Principal components analysis Expectation-maximization algorithms
143	Target tracking using residual vector quantization Aslam, Salman Muhammad 18 November 2011 (has links) In this work, our goal is to track visual targets using residual vector quantization (RVQ). We compare our results with principal components analysis (PCA) and tree structured vector quantization (TSVQ) based tracking. This work is significant since PCA is commonly used in the Pattern Recognition, Machine Learning and Computer Vision communities. On the other hand, TSVQ is commonly used in the Signal Processing and data compression communities. RVQ with more than two stages has not received much attention due to the difficulty in producing stable designs. In this work, we bring together these different approaches into an integrated tracking framework and show that RVQ tracking performs best according to multiple criteria on publicly available datasets. Moreover, an advantage of our approach is a learning-based tracker that builds the target model while it tracks, thus avoiding the costly step of building target models prior to tracking. Visual tracking RVQ Tree structured vector quantization PCA Principal components analysis TSVQ Computer vision Residual vector quantization Image processing Imaging systems Computer graphics Image processing Digital techniques
144	Modeling of linkage disequilibrium in whole genome genetic association studies. / Modélisation du déséquilibre de liaison dans les études d’association génome entier Johnson, Randall 19 December 2014 (has links) L’approche GWAS est un outil essentiel pour la découverte de gènes associés aux maladies, mais elle pose des problèmes de puissance statistique quand il est impossible d’échantillonner génétiquement des dizaines de milliers de sujets. Les résultats présentés ici—ALDsuite, un programme en utilisant une correction nouvelle et efficace pour le déséquilibre de liaison (DL) ancestrale de la population locale, en permettant l'utilisation de marqueurs denses dans le MALD, et la démonstration que la méthode simpleM fournit une correction optimale pour les comparaisons multiples dans le GWAS—réaffirment la valeur de l'analyse en composantes principales (APC) pour capturer l’essence de la complexité des systèmes de grande dimension. L’APC est déjà la norme pour corriger la structure de la population dans le GWAS; mes résultats indiquent qu’elle est aussi une stratégie générale pour faire face à la forte dimensionnalité des données génomiques d'association. / GWAS is an essential tool for disease gene discovery, but has severe problems of statistical power when it is impractical to genetically sample tens of thousands of subjects. The results presented here—a novel, effective correction for local ancestral population LD allowing use of dense markers in MALD using the ALDsuite and the demonstration that the simpleM method provides an optimum Bonferroni correction for multiple comparisons in GWAS, reiterate the value of PCA for capturing the essential part of the complexity of high- dimensional systems. PCA is already standard for correcting for population substructure in GWAS; my results point to it’s broader applicability as a general strategy for dealing with the high dimensionality of genomic association data. Gwas Association génétique Génome-Entier Statistiques Correction Analyse par composantes principales Gwas Genetic association Genome-Wide Statistics Correction Principal components analysis 576
145	Super-resolution image processing with application to face recognition Lin, Frank Chi-Hao January 2008 (has links) Subject identification from surveillance imagery has become an important task for forensic investigation. Good quality images of the subjects are essential for the surveillance footage to be useful. However, surveillance videos are of low resolution due to data storage requirements. In addition, subjects typically occupy a small portion of a camera's field of view. Faces, which are of primary interest, occupy an even smaller array of pixels. For reliable face recognition from surveillance video, there is a need to generate higher resolution images of the subject's face from low-resolution video. Super-resolution image reconstruction is a signal processing based approach that aims to reconstruct a high-resolution image by combining a number of low-resolution images. The low-resolution images that differ by a sub-pixel shift contain complementary information as they are different "snapshots" of the same scene. Once geometrically registered onto a common high-resolution grid, they can be merged into a single image with higher resolution. As super-resolution is a computationally intensive process, traditional reconstruction-based super-resolution methods simplify the problem by restricting the correspondence between low-resolution frames to global motion such as translational and affine transformation. Surveillance footage however, consists of independently moving non-rigid objects such as faces. Applying global registration methods result in registration errors that lead to artefacts that adversely affect recognition. The human face also presents additional problems such as selfocclusion and reflectance variation that even local registration methods find difficult to model. In this dissertation, a robust optical flow-based super-resolution technique was proposed to overcome these difficulties. Real surveillance footage and the Terrascope database were used to compare the reconstruction quality of the proposed method against interpolation and existing super-resolution algorithms. Results show that the proposed robust optical flow-based method consistently produced more accurate reconstructions. This dissertation also outlines a systematic investigation of how super-resolution affects automatic face recognition algorithms with an emphasis on comparing reconstruction- and learning-based super-resolution approaches. While reconstruction-based super-resolution approaches like the proposed method attempt to recover the aliased high frequency information, learning-based methods synthesise them instead. Learning-based methods are able to synthesise plausible high frequency detail at high magnification ratios but the appearance of the face may change to the extent that the person no longer looks like him/herself. Although super-resolution has been applied to facial imagery, very little has been reported elsewhere on measuring the performance changes from super-resolved images. Intuitively, super-resolution improves image fidelity, and hence should improve the ability to distinguish between faces and consequently automatic face recognition accuracy. This is the first study to comprehensively investigate the effect of super-resolution on face recognition. Since super-resolution is a computationally intensive process it is important to understand the benefits in relation to the trade-off in computations. A framework for testing face recognition algorithms with multi-resolution images was proposed, using the XM2VTS database as a sample implementation. Results show that super-resolution offers a small improvement over bilinear interpolation in recognition performance in the absence of noise and that super-resolution is more beneficial when the input images are noisy since noise is attenuated during the frame fusion process. super-resolution face recognition optical flow image processing surveillance video computer vision pattern recognition biometrics principal components analysis elastic bunch graph matching
146	Εφαρμογή της παραγοντικής ανάλυσης για την ανίχνευση και περιγραφή της κατανάλωσης αλκοολούχων ποτών του ελληνικού πληθυσμού Ρεκούτη, Αγγελική 21 October 2011 (has links) Σκοπός της εργασίας αυτής είναι να εφαρμόσουμε την Παραγοντική Ανάλυση στο δείγμα μας, έτσι ώστε να ανιχνεύσουμε και να περιγράψουμε τις καταναλωτικές συνήθειες του Ελληνικού πληθυσμού ως προς την κατανάλωση 9 κατηγοριών αλκοολούχων ποτών. Η εφαρμογή της μεθόδου γίνεται με την χρήση του στατιστικού προγράμματος SPSS. Στο πρώτο κεφάλαιο παρουσιάζεται η οικογένεια μεθόδων επίλυσης του προβλήματος και στο δεύτερο η μέθοδος που επιλέχτηκε για την επίλυση, η Παραγοντική Ανάλυση. Προσδιορίζουμε το αντικείμενο, τα στάδια σχεδιασμού και τις προϋποθέσεις της μεθόδου, καθώς και τα κριτήρια αξιολόγησης των αποτελεσμάτων. Τα κεφάλαια που ακολουθούν αποτελούν το πρακτικό μέρος της εργασίας. Στο 3ο κεφάλαιο αναφέρουμε την πηγή των δεδομένων μας και την διεξαγωγή του τρόπου συλλογής τους. Ακολουθεί ο εντοπισμός των «χαμένων» απαντήσεων και εφαρμόζεται η Ανάλυση των Χαμένων Τιμών (Missing Values Analysis) για τον προσδιορισμό του είδους αυτών και την αποκατάσταση τους στο δείγμα. Στην συνέχεια παρουσιάζουμε το δείγμα μας με τη βοήθεια της περιγραφικής στατιστικής και τέλος δημιουργούμε και περιγράφουμε το τελικό μητρώο δεδομένων το οποίο θα αναλύσουμε παραγοντικά. Στο 4ο και τελευταίο κεφάλαιο διερευνάται η καταλληλότητα του δείγματος για την εφαρμογή της Παραγοντικής Ανάλυσης με τον έλεγχο της ικανοποίησης των προϋποθέσεων της μεθόδου. Ακολουθεί η παράλληλη μελέτη του δείγματος συμπεριλαμβάνοντας και μη στην επίλυση τις ακραίες τιμές (outliers) που εντοπίστηκαν. Καταλήγοντας στο συμπέρασμα ότι οι ακραίες τιμές δεν επηρεάζουν τα αποτελέσματα της μεθόδου, εφαρμόζουμε την Παραγοντική Ανάλυση με τη χρήση της μεθόδου των κυρίων συνιστωσών και αναφέρουμε αναλυτικά όλα τα βήματα μέχρι να καταλήξουμε στα τελικά συμπεράσματα μας. / The purpose of this paper is to apply the Factor Analysis to our sample in order to detect and describe patterns concerning the consumption of 9 categories of alcoholic beverages by the Greek population. For the application of the method, we use the statistical program SPSS. The first chapter presents the available methods for solving this problem and the second one presents the chosen method, namely Factor Analysis. We specify the objective of the analysis, the design and the critical assumptions of the method, as well as the criteria for the evaluation of the results. In the third chapter we present the source of our data and how the sampling was performed. Furthermore, we identify the missing values and we apply the Missing Values Analysis to determine their type. We also present our sample using descriptive statistics and then create and describe the final matrix which we analyze with Factor Analysis. In the fourth and last chapter we investigate the suitability of our samples for applying Factor Analysis. In the sequence, we perform the parallel study of our sample both including and not including the extreme values that we identified (which we call “outliers”). We conclude that the outliers do not affect the results of our method and then apply Factor Analysis using the extraction method of Principal Components. We also mention in detail all steps until reaching our final conclusions. Παραγοντική ανάλυση Χαμένες τιμές Ακραίες τιμές 519.535 4 Multivariate methods Exploratory factor analysis Missing values Outliers Principal components Analysis Validation of results
147	APLICAÇÃO DE TÉCNICAS ESTATÍSTICAS MULTIVARIADAS EM DADOS DE CERÂMICA VERMELHA PRODUZIDA NA REGIÃO CENTRAL DO RIO GRANDE DO SUL Saad, Danielle de Souza 10 September 2009 (has links) This work aimed the application of multivaried statistical techniques using Software STATISTICA 7.0 for Windows, in the analysis of red ceramics data produced in the Central region of the state of the Rio Grande do Sul. The used variable had been: total monthly production, massive number of ceramic industries, bricks, structural blocks. The used techniques had been Cluster Analysis, Factor Analysis, and Principal Components Analysis. The objective of the technique of Cluster Analysis is to determine the degree of similarity between the variables. The Factor Analysis aims to reduce the number variable analyzed in agreement with the Cluster Analysis. The degree of contribution of the variable in the formation of the factors is identified by the technique of Principal Components Analysis. The work concluded that the techniques can be applied in data of ceramic products, because of the results had confirmed previous works. The employed techniques had demonstrated to be pertinent to the considered objectives. / Este trabalho visou empregar técnicas estatísticas multivariadas através do Software STATISTICA 7.0 for Windows, na análise de dados de cerâmica vermelha produzidos na região Central do estado do Rio Grande do Sul. As variáveis utilizadas foram: produção mensal total, número de indústrias cerâmicas, tijolos maciços, blocos de vedação e blocos estruturais. As técnicas utilizadas foram Análise de Agrupamento, Análise Fatorial, e Análise de Componentes Principais. O objetivo da técnica de Análise de Agrupamento é determinar o grau de similaridade entre as variáveis. A Análise Fatorial visa reduzir o número de variáveis analisadas em concordância com a Análise de Agrupamento. O grau de contribuição das variáveis na formação dos fatores é identificado pela técnica de Análise de Componentes Principais. No trabalho concluiu-se que as técnicas podem ser aplicadas em dados de produtos cerâmicos, pois os resultados obtidos confirmaram resultados e conclusões obtidas em trabalhos anteriores. As técnicas empregadas demonstraram ser pertinentes aos objetivos propostos. Cerâmica vermelha Análise de agrupamentos Análise fatorial Análise de componentes principais Red ceramics Cluster analysis Factor analysis Principal components analysis CNPQ::ENGENHARIAS::ENGENHARIA CIVIL
148	Sistemas de polinização em fragmentos de Cerrado na região do Alto Taquari (GO, MS, MT). Martins, Fernanda Quintas 25 February 2005 (has links) Made available in DSpace on 2016-06-02T19:32:13Z (GMT). No. of bitstreams: 1 DissFQM.pdf: 973806 bytes, checksum: e915584ac61462d2e8ef3840ece7c9a9 (MD5) Previous issue date: 2005-02-25 / Universidade Federal de Sao Carlos / The Cerrado Domain occupied originally 23% of the Brazilian territory (ca. 2 million km2), especially in the Central Plateau, being the second largest phytogeographic province of Brazil. The cerrado vegetation is not uniform in physiognomy, ranging from grassland to tall woodland, but most of its physiognomies lie within the range defined as tropical savanna. Is estimated that 3,000 to 7,000 vascular plant species occur in this vegetation type, from which 1,000 to 2,000 belong to the woody component. Different authors have attempted to use reproductive features to explain the general patterns of diversity and community structure found in tropical woodlands with the underlying idea that plant diversity and spatial distribution is dependent on reproductive processes. Studies on the reproductive biology of cerrado plant species have shown a great diversity of pollination systems, similar to those found in Neotropical forests. The data emerging for the reproductive biology of plants have important consequences for conservation and understanding of the organization of cerrado communities. We sampled five cerrado fragments in the Brazilian Central Plateau, in which we sampled woody individuals. Using the floristic data of all our field trips, we sampled 2,280 individuals, representing 121 species and 38 families. The richest families were Fabaceae and Myrtaceae, and Davilla elliptica A. St-Hill and Myrcia bella Triana were the best represented species. Most species presented open flowers, with diurnal anthesis, pale colors and with pollen as floral reward. In the cerrado vegetation, species with flowers visited mainly by bees and small insects were the main groups ecologically related to the pollination. Of the 121 species, 65 were pollinated mainly by bees; 30, by small insects; 15, by moths; five, by bats; three, by beetles; two, by hummingbirds; and one, by wind. The ordination analysis of floral characteristics and plant species showed that there was a grouping of species with some pollination systems, for which inferences based on floral characteristics are recommended, such as the species pollinated by bats, moths, and birds. On the other hand, for the species pollinated mainly by bees and small insects, these inferences are not recommended due their great dispersion throughout ordination axes and large overlapping. These dispersion and overlapping occurred probably due the absence of specificity between plants and pollinators. For four of the five pollination systems with at least ten individuals, we found no significant variation in relation to distance from edge, except for plants pollinated by beetles, for which there was a decrease in the frequency to toward the fragment interior. Similarly, we only found significant variation in relation to the height for plants pollinated by bats, for which there was an increase of the frequency with the height of the trees. In general, we found no horizontal and vertical variations in the pollination systems, contrary to what was found in forests and, probably, as consequence of the more open physiognomy of the cerrado fragments. / O Domínio do Cerrado ocupava originalmente cerca de 23% do território brasileiro (aproximadamente 2 milhões de km2), especialmente no Planalto Central, sendo a segunda maior província fitogeográfica do Brasil. A vegetação de cerrado não é uniforme na sua fisionomia, variando desde campo limpo a cerradão, mas a maior parte das fisionomias se enquadra na definição de savana . É estimado que ocorra de 3.000 a 7.000 espécies de plantas vasculares nesse tipo de vegetação, das quais de 1.000 a 2.000 espécies pertencem ao componente arbustivoarbóreo. Diferentes autores tentaram usar características reprodutivas para explicar os padrões gerais de diversidade e estrutura de comunidade encontrados em florestas tropicais, com a idéia de que a diversidade das plantas e a distribuição espacial são dependentes de processos reprodutivos. Estudos na biologia reprodutiva de espécies de planta de cerrado mostraram uma grande diversidade de sistemas de polinização, semelhantes àqueles encontrados em florestas neotropicais. Os dados que emergem para a biologia reprodutiva de plantas têm conseqüências importantes para conservação e entendimento da organização das comunidades de cerrado. Amostramos cinco fragmentos de cerrado sensu stricto no Planalto Central brasileiro, em que amostramos os indivíduos arbustivo-arbóreos. Usando os dados florísticos de todas as nossas coletas, nós amostramos 2.280 indivíduos, representando 121 espécies e 38 famílias. As famílias mais ricas foram Fabaceae e Myrtaceae, sendo Davilla elliptica A. St-Hill e Myrcia bella Triana as espécies mais bem representadas. A maioria das espécies apresentou flores abertas, com antese diurna, cores claras e pólen como recompensa floral. Na vegetação de cerrado, as espécies com flores visitadas principalmente por abelhas e também pelos insetos pequenos formaram os principais grupos ecologicamente relacionados com a polinização. Das 121 espécies, 65 foram polinizadas principalmente por abelhas; 30 por insetos pequenos; 15 por mariposas; cinco por morcegos; três por besouros; dois por beija-flores e um pelo vento. A análise de ordenação dos caracteres florais e das espécies vegetais mostrou que houve um agrupamento entre espécies com alguns sistemas de polinização, para os quais inferências baseadas em caracteres florais são recomendadas, como as espécies polinizadas por morcegos, mariposas e aves. Já com relação às espécies polinizadas principalmente por abelhas e insetos pequenos, essas inferências baseadas em caracteres florais não são recomendadas devido à grande dispersão e sobreposição entre essas duas classes. A grande dispersão e sobreposição das classes de abelhas e insetos pequenos ocorreram provavelmente devido à ausência de especificidade nas relações planta-polinizador. Para quatro dos cinco sistemas de polinização com pelo menos dez indivíduos, nós não encontramos nenhuma variação significativa em relação à distância da borda do fragmento, exceto para as plantas polinizadas por besouros, para as quais houve uma diminuição na freqüência em direção ao interior do fragmento. De maneira semelhante, encontramos variação significativa em relação à altura somente para plantas polinizadas por morcegos, para as quais houve um aumento da freqüência com a altura das árvores. Em geral, não encontramos variações horizontais e verticais nos sistemas de polinização, ao contrário do que foi encontrado em florestas, provavelmente, como conseqüência da fisionomia mais aberta dos fragmentos de cerrado. Cerrados Polinização Distribuição espacial Central Brazil Cerrado Floral traits Pollination Principal Components Analysis Spatial distribution Stratification
149	Modeling of linkage disequilibrium in whole genome genetic association studies / Modélisation du déséquilibre de liaison dans les études d’association génome entier Johnson, Randall 19 December 2014 (has links) L’approche GWAS est un outil essentiel pour la découverte de gènes associés aux maladies, mais elle pose des problèmes de puissance statistique quand il est impossible d’échantillonner génétiquement des dizaines de milliers de sujets. Les résultats présentés ici—ALDsuite, un programme en utilisant une correction nouvelle et efficace pour le déséquilibre de liaison (DL) ancestrale de la population locale, en permettant l'utilisation de marqueurs denses dans le MALD, et la démonstration que la méthode simpleM fournit une correction optimale pour les comparaisons multiples dans le GWAS—réaffirment la valeur de l'analyse en composantes principales (APC) pour capturer l’essence de la complexité des systèmes de grande dimension. L’APC est déjà la norme pour corriger la structure de la population dans le GWAS; mes résultats indiquent qu’elle est aussi une stratégie générale pour faire face à la forte dimensionnalité des données génomiques d'association. / GWAS is an essential tool for disease gene discovery, but has severe problems of statistical power when it is impractical to genetically sample tens of thousands of subjects. The results presented here—a novel, effective correction for local ancestral population LD allowing use of dense markers in MALD using the ALDsuite and the demonstration that the simpleM method provides an optimum Bonferroni correction for multiple comparisons in GWAS, reiterate the value of PCA for capturing the essential part of the complexity of high- dimensional systems. PCA is already standard for correcting for population substructure in GWAS; my results point to it’s broader applicability as a general strategy for dealing with the high dimensionality of genomic association data. Gwas Association génétique Génome-Entier Statistiques Correction Analyse par composantes principales Gwas Genetic association Genome-Wide Statistics Correction Principal components analysis 576
150	Relação entre as componentes principais da estrutura a termo da taxa de juros brasileira e as variáveis macroeconômicas Obara, Victor Hideki 11 February 2014 (has links) Submitted by Victor Hideki Obara (victor.obara@gmail.com) on 2014-03-06T18:17:06Z No. of bitstreams: 1 Dissertação_VictorObara2014.pdf: 1045588 bytes, checksum: 53366e74ed0e394dca903abf1be27bf7 (MD5) / Approved for entry into archive by Suzinei Teles Garcia Garcia (suzinei.garcia@fgv.br) on 2014-03-06T18:34:02Z (GMT) No. of bitstreams: 1 Dissertação_VictorObara2014.pdf: 1045588 bytes, checksum: 53366e74ed0e394dca903abf1be27bf7 (MD5) / Made available in DSpace on 2014-03-06T19:39:03Z (GMT). No. of bitstreams: 1 Dissertação_VictorObara2014.pdf: 1045588 bytes, checksum: 53366e74ed0e394dca903abf1be27bf7 (MD5) Previous issue date: 2014-02-11 / Este trabalho observa como as variáveis macroeconômicas (expectativa de inflação, juro real, hiato do produto e a variação cambial) influenciam a dinâmica da Estrutura a Termo da Taxa de Juros (ETTJ). Esta dinâmica foi verificada introduzindo a teoria de Análise de Componentes Principais (ACP) para capturar o efeito das componentes mais relevantes na ETTJ (nível, inclinação e curvatura). Utilizando-se as estimativas por mínimos quadrados ordinários e pelo método generalizado dos momentos, foi verificado que existe uma relação estatisticamente significante entre as variáveis macroeconômicas e as componentes principais da ETTJ. / This paper observes how the macroeconomic variables (inflation expectations, real interest rate, output gap and the exchange rate) influence the dynamics of the Term Structure of Interest Rates (TSIR). This dynamic was verified by introducing the theory of Principal Component Analysis (PCA) to capture the effect of the most important components in TSIR (level, slope and curvature). Using ordinary least square estimation and the generalized method of moments, it was verified that there is a statistical significant relationship between macroeconomic variables and TSIR principal components. Estrutura a termo da taxa de juros Análise de componentes principais Variáveis macroeconômicas Term structure of interest rate Principal components analysis Macroeconomic variables Economia Macroeconomia Taxas de juros - Brasil Análise de componentes principais

Search results