Global ETD Search

21	Statistical Methods for the Analysis of Mass Spectrometry-based Proteomics Data Wang, Xuan 2012 May 1900 (has links) Proteomics serves an important role at the systems-level in understanding of biological functioning. Mass spectrometry proteomics has become the tool of choice for identifying and quantifying the proteome of an organism. In the most widely used bottom-up approach to MS-based high-throughput quantitative proteomics, complex mixtures of proteins are first subjected to enzymatic cleavage, the resulting peptide products are separated based on chemical or physical properties and then analyzed using a mass spectrometer. The three fundamental challenges in the analysis of bottom-up MS-based proteomics are as follows: (i) Identifying the proteins that are present in a sample, (ii) Aligning different samples on elution (retention) time, mass, peak area (intensity) and etc, (iii) Quantifying the abundance levels of the identified proteins after alignment. Each of these challenges requires knowledge of the biological and technological context that give rise to the observed data, as well as the application of sound statistical principles for estimation and inference. In this dissertation, we present a set of statistical methods in bottom-up proteomics towards protein identification, alignment and quantification. We describe a fully Bayesian hierarchical modeling approach to peptide and protein identification on the basis of MS/MS fragmentation patterns in a unified framework. Our major contribution is to allow for dependence among the list of top candidate PSMs, which we accomplish with a Bayesian multiple component mixture model incorporating decoy search results and joint estimation of the accuracy of a list of peptide identifications for each MS/MS fragmentation spectrum. We also propose an objective criteria for the evaluation of the False Discovery Rate (FDR) associated with a list of identifications at both peptide level, which results in more accurate FDR estimates than existing methods like PeptideProphet. Several alignment algorithms have been developed using different warping functions. However, all the existing alignment approaches suffer from a useful metric for scoring an alignment between two data sets and hence lack a quantitative score for how good an alignment is. Our alignment approach uses "Anchor points" found to align all the individual scan in the target sample and provides a framework to quantify the alignment, that is, assigning a p-value to a set of aligned LC-MS runs to assess the correctness of alignment. After alignment using our algorithm, the p-values from Wilcoxon signed-rank test on elution (retention) time, M/Z, peak area successfully turn into non-significant values. Quantitative mass spectrometry-based proteomics involves statistical inference on protein abundance, based on the intensities of each protein's associated spectral peaks. However, typical mass spectrometry-based proteomics data sets have substantial proportions of missing observations, due at least in part to censoring of low intensities. This complicates intensity-based differential expression analysis. We outline a statistical method for protein differential expression, based on a simple Binomial likelihood. By modeling peak intensities as binary, in terms of "presence / absence", we enable the selection of proteins not typically amendable to quantitative analysis; e.g., "one-state" proteins that are present in one condition but absent in another. In addition, we present an analysis protocol that combines quantitative and presence / absence analysis of a given data set in a principled way, resulting in a single list of selected proteins with a single associated FDR. Proteomics Mass spectrometry Bottom-up Identification Alignment Quantitation LC-MS PSM Bayesian Hierarchical Model FDR Anchor Points Missingness
22	The Relative Effects of Functional Diversity and Structural Complexity on Carbon Dynamics in Late-Successional, Northeastern Mixed Hardwood Forests Myers, Samantha 03 April 2023 (has links) (PDF) Late-successional forests provide a unique opportunity to explore adaptive management approaches that mitigate atmospheric carbon dioxide levels through carbon storage while also enhancing ecological resilience to novel climate and disturbances. Typical benchmarks for adaptive forest management include species diversity and structural complexity, which are widely considered to increase ecosystem stability and productivity. However, the role of functional trait diversity (e.g., variation in leaf and stem traits) in driving forest productivity and ecosystem resilience remains underexplored. We leveraged existing continuous forest inventory (CFI) data and collected local functional trait observations from CFI plots within late-successional forests in western Massachusetts to explore links between aboveground carbon storage and different types of forest diversity. We then fit a linear model within a Bayesian hierarchical framework applying functional diversity, species diversity, and structural complexity as predictors of live aboveground biomass (AGB) within CFI plots. Our framework integrates local functional trait information with database species mean trait values using a multivariate structure to account for inherent trait syndromes and estimate functional diversity in each plot. Across 626 plot-timepoints, we found that integrating individual functional trait information from co-located plots yielded the best predictions of live AGB. Contrary to expectations, functional diversity had a negative relationship with live AGB. Whereas plots with low functional diversity and higher AGB were dominated by mid-to-late successional hardwood species, plots with high functional diversity had more shade-intolerant species and lower AGB mediated by recent small-scale disturbances. Our results reveal an ontogenetic shift in the effects of functional diversity on AGB productivity over the course of succession in northeastern temperate forests. Corroborating with classical models of biomass development in late-successional northern hardwood forests, our findings support the need for adaptive forest carbon management to facilitate a mosaic of different forest successional stages across the landscape to maximize live aboveground carbon benefits in northeastern mixed hardwood forests. forest carbon structural complexity functional trait diversity late-successional hardwood forests Bayesian hierarchical model Forest Management Natural Resources and Conservation
23	Bayesian hierarchical approaches to analyze spatiotemporal dynamics of fish populations Bi, Rujia 03 September 2020 (has links) The study of spatiotemporal dynamics of fish populations is important for both stock assessment and fishery management. I explored the impacts of environmental and anthropogenic factors on spatiotemporal patterns of fish populations, and contributed to stock assessment and management by incorporating the inherent spatial structure. Hierarchical models were developed to specify spatial and temporal variations, and Bayesian methods were adopted to fit the models. Yellow perch (Perca flavescens) is one of the most important commercial and recreational fisheries in Lake Erie, which is currently managed using four management units (MUs), with each assessed by a spatially-independent stock-specific assessment model. The current spatially-independent stock-specific assessment assumes that movement of yellow perch among MUs in Lake Erie is statistically negligible and biologically insignificant. I investigated whether the assumption is violated and the effect this assumption has on assessment. I first explored the spatiotemporal patterns of yellow perch abundance in Lake Erie based on data from a 27-year gillnet survey, and analyzed the impacts of environmental factors on spatiotemporal dynamics of the population. I found that yellow perch relative biomass index displayed clear temporal variation and spatial heterogeneity, however the two middle MUs displayed spatial similarities. I then developed a state-space model based on a 7-year tag-recovery data to explore movements of yellow perch among MUs, and performed a simulation analysis to evaluate the impacts of sample size on movement estimates. The results suggested substantial movement between the two stocks in the central basin, and the accuracy and precision of movement estimates increased with increasing sample size. These results demonstrate that the assumption on movements among MUs is violated, and it is necessary to incorporate regional connectivity into stock assessment. I thus developed a tag-integrated multi-region model to incorporate movements into a spatial stock assessment by integrating the tag-recovery data with 45-years of fisheries data. I then compared population projections such as recruitment and abundance derived from the tag-integrated multi-region model and the current spatial-independent stock-specific assessment model to detect the influence of hypotheses on with/without movements among MUs. Differences between the population projections from the two models suggested that the integration of regional stock dynamics has significant influence on stock estimates. American Shad (Alosa sapidissima), Hickory Shad (A. mediocris) and river herrings, including Alewife (A. pseudoharengus) and Blueback Herring (A. aestivalis), are anadromous pelagic fishes that spend most of the annual cycle at sea and enter coastal rivers in spring to spawn. Alosa fisheries were once one of the most valuable along the Atlantic coast, but have declined in recent decades due to pollution, overfishing and dam construction. Management actions have been implemented to restore the populations, and stocks in different river systems have displayed different recovery trends. I developed a Bayesian hierarchical spatiotemporal model to identify the population trends of these species among rivers in the Chesapeake Bay basin and to identify environmental and anthropogenic factors influencing their distribution and abundance. The results demonstrated river-specific heterogeneity of the spatiotemporal dynamics of these species and indicated the river-specific impacts of multiple factors including water temperature, river flow, chlorophyll a concentration and total phosphorus concentration on their population dynamics. Given the importance of these two case studies, analyses to diagnose the factors influencing population dynamics and to develop models to consider spatial complexity are highly valuable to practical fisheries management. Models incorporating spatiotemporal variation describe population dynamics more accurately, improve the accuracy of stock assessments, and would provide better recommendations for management purposes. / Doctor of Philosophy / Many fish populations exhibit complex spatial structure, but the spatial patterns have been incorporated into stock assessment only in few cases. A full understanding of spatial structure of fish populations is needed to better manage the populations. Stock assessment and management strategies should depend on the inherent spatial structure of the target fish population. There have been many approaches developed to analyze spatial structure of fish populations. In this dissertation, I developed quantitative models to analyze fish demographic data and tagging data to explore spatial structure of fish populations. Yellow perch (Perca flavescens) in Lake Erie and Alosa group including American Shad (Alosa sapidissima), Hickory Shad (A. mediocris) and river herrings (Alewife A. pseudoharengus and Blueback Herring A. aestivalis) in selected tributaries of the Chesapeake Bay were taken as examples. Fishery-independent data for yellow perch displayed spatial similarities in the central basin of Lake Erie. Distinct temporal trends were observed in relative abundance data for Alosa sp. in different tributaries of the Chesapeake Bay. Substantial yellow perch movement among the central basin of the Lake was observed in tagging data. Ignoring the inherent spatial structure may cause fish to be overfished in some regions and underfished in others. To maximize the effectiveness of management in all regions for fish populations, I highly recommend incorporating spatial structure into stock assessment and management such as the ones developed in this dissertation. Yellow perch Lake Erie Alosa Chesapeake Bay Bayesian hierarchical model spatiotemporal dynamics tag-recovery sample size Simulation movement stock assessment
24	Bayesian Hierarchical Latent Model for Gene Set Analysis Chao, Yi 13 May 2009 (has links) Pathway is a set of genes which are predefined and serve a particular celluar or physiological function. Ranking pathways relevant to a particular phenotype can help researchers focus on a few sets of genes in pathways. In this thesis, a Bayesian hierarchical latent model was proposed using generalized linear random effects model. The advantage of the approach was that it can easily incorporate prior knowledges when the sample size was small and the number of genes was large. For the covariance matrix of a set of random variables, two Gaussian random processes were considered to construct the dependencies among genes in a pathway. One was based on the polynomial kernel and the other was based on the Gaussian kernel. Then these two kernels were compared with constant covariance matrix of the random effect by using the ratio, which was based on the joint posterior distribution with respect to each model. For mixture models, log-likelihood values were computed at different values of the mixture proportion, compared among mixtures of selected kernels and point-mass density (or constant covariance matrix). The approach was applied to a data set (Mootha et al., 2003) containing the expression profiles of type II diabetes where the motivation was to identify pathways that can discriminate between normal patients and patients with type II diabetes. / Master of Science Pathway based analysis Point-mass density Probit regression model Bayesian hierarchical model Latent variable Generalized linear mixed model
25	Modélisation spatio-temporelle pour l'esca de la vigne à l'échelle de la parcelle / Spatio-temporal modelling of esca grapevine disease at vineyard scale Li, Shuxian 16 December 2015 (has links) L'esca de la vigne fait partie des maladies de dépérissement incurables dont l'étiologie n'est pas complément élucidée. Elle représente un des problèmes majeurs en viticulture. L'objectif général de cette thèse est d'améliorer la compréhension des processus épidémiques et des facteurs de risque. Pour ce faire, nous avons mené une étude quantitative du développement spatio-temporel de l'esca à l'échelle de la parcelle. Dans un premier temps, pour détecter d'éventuelles corrélations spatiales entre les cas de maladie, des tests statistiques non paramétriques sont appliqués aux données spatio-temporelles d'expression foliaires de l'esca pour 15 parcelles du bordelais. Une diversité de profils spatiaux, allant d'une distribution aléatoire à fortement structurée est trouvée. Dans le cas de structures très agrégées, les tests n'ont pas montré d'augmentation significative de la taille des foyers, ni de propagation secondaire locale à partir de ceps symptomatiques, suggérant un effet de l'environnement dans l'explication de cette agrégation. Dans le but de modéliser l'occurrence des symptômes foliaires, nous avons développé des modèles logistiques hiérarchiques intégrant à la fois des covariables exogènes liées à l'environnement et des covariables de voisinage de ceps déjà malades mais aussi un processus latent pour l'auto-corrélation spatio-temporelle. Les inférences bayésiennes sont réalisées en utilisant la méthode INLA (Inverse Nested Laplace Approximation). Les résultats permettent de conforter l'hypothèse du rôle significatif des facteurs environnementaux dans l'augmentation du risque d'occurrence des symptômes. L'effet de propagation de l'esca à petite échelle à partir de ceps déjà atteints situés sur le rang ou hors rang n'est pas montré. Un modèle autologistique de régression, deux fois centré, qui prend en compte de façon plus explicite la structure spatio-temporelle de voisinage, est également développé. Enfin, une méthode géostatistique d'interpolation de données de nature anisotropique atypique est proposée. Elle permet d'interpoler la variable auxiliaire de résistivité électrique du sol pour estimer à l'échelle de chaque plante de la parcelle, la réserve en eau du sol disponible pour la vigne. Les méthodes géostatistique et spatio-temporelles développées dans cette thèse ouvrent des perspectives pour identifier les facteurs de risques et prédire le développement de l'esca de la vigne dans des contextes agronomiques variés. / Esca grapevine disease is one of the incurable dieback disease with the etiology not completely elucidated. It represents one of the major threats for viticulture around the world. To better understand the underlying process of esca spread and the risk factors of this disease, we carried out quantitative analyses of the spatio-temporal development of esca at vineyard scale. In order to detect the spatial correlation among the diseased vines, the non-parametric statistical tests were applied to the spatio-temporal data of esca foliar symptom expression for 15 vineyards in Bordeaux region. Among vineyards, a large range of spatial patterns, from random to strongly structured, were found. In the vineyards with strongly aggregated patterns, no significant increase in the size of cluster and local spread from symptomatic vines was shown, suggesting an effect of the environment in the explanation of this aggregation. To model the foliar symptom occurrence, we developed hierarchical logistic regression models by integrating exogenous covariates, covariates of neighboring symptomatic vines already diseased, and also a latent process with spatio-temporal auto-correlation. The Bayesian inferences of these models were performed by INLA (Inverse Nested Laplace Approximation) approach. The results confirmed the effect of environmental factors on the occurrence risk of esca symptom. The secondary locally spread of esca from symptomatic vines located on the same row or out of row was not shown. A two-step centered auto-logistic regression model, which explicitly integrated the spatio-temporal neighboring structure, was also developed. At last, a geostatistical method was proposed to interpolate data with a particular anisotropic structure. It allowed interpolating the ancillary variable, electrical resistivity of soil, which were used to estimate the available soil water content at vine-scale. These geostatistical methods and spatio-temporal statistical methods developed in this thesis offered outlook to identify risk factors, and thereafter to predict the development of esca grapevine disease in different agronomical contexts. Vigne Maladie du bois Join count test Modèle Bayésien hiérarchique Modèle auto-logistique Grapevine Wood trunk disease Join count test Bayesian hierarchical model Auto-logistic model
26	Método de partição produto aplicado à Krigagem Almeida, Maria de Fátima Ferreira January 2019 (has links) Orientador: José Sílvio Govone / Resumo: As variáveis aleatórias no espaço estão definidas como funções aleatórias sujeitas à teoria das variáveis regionalizadas. Para assumir continuidade espacial com um número limitado de realizações da variável aleatória são necessárias as hipóteses de estacionariedade, as quais envolvem diferentes graus de homogeneidade espacial. Formalmente, uma variável regionalizada Z é estacionária se os momentos estatísticos de Z(s+h) forem os mesmos para qualquer vetor h. A hipótese de estacionariedade de primeira ordem é definida como a hipótese de que o momento de primeira ordem da distribuição da função aleatória Z(s) é constante em toda a área. A hipótese intrínseca é baseada no cálculo de médias globais das semivariancias, com a pressuposição de estacionariedade de 1a ordem e da estacionariedade da variância dos incrementos. Embora muitas variáveis sejam suscetível a dupla ou múltipla estacionariedade, estas estruturas espaciais não são levadas em consideração pelo semivariograma usual. Na perspectiva de solucionar o problema apontado, buscou-se identificar os locais dos pontos de mudança na média que definem mais de uma estrutura de semivariancia, com o objetivo de melhorar a qualidade dos mapas de Krigagem Ordinária. Para isso, foi utilizado o Método de Partição Produto (MPP), com enfoque espacial, denominado Método de Partição Produto Espacial (MPPs). Para separar os grupos, foi criada uma função de busca de ponto de mudança na média utilizando o modelo hierárquico bayesiano, denom... (Resumo completo, clicar acesso eletrônico abaixo) / Abstract: The random variables in space are defined by random functions subject to regionalized variable theory. To assume spatial continuity with a limited number of realization of the random variable, we need to assume stationarity hypotheses, which involve different degrees of spatial homogeneity. Formally, a regionalized variable Z is stationary if statistical moments of Z(s + h) are the same for any vector h. The first order stationarity hypothesis is defined to be the hypothesis that first order moment of the distribution of the random function Z(s) is constant throughout the area. The intrinsic hypothesis is based on the computation of global means of semivariate models, with the assumption of 1st order stationarity and incremental variation stationarity. Although many variables are capable of double or multiple stationarity, these spatial structures are not taken into account by the usual semivariogram, and, consequently, cause acuracy problems in Kriging maps. In order xvii to solve the described problem, it was identify the points of change in the average with the objective of improving the quality and accuracy of the maps of Ordinary Kriging. To separate the groups, a mean change point search function was created using the Bayesian hierarchical model, called the Space Product Partition Model (MPPs). Two databases were used to test the model’s potential to separate spatially dependent groups, in which the former suspected a change in mean while in the latter. “ Data2 ”, there... (Complete abstract click electronic access below) / Doutor Ponto de mudança Krigagem das partes acurácia de mapas Point of change Bayesian hierarchical model MPPs Ordinary kriging Accuracy
27	Making Models with Bayes Olid, Pilar 01 December 2017 (has links) Bayesian statistics is an important approach to modern statistical analyses. It allows us to use our prior knowledge of the unknown parameters to construct a model for our data set. The foundation of Bayesian analysis is Bayes' Rule, which in its proportional form indicates that the posterior is proportional to the prior times the likelihood. We will demonstrate how we can apply Bayesian statistical techniques to fit a linear regression model and a hierarchical linear regression model to a data set. We will show how to apply different distributions to Bayesian analyses and how the use of a prior affects the model. We will also make a comparison between the Bayesian approach and the traditional frequentist approach to data analyses. Bayesian linear regression Bayesian hierarchical model Bayes' rule Applied Statistics Multivariate Analysis Other Applied Mathematics Other Mathematics Other Statistics and Probability Probability Statistical Models
28	Bayesian Uncertainty Quantification for Large Scale Spatial Inverse Problems Mondal, Anirban 2011 August 1900 (has links) We considered a Bayesian approach to nonlinear inverse problems in which the unknown quantity is a high dimension spatial field. The Bayesian approach contains a natural mechanism for regularization in the form of prior information, can incorporate information from heterogeneous sources and provides a quantitative assessment of uncertainty in the inverse solution. The Bayesian setting casts the inverse solution as a posterior probability distribution over the model parameters. Karhunen-Lo'eve expansion and Discrete Cosine transform were used for dimension reduction of the random spatial field. Furthermore, we used a hierarchical Bayes model to inject multiscale data in the modeling framework. In this Bayesian framework, we have shown that this inverse problem is well-posed by proving that the posterior measure is Lipschitz continuous with respect to the data in total variation norm. The need for multiple evaluations of the forward model on a high dimension spatial field (e.g. in the context of MCMC) together with the high dimensionality of the posterior, results in many computation challenges. We developed two-stage reversible jump MCMC method which has the ability to screen the bad proposals in the first inexpensive stage. Channelized spatial fields were represented by facies boundaries and variogram-based spatial fields within each facies. Using level-set based approach, the shape of the channel boundaries was updated with dynamic data using a Bayesian hierarchical model where the number of points representing the channel boundaries is assumed to be unknown. Statistical emulators on a large scale spatial field were introduced to avoid the expensive likelihood calculation, which contains the forward simulator, at each iteration of the MCMC step. To build the emulator, the original spatial field was represented by a low dimensional parameterization using Discrete Cosine Transform (DCT), then the Bayesian approach to multivariate adaptive regression spline (BMARS) was used to emulate the simulator. Various numerical results were presented by analyzing simulated as well as real data. Bayesian Hierarchical Model Karhunen Loeve Expansion Discrete Cosine Transform Emulator
29	Analysis of forklift data – A process for decimating data and analyzing fork positioning functions Sternelöv, Gustav January 2017 (has links) Investigated in this thesis are the possibilities and effects of reducing CAN data collected from forklifts. The purpose of reducing the data was to create the possibility of exporting and managing data for multiple forklifts and a relatively long period of time. For doing that was an autoregressive filter implemented for filtering and decimating data. Connected to the decimation was also the aim of generating a data set that could be used for analyzing lift sequences and in particular the usage of fork adjustment functions during lift sequences. The findings in the report are that an AR (18) model works well for filtering and decimating the data. Information losses are unavoidable but kept at a relatively low level, and the size of data becomes manageable. Each row in the decimated data is labeled as belonging to a lift sequence or as not belonging to a lift sequence given a manually specified definition of the lift sequence event. From the lift sequences is information about the lift like number of usages of each fork adjustment function, load weight and fork height gathered. The analysis of the lift sequences gave that the lift/lower function on average is used 4.75 times per lift sequence and the reach function 3.23 times on average. For the side shift the mean is 0.35 per lift sequence and for the tilt the mean is 0.10. Moreover, it was also found that the struggling time on average is about 17 % of the total lift sequence time. The proportion of the lift that is struggling time was also shown to differ between drivers, with the lowest mean proportion being 7 % and the highest 30 %. Statistics CAN data autoregressive AR decimation signal processing bayesian hierarchical model Probability Theory and Statistics Sannolikhetsteori och statistik Other Computer and Information Science Annan data- och informationsvetenskap
30	Hnutí ANO před parlamentními volbami 2017 / Political Party ANO before parliamentary elections 2017 Měska, Ondřej January 2018 (has links) The main objective of my diploma thesis is to analyze and evaluate the Political Movement ANO positioning within the political parties system of the Czech Republic by using a methodological framework approach. The thesis provides an analysis of electorate shifting and selected political parties manifestos as well as their comparison with the Political Movement ANO. Timewise, my focus is on the period prior to the election to the Chamber of Deputies of the Parliament of the Czech Republic in 2017. As for analytical purposes, the Hierarchical Bayesian Modeling has been used. This statistical model helps to get the respective values and to show the electoral vote changes between the last two parliament elections (to Chamber of Deputies). The author uses quantitative and qualitative research for comparison and analysis of programmatical convergency as defined in the election manifestos of various political parties. For manifestos quantification a coding scheme by a Comparative manifesto project group has been applied. The reason for using the above mentioned scheme is that it provides a structured methodology to quantify the domains that the political parties do focus the most in their manifestos. The aim of the analytical part of the thesis is to define how and especially from where the Movement ANO...

Search results