Spelling suggestions: "subject:"bootstrap full distribution"" "subject:"gbootstrap full distribution""
1 |
Statistical identification of metabolic reactions catalyzed by gene products of unknown functionZheng, Lianqing January 1900 (has links)
Doctor of Philosophy / Department of Statistics / Gary L. Gadbury / High-throughput metabolite analysis is an approach used by biologists seeking to identify the functions of genes. A mutation in a gene encoding an enzyme is expected to alter the level of the metabolites which serve as the enzyme’s reactant(s) (also known as substrate) and product(s). To find the function of a mutated gene, metabolite data from a wild-type organism and a mutant are compared and candidate reactants and products are identified. The screening principle is that the concentration of reactants will be higher and the concentration of products will be lower in the mutant than in wild type. This is because the mutation reduces the reaction between the reactant and the product in the mutant organism.
Based upon this principle, we suggest a method to screen the possible lipid reactant and product pairs related to a mutation affecting an unknown reaction. Some numerical facts are given for the treatment means for the lipid pairs in each treatment group, and relations between the means are found for the paired lipids. A set of statistics from the relations between the means of the lipid pairs is derived. Reactant and product lipid pairs associated with specific mutations are used to assess the results.
We have explored four methods using the test statistics to obtain a list of potential reactant-product pairs affected by the mutation. The first method uses the parametric bootstrap to obtain an empirical null distribution of the test statistic and a technique to identify a family of distributions and corresponding parameter estimates for modeling the null distribution. The second method uses a mixture of normal distributions to model the empirical bootstrap null. The third method uses a normal mixture model with multiple components to model the entire distribution of test statistics from all pairs of lipids. The argument is made that, for some cases, one of the model components is that for lipid pairs affected by the mutation while the other components model the null distribution. The fourth method uses a two-way ANOVA model with an interaction term to find the relations between the mean concentrations and the role of a lipid as a reactant or product in a specific lipid pair. The goal of all methods is to identify a list of findings by false discovery techniques. Finally a simulation technique is proposed to evaluate properties of statistical methods for identifying candidate reactant-product pairs.
|
Page generated in 0.138 seconds