Global ETD Search

391	Statistical Methods to Account for Gene-Level Covariates in Normalization of High-Dimensional Read-Count Data Lenz, Lauren Holt 01 December 2018 (has links) The goal of genetic-based cancer research is often to identify which genes behave differently in cancerous and healthy tissue. This difference in behavior, referred to as differential expression, may lead researchers to more targeted preventative care and treatment. One way to measure the expression of genes is though a process called RNA-Seq, that takes physical tissue samples and maps gene products and fragments in the sample back to the gene that created it, resulting in a large read-count matrix with genes in the rows and a column for each sample. The read-counts for tumor and normal samples are then compared in a process called differential expression analysis. However, normalization of these read-counts is a necessary pre-processing step, in order to account for differences in the read-count values due to non-expression related variables. It is common in recent RNA-Seq normalization methods to also account for gene-level covariates, namely gene length in base pairs and GC-content, the proportion of bases in the gene that are Guanine and Cytosine. Here a colorectal cancer RNA-Seq read-count data set comprised of 30,220 genes and 378 samples is examined. Two of the normalization methods that account for gene length and GC-content, CQN and EDASeq, are extended to account for protein coding status as a third gene-level covariate. The binary nature of protein coding status results in unique computation issues. The results of using the normalized read counts from CQN, EDASeq, and four new normalization methods are used for differential expression analysis via the nonparametric Wilcoxon Rank-Sum Test as well as the lme4 pipeline that produces per-gene models based on a negative binomial distribution. The resulting differential expression results are compared for two genes of interest in colorectal cancer, APC and CTNNB1, both of the WNT signaling pathway. RNA-Seq normalization gene-level covariates high-dimensional read-counts Statistics and Probability
392	Psychometric Properties of Postsecondary Students' Course Evaluations Drysdale, Michael J. 01 December 2010 (has links) Several experts in the area of postsecondary student evaluations of courses have concluded that they are stable or reliable measures as well as being measures that provide ways of making valid inferences regarding teacher effectiveness. Often these experts have offered these conclusions without supporting evidence. Surprisingly, a thorough review of the literature revealed very few reported test-retest reliability studies of course evaluations and the results from these studies are contradictory. In the area of validity, the conclusions offered by scholars who conducted meta-analyses of mutlisection course studies are inconsistent. This leads to the following two research questions: 1. What is the test-retest reliability over a 3-week period of the course evaluation currently employed at Utah State University? 2. Can results of the course evaluation employed at Utah State University be used to make valid inferences about a teacher's effectiveness? Two parts of a study were conducted to answer these questions. First, a test-retest reliability part was conducted with students from courses at Utah State University, employing a 3-week time lapse between administrations of the course evaluations. Second, a multisection course validity part was conducted using existing student ratings data and final examination scores for 100 sections of MATH 1010 over a 5-year period. Correlational analyses were conducted on the resulting data from both studies. Test-retest reliability coefficients ranging from 0.64 to 0.94 were found. In the second study, the correlation coefficients from the validity study ranged from -0.39 to 0.71, with a mean coefficient of 0.14 and 0.11 for final examination score by instructor rating and final examination score by course rating, respectively. Results from both parts of the study suggest that the course evaluation used at USU is not reliable and that results of the course evaluation do not provide information that can be used to make valid inferences regarding teacher effectiveness. course evaluation reliability student ratings validity Educational Psychology Statistics and Probability
393	Annotation Tools for Multivariate Gene Set Testing of Non-Model Organisms Banks, Russell K. 01 May 2015 (has links) Many researchers across a wide range of disciplines have turned to gene expression anal- ysis to aid in predicting and understanding biological outcomes and mechanisms. Because genes are known to work in a dependent manner, it’s common for researchers to first group genes in biologically meaningful sets and then test each gene set for differential expression. Comparisons are made across different treatment/condition groups. The meta-analytic method for testing differential activity of gene sets, termed multi-variate gene set testing (mvGST), will be used to provide context for two persistent and problematic issues in gene set testing. These are: 1) gathering organism specific annotation for non-model organisms and 2) handling gene annotation ambiguities. The primary purpose of this thesis is to explore different gene annotation gathering methods in the building of gene set lists and to address the problem of gene annotation ambiguity. Using an example study, three different annotation gathering methods are proposed to construct GO gene set lists. These lists are directly compared, as are the subsequent results from mvGST analysis. In a separate study, an optimization algorithm is proposed as a solution for handling gene annotation ambiguities. gene expression gene naming translation GO meta-analytic non-model organisms Genetics and Genomics Statistics and Probability
394	A Comparative Analysis of the Use of a Markov Chain Versus a Binomial Probability Model in Estimating the Probability of Consecutive Rainless Days Homeyer, Jack Wilfred 01 May 1974 (has links) The Markov chain process for predicting the occurence of a sequence of rainless days, a standard technique, is critically examined in light of the basic underlying assumptions that must be made each time it is used. This is then compared to a simple binomial model wherein an event is defined to be a series of rainless days of desired length. Computer programs to perform the required calculations are then presented and compared as to complexity and operating characteristics. Finally, an example of applying both programs to real data is presented and further comparisons are drawn between the two techniques. comparative analysis markov chain binomial probability model probability estimation consecutive rainless days Statistics and Probability
395	Physically Based Preconditioning Techniques Applied to the First Order Particle Transport and to Fluid Transport in Porous Media Rigley, Michael 01 May 2014 (has links) Physically based preconditioning is applied to linear systems resulting from solving the first order formulation of the particle transport equation and from solving the homogenized form of the simple flow equation for porous media flows. The first order formulation of the particle transport equation is solved two ways. The first uses a least squares finite element method resulting in a symmetric positive definite linear system which is solved by a preconditioned conjugate gradient method. The second uses a discontinuous finite element method resulting in a non-symmetric linear system which is solved by a preconditioned biconjugate gradient stabilized method. The flow equation is solved using a mixed finite element method. Specifically four levels of improvement are applied: homogenization of the porous media domain, a projection method for the mixed finite element method which simplifies the linear system, physically based preconditioning, and implementation of the linear solver in parallel on graphic processing units. The conjugate gradient linear solver for the least squares finite element method is also applied in parallel on graphics processing units. The physically based preconditioner is shown to perform well in each case, in relation to speed-ups gained and as compared with several algebraic preconditioners. Preconditioning Techniqes Applied Particle Transport Fluid Porous Media Applied Statistics Physical Sciences and Mathematics Statistics and Probability
396	Rational Arithmetic as a Means of Matrix Inversion Peterson, Jay Roland 01 May 1967 (has links) The solution to a set of simultaneous equations is of the form A-1 B = X where A-1 is the inverse of A in the equation AX= B. The purpose of this study is to obtain an exact A-1 through the use of rational arithmetic, and to study the behavior of rational numbers when used in arithmetic calculations. This study describes a matrix inversion program written in SPS II, utilizing the concept of rational arithmetic. This program, using the Gaussian elimination matrix inversion method, is compared to the same method written in Fortran. Gaussian elimination was used by this study because of its simplicity and speed of inversion. The Adjoint method was ruled out because of its complexity and relative lack of speed when compared with Gaussian elimination. The Fortran program gives only an approximate inverse due to the rounding error while the rational arithmetic program gives an exact inverse. matrix inversion rational arithmetic fortran matrix inversion Computer Sciences Statistics and Probability
397	Generation of Random Numbers Eberhard, Keith H. 01 May 1969 (has links) Subroutines are written to generate random numbers on the computer. Depending on the subroutine used, the generated random numbers follow the uniform, binomial, normal, chi-square, t, F, or gamma distribution. Each subroutine is tested using the chi-square goodness of fit test to verify that the random numbers generated by each subroutine follow the statistical distribution for which it is written. The interpretation of the test results indicates that each subroutine generates random numbers which closely approximates the theoretical distribution for which it is designed. The approach used in the subroutine which generates gamma distributed random numbers involves the use of numerical integration, whereas simpler techniques are used in all the other subroutines. Each subroutine is documented with a description of how to use it and an explanation of the methods used to obtain the random numbers which it is designed to generate. (77 pages) chi-square density gamma density function random numbers generation Mathematics Statistics and Probability
398	Family-Wise Error Rate Control in Quantitative Trait Loci (QTL) Mapping and Gene Ontology Graphs with Remarks on Family Selection Saunders, Garrett 01 May 2014 (has links) One of the great aims of statistics, the science of collecting, analyzing, and interpreting data, is to protect against the probability of falsely rejecting an accepted claim, or hypothesis, given observed data stemming from some experiment. This is generally known as protecting against a Type I Error, or controlling the Type I Error rate. The extension of this protection against Type I Errors to the situation where thousands upon thousands of hypotheses are examined simultaneously is known as multiple hypothesis testing. This dissertation presents an improvement to an existing multiple hypothesis testing approach, the Focus Level method, specific to gene set testing (a branch of genomics) on Gene Ontology graphs. This improvement resolves a long standing computational difficulty of the Focus Level method, providing more than a 15.000-fold increase in computational efficiency. This dissertation also presents a solution to a multiple testing problem in genetics where a specific approach to mapping genes underlying quantitative traits of interest requires a multiplicity adjustment approach that both corrects for the number of tests while also ensuring logical consistency. The power advantage of the solution is demonstrated over the current standard approach to the problem. A side issue of this model framework led to the development of a new bivariate approach to quantitative trait marker detection, which is presented herein. The overall contribution of this dissertation to the statistics literature is that it provides novel solutions that meet real needs of practitioners in genetics and genomics with the aim of ensuring both that truth is discovered and that discoveries are actually true. family-wise error rate quantitative trait loci gene ontology family selection Mathematics Statistics and Probability
399	Empirical Properties of Functional Regression Models and Application to High-Frequency Financial Data Zhang, Xi 01 May 2013 (has links) Functional data analysis (FDA) has grown into a substantial field of statistical research, with new methodology, numerous useful applications and interesting novel theoretical developments. My dissertation focuses on the empirical properties of functional regression models and their application to financial data. We start from testing the empirical properties of forecasts with the functional autoregressive models based on simulated and real data. We define intraday returns and consider their prediction from such returns on a market index. This is an extension to intraday data of the Capital Asset Pricing model. Finally we investigate multifactor functional models and assess their suitability for the prediction of intraday returns for various financial assets, including stock and commodity futures. Empirical Study Financial Data Functional Data Analysis Functional Regression Models Finance and Financial Management Statistics and Probability
400	SHAMAT: A Matrix Manipulation Program Dadkhah, Shahriyar 01 May 1987 (has links) This report is both a users guide and a programmers manual for running and modifying the program SHAMAT, an interactive matrix calculator. The program is written in Turbo Pascal version 3.0 for MS-DOS computers. This software enables the user to type in matrix equations for solving statistical problems such as multiple regression, analysis of variance, etc. All matrix operations necessary for linear models analysis are included in this program. Since each operation uses a separate subroutine, program enhancement, modification and updating is demonstrated to be easy. matrix manipulation interactive matrix calculator matrix equations statistics Computer Sciences Statistics and Probability

Search results