Global ETD Search

11	Sparse functional regression models: minimax rates and contamination Xiong, Wei January 2012 (has links) In functional linear regression and functional generalized linear regression models, the effect of the predictor function is usually assumed to be spread across the index space. In this dissertation we consider the sparse functional linear model and the sparse functional generalized linear models (GLM), where the impact of the predictor process on the response is only via its value at one point in the index space, defined as the sensitive point. We are particularly interested in estimating the sensitive point. The minimax rate of convergence for estimating the parameters in sparse functional linear regression is derived. It is shown that the optimal rate for estimating the sensitive point depends on the roughness of the predictor function, which is quantified by a "generalized Hurst exponent". The least squares estimator (LSE) is shown to attain the optimal rate. Also, a lower bound is given on the minimax risk of estimating the parameters in sparse functional GLM, which also depends on the generalized Hurst exponent of the predictor process. The order of the minimax lower bound is the same as that of the weak convergence rate of the maximum likelihood estimator (MLE), given that the functional predictor behaves like a Brownian motion. Biometry
12	Sequential Quantile Estimation Using Continuous Outcomes with Applications in Dose Finding Hu, Chih-Chi January 2014 (has links) We consider dose finding studies where a binary outcome is obtained by dichotomizing a continuous measurement. While the majority of existing dose finding designs work with dichotomized data, two procedures that operate on continuous measurements have been proposed. One is based on stochastic approximation and the other on least square recursion. In both cases, estimating the variance of the continuous measurement is an integral part of the design. In their originally proposed forms, variance estimation is based on data from the most current cohort only. This raises the question of whether performance of the two designs can be improved by incorporating better variance estimators. To this end, we propose estimators that pool data across cohorts. Asymptotic properties of both designs with the proposed estimators are derived. Operating characteristics are also investigated via simulations in the context of a real Phase I trial. Results show that performance of least square recursion based procedure can be substantially improved through pooling data in variance estimation while performance of stochastic approximation based procedure is only marginally improved. The second problem considered in this dissertation deals with the limitation shared by both designs that complete follow-up of all current patients is required before new patients can be enrolled. This may result in impractically long trial duration. We consider situations where besides the final measurement that the outcome of the study is defined on, each patient has an additional intermediate continuous measurement. By extending least square recursion through incorporating intermediate measurements, continual patient accrual is allowed. Simulation results show that under reasonable patient accrual rate, the proposed procedure is comparable to the original in terms of accuracy while shortening the trial duration considerably. Biometry
13	Two-stage Continual Reassessment Method and Patient Heterogeneity for Dose-finding Studies Jia, Xiaoyu January 2014 (has links) The continual reassessment method (CRM) is a widely used model-based design in Phase I dose-finding studies. This dissertation examines two extensions of CRM: one is a two-stage method and the other is a method that accounts for patient heterogeneity. Originally proposed in the Bayesian framework, CRM starts by testing the first patient at the prior guess of the maximum tolerated dose (MTD). However, there are safety concerns with this approach as practitioners often prefer to start from the lowest dose level and are reluctant to escalate to higher dose levels without testing the lower ones with a sufficient number of patients. This calls for a two-stage design, where the model-based phase is preceded by a pre-specified dose escalation phase, and the phase transitions when any dose-limiting toxicity (DLT) occurs. In the first part of this dissertation, I propose a theoretical framework to build a two-stage CRM based on the coherence principle and prove the unique existence of the most conservative and coherent initial design. An accompanying calibration algorithm is formulated to facilitate design implementation. We demonstrate that by using real trial examples, the algorithm yields designs with competitive performance compared to the conventional design which uses a much more labor intensive trial-and-error approach. Furthermore, we show that this algorithm can be applied in a timely and reproducible manner. In addition to the two-stage method, we also take into account of patient's heterogeneity in drug metabolism rate that can result in different susceptibility to drug toxicity. This led to a risk-adjusting design for identifying patient-specific MTDs. The existing dose-finding designs which incorporate patient heterogeneity deal either with only categorical risk factor or with continuous risk factor using models based on strong parametric assumptions. We propose a method that uses a flexible semi-parametric model to identify patient-specific MTDs, adjusting for either categorical or continuous risk factor. Initially, our method assigns dose to patients using the aforementioned two-stage CRM ignoring any patient heterogeneity, and tests the risk effect as trial proceeds. It then transitions to a risk-adjusting stage only if sufficient risk effect on toxicity outcome is observed. The performance of this multi-stage design is evaluated under various scenarios, using dosing accuracy measures calculated based on the final model estimate at the end of a trial and on the intra-trial dose allocation. The results are compared to the conventional two-stage CRM without considering patient heterogeneity. Simulation results demonstrate a substantial improvement in dosing accuracy in scenarios where there are true risk effects on toxicity probability; and in situations where risk factors do not have an effect, the performance of the proposed method is also comparable to that of the conventional design. Biometry
14	Graph structure inference for high-throughput genomic data Zhou, Hui January 2014 (has links) Recent advances in high-throughput sequencing technologies enable us to study a large number of biomarkers and use their information collectively. Based on high-throughput experiments, there are many genome-wide networks constructed to characterize the complex physical or functional interactions between the biomarkers. To identify outcome-related biomarkers, it is often advantageous to make use of the known relational structure, because graph structured inference introduces smoothness and reduces complexity in modelling. In this dissertation, we propose models for high-dimensional epigenetic and genomic data that incorporate the network structure and update the network structure based on empirical evidence. In the first part of this dissertation, we propose a penalized conditional logistic regression model for high dimensional DNA methylation data. DNA methylation of CpG sites within genes are often correlated and the number of CpG sites typically far outnumbers the sample size. The new penalty function combines the truncated lasso penalty and a graph fuse-lasso penalty to induce parsimonious and consistent models, and to incorporate the CpG sites network structure without introducing extra bias. An efficient minorization-maximization algorithm that utilizes difference of convex programming and alternating direction method of multipliers is presented. Extensive simulations demonstrated superior performance of the proposed method compared to several existing methods in both model selection consistency and parameter estimation accuracy. We also applied the proposed method to a matched case-control breast invasive carcinoma methylation data from the Cancer Genome Atlas (TCGA), generated from both Illumina Infinium HumanMethylation27 (HM27) and HumanMethylation450 (HM450) Beadchip. The proposed method identified several outcome-related CpG sites that have been missed by the existing methods. In the latter part of this dissertation, we propose a Bayesian hierarchical graph-structured model that integrates {\em a priori} network information with empirical evidence. Empirical data may suggest modifications to the given network structure, which could lead to new and interesting biological findings when the prior knowledge on the graphical structure among the variables is limited or partial. We present the full hierarchical model along with the Markov Chain Monte Carlo sampling inference procedure. Using both simulations and brain aging gene pathway data, we showed that the new method can identify discrepancy between data and a prior known graph structure and suggest modifications and updates. Motivated by methylation and gene expression data, the two models we propose in this thesis make use of the available structure in the data and produce better inferential results. The proposed methods can be applied to a wider range of problems. Biometry
15	Empirical likelihood tests for stochastic ordering based on censored and biased data Chang, Hsin-wen January 2014 (has links) In the classical two-sample comparison problem, it is often of interest to examine whether the distribution function is uniformly higher in one group than the other. This can be framed in terms of the notion of stochastic ordering. We consider testing for stochastic ordering based on two types of data: (1) right-censored and (2) size-biased data. We derive our procedures using the empirical likelihood method, and the proposed tests are based on maximally selected local empirical likelihood statistics. For (1), the proposed test is shown via a simulation study to have superior power to the commonly-used log-rank test under crossing-hazard alternatives. The approach is illustrated using data from a randomized clinical trial involving the treatment of severe alcoholic hepatitis. As for (2), simulations show that the proposed test outperforms the Wald test and the test overlooking size bias in all the cases considered. The approach is illustrated via a real data example of alcohol concentration in fatal driving accidents. Biometry
16	AN EASY ACCESS INTERACTIVE STATISTICAL SYSTEM FOR USE AND TRAINING IN BIOMETRY Slaymaker, Amos Addison, Jr. 01 January 1971 (has links) One of the most important tools of the applied statistician is the digital computer. It is natural, therefore, for the instructor in applied statistics to want his students to become familiar with the use of computers. If his students are going to get actual experience in using a computer for statistical analysis, he often has only two alternatives. The students can be required to write their own statistical programs or they can use programs already available through a computer facility. If the course is to be taught such that each student is responsible for his own programs, the instructor must either require that the students have previous programming experience or he must be prepared to spend a portion of his class time teaching a programming language. Neither of these seem to be satisfactory. First, to make knowledge of programming a prerequisite will often reduce the number of people interested in the course. Many students, who would otherwise enroll, might be completely unfamiliar with programming and have no real interest in becoming programmers. To spend a portion of the class time in teaching a programming language and associated programming techniques would often mean that the emphasis of the class could easily shift from the statistical methods to computer programming. This would result in a significant reduction in the enount of nnteriel the class could cover. The alternative to having each student write his own programs is to use prepared programs available through a computer facility. In most instances, this would mean that each time a student wished to use the computer for a statistical analysis he would have to prepare the data for card input, send the cards to the computer facility, wait, and finally have his results returned. Again either the instructor would have to assign a particular program and would lead the class through the data preparation or he would expect each student to be responsible for reading the program documentation and preparing the data for himself. In many statistical analyses the investigator might wish to run several different programs. For each of these the student might have to review the relevant documentation, punch a new set of data cards and wait. Unfortunately, rather then repeat this procedure several times a student may become satisfied with running only the primary analysis without spending time, for instance, verifying the underlying assumptions. An example of the type of situation which might indicate several computer run; would be data on which an Analysis of Variance is to be performed. Consider the problem of a student who has data from patients being treated with several different drugs. He wished to test the null hypothesis of no significant differences between the treatment means. He night first wish to run I Bartlett's test for homogeneity of variances. If transforms are necessary on the data he will wish to try them. If he is satisfied that the variances are not significantly different, he will compute the Analysis of Variance possibly following that with Duncan's multiple range test. Since each method is probably done by a different program, the date might have to be completely punched three or four different times. Rather than doing all the extra work the student might simply run the Analysis of Variance and be satisfied with a less than a complete data analysis. The problems introduced here give the necessary background for the discussion of the APL Statistical System which follows. This discussion is divided into three sections. The first section includes two chapters and discusses broadly the APL Statistical System characteristics which contribute to overcome some of the problems involved in utilizing a computer in statistical instruction. The second chapter describes two basic utilization: of the Statistical System. The second section describes the computer hardware configuration on which the system is currently being implemented. It also describes some of the important characteristics of the programming language used. A description of the actual statistical System with a list of the statistical methods which are available to the user is also included in the third chapter. The third section is actually a user's manual giving the operating procedures for the system, an explanation of the keyboard, data entry, and s few of the basic APL operators. To make it an independent part of the thesis so that it may be used alone as a manual, s more complete description of how to use each of the statistical methods is given. For each method an example is shown which can be verified in most cases by the reference source listed in the example. A complete program listing of all the programs, or functions, used in this system can be found in the Appendix. Biometry
17	Biometric system evaluation / Micheals, Ross J., January 2003 (has links) Thesis (Ph. D.)--Lehigh University, 2004. / Includes vita. Includes bibliographical references (leaves 253-264). Biometry
18	An Assortment of Unsupervised and Supervised Applications to Large Data Agne, Michael Robert January 2015 (has links) This dissertation presents several methods that can be applied to large datasets with an enormous number of covariates. It is divided into two parts. In the first part of the dissertation, a novel approach to pinpointing sets of related variables is introduced. In the second part, several new methods and modifications of current methods designed to improve prediction are outlined. These methods can be considered extensions of the very successful I Score suggested by Lo and Zheng in a 2002 paper and refined in many papers since. In Part I, unsupervised data (with no response) is addressed. In chapter 2, the novel unsupervised I score and its associated procedure are introduced and some of its unique theoretical properties are explored. In chapter 3, several simulations consisting of generally hard-to-wrangle scenarios demonstrate promising behavior of the approach. The method is applied to the complex field of market basket analysis, with a specific grocery data set used to show it in action in chapter 4. It is compared it to a natural competition, the A Priori algorithm. The main contribution of this part of the dissertation is the unsupervised I score, but we also suggest several ways to leverage the variable sets the I score locates in order to mine for association rules. In Part II, supervised data is confronted. Though the I Score has been used in reference to these types of data in the past, several interesting ways of leveraging it (and the modules of covariates it identifies) are investigated. Though much of this methodology adopts procedures which are individually well-established in literature, the contribution of this dissertation is organization and implementation of these methods in the context of the I Score. Several module-based regression and voting methods are introduced in chapter 7, including a new LASSO-based method for optimizing voting weights. These methods can be considered intuitive and readily applicable to a huge number of datasets of sometimes colossal size. In particular, in chapter 8, a large dataset on Hepatitis and another on Oral Cancer are analyzed. The results for some of the methods are quite promising and competitive with existing methods, especially with regard to prediction. A flexible and multifaceted procedure is suggested in order to provide a thorough arsenal when dealing with the problem of prediction in these complex data sets. Ultimately, we highlight some benefits and future directions of the method. Statistics Biometry
19	Deriving optimal composite scores relating observational/longitudinal data with a primary endpoint / Ellis, Rhonda Denise, January 1900 (has links) Thesis (Ph.D.)--Virginia Commonwealth University, 2009. / Prepared for: Dept. of Biostatistics. Title from title-page of electronic thesis. Bibliography: leaves 96-99. Biometry. Functions.
20	Analysis of repeated measurements in agricultural experiments Mahdi, Ali A. J. January 1989 (has links) No description available. 519.5 Biometry

Search results