Global ETD Search

31	The use of digital computers for statistical analysis in textiles Sarvate, Sharad Ramchandra 05 1900 (has links) No description available. Textile industry Statistical methods
32	A criterion for selecting the probability density function of best fit for hydrologic data Donthamsetti, Veerabhadra Rao 05 1900 (has links) No description available. Hydrology Statistical methods
33	Digital computers and geodetic computation : solution of normal equations and error analysis of geodetic networks Ashkenazi, V. January 1965 (has links) No description available. 526 Geodesy--Statistical methods
34	Variable selection in high dimensional semi-varying coefficient models Chen, Chi 06 September 2013 (has links) With the development of computing and sampling technologies, high dimensionality has become an important characteristic of commonly used science data, such as some data from bioinformatics, information engineering, and the social sciences. The varying coefficient model is a flexible and powerful statistical model for exploring dynamic patterns in many scientific areas. It is a natural extension of classical parametric models with good interpretability, and is becoming increasingly popular in data analysis. The main objective of thesis is to apply the varying coefficient model to analyze high dimensional data, and to investigate the properties of regularization methods for high-dimensional varying coefficient models. We first discuss how to apply local polynomial smoothing and the smoothly clipped absolute deviation (SCAD) penalized methods to estimate varying coefficient models when the dimension of the model is diverging with the sample size. Based on the nonconcave penalized method and local polynomial smoothing, we suggest a regularization method to select significant variables from the model and estimate the corresponding coefficient functions simultaneously. Importantly, our proposed method can also identify constant coefficients at same time. We investigate the asymptotic properties of our proposed method and show that it has the so called “oracle property.” We apply the nonparametric independence Screening (NIS) method to varying coefficient models with ultra-high-dimensional data. Based on the marginal varying coefficient model estimation, we establish the sure independent screening property under some regular conditions for our proposed sure screening method. Combined with our proposed regularization method, we can systematically deal with high-dimensional or ultra-high-dimensional data using varying coefficient models. The nonconcave penalized method is a very effective variable selection method. However, maximizing such a penalized likelihood function is computationally challenging, because the objective functions are nondifferentiable and nonconcave. The local linear approximation (LLA) and local quadratic approximation (LQA) are two popular algorithms for dealing with such optimal problems. In this thesis, we revisit these two algorithms. We investigate the convergence rate of LLA and show that the rate is linear. We also study the statistical properties of the one-step estimate based on LLA under a generalized statistical model with a diverging number of dimensions. We suggest a modified version of LQA to overcome its drawback under high dimensional models. Our proposed method avoids having to calculate the inverse of the Hessian matrix in the modified Newton Raphson algorithm based on LQA. Our proposed methods are investigated by numerical studies and in a real case study in Chapter 5. Regression analysis Statistical methods
35	Probabilistic modelling of genomic trajectories Campbell, Kieran January 2017 (has links) The recent advancement of whole-transcriptome gene expression quantification technology - particularly at the single-cell level - has created a wealth of biological data. An increasingly popular unsupervised analysis is to find one dimensional manifolds or trajectories through such data that track the development of some biological process. Such methods may be necessary due to the lack of explicit time series measurements or due to asynchronicity of the biological process at a given time. This thesis aims to recast trajectory inference from high-dimensional "omics" data as a statistical latent variable problem. We begin by examining sources of uncertainty in current approaches and examine the consequences of propagating such uncertainty to downstream analyses. We also introduce a model of switch-like differentiation along trajectories. Next, we consider inferring such trajectories through parametric nonlinear factor analysis models and demonstrate that incorporating information about gene behaviour as informative Bayesian priors improves inference. We then consider the case of bifurcations in data and demonstrate the extent to which they may be modelled using a hierarchical mixture of factor analysers. Finally, we propose a novel type of latent variable model that performs inference of such trajectories in the presence of heterogeneous genetic and environmental backgrounds. We apply this to both single-cell and population-level cancer datasets and propose a nonparametric extension similar to Gaussian Process Latent Variable Models.
36	A statistical continuum approach for mass transport in fractured media Robertson, Mark Donald January 1990 (has links) The stochastic-continuum model developed by Schwartz and Smith [1988] is a new approach to the traditional continuum methods for solute transport in fractured media. Instead of trying to determine dispersion coefficients and an effective porosity for the hydraulic system, statistics on particle motion (direction, velocity and fracture length) collected from a discretely modeled sub-domain network are used to recreate particle motion in a full-domain continuum model. The discrete sub-domain must be large enough that representative statistics can be collected, yet small enough to be modeled with available resources. Statistics are collected in the discrete sub-domain model as the solute, represented by discrete particles, is moved through the network of fractures. The domain of interest, which is typically too large to be modeled discretely is represented by a continuum distribution of the hydraulic head. A particle tracking method is used to move the solute through the continuum model, sampling from the distributions for direction, velocity and fracture length. This thesis documents extensions and further testing of the stochastic-continuum two-dimensional model and initial work on a three-dimensional stochastic-continuum model. Testing of the model was done by comparing the mass distribution from the stochastic-continuum model to the mass distribution from the same domain modeled discretely. Analysis of the velocity statistics collected in the two-dimensional model suggested changes in the form of the fitted velocity distribution from a gaussian distribution to a gamma distribution, and the addition of a velocity correlation function. By adding these changes to the statistics collected, an improvement in the match of the spatial mass distribution moments between the stochastic-continuum and discrete models was effected. This extended two-dimensional model is then tested under a wide range of network conditions. The differences in the first spatial moments of the discrete and stochastic-continuum models were less than 10%, while the differences in the second spatial moments ranged from 6% to 30%. Initial results from the three-dimensional stochastic-continuum model showed that similar statistics to those used in the two-dimensional stochastic-continuum model can be used to recreate the nature of three-dimensional discrete particle motion. / Science, Faculty of / Earth, Ocean and Atmospheric Sciences, Department of / Graduate
37	Information and distance measures with application to feature evaluation and to heuristic sequential classification Vilmansen, Toomas Rein January 1974 (has links) Two different aspects of the problem of selecting measurements for statistical pattern recognition are investigated. First, the evaluation of features for multiclass recognition problems by using measures of probabilistic dependence is examined. Secondly, the problem of evaluation and selection of features for a general tree type classifier is investigated. Measures of probabilistic dependence are derived from pairwise distance measures such as Bhattacharyya distance, divergence, Matusita's distance, and discrimination information. The properties for the dependence measures are developed in the context of feature class dependency. Inequalities relating the measures are derived. Also upper and lower bounds on error probability are derived for the different measures. Comparisons of the bounds are made. Feature ordering experiments are performed to compare the measures to error probability and to each other. A fairly general tree type sequential classifier is examined. An algorithm which uses distance measures for clustering probability distributions and which uses dependence and distance measures for ordering features is derived for constructing the decision tree. The concept of confidence in a decision in conjunction with backtracking is introduced in order to make decisions at any node of the tree tentative and reversible. Also, the idea of re-introducing classes at any stage is discussed. Experiments are performed to determine the storage and processing requirements of the classifier, to determine effects of various parameters on performance, and to determine the usefulness of procedures for backtracking and reintroducing of classes. / Applied Science, Faculty of / Electrical and Computer Engineering, Department of / Graduate Engineering Statistical methods Probabilities
38	Comparative fish population studies Ni, I-hsun January 1978 (has links) This project was designed to study the patterns of variability in fish populations. My hypothesis is that specific population patterns should be related to evolutionary concepts (phylogenetic patterns} , zoogeographic considerations (faunal patterns), and their vertical distributions. These patterns should be detected by comparing certain population parameters [growth parameters (K, LINF), the natural mortality coefficient (M) size at first maturity (LM), age at first maturity (TM), size at age 1 (L1) , the weight-length exponential coefficient (b) , and life span (T95)] which are intrinsic biological features of the population. Comparative methods were used to analyze data from published fish population studies by comparing fish population parameters, individually, in pairs (ratio or linear regression), or grouped together (discriminant analysis or Cooley and Lohnes' classification method), in order to find the similarities or differences among different categories, and then to group these into patterns. Published data provided 682 parameter records from 43 families (171 species) of fishes. My findings suggested that more satisfactory results would be obtained from a greater volume of data. Therefore, all the analyses were based mainly on 15 families with large sample sizes (Bothidae, Clupeidae, Cyprinidae, Engraulidae, Gadidae, Hiodontidae, Osmeridae, Percidae, Pleuronectidae, Salmonidae, Sciaenidae, Scombridae, Scorpaenidae, Sparidae, and Sgualidae). Sample sizes, mean values, standard errors, and coefficients of variation for population parameters and relative characters of the 15 families of fishes are listed in the summary table. These data would enable the extrapolation of results based on many areas for management of other fish stocks where data are lacking. In the majority of families significant linear regression relationships were found between 1/K--LINF, between LM--LINF, and between M--K. This means that fish having a greater asymptotic length (LINF) also have a larger size at first maturity (LM), a lower natural mortality coefficient (M), and a lower rate (K) at which the asymptotic length is reached. Using the F-test and the appropriate t-test as a basis for comparison of variances and means of individual parameters, it is evident that in most cases there are significant differences between families. This confirms one of my hypothesis; namely that differences between families, as shown by population parameters, exist from phylogenetic considerations. By comparing the four characters (K, LINF, LM, and LH/LINF) the fish families can be divided into the following groups: A) Shoaling pelagic fishes - Engraulidae, Clupeidae, and Osmeridae. These families have the highest K values (1.6 for Engraulidae, over 0.4 for the others), the smallest LINF, LM, and a very high LM/LINF ratio (over 0.7). B) Large pelagic fishes - Scombridae. This family has a moderately high K value (around 0.35) and the largest LINF. C) Demersal fishes - Gadidae, Pleuronectidae, Scorpaenidae, Sparidae etc. These families have low K values (less than 0.25), intermediate LINF size, and lower LM/LINF ratios (less than 0.6). D) Freshwater fish - Cyprinidae. This family has K and LINF values which are similar to those of the demersal fishes, but has a smaller LM length and, especially, the lowest LK/LINF (0.4) and TH/T95 (0.2) ratios. Stepwise discriminant analysis based on 7 variables in the 15 families showed that over 90% of the 620 cases considered independently could be correctly classified into the right families. Cooley and Lohnes' classification method was also utilized among species within 5 major families (Clupeidae, Cyprinidae, Gadidae, Pleuronectidae, and Scombridae). Correct classification ranged from 5 8.6% (Pleuronectidae) to 87.6% (Cyprinidae). These results further confirmed the existence of population patterns by examination of population parameters. Cluster analysis based on 7 population parameters displayed the closeness among the 15 families. Dendrograph relationships brought out the ecological, rather than the systematic, affinities between families. / Science, Faculty of / Zoology, Department of / Graduate Fish populations -- Statistical methods
39	Statistical methods for integrative analysis of genomic data Ming, Jingsi 24 August 2018 (has links) Thousands of risk variants underlying complex phenotypes (quantitative traits and diseases) have been identified in genome-wide association studies (GWAS). However, there are still several challenges towards deepening our understanding of the genetic architectures of complex phenotypes. First, the majority of GWAS hits are in non-coding region and their biological interpretation is still unclear. Second, most complex traits are suggested to be highly polygenic, i.e., they are affected by a vast number of risk variants with individually small or moderate effects, whereas a large proportion of risk variants with small effects remain unknown. Third, accumulating evidence from GWAS suggests the pervasiveness of pleiotropy, a phenomenon that some genetic variants can be associated with multiple traits, but there is a lack of unified framework which is scalable to reveal relationship among a large number of traits and prioritize genetic variants simultaneously with functional annotations integrated. In this thesis, we propose two statistical methods to address these challenges using integrative analysis of summary statistics from GWASs and functional annotations. In the first part, we propose a latent sparse mixed model (LSMM) to integrate functional annotations with GWAS data. Not only does it increase the statistical power of identifying risk variants, but also offers more biological insights by detecting relevant functional annotations. To allow LSMM scalable to millions of variants and hundreds of functional annotations, we developed an efficient variational expectation-maximization (EM) algorithm for model parameter estimation and statistical inference. We first conducted comprehensive simulation studies to evaluate the performance of LSMM. Then we applied it to analyze 30 GWASs of complex phenotypes integrated with nine genic category annotations and 127 cell-type specific functional annotations from the Roadmap project. The results demonstrate that our method possesses more statistical power than conventional methods, and can help researchers achieve deeper understanding of genetic architecture of these complex phenotypes. In the second part, we propose a latent probit model (LPM) which combines summary statistics from multiple GWASs and functional annotations, to characterize relationship and increase statistical power to identify risk variants. LPM can also perform hypothesis testing for pleiotropy and annotations enrichment. To enable the scalability of LPM as the number of GWASs increases, we developed an efficient parameter-expanded EM (PX-EM) algorithm which can execute parallelly. We first validated the performance of LPM through comprehensive simulations, then applied it to analyze 44 GWASs with nine genic category annotations. The results demonstrate the benefits of LPM and can offer new insights of disease etiology. Genomics ; Statistical methods
40	Space-time clustering : finding the distribution of a correlation-type statistic. Siemiatycki, Jack January 1971 (has links) No description available. Biometry Epidemiology -- Statistical methods

Search results