Global ETD Search

21	Instability in Settling Fibres : A Numerical Study Zhang, Feng January 2014 (has links) Sedimenting suspensions exist in a varity of natural phenomena and industrial applications. It is already observed in experiments that the dilute fibre suspensions experience a concentration instability under gravity at low Reynolds numbers. Initially well-mixed suspensions become inhomogeneous and anisotropic due to this instability. This project is focused on the development and validation of numerical models to understand the instability in a dilute fibre suspension by means of the mixture model and the point-particle model. For periodic boundary condition, we use a linear stability analysis to show that inertia and hydro dynamic translational diffusion damp perturbations at long wavelengths and short wavelengths, respectively, leading to a wavenumber selection. However, numerical simulations indicate a weak wavenumber selection even at zero Reynolds number. Numerical simulations also show that the induced flow may either die or saturate on a finite amplitude. The characterof this long time behaviour is dictated by the wavenumber, the presence or absence of the translational diusivity, rotational diffusivity, and the fluid inertia on particle motions. Moreover, the most unstable wavenumber decreases with time and the maximum amplitude increases. The smallest wavenumber obtains the largest amplitude at steady state. For a vessel bounded by sidewalls, the near-wall convection is an upward back flow in the very beginning, due to the combined effects of the steric-depleted layer and a hydrodynamiclly-depleted region near the wall. However, the evolution of the near-wall convection at later times depends on the aspect ratio of the bres, the translational diffusivity and the initial perturbations. The steric-depleted layer in the mixture model can be neglected for large widths. Multiple streamers are obtained due to the sidewalls, implying that the sidewalls can generate a wavelength which is smaller than the channel width. The suspension ends up with a single streamer on one side of the container, consistent with the results of the cases with periodic boundary condition but different from the experimental results. This might be due to the absence of the botton wall in the mixture model. Moreover, the global structure evolution of a suspension is dependent on the width of the vessel and the amplitude ofthe initial perturbations. / <p>QC 20140207</p> bre suspensions sedimentation instability mixture model point-particle mode
22	GMMEDA : A demonstration of probabilistic modeling in continuous metaheuristic optimization using mixture models Naveen Kumar Unknown Date (has links) Optimization problems are common throughout science, engineering and commerce. The desire to continually improve solutions and resolve larger, complex problems has given prominence to this field of research for several decades and has led to the development of a range of optimization algorithms for different class of problems. The Estimation of Distribution Algorithms (EDAs) are a relatively recent class of metaheuristic optimization algorithms based on using probabilistic modeling techniques to control the search process. Within the general EDA framework, a number of different probabilistic models have been previously proposed for both discrete and continuous optimization problems. This thesis focuses on GMMEDAs; continuous EDAs based on the Gaussian Mixture Models (GMM) with parameter estimation performed using the Expectation Maximization (EM) algorithm. To date, this type of model has only received limited attention in the literature. There are few previous experimental studies of the algorithms. Furthermore, a number of implementation details of Continuous Iterated Density Estimation Algorithm based on Gaussian Mixture Model have not been previously documented. This thesis intends to provide a clear description of the GMMEDAs, discuss the implementation decisions and details and provides experimental study to evaluate the performance of the algorithms. The effectiveness of the GMMEDAs with varying model complexity (structure of covariance matrices and number of components) was tested against five benchmark functions (Sphere, Rastrigin, Griewank, Ackley and Rosenbrock) with varying dimensionality (2−, 10− and 30−D). The effect of the selection pressure parameters is also studied in this experiment. The results of the 2D experiments show that a variant of the GMMEDA with moderate complexity (Diagonal GMMEDA) was able to optimize both unimodal and multimodal functions. Further, experimental analysis of the 10 and 30D functions optimized results indicates that the simpler variant of the GMMEDA (Spherical GMMEDA) was most effective of all three variants of the algorithm. However, a greater consistency in the results of these functions is achieved when the most complex variant of the algorithm (Full GMMEDA) is used. The comparison of the results for four artificial test functions - Sphere, Griewank, Ackley and Rosenbrock - showed that the GMMEDA variants optimized most of complex functions better than existing continuous EDAs. This was achieved because of the ability of the GMM components to model the functions effectively. The analysis of the results evaluated by variants of the GMMEDA showed that number of the components and the selection pressure does affect the optimum value of artificial test function. The convergence of the GMMEDA variants to the respective functions best local optimum has been caused more by the complexity in the GMM components. The complexity of GMMEDA because of the number of components increases as the complexity owing to the structure of the covariance matrices increase. However, while finding optimum value of complex functions the increased complexity in GMMEDA due to complex covariance structure overrides the complexity due to increase in number of components. Additionally, the affect on the convergence due to the number of components decreases for most functions when the selection pressure increased. These affects have been noticed in the results in the form of stability of the results related to the functions. Other factors that affect the convergence of the model to the local optima are the initialization of the GMM parameters, the number of the EM components, and the reset condition. The initialization of the GMM components, though not visible graphically in the 10D optimization has shown: for different initialization of the GMM parameters in 2D, the optimum value of the functions is affected. The initialization of the population in the Evolutionary Algorithms has shown to affect the convergence of the algorithm to the functions global optimum. The observation of similar affects due to initialization of GMM parameters on the optimization of the 2D functions indicates that the convergence of the GMM in the 10D could be affected, which in turn, could affect the optimum value of respective functions. The estimated values related to the covariance and mean over the EM iteration in the 2D indicated that some functions needed a greater number of EM iterations while finding their optimum value. This indicates that lesser number of EM iterations could affect the fitting of the components to the selected population in the 10D and the fitting can affect the effective modeling of functions with varying complexity. Finally, the reset condition has shown as resetting the covariance and the best fitness value of individual in each generation in 2D. This condition is certain to affect the convergence of the GMMEDA variants to the respective functions best local optimum. The rate at which the reset condition was invoked could certainly have caused the GMM components covariance values to reset to their initials values and thus the model fitting the percentage of the selected population could have been affected. Considering all the affects caused by the different factors, the results indicates that a smaller number of the components and percentage of the selected population with a simpler S-GMMEDA modeled most functions with a varying complexity.
23	Some non-standard statistical dependence problems Bere, Alphonce January 2016 (has links) Philosophiae Doctor - PhD / The major result of this thesis is the development of a framework for the application of pair-mixtures of copulas to model asymmetric dependencies in bivariate data. The main motivation is the inadequacy of mixtures of bivariate Gaussian models which are commonly fitted to data. Mixtures of rotated single parameter Archimedean and Gaussian copulas are fitted to real data sets. The method of maximum likelihood is used for parameter estimation. Goodness-of-fit tests performed on the models giving the highest log-likelihood values show that the models fit the data well. We use mixtures of univariate Gaussian models and mixtures of regression models to investigate the existence of bimodality in the distribution of the widths of autocorrelation functions in a sample of 119 gamma-ray bursts. Contrary to previous findings, our results do not reveal any evidence of bimodality. We extend a study by Genest et al. (2012) of the power and significance levels of tests of copula symmetry, to two copula models which have not been considered previously. Our results confirm that for small sample sizes, these tests fail to maintain their 5% significance level and that the Cramer-von Mises-type statistics are the most powerful. Gamma-ray bursts Konus Copulas (Mathematical statistics) Gaussian mixture model
24	Embedded Feature Selection for Model-based Clustering January 2020 (has links) abstract: Model-based clustering is a sub-field of statistical modeling and machine learning. The mixture models use the probability to describe the degree of the data point belonging to the cluster, and the probability is updated iteratively during the clustering. While mixture models have demonstrated the superior performance in handling noisy data in many fields, there exist some challenges for high dimensional dataset. It is noted that among a large number of features, some may not indeed contribute to delineate the cluster profiles. The inclusion of these “noisy” features will confuse the model to identify the real structure of the clusters and cost more computational time. Recognizing the issue, in this dissertation, I propose a new feature selection algorithm for continuous dataset first and then extend to mixed datatype. Finally, I conduct uncertainty quantification for the feature selection results as the third topic. The first topic is an embedded feature selection algorithm termed Expectation-Selection-Maximization (ESM) model that can automatically select features while optimizing the parameters for Gaussian Mixture Model. I introduce a relevancy index (RI) revealing the contribution of the feature in the clustering process to assist feature selection. I demonstrate the efficacy of the ESM by studying two synthetic datasets, four benchmark datasets, and an Alzheimer’s Disease dataset. The second topic focuses on extending the application of ESM algorithm to handle mixed datatypes. The Gaussian mixture model is generalized to Generalized Model of Mixture (GMoM), which can not only handle continuous features, but also binary and nominal features. The last topic is about Uncertainty Quantification (UQ) of the feature selection. A new algorithm termed ESOM is proposed, which takes the variance information into consideration while conducting feature selection. Also, a set of outliers are generated in the feature selection process to infer the uncertainty in the input data. Finally, the selected features and detected outlier instances are evaluated by visualization comparison. / Dissertation/Thesis / Doctoral Dissertation Industrial Engineering 2020 Industrial engineering Statistics clustering feature selection Gaussian Mixture Model
25	Informative censoring with an imprecise anchor event: estimation of change over time and implications for longitudinal data analysis Collins, Jamie Elizabeth 22 January 2016 (has links) A number of methods have been developed to analyze longitudinal data with dropout. However, there is no uniformly accepted approach. Model performance, in terms of the bias and accuracy of the estimator, depends on the underlying missing data mechanism and it is unclear how existing methods will perform when little is known about the missing data mechanism. Here we evaluate methods for estimating change over time in longitudinal studies with informative dropout in three settings: using a linear mixed effect (LME) estimator in the presence of multiple types of dropout; proposing an update to the pattern mixture modeling (PMM) approach in the presence of imprecision in identifying informative dropouts; and utilizing this new approach in the presence of prognostic factor by dropout interaction. We demonstrate that amount of dropout, the proportion of dropout that is informative, and the variability in outcome all affect the performance of an LME estimator in data with a mixture of informative and non-informative dropout. When the amount of dropout is moderate to large (>20% overall) the potential for relative bias greater than 10% increases, especially with large variability in outcome measure, even under scenarios where only a portion of the dropouts are informative. Under conditions where LME models do not perform well, it is necessary to take the missing data mechanism into account. We develop a method that extends the PMM approach to account for uncertainty in identifying informative dropouts. In scenarios with this uncertainty, the proposed method outperformed the traditional method in terms of bias and coverage. In the presence of interaction between dropout and a prognostic factor, the LME model performed poorly, in terms of bias and coverage, in estimating prognostic factor-specific slopes and the interaction between the prognostic factor and time. The update to the PMM approach, proposed here, outperformed both the LME and traditional PMM. Our work suggests that investigators must be cautious with any analysis of data with informative dropout. We found that particular attention must be paid to the model assumptions when the missing data mechanism is not well understood. Biostatistics Informative censoring Longitudinal study Missing data Pattern mixture model
26	Species Identification and Strain Attribution with Unassembled Sequencing Data Francis, Owen Eric 18 April 2012 (has links) (PDF) Emerging sequencing approaches have revolutionized the way we can collect DNA sequence data for applications in bioforensics and biosurveillance. In this research, we present an approach to construct a database of known biological agents and use this database to develop a statistical framework to analyze raw reads from next-generation sequence data for species identification and strain attribution. Our method capitalizes on a Bayesian statistical framework that accommodates information on sequence quality, mapping quality and provides posterior probabilities of matches to a known database of target genomes. Importantly, our approach also incorporates the possibility that multiple species can be present in the sample or that the target strain is not even contained within the reference database. Furthermore, our approach can accurately discriminate between very closely related strains of the same species with very little coverage of the genome and without the need for genome assembly - a time consuming and labor intensive step. We demonstrate our approach using genomic data from a variety of known bacterial agents of bioterrorism and agents impacting human health. Next-generation sequencing bioforensics biosurveillance Bayesian mixture model Statistics and Probability
27	Hot Spot Identification and Analysis Methodology Farnsworth, Jacob S. 20 November 2013 (has links) (PDF) The Utah Department of Transportation (UDOT) Traffic and Safety Division continues to advance the safety of roadway sections throughout the state. To aid UDOT in meeting their goal the Department of Civil and Environmental Engineering at Brigham Young University (BYU) has worked with the Statistics Department in developing analysis tools for safety. The most recent of these tools has been the development of a hierarchical Bayesian Poisson Mixture Model (PMM) statistical model of traffic crashes and safety on UDOT roadways statewide and the integration of the results of this model in a Geographic Information System (GIS) framework. This research focuses on the enhancement of the framework for highway safety mitigation in Utah with its six primary steps: 1) network screening, 2) diagnosis, 3) countermeasure selection, 4) economic appraisal, 5) project prioritization, and 6) effectiveness evaluation. The framework was enhanced by developing a methodology for accomplishing the steps of network screening, diagnosis, and countermeasure selection. This methodology is titled, "Hot Spot Identification and Analysis." The hot spot identification and analysis methodology consists of the following seven steps: 1) identify problematic segments with safety concern, 2) identify problem spots within the segments, 3) micro analysis of problematic segments and spots, 4) defining the segment, 5) defining the problem, 6) evaluation of possible countermeasures, and 7) selection and recommendation of feasible countermeasures. The methodology is to help in the identification of hot spots with safety concerns so that they can be analyzed and countermeasures can be identified to mitigate the safety issues. Examples of how the methodology is to function are given with specific examples from Utah's state roadway network. Poisson Mixture Model hot spots countermeasures safety Civil and Environmental Engineering
28	A Data-Driven Algorithm for Parameter Estimation in the Parametric Survival Mixture Model Zhang, Jin 12 1900 (has links) <p> We propose a data-driven estimation algorithm in survival mixture model. The objective of this study is to provide an alternative fitting procedure to the conventional EM algorithm. The EM algorithm is the classical ML fitting of the parametric mixture model. If the initial values for the EM algorithm are not properly chosen, the maximizers might be local or divergent. Traditionally, initial values are given manually according to experience or a gridpoint search. This is a heavy burden for a high-dimensional data sets. Also, specifying the ranges of parameters for a grid-point search is difficult. To avoid the specification of initial values, we employ the random partition. Then, improvement of fitting is adjusted according to model specification. This process is repeated a large number of times, so it is computer intensive. The large repetitions makes the solution more likely to be the global maximizer, and it is driven purely by the data. We conduct a simulation study for three cases of two-component Log-Normal, two-component Weibull, and two-component Log-Normal and Wei bull, in order to illustrate the effectiveness of the proposed algorithm. Finally, we apply our algorithm to a breast cancer study data which follows a cure model. The program is written in R. It calls existing R functions, so it is flexible to use in regression situations where model formula must be specified. </p> / Thesis / Master of Science (MSc) Data-Driven Algorithm Parameter Estimation Parametric Survival Mixture Model
29	The Cauchy-Net Mixture Model for Clustering with Anomalous Data Slifko, Matthew D. 11 September 2019 (has links) We live in the data explosion era. The unprecedented amount of data offers a potential wealth of knowledge but also brings about concerns regarding ethical collection and usage. Mistakes stemming from anomalous data have the potential for severe, real-world consequences, such as when building prediction models for housing prices. To combat anomalies, we develop the Cauchy-Net Mixture Model (CNMM). The CNMM is a flexible Bayesian nonparametric tool that employs a mixture between a Dirichlet Process Mixture Model (DPMM) and a Cauchy distributed component, which we call the Cauchy-Net (CN). Each portion of the model offers benefits, as the DPMM eliminates the limitation of requiring a fixed number of a components and the CN captures observations that do not belong to the well-defined components by leveraging its heavy tails. Through isolating the anomalous observations in a single component, we simultaneously identify the observations in the net as warranting further inspection and prevent them from interfering with the formation of the remaining components. The result is a framework that allows for simultaneously clustering observations and making predictions in the face of the anomalous data. We demonstrate the usefulness of the CNMM in a variety of experimental situations and apply the model for predicting housing prices in Fairfax County, Virginia. / Doctor of Philosophy / We live in the data explosion era. The unprecedented amount of data offers a potential wealth of knowledge but also brings about concerns regarding ethical collection and usage. Mistakes stemming from anomalous data have the potential for severe, real-world consequences, such as when building prediction models for housing prices. To combat anomalies, we develop the Cauchy-Net Mixture Model (CNMM). The CNMM is a flexible tool for identifying and isolating the anomalies, while simultaneously discovering cluster structure and making predictions among the nonanomalous observations. The result is a framework that allows for simultaneously clustering and predicting in the face of the anomalous data. We demonstrate the usefulness of the CNMM in a variety of experimental situations and apply the model for predicting housing prices in Fairfax County, Virginia. Bayesian Nonparametrics Dirichlet Process Mixture Model Clustering Anomaly Detection
30	BER Modeling for Interference Canceling Adaptive NLMS Equalizer Roy, Tamoghna 13 January 2015 (has links) Adaptive LMS equalizers are widely used in digital communication systems for their simplicity in implementation. Conventional adaptive filtering theory suggests the upper bound of the performance of such equalizer is determined by the performance of a Wiener filter of the same structure. However, in the presence of a narrowband interferer the performance of the LMS equalizer is better than that of its Wiener counterpart. This phenomenon, termed a non-Wiener effect, has been observed before and substantial work has been done in explaining the underlying reasons. In this work, we focus on the Bit Error Rate (BER) performance of LMS equalizers. At first a model “the Gaussian Mixture (GM) model“ is presented to estimate the BER performance of a Wiener filter operating in an environment dominated by a narrowband interferer. Simulation results show that the model predicts BER accurately for a wide range of SNR, ISR, and equalizer length. Next, a model similar to GM termed the Gaussian Mixture using Steady State Weights (GMSSW) model is proposed to model the BER behavior of the adaptive NLMS equalizer. Simulation results show unsatisfactory performance of the model. A detailed discussion is presented that points out the limitations of the GMSSW model, thereby providing some insight into the non-Wiener behavior of (N)LMS equalizers. An improved model, the Gaussian with Mean Square Error (GMSE), is then proposed. Simulation results show that the GMSE model is able to model the non-Wiener characteristics of the NLMS equalizer when the normalized step size is between 0 and 0.4. A brief discussion is provided on why the model is inaccurate for larger step sizes. / Master of Science Gaussian Mixture Model BER Modeling Non-Wiener Effects (N)LMS Equalizer

Search results