• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 90
  • 9
  • 9
  • 5
  • 4
  • 4
  • 2
  • 1
  • 1
  • Tagged with
  • 152
  • 152
  • 39
  • 37
  • 36
  • 22
  • 20
  • 19
  • 18
  • 18
  • 17
  • 17
  • 17
  • 15
  • 15
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
91

INFORMATIONAL INDEX AND ITS APPLICATIONS IN HIGH DIMENSIONAL DATA

Yuan, Qingcong 01 January 2017 (has links)
We introduce a new class of measures for testing independence between two random vectors, which uses expected difference of conditional and marginal characteristic functions. By choosing a particular weight function in the class, we propose a new index for measuring independence and study its property. Two empirical versions are developed, their properties, asymptotics, connection with existing measures and applications are discussed. Implementation and Monte Carlo results are also presented. We propose a two-stage sufficient variable selections method based on the new index to deal with large p small n data. The method does not require model specification and especially focuses on categorical response. Our approach always improves other typical screening approaches which only use marginal relation. Numerical studies are provided to demonstrate the advantages of the method. We introduce a novel approach to sufficient dimension reduction problems using the new measure. The proposed method requires very mild conditions on the predictors, estimates the central subspace effectively and is especially useful when response is categorical. It keeps the model-free advantage without estimating link function. Under regularity conditions, root-n consistency and asymptotic normality are established. The proposed method is very competitive and robust comparing to existing dimension reduction methods through simulations results.
92

EFFICIENT NUMERICAL METHODS FOR KINETIC EQUATIONS WITH HIGH DIMENSIONS AND UNCERTAINTIES

Yubo Wang (11792576) 19 December 2021 (has links)
<div><div>In this thesis, we focus on two challenges arising in kinetic equations, high dimensions and uncertainties. To reduce the dimensions, we proposed efficient methods for linear Boltzmann and full Boltzmann equations based on dynamic low-rank frameworks. For linear Boltzmann equation, we proposed a method that is based on macro-micro decomposition of the equation; the low-rank approximation is only used for the micro part of the solution. The time and spatial discretizations are done properly so that the overall scheme is second-order accurate (in both the fully kinetic and the limit regime) and asymptotic-preserving (AP). That is, in the diffusive regime, the scheme becomes a macroscopic solver for the limiting diffusion equation that automatically captures the low-rank structure of the solution. Moreover, the method can be implemented in a fully explicit way and is thus significantly more efficient compared to the previous state of the art. We demonstrate the accuracy and efficiency of the proposed low-rank method by a number of four-dimensional (two dimensions in physical space and two dimensions in velocity space) simulations. We further study the adaptivity of low-rank methods in full Boltzmann equation. We proposed a highly efficient adaptive low- rank method in Boltzmann equation for computations of steady state solutions. The main novelties of this approach are: On one hand, to the best of our knowledge, the dynamic low- rank integrator hasn’t been applied to full Boltzmann equation till date. The full collision operator is local in spatial variable while the convection part is local in velocity variable. This separated nature is well-suited for low-rank methods. Compared with full grid method (finite difference, finite volume,...), the dynamic low-rank method can avoid the full computations of collision operators in each spatial grid/elements. Resultingly, it can achieve much better efficiency especially for some low rank flows (e.g. normal shock wave). On the other hand, our adaptive low-rank method uses a novel dynamic thresholding strategy to adaptively control the computational rank to achieve better efficiency especially for steady state solutions. We demonstrate the accuracy and efficiency of the proposed adaptive low rank method by a number of 1D/2D Maxwell molecule benchmark tests. On the other hand, for kinetic equations with uncertainties, we focus on non-intrusive sampling methods where we are able to inherit good properties (AP, positivity preserving) from existing deterministic solvers. We propose a control variate multilevel Monte Carlo method for the kinetic BGK model of the Boltzmann equation subject to random inputs. The method combines a multilevel Monte Carlo technique with the computation of the optimal control variate multipliers derived from local or global variance minimization prob- lems. Consistency and convergence analysis for the method equipped with a second-order positivity-preserving and asymptotic-preserving scheme in space and time is also performed. Various numerical examples confirm that the optimized multilevel Monte Carlo method outperforms the classical multilevel Monte Carlo method especially for problems with dis- continuities<br></div></div>
93

Comparing Building Energy Benchmarking Metrics using Dimension Reduction Techniques

Agale, Ketaki 21 October 2019 (has links)
No description available.
94

Learning Techniques For Information Retrieval And Mining In High-dimensional Databases

Cheng, Hao 01 January 2009 (has links)
The main focus of my research is to design effective learning techniques for information retrieval and mining in high-dimensional databases. There are two main aspects in the retrieval and mining research: accuracy and efficiency. The accuracy problem is how to return results which can better match the ground truth, and the efficiency problem is how to evaluate users' requests and execute learning algorithms as fast as possible. However, these problems are non-trivial because of the complexity of the high-level semantic concepts, the heterogeneous natures of the feature space, the high dimensionality of data representations and the size of the databases. My dissertation is dedicated to addressing these issues. Specifically, my work has five main contributions as follows. The first contribution is a novel manifold learning algorithm, Local and Global Structures Preserving Projection (LGSPP), which defines salient low-dimensional representations for the high-dimensional data. A small number of projection directions are sought in order to properly preserve the local and global structures for the original data. Specifically, two groups of points are extracted for each individual point in the dataset: the first group contains the nearest neighbors of the point, and the other set are a few sampled points far away from the point. These two point sets respectively characterize the local and global structures with regard to the data point. The objective of the embedding is to minimize the distances of the points in each local neighborhood and also to disperse the points far away from their respective remote points in the original space. In this way, the relationships between the data in the original space are well preserved with little distortions. The second contribution is a new constrained clustering algorithm. Conventionally, clustering is an unsupervised learning problem, which systematically partitions a dataset into a small set of clusters such that data in each cluster appear similar to each other compared with those in other clusters. In the proposal, the partial human knowledge is exploited to find better clustering results. Two kinds of constraints are integrated into the clustering algorithm. One is the must-link constraint, indicating that the involved two points belong to the same cluster. On the other hand, the cannot-link constraint denotes that two points are not within the same cluster. Given the input constraints, data points are arranged into small groups and a graph is constructed to preserve the semantic relations between these groups. The assignment procedure makes a best effort to assign each group to a feasible cluster without violating the constraints. The theoretical analysis reveals that the probability of data points being assigned to the true clusters is much higher by the new proposal, compared to conventional methods. In general, the new scheme can produce clusters which can better match the ground truth and respect the semantic relations between points inferred from the constraints. The third contribution is a unified framework for partition-based dimension reduction techniques, which allows efficient similarity retrieval in the high-dimensional data space. Recent similarity search techniques, such as Piecewise Aggregate Approximation (PAA), Segmented Means (SMEAN) and Mean-Standard deviation (MS), prove to be very effective in reducing data dimensionality by partitioning dimensions into subsets and extracting aggregate values from each dimension subset. These partition-based techniques have many advantages including very efficient multi-phased pruning while being simple to implement. They, however, are not adaptive to different characteristics of data in diverse applications. In this study, a unified framework for these partition-based techniques is proposed and the issue of dimension partitions is examined in this framework. An investigation of the relationships of query selectivity and the dimension partition schemes discovers indicators which can predict the performance of a partitioning setting. Accordingly, a greedy algorithm is designed to effectively determine a good partitioning of data dimensions so that the performance of the reduction technique is robust with regard to different datasets. The fourth contribution is an effective similarity search technique in the database of point sets. In the conventional model, an object corresponds to a single vector. In the proposed study, an object is represented by a set of points. In general, this new representation can be used in many real-world applications and carries much more local information, but the retrieval and learning problems become very challenging. The Hausdorff distance is the common distance function to measure the similarity between two point sets, however, this metric is sensitive to outliers in the data. To address this issue, a novel similarity function is defined to better capture the proximity of two objects, in which a one-to-one mapping is established between vectors of the two objects. The optimal mapping minimizes the sum of distances between each paired points. The overall distance of the optimal matching is robust and has high retrieval accuracy. The computation of the new distance function is formulated into the classical assignment problem. The lower-bounding techniques and early-stop mechanism are also proposed to significantly accelerate the expensive similarity search process. The classification problem over the point-set data is called Multiple Instance Learning (MIL) in the machine learning community in which a vector is an instance and an object is a bag of instances. The fifth contribution is to convert the MIL problem into a standard supervised learning in the conventional vector space. Specially, feature vectors of bags are grouped into clusters. Each object is then denoted as a bag of cluster labels, and common patterns of each category are discovered, each of which is further reconstructed into a bag of features. Accordingly, a bag is effectively mapped into a feature space defined by the distances from this bag to all the derived patterns. The standard supervised learning algorithms can be applied to classify objects into pre-defined categories. The results demonstrate that the proposal has better classification accuracy compared to other state-of-the-art techniques. In the future, I will continue to explore my research in large-scale data analysis algorithms, applications and system developments. Especially, I am interested in applications to analyze the massive volume of online data.
95

Behaviour recognition and monitoring of the elderly using wearable wireless sensors. Dynamic behaviour modelling and nonlinear classification methods and implementation.

Winkley, Jonathan James January 2013 (has links)
In partnership with iMonSys - an emerging company in the passive care field - a new system, 'Verity', is being developed to fulfil the role of a passive behaviour monitoring and alert detection device, providing an unobtrusive level of care and assessing an individual's changing behaviour and health status whilst still allowing for independence of its elderly user. In this research, a Hidden Markov Model incorporating Fuzzy Logic-based sensor fusion is created for the behaviour detection within Verity, with a method of Fuzzy-Rule induction designed for the system's adaptation to a user during operation. A dimension reduction and classification scheme utilising Curvilinear Distance Analysis is further developed to deal with the recognition task presented by increasingly nonlinear and high dimension sensor readings, and anomaly detection methods situated within the Hidden Markov Model provide possible solutions to identification of health concerns arising from independent living. Real-time implementation is proposed through development of an Instance Based Learning approach in combination with a Bloom Filter, speeding up the classification operation and reducing the storage requirements for the considerable amount of observation data obtained during operation. Finally, evaluation of all algorithms is completed using a simulation of the Verity system with which the behaviour monitoring task is to be achieved.
96

Projection separability: A new approach to evaluate embedding algorithms in the geometrical space

Acevedo Toledo, Aldo Marcelino 06 February 2024 (has links)
Evaluating separability is fundamental to pattern recognition. A plethora of embedding methods, such as dimension reduction and network embedding algorithms, have been developed to reveal the emergence of geometrical patterns in a low-dimensional space, where high-dimensional sample and node similarities are approximated by geometrical distances. However, statistical measures to evaluate the separability attained by the embedded representations are missing. Traditional cluster validity indices (CVIs) might be applied in this context, but they present multiple limitations because they are not specifically tailored for evaluating the separability of embedded results. This work introduces a new rationale called projection separability (PS), which provides a methodology expressly designed to assess the separability of data samples in a reduced (i.e., low-dimensional) geometrical space. In a first case study, using this rationale, a new class of indices named projection separability indices (PSIs) is implemented based on four statistical measures: Mann-Whitney U-test p-value, Area Under the ROC-Curve, Area Under the Precision-Recall Curve, and Matthews Correlation Coefficient. The PSIs are compared to six representative cluster validity indices and one geometrical separability index using seven nonlinear datasets and six different dimension reduction algorithms. In a second case study, the PS rationale is extended to define and measure the geometric separability (linear and nonlinear) of mesoscale patterns in complex data visualization by solving the traveling salesman problem, offering experimental evidence on the evaluation of community separability of network embedding results using eight real network datasets and three network embedding algorithms. The results of both studies provide evidence that the implemented statistical-based measures designed on the basis of the PS rationale are more accurate than the other indices and can be adopted not only for evaluating and comparing the separability of embedded results in the low-dimensional space but also for fine-tuning embedding algorithms’ hyperparameters. Besides these advantages, the PS rationale can be used to design new statistical-based separability measures other than the ones presented in this work, providing the community with a novel and flexible framework for assessing separability.
97

DIMENSION REDUCTION, OPERATOR LEARNING AND UNCERTAINTY QUANTIFICATION FOR PROBLEMS OF DIFFERENTIAL EQUATIONS

Shiqi Zhang (12872678) 26 July 2022 (has links)
<p>In this work, we mainly focus on the topic related to dimension reduction, operator learning and uncertainty quantification for problems of differential equations. The supervised machine learning methods introduced here belong to a newly booming field compared to traditional numerical methods. The building blocks for our works are mainly Gaussian process and neural network. </p> <p><br></p> <p>The first work focuses on supervised dimension reduction problems. A new framework based on rotated multi-fidelity Gaussian process regression is introduced. It can effectively solve high-dimensional problems while the data are insufficient for traditional methods. Moreover, an accurate surrogate Gaussian process model of original problem can be formulated. The second one we would like to introduce is a physics-assisted Gaussian process framework with active learning for forward and inverse problems of partial differential equations(PDEs). In this work, Gaussian process regression model is incorporated with given physical information to find solutions or discover unknown coefficients of given PDEs. Three different models are introduce and their performance are compared and discussed. Lastly, we propose attention based MultiAuto-DeepONet for operator learning of stochastic problems. The target of this work is to solve operator learning problems related to time-dependent stochastic differential equations(SDEs). The work is built on MultiAuto-DeepONet and attention mechanism is applied to improve the model performance in specific type of problems. Three different types of attention mechanism are presented and compared. Numerical experiments are provided to illustrate the effectiveness of our proposed models.</p>
98

Model-Free Variable Selection For Two Groups of Variables

Alothman, Ahmad January 2018 (has links)
In this dissertation we introduce two variable selection procedures for multivariate responses. Our procedures are based on sufficient dimension reduction concepts and are model-free. In the first procedure we consider the dual marginal coordinate hypotheses, where the role of the predictor and the response is not important. Motivated by canonical correlation analysis (CCA), we propose a CCA-based test for the dual marginal coordinate hypotheses, and devise a joint backward selection algorithm for dual model-free variable selection. The second procedure is based on ordinary least squares (OLS). We derive and study the asymptotic properties of the OLS-based test under the normality assumption of the predictors as well as an asymmetry assumption. When these assumptions are violated, the asymptotic test with elliptical trimming and clustering is still valid with desirable numerical performances. A backward selection algorithm for the predictor is also provided for the OLS-based test. The performances of the proposed tests and the variable selection procedures are evaluated through synthetic examples and a real data analysis. / Statistics
99

Error Analysis for Geometric Finite Element Discretizations of a Cosserat Rod Optimization Problem

Bauer, Robert 08 April 2024 (has links)
In summary, this thesis focuses on developing an a priori theory for geometric finite element discretizations of a Cosserat rod model, which is derived from incompatible elasticity. This theory will be supported by corresponding numerical experiments to validate the convergence behavior of the proposed method. The main result describes the qualitative behavior of intrinsic H1-errors and L2-errors in terms of the mesh diameter 0 < h ≪ 1 of the approximation scheme. Geometric Finite Element functions uh with its subclasses Geodesic Finite Elements and Projection- based Finite Elements as conforming path-independent and objective discretizations of Cosserat rod configurations were used. Existence, regularity, variational bounds and vector field transport estimates of the Cosserat rod model were derived to ob- tain an intrinsic a-priori theory. In the second part, this thesis concerns the derivation of the Cosserat rod from 3D elasticity featuring prestress together with numerical experiments for microheteroge- neous prestressed materials.
100

COPS: Cluster optimized proximity scaling

Rusch, Thomas, Mair, Patrick, Hornik, Kurt January 2015 (has links) (PDF)
Proximity scaling (i.e., multidimensional scaling and related methods) is a versatile statistical method whose general idea is to reduce the multivariate complexity in a data set by employing suitable proximities between the data points and finding low-dimensional configurations where the fitted distances optimally approximate these proximities. The ultimate goal, however, is often not only to find the optimal configuration but to infer statements about the similarity of objects in the high-dimensional space based on the the similarity in the configuration. Since these two goals are somewhat at odds it can happen that the resulting optimal configuration makes inferring similarities rather difficult. In that case the solution lacks "clusteredness" in the configuration (which we call "c-clusteredness"). We present a version of proximity scaling, coined cluster optimized proximity scaling (COPS), which solves the conundrum by introducing a more clustered appearance into the configuration while adhering to the general idea of multidimensional scaling. In COPS, an arbitrary MDS loss function is parametrized by monotonic transformations and combined with an index that quantifies the c-clusteredness of the solution. This index, the OPTICS cordillera, has intuitively appealing properties with respect to measuring c-clusteredness. This combination of MDS loss and index is called "cluster optimized loss" (coploss) and is minimized to push any configuration towards a more clustered appearance. The effect of the method will be illustrated with various examples: Assessing similarities of countries based on the history of banking crises in the last 200 years, scaling Californian counties with respect to the projected effects of climate change and their social vulnerability, and preprocessing a data set of hand written digits for subsequent classification by nonlinear dimension reduction. (authors' abstract) / Series: Discussion Paper Series / Center for Empirical Research Methods

Page generated in 0.0792 seconds