Global ETD Search

111	Theoretical Results and Applications Related to Dimension Reduction Chen, Jie 01 November 2007 (has links) To overcome the curse of dimensionality, dimension reduction is important and necessary for understanding the underlying phenomena in a variety of fields. Dimension reduction is the transformation of high-dimensional data into a meaningful representation in the low-dimensional space. It can be further classified into feature selection and feature extraction. In this thesis, which is composed of four projects, the first two focus on feature selection, and the last two concentrate on feature extraction. The content of the thesis is as follows. The first project presents several efficient methods for the sparse representation of a multiple measurement vector (MMV); some theoretical properties of the algorithms are also discussed. The second project introduces the NP-hardness problem for penalized likelihood estimators, including penalized least squares estimators, penalized least absolute deviation regression and penalized support vector machines. The third project focuses on the application of manifold learning in the analysis and prediction of 24-hour electricity price curves. The last project proposes a new hessian regularized nonlinear time-series model for prediction in time series. HRM Time series prediction Electricity price curve Dimension reduction MMV Penalized likelihood estimator Computational complexity Estimation theory Prediction theory Electricity Pricing
112	Overview of Redundancy Analysis and Partial Linear Squares and Their Extension to the Frequency Domain Liu, Jinyi Jr 30 April 2011 (has links) Applied statisticians are often faced with the problem of dealing with high dimensional data sets when attempting to describe the variability of a single set of variables, or trying to predict the variation of one set of variables from another. In this study, two data reduction methods are described: Redundancy Analysis and Partial Least Squares. A hybrid approach developed by Bougeard et al., (2007) and called Continuum Redundancy-Partial Least Squares, is described. All three methods are extended to the frequency domain in order to allow the lower dimensional subspace used to describe the variability to change with frequency. To illustrate and compare the three methods, and their frequency dependent generalizations, an idealized coupled atmosphere-ocean model is introduced in state space form. This model provides explicit expressions for the covariance and cross spectral matrices required by the various methods; this allows the strengths and weaknesses of the methods to be identified. Atmosphere-Ocean coupling model dimension reduction methods frequency domain multivariate analysis Partial Least Squares Redundancy Analysis
113	Integration of computational methods and visual analytics for large-scale high-dimensional data Choo, Jae gul 20 September 2013 (has links) With the increasing amount of collected data, large-scale high-dimensional data analysis is becoming essential in many areas. These data can be analyzed either by using fully computational methods or by leveraging human capabilities via interactive visualization. However, each method has its drawbacks. While a fully computational method can deal with large amounts of data, it lacks depth in its understanding of the data, which is critical to the analysis. With the interactive visualization method, the user can give a deeper insight on the data but suffers when large amounts of data need to be analyzed. Even with an apparent need for these two approaches to be integrated, little progress has been made. As ways to tackle this problem, computational methods have to be re-designed both theoretically and algorithmically, and the visual analytics system has to expose these computational methods to users so that they can choose the proper algorithms and settings. To achieve an appropriate integration between computational methods and visual analytics, the thesis focuses on essential computational methods for visualization, such as dimension reduction and clustering, and it presents fundamental development of computational methods as well as visual analytic systems involving newly developed methods. The contributions of the thesis include (1) the two-stage dimension reduction framework that better handles significant information loss in visualization of high-dimensional data, (2) efficient parametric updating of computational methods for fast and smooth user interactions, and (3) an iteration-wise integration framework of computational methods in real-time visual analytics. The latter parts of the thesis focus on the development of visual analytics systems involving the presented computational methods, such as (1) Testbed: an interactive visual testbed system for various dimension reduction and clustering methods, (2) iVisClassifier: an interactive visual classification system using supervised dimension reduction, and (3) VisIRR: an interactive visual information retrieval and recommender system for large-scale document data. Dimension reduction Clustering High-dimensional data Visualization Visual analytics Dimensional analysis Data structures (Computer science) Information visualization Visual analytics Mathematical statistics Data processing
114	Spectral and Homogenization Problems Goncalves-Ferreira, Rita Alexandria 01 July 2011 (has links) In this dissertation we will address two types of homogenization problems. The first one is a spectral problem in the realm of lower dimensional theories, whose physical motivation is the study of waves propagation in a domain of very small thickness and where it is introduced a very thin net of heterogeneities. Precisely, we consider an elliptic operator with "ε-periodic coefficients and the corresponding Dirichlet spectral problem in a three-dimensional bounded domain of small thickness δ. We study the asymptotic behavior of the spectrum as ε and δ tend to zero. This asymptotic behavior depends crucially on whether ε and δ are of the same order (δ ≈ ε), or ε is of order smaller than that of δ (δ = ετ , τ < 1), or ε is of order greater than that of δ (δ = ετ , τ > 1). We consider all three cases. The second problem concerns the study of multiscale homogenization problems with linear growth, aimed at the identification of effective energies for composite materials in the presence of fracture or cracks. Precisely, we characterize (n+1)-scale limit pairs (u,U) of sequences {(uεLN⌊Ω,Duε⌊Ω)}ε>0 ⊂ M(Ω;ℝd) × M(Ω;ℝd×N) whenever {uε}ε>0 is a bounded sequence in BV (Ω;ℝd). Using this characterization, we study the asymptotic behavior of periodically oscillating functionals with linear growth, defined in the space BV of functions of bounded variation and described by n ∈ ℕ microscales Periodic homogenization spectral analysis dimension reduction $-convergence asymptotic expansions BV -valued measures multiscale convergence linear growth
115	COPS: Cluster optimized proximity scaling Rusch, Thomas, Mair, Patrick, Hornik, Kurt January 2015 (has links) (PDF) Proximity scaling (i.e., multidimensional scaling and related methods) is a versatile statistical method whose general idea is to reduce the multivariate complexity in a data set by employing suitable proximities between the data points and finding low-dimensional configurations where the fitted distances optimally approximate these proximities. The ultimate goal, however, is often not only to find the optimal configuration but to infer statements about the similarity of objects in the high-dimensional space based on the the similarity in the configuration. Since these two goals are somewhat at odds it can happen that the resulting optimal configuration makes inferring similarities rather difficult. In that case the solution lacks "clusteredness" in the configuration (which we call "c-clusteredness"). We present a version of proximity scaling, coined cluster optimized proximity scaling (COPS), which solves the conundrum by introducing a more clustered appearance into the configuration while adhering to the general idea of multidimensional scaling. In COPS, an arbitrary MDS loss function is parametrized by monotonic transformations and combined with an index that quantifies the c-clusteredness of the solution. This index, the OPTICS cordillera, has intuitively appealing properties with respect to measuring c-clusteredness. This combination of MDS loss and index is called "cluster optimized loss" (coploss) and is minimized to push any configuration towards a more clustered appearance. The effect of the method will be illustrated with various examples: Assessing similarities of countries based on the history of banking crises in the last 200 years, scaling Californian counties with respect to the projected effects of climate change and their social vulnerability, and preprocessing a data set of hand written digits for subsequent classification by nonlinear dimension reduction. (authors' abstract) / Series: Discussion Paper Series / Center for Empirical Research Methods RVK 840
116	維度縮減應用於蛋白質質譜儀資料 / Dimension Reduction on Protein Mass Spectrometry Data 黃靜文, Huang, Ching-Wen Unknown Date (has links) 本文應用攝護腺癌症蛋白質資料庫，是經由表面強化雷射解吸電離飛行質譜技術的血清蛋白質強度資料，藉此資料判斷受測者是否罹患癌症。此資料庫之受測者包含正常、良腫、癌初和癌末四種類別，其中包括兩筆資料，一筆為包含約48000個區間資料(變數)之原始資料，另一筆為經由人工變數篩選後，僅剩餘779區間資料(變數)之人工處理資料，此兩筆皆為高維度資料，皆約有650個觀察值。高維度資料因變數過多，除了分析不易外，亦造成運算時間較長。故本研究目的即探討在有效的維度縮減方式下，找出最小化分錯率的方法。本研究先比較分類方法－支持向量機、類神經網路和分類迴歸樹之優劣，再將較優的分類方法：支持向量機和類神經網路，應用於維度縮減資料之分類。本研究採用之維度縮減方法，包含離散小波分析、主成份分析和主成份分析網路。根據分析結果，離散小波分析和主成份分析表現較佳，而主成份分析網路差強人意。本研究除探討以上維度縮減方法對此病例資料庫分類之成效外，亦結合線性維度縮減－主成份分析，非線性維度縮減－主成份分析網路，希望能藉重疊法再改善僅做單一維度縮減方法之病例篩檢分錯率，根據分析結果，重疊法對原始資料改善效果不明顯，但對人工處理資料卻有明顯的改善效果。 / In this paper, we study the serum protein data set of prostate cancer, which acquired by Surface-Enhanced Laser Desorption/Ionization Time-of-Flight Mass Spectrometry (SELDI-TOF-MS) technique. The data set, with four populations of prostate cancer patients, includes both raw data and preprocessed data. There are around 48000 variables in raw data and 779 variables in preprocessed data. The sample size of each data is around 650. Because of the high dimensionality, this data set provokes higher level of difficulty and computation time. Therefore, the goal of this study is to search efficient dimension reduction methods. We first compare three classification methods: support vector machine, artificial neural network, and classification and regression tree. And, we use discrete wavelet transform, principal component analysis and principal component analysis networks to reduce the data dimension. Then, we discuss the dimension reduction methods and propose overlap method that combines the linear dimension reduction method－principal component analysis, and the nonlinear dimension reduction method－principal component analysis networks to improve the classification result. We find that the improvement of overlap method is significant in the preprocessed data, but not significant in the raw data. 分類維度縮減疾病診斷電腦模擬 Classification Dimension reduction Disease diagnosis Computer simulation
117	Náhodné kótované množiny a redukce dimenze / Random marked sets and dimension reduction Šedivý, Ondřej January 2014 (has links) Random closed sets and random marked closed sets present an important general concept for the description of random objects appearing in a topological space, particularly in the Euclidean space. This thesis deals with two major tasks. At first, it is the dimension reduction problem where dependence of a random closed set on underlying spatial variables is studied. Solving this problem allows to find the most significant regressors or, possibly, to identify the redundant ones. This work achieves both theoretical results, based on extending the inverse regression techniques from classical to spatial statistics, and numerical justification of the methods via simulation studies. The second topic is estimation of characteristics of random marked closed sets which is primarily motivated by an application in the microstructural research. Random marked closed sets present a mathematical model for the description of ultrafine-grained microstructures of metals. Methods for statistical estimation of their selected characteristics are developed in the thesis. Correct quantitative characterization of microstructure of metals allows to better understand their macroscopic properties.
118	Matematické modelování tenkých filmů z martenzitických materiálů / Mathematical modelling of thin films of martensitic materials Pathó, Gabriel January 2015 (has links) The aim of the thesis is the mathematical and computer modelling of thin films of martensitic materials. We derive a thermodynamic thin-film model on the meso-scale that is capable of capturing the evolutionary process of the shape-memory effect through a two-step procedure. First, we apply dimension reduction techniques in a microscopic bulk model, then enlarge gauge by neglecting microscopic interfacial effects. Computer modelling of thin films is conducted for the static case that accounts for a modified Hadamard jump condition which allows for austenite--martensite interfaces that do not exist in the bulk. Further, we characterize $L^p$-Young measures generated by invertible matrices, that have possibly positive determinant as well. The gradient case is covered for mappings the gradients and inverted gradients of which belong to $L^\infty$, a non-trivial problem is the manipulation with boundary conditions on generating sequences, as standard cut-off methods are inapplicable due to the determinant constraint. Lastly, we present new results concerning weak lower semicontinuity of integral functionals along (asymptotically) $\mathcal{A}$-free sequences that are possibly negative and non-coercive. Powered by TCPDF (www.tcpdf.org)
119	Classification non supervisée et sélection de variables dans les modèles mixtes fonctionnels. Applications à la biologie moléculaire / Curve clustering and variable selection in mixed effects functional models. Applications to molecular biology Giacofci, Joyce 22 October 2013 (has links) Un nombre croissant de domaines scientifiques collectent de grandes quantités de données comportant beaucoup de mesures répétées pour chaque individu. Ce type de données peut être vu comme une extension des données longitudinales en grande dimension. Le cadre naturel pour modéliser ce type de données est alors celui des modèles mixtes fonctionnels. Nous traitons, dans une première partie, de la classification non-supervisée dans les modèles mixtes fonctionnels. Nous présentons dans ce cadre une nouvelle procédure utilisant une décomposition en ondelettes des effets fixes et des effets aléatoires. Notre approche se décompose en deux étapes : une étape de réduction de dimension basée sur les techniques de seuillage des ondelettes et une étape de classification où l'algorithme EM est utilisé pour l'estimation des paramètres par maximum de vraisemblance. Nous présentons des résultats de simulations et nous illustrons notre méthode sur des jeux de données issus de la biologie moléculaire (données omiques). Cette procédure est implémentée dans le package R "curvclust" disponible sur le site du CRAN. Dans une deuxième partie, nous nous intéressons aux questions d'estimation et de réduction de dimension au sein des modèles mixtes fonctionnels et nous développons en ce sens deux approches. La première approche se place dans un objectif d'estimation dans un contexte non-paramétrique et nous montrons dans ce cadre, que l'estimateur de l'effet fixe fonctionnel basé sur les techniques de seuillage par ondelettes possède de bonnes propriétés de convergence. Notre deuxième approche s'intéresse à la problématique de sélection des effets fixes et aléatoires et nous proposons une procédure basée sur les techniques de sélection de variables par maximum de vraisemblance pénalisée et utilisant deux pénalités SCAD sur les effets fixes et les variances des effets aléatoires. Nous montrons dans ce cadre que le critère considéré conduit à des estimateurs possédant des propriétés oraculaires dans un cadre où le nombre d'individus et la taille des signaux divergent. Une étude de simulation visant à appréhender les comportements des deux approches développées est réalisée dans ce contexte. / More and more scientific studies yield to the collection of a large amount of data that consist of sets of curves recorded on individuals. These data can be seen as an extension of longitudinal data in high dimension and are often modeled as functional data in a mixed-effects framework. In a first part we focus on performing unsupervised clustering of these curves in the presence of inter-individual variability. To this end, we develop a new procedure based on a wavelet representation of the model, for both fixed and random effects. Our approach follows two steps : a dimension reduction step, based on wavelet thresholding techniques, is first performed. Then a clustering step is applied on the selected coefficients. An EM-algorithm is used for maximum likelihood estimation of parameters. The properties of the overall procedure are validated by an extensive simulation study. We also illustrate our method on high throughput molecular data (omics data) like microarray CGH or mass spectrometry data. Our procedure is available through the R package "curvclust", available on the CRAN website. In a second part, we concentrate on estimation and dimension reduction issues in the mixed-effects functional framework. Two distinct approaches are developed according to these issues. The first approach deals with parameters estimation in a non parametrical setting. We demonstrate that the functional fixed effects estimator based on wavelet thresholding techniques achieves the expected rate of convergence toward the true function. The second approach is dedicated to the selection of both fixed and random effects. We propose a method based on a penalized likelihood criterion with SCAD penalties for the estimation and the selection of both fixed effects and random effects variances. In the context of variable selection we prove that the penalized estimators enjoy the oracle property when the signal size diverges with the sample size. A simulation study is carried out to assess the behaviour of the two proposed approaches. Ondelettes Réduction de dimension Modèles mixtes Algorithme EM Classification non supervisée Sélection de variables Wavelets Dimension reduction Mixed models EM algorithm Clustering Variable selection 510
120	Modelling and analysis of wireless MAC protocols with applications to vehicular networks Jafarian, Javad January 2014 (has links) The popularity of the wireless networks is so great that we will soon reach the point where most of the devices work based on that, but new challenges in wireless channel access will be created with these increasingly widespread wireless communications. Multi-channel CSMA protocols have been designed to enhance the throughput of the next generation wireless networks compared to single-channel protocols. However, their performance analysis still needs careful considerations. In this thesis, a set of techniques are proposed to model and analyse the CSMA protocols in terms of channel sensing and channel access. In that respect, the performance analysis of un-slotted multi-channel CSMA protocols is studied through considering the hidden terminals. In the modelling phase, important parameters such as shadowing and path loss impairments are being considered. Following that, due to the high importance of spectrum sensing in CSMA protocols, the Double-Threshold Energy Detector (DTED) is thoroughly investigated in this thesis. An iterative algorithm is also proposed to determine optimum values of detection parameters in a sensing-throughput problem formulation. Vehicle-to-Roadside (V2R) communication, as a part of Intelligent Transportation System (ITS), over multi-channel wireless networks is also modelled and analysed in this thesis. In this respect, through proposing a novel mathematical model, the connectivity level which an arbitrary vehicle experiences during its packet transmission with a RSU is also investigated. 621.382

Search results