• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 90
  • 9
  • 9
  • 5
  • 4
  • 4
  • 2
  • 1
  • 1
  • Tagged with
  • 152
  • 152
  • 39
  • 37
  • 36
  • 22
  • 20
  • 19
  • 18
  • 18
  • 17
  • 17
  • 17
  • 15
  • 15
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
111

Overview of Redundancy Analysis and Partial Linear Squares and Their Extension to the Frequency Domain

Liu, Jinyi Jr 30 April 2011 (has links)
Applied statisticians are often faced with the problem of dealing with high dimensional data sets when attempting to describe the variability of a single set of variables, or trying to predict the variation of one set of variables from another. In this study, two data reduction methods are described: Redundancy Analysis and Partial Least Squares. A hybrid approach developed by Bougeard et al., (2007) and called Continuum Redundancy-Partial Least Squares, is described. All three methods are extended to the frequency domain in order to allow the lower dimensional subspace used to describe the variability to change with frequency. To illustrate and compare the three methods, and their frequency dependent generalizations, an idealized coupled atmosphere-ocean model is introduced in state space form. This model provides explicit expressions for the covariance and cross spectral matrices required by the various methods; this allows the strengths and weaknesses of the methods to be identified.
112

Integration of computational methods and visual analytics for large-scale high-dimensional data

Choo, Jae gul 20 September 2013 (has links)
With the increasing amount of collected data, large-scale high-dimensional data analysis is becoming essential in many areas. These data can be analyzed either by using fully computational methods or by leveraging human capabilities via interactive visualization. However, each method has its drawbacks. While a fully computational method can deal with large amounts of data, it lacks depth in its understanding of the data, which is critical to the analysis. With the interactive visualization method, the user can give a deeper insight on the data but suffers when large amounts of data need to be analyzed. Even with an apparent need for these two approaches to be integrated, little progress has been made. As ways to tackle this problem, computational methods have to be re-designed both theoretically and algorithmically, and the visual analytics system has to expose these computational methods to users so that they can choose the proper algorithms and settings. To achieve an appropriate integration between computational methods and visual analytics, the thesis focuses on essential computational methods for visualization, such as dimension reduction and clustering, and it presents fundamental development of computational methods as well as visual analytic systems involving newly developed methods. The contributions of the thesis include (1) the two-stage dimension reduction framework that better handles significant information loss in visualization of high-dimensional data, (2) efficient parametric updating of computational methods for fast and smooth user interactions, and (3) an iteration-wise integration framework of computational methods in real-time visual analytics. The latter parts of the thesis focus on the development of visual analytics systems involving the presented computational methods, such as (1) Testbed: an interactive visual testbed system for various dimension reduction and clustering methods, (2) iVisClassifier: an interactive visual classification system using supervised dimension reduction, and (3) VisIRR: an interactive visual information retrieval and recommender system for large-scale document data.
113

Spectral and Homogenization Problems

Goncalves-Ferreira, Rita Alexandria 01 July 2011 (has links)
In this dissertation we will address two types of homogenization problems. The first one is a spectral problem in the realm of lower dimensional theories, whose physical motivation is the study of waves propagation in a domain of very small thickness and where it is introduced a very thin net of heterogeneities. Precisely, we consider an elliptic operator with "ε-periodic coefficients and the corresponding Dirichlet spectral problem in a three-dimensional bounded domain of small thickness δ. We study the asymptotic behavior of the spectrum as ε and δ tend to zero. This asymptotic behavior depends crucially on whether ε and δ are of the same order (δ ≈ ε), or ε is of order smaller than that of δ (δ = ετ , τ < 1), or ε is of order greater than that of δ (δ = ετ , τ > 1). We consider all three cases. The second problem concerns the study of multiscale homogenization problems with linear growth, aimed at the identification of effective energies for composite materials in the presence of fracture or cracks. Precisely, we characterize (n+1)-scale limit pairs (u,U) of sequences {(uεLN⌊Ω,Duε⌊Ω)}ε>0 ⊂ M(Ω;ℝd) × M(Ω;ℝd×N) whenever {uε}ε>0 is a bounded sequence in BV (Ω;ℝd). Using this characterization, we study the asymptotic behavior of periodically oscillating functionals with linear growth, defined in the space BV of functions of bounded variation and described by n ∈ ℕ microscales
114

COPS: Cluster optimized proximity scaling

Rusch, Thomas, Mair, Patrick, Hornik, Kurt January 2015 (has links) (PDF)
Proximity scaling (i.e., multidimensional scaling and related methods) is a versatile statistical method whose general idea is to reduce the multivariate complexity in a data set by employing suitable proximities between the data points and finding low-dimensional configurations where the fitted distances optimally approximate these proximities. The ultimate goal, however, is often not only to find the optimal configuration but to infer statements about the similarity of objects in the high-dimensional space based on the the similarity in the configuration. Since these two goals are somewhat at odds it can happen that the resulting optimal configuration makes inferring similarities rather difficult. In that case the solution lacks "clusteredness" in the configuration (which we call "c-clusteredness"). We present a version of proximity scaling, coined cluster optimized proximity scaling (COPS), which solves the conundrum by introducing a more clustered appearance into the configuration while adhering to the general idea of multidimensional scaling. In COPS, an arbitrary MDS loss function is parametrized by monotonic transformations and combined with an index that quantifies the c-clusteredness of the solution. This index, the OPTICS cordillera, has intuitively appealing properties with respect to measuring c-clusteredness. This combination of MDS loss and index is called "cluster optimized loss" (coploss) and is minimized to push any configuration towards a more clustered appearance. The effect of the method will be illustrated with various examples: Assessing similarities of countries based on the history of banking crises in the last 200 years, scaling Californian counties with respect to the projected effects of climate change and their social vulnerability, and preprocessing a data set of hand written digits for subsequent classification by nonlinear dimension reduction. (authors' abstract) / Series: Discussion Paper Series / Center for Empirical Research Methods
115

維度縮減應用於蛋白質質譜儀資料 / Dimension Reduction on Protein Mass Spectrometry Data

黃靜文, Huang, Ching-Wen Unknown Date (has links)
本文應用攝護腺癌症蛋白質資料庫,是經由表面強化雷射解吸電離飛行質譜技術的血清蛋白質強度資料,藉此資料判斷受測者是否罹患癌症。此資料庫之受測者包含正常、良腫、癌初和癌末四種類別,其中包括兩筆資料,一筆為包含約48000個區間資料(變數)之原始資料,另一筆為經由人工變數篩選後,僅剩餘779區間資料(變數)之人工處理資料,此兩筆皆為高維度資料,皆約有650個觀察值。高維度資料因變數過多,除了分析不易外,亦造成運算時間較長。故本研究目的即探討在有效的維度縮減方式下,找出最小化分錯率的方法。 本研究先比較分類方法-支持向量機、類神經網路和分類迴歸樹之優劣,再將較優的分類方法:支持向量機和類神經網路,應用於維度縮減資料之分類。本研究採用之維度縮減方法,包含離散小波分析、主成份分析和主成份分析網路。根據分析結果,離散小波分析和主成份分析表現較佳,而主成份分析網路差強人意。 本研究除探討以上維度縮減方法對此病例資料庫分類之成效外,亦結合線性維度縮減-主成份分析,非線性維度縮減-主成份分析網路,希望能藉重疊法再改善僅做單一維度縮減方法之病例篩檢分錯率,根據分析結果,重疊法對原始資料改善效果不明顯,但對人工處理資料卻有明顯的改善效果。 / In this paper, we study the serum protein data set of prostate cancer, which acquired by Surface-Enhanced Laser Desorption/Ionization Time-of-Flight Mass Spectrometry (SELDI-TOF-MS) technique. The data set, with four populations of prostate cancer patients, includes both raw data and preprocessed data. There are around 48000 variables in raw data and 779 variables in preprocessed data. The sample size of each data is around 650. Because of the high dimensionality, this data set provokes higher level of difficulty and computation time. Therefore, the goal of this study is to search efficient dimension reduction methods. We first compare three classification methods: support vector machine, artificial neural network, and classification and regression tree. And, we use discrete wavelet transform, principal component analysis and principal component analysis networks to reduce the data dimension. Then, we discuss the dimension reduction methods and propose overlap method that combines the linear dimension reduction method-principal component analysis, and the nonlinear dimension reduction method-principal component analysis networks to improve the classification result. We find that the improvement of overlap method is significant in the preprocessed data, but not significant in the raw data.
116

Náhodné kótované množiny a redukce dimenze / Random marked sets and dimension reduction

Šedivý, Ondřej January 2014 (has links)
Random closed sets and random marked closed sets present an important general concept for the description of random objects appearing in a topological space, particularly in the Euclidean space. This thesis deals with two major tasks. At first, it is the dimension reduction problem where dependence of a random closed set on underlying spatial variables is studied. Solving this problem allows to find the most significant regressors or, possibly, to identify the redundant ones. This work achieves both theoretical results, based on extending the inverse regression techniques from classical to spatial statistics, and numerical justification of the methods via simulation studies. The second topic is estimation of characteristics of random marked closed sets which is primarily motivated by an application in the microstructural research. Random marked closed sets present a mathematical model for the description of ultrafine-grained microstructures of metals. Methods for statistical estimation of their selected characteristics are developed in the thesis. Correct quantitative characterization of microstructure of metals allows to better understand their macroscopic properties.
117

Matematické modelování tenkých filmů z martenzitických materiálů / Mathematical modelling of thin films of martensitic materials

Pathó, Gabriel January 2015 (has links)
The aim of the thesis is the mathematical and computer modelling of thin films of martensitic materials. We derive a thermodynamic thin-film model on the meso-scale that is capable of capturing the evolutionary process of the shape-memory effect through a two-step procedure. First, we apply dimension reduction techniques in a microscopic bulk model, then enlarge gauge by neglecting microscopic interfacial effects. Computer modelling of thin films is conducted for the static case that accounts for a modified Hadamard jump condition which allows for austenite--martensite interfaces that do not exist in the bulk. Further, we characterize $L^p$-Young measures generated by invertible matrices, that have possibly positive determinant as well. The gradient case is covered for mappings the gradients and inverted gradients of which belong to $L^\infty$, a non-trivial problem is the manipulation with boundary conditions on generating sequences, as standard cut-off methods are inapplicable due to the determinant constraint. Lastly, we present new results concerning weak lower semicontinuity of integral functionals along (asymptotically) $\mathcal{A}$-free sequences that are possibly negative and non-coercive. Powered by TCPDF (www.tcpdf.org)
118

Classification non supervisée et sélection de variables dans les modèles mixtes fonctionnels. Applications à la biologie moléculaire / Curve clustering and variable selection in mixed effects functional models. Applications to molecular biology

Giacofci, Joyce 22 October 2013 (has links)
Un nombre croissant de domaines scientifiques collectent de grandes quantités de données comportant beaucoup de mesures répétées pour chaque individu. Ce type de données peut être vu comme une extension des données longitudinales en grande dimension. Le cadre naturel pour modéliser ce type de données est alors celui des modèles mixtes fonctionnels. Nous traitons, dans une première partie, de la classification non-supervisée dans les modèles mixtes fonctionnels. Nous présentons dans ce cadre une nouvelle procédure utilisant une décomposition en ondelettes des effets fixes et des effets aléatoires. Notre approche se décompose en deux étapes : une étape de réduction de dimension basée sur les techniques de seuillage des ondelettes et une étape de classification où l'algorithme EM est utilisé pour l'estimation des paramètres par maximum de vraisemblance. Nous présentons des résultats de simulations et nous illustrons notre méthode sur des jeux de données issus de la biologie moléculaire (données omiques). Cette procédure est implémentée dans le package R "curvclust" disponible sur le site du CRAN. Dans une deuxième partie, nous nous intéressons aux questions d'estimation et de réduction de dimension au sein des modèles mixtes fonctionnels et nous développons en ce sens deux approches. La première approche se place dans un objectif d'estimation dans un contexte non-paramétrique et nous montrons dans ce cadre, que l'estimateur de l'effet fixe fonctionnel basé sur les techniques de seuillage par ondelettes possède de bonnes propriétés de convergence. Notre deuxième approche s'intéresse à la problématique de sélection des effets fixes et aléatoires et nous proposons une procédure basée sur les techniques de sélection de variables par maximum de vraisemblance pénalisée et utilisant deux pénalités SCAD sur les effets fixes et les variances des effets aléatoires. Nous montrons dans ce cadre que le critère considéré conduit à des estimateurs possédant des propriétés oraculaires dans un cadre où le nombre d'individus et la taille des signaux divergent. Une étude de simulation visant à appréhender les comportements des deux approches développées est réalisée dans ce contexte. / More and more scientific studies yield to the collection of a large amount of data that consist of sets of curves recorded on individuals. These data can be seen as an extension of longitudinal data in high dimension and are often modeled as functional data in a mixed-effects framework. In a first part we focus on performing unsupervised clustering of these curves in the presence of inter-individual variability. To this end, we develop a new procedure based on a wavelet representation of the model, for both fixed and random effects. Our approach follows two steps : a dimension reduction step, based on wavelet thresholding techniques, is first performed. Then a clustering step is applied on the selected coefficients. An EM-algorithm is used for maximum likelihood estimation of parameters. The properties of the overall procedure are validated by an extensive simulation study. We also illustrate our method on high throughput molecular data (omics data) like microarray CGH or mass spectrometry data. Our procedure is available through the R package "curvclust", available on the CRAN website. In a second part, we concentrate on estimation and dimension reduction issues in the mixed-effects functional framework. Two distinct approaches are developed according to these issues. The first approach deals with parameters estimation in a non parametrical setting. We demonstrate that the functional fixed effects estimator based on wavelet thresholding techniques achieves the expected rate of convergence toward the true function. The second approach is dedicated to the selection of both fixed and random effects. We propose a method based on a penalized likelihood criterion with SCAD penalties for the estimation and the selection of both fixed effects and random effects variances. In the context of variable selection we prove that the penalized estimators enjoy the oracle property when the signal size diverges with the sample size. A simulation study is carried out to assess the behaviour of the two proposed approaches.
119

Modelling and analysis of wireless MAC protocols with applications to vehicular networks

Jafarian, Javad January 2014 (has links)
The popularity of the wireless networks is so great that we will soon reach the point where most of the devices work based on that, but new challenges in wireless channel access will be created with these increasingly widespread wireless communications. Multi-channel CSMA protocols have been designed to enhance the throughput of the next generation wireless networks compared to single-channel protocols. However, their performance analysis still needs careful considerations. In this thesis, a set of techniques are proposed to model and analyse the CSMA protocols in terms of channel sensing and channel access. In that respect, the performance analysis of un-slotted multi-channel CSMA protocols is studied through considering the hidden terminals. In the modelling phase, important parameters such as shadowing and path loss impairments are being considered. Following that, due to the high importance of spectrum sensing in CSMA protocols, the Double-Threshold Energy Detector (DTED) is thoroughly investigated in this thesis. An iterative algorithm is also proposed to determine optimum values of detection parameters in a sensing-throughput problem formulation. Vehicle-to-Roadside (V2R) communication, as a part of Intelligent Transportation System (ITS), over multi-channel wireless networks is also modelled and analysed in this thesis. In this respect, through proposing a novel mathematical model, the connectivity level which an arbitrary vehicle experiences during its packet transmission with a RSU is also investigated.
120

Text mining Twitter social media for Covid-19 : Comparing latent semantic analysis and latent Dirichlet allocation

Sheikha, Hassan January 2020 (has links)
In this thesis, the Twitter social media is data mined for information about the covid-19 outbreak during the month of March, starting from the 3’rd and ending on the 31’st. 100,000 tweets were collected from Harvard’s opensource data and recreated using Hydrate. This data is analyzed further using different Natural Language Processing (NLP) methodologies, such as termfrequency inverse document frequency (TF-IDF), lemmatizing, tokenizing, Latent Semantic Analysis (LSA) and Latent Dirichlet Allocation (LDA). Furthermore, the results of the LSA and LDA algorithms is reduced dimensional data that will be clustered using clustering algorithms HDBSCAN and K-Means for later comparison. Different methodologies are used to determine the optimal parameters for the algorithms. This is all done in the python programing language, as there are libraries for supporting this research, the most important being scikit-learn. The frequent words of each cluster will then be displayed and compared with factual data regarding the outbreak to discover if there are any correlations. The factual data is collected by World Health Organization (WHO) and is then visualized in graphs in ourworldindata.org. Correlations with the results are also looked for in news articles to find any significant moments to see if that affected the top words in the clustered data. The news articles with good timelines used for correlating incidents are that of NBC News and New York Times. The results show no direct correlations with the data reported by WHO, however looking into the timelines reported by news sources some correlation can be seen with the clustered data. Also, the combination of LDA and HDBSCAN yielded the most desireable results in comparison to the other combinations of the dimnension reductions and clustering. This was much due to the use of GridSearchCV on LDA to determine the ideal parameters for the LDA models on each dataset as well as how well HDBSCAN clusters its data in comparison to K-Means.

Page generated in 0.1643 seconds