• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 7
  • 3
  • 2
  • 1
  • Tagged with
  • 27
  • 9
  • 4
  • 4
  • 4
  • 4
  • 4
  • 3
  • 3
  • 3
  • 3
  • 3
  • 3
  • 3
  • 2
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
11

Correlated data in multivariate analysis

Oliveira, Irene January 2003 (has links)
After presenting (PCA) Principal Component Analysis and its relationship with time series data sets, we describe most of the existing techniques in this field. Various techniques, e.g. Singular Spectrum Analysis, Hilbert EOF, Extended EOF or Multichannel Singular Spectrum Analysis (MSSA), Principal Oscillation Pattern Analysis (POP Analysis), can be used for such data. The way we use the matrix of data or the covariance or correlation matrix, makes each method different from the others. SSA may be considered as a PCA performed on a lagged versions of a single time series where we may decompose the original time series into some main components. Following SSA we have its multivariate version (MSSA) where we try to augment the initial matrix of data to get information on lagged versions of each variable (time series) and so past (or future) behaviour can be used to reanalyse the information between variables. In POP Analysis a linear system involving the vector field is analysed, x<sub>t+1</sub>=<i>A</i><i>x<sub>t</sub>+</i>n<sub>t</sub>, in order to “know” x<sub>t</sub> at time <i>t+1</i> given the information from time <i>t.</i> The matrix <i>A</i> is estimated by using not only the covariance matrix but also the matrix of covariances between the systems at the current time and at lag 1. In Hilbert EOF we try to get some (future) information from the internal correlation in each variable by using the Hilbert transform of each series in a augmented complex matrix with the data themselves in the real part and the Hilbert time series in the imaginary part X<sub>t</sub> + X<sub>t</sub><sup>H</sup>. In addition to all these ideas from the statistics and other literature we develop a new methodology as a modification of HEOF and POP Analysis, namely Hilbert Oscillation Patterns (HOP) Analysis or the related idea of Hilbert Canonical Correlation Analysis (HCCA), by using a system, <i><sub>x</sub><sup>H</sup><sub>t</sub> = Ax<sub>t</sub> + n<sub>t</sub>. </i>Theory and assumptions are presented and HOPS results will be related with the results extracted from a Canonical Correlation Analysis between the time series data matrix and its Hilbert transform. Some examples will be given to show the differences and similarities of the results of the HCCA technique with those from PCA, MSSA, HEOF and POPs. We also present PCA for time series as observations where a technique of linear algebra (PCA) becomes a problem in function analysis leading to Functional PCA (FPCA).  We also adapt PCA to allow for this and discuss the theoretical and practical behaviour of using PCA on the even part (EPCA) and odd part (OPCA) of the data, and its application in functional data. Comparisons will be made between PCA and this modification, for the reconstruction of data sets for which considerations of symmetry are especially relevant.
12

Multivariate multiscale complexity analysis

Ahmed, Mosabber Uddin January 2012 (has links)
Established dynamical complexity analysis measures operate at a single scale and thus fail to quantify inherent long-range correlations in real world data, a key feature of complex systems. They are designed for scalar time series, however, multivariate observations are common in modern real world scenarios and their simultaneous analysis is a prerequisite for the understanding of the underlying signal generating model. To that end, this thesis first introduces a notion of multivariate sample entropy and thus extends the current univariate complexity analysis to the multivariate case. The proposed multivariate multiscale entropy (MMSE) algorithm is shown to be capable of addressing the dynamical complexity of such data directly in the domain where they reside, and at multiple temporal scales, thus making full use of all the available information, both within and across the multiple data channels. Next, the intrinsic multivariate scales of the input data are generated adaptively via the multivariate empirical mode decomposition (MEMD) algorithm. This allows for both generating comparable scales from multiple data channels, and for temporal scales of same length as the length of input signal, thus, removing the critical limitation on input data length in current complexity analysis methods. The resulting MEMD-enhanced MMSE method is also shown to be suitable for non-stationary multivariate data analysis owing to the data-driven nature of MEMD algorithm, as non-stationarity is the biggest obstacle for meaningful complexity analysis. This thesis presents a quantum step forward in this area, by introducing robust and physically meaningful complexity estimates of real-world systems, which are typically multivariate, finite in duration, and of noisy and heterogeneous natures. This also allows us to gain better understanding of the complexity of the underlying multivariate model and more degrees of freedom and rigor in the analysis. Simulations on both synthetic and real world multivariate data sets support the analysis.
13

Projection based models for high dimensional data

McWilliams, Brian Victor Parulian January 2011 (has links)
In recent years, many machine learning applications have arisen which deal with the problem of finding patterns in high dimensional data. Principal component analysis (PCA) has become ubiquitous in this setting. PCA performs dimensionality reduction by estimating latent factors which minimise the reconstruction error between the original data and its low-dimensional projection. We initially consider a situation where influential observations exist within the dataset which have a large, adverse affect on the estimated PCA model. We propose a measure of “predictive influence” to detect these points based on the contribution of each point to the leave-one-out reconstruction error of the model using an analytic PRedicted REsidual Sum of Squares (PRESS) statistic. We then develop a robust alternative to PCA to deal with the presence of influential observations and outliers which minimizes the predictive reconstruction error. In some applications there may be unobserved clusters in the data, for which fitting PCA models to subsets of the data would provide a better fit. This is known as the subspace clustering problem. We develop a novel algorithm for subspace clustering which iteratively fits PCA models to subsets of the data and assigns observations to clusters based on their predictive influence on the reconstruction error. We study the convergence of the algorithm and compare its performance to a number of subspace clustering methods on simulated data and in real applications from computer vision involving clustering object trajectories in video sequences and images of faces. We extend our predictive clustering framework to a setting where two high-dimensional views of data have been obtained. Often, only either clustering or predictive modelling is performed between the views. Instead, we aim to recover clusters which are maximally predictive between the views. In this setting two block partial least squares (TB-PLS) is a useful model. TB-PLS performs dimensionality reduction in both views by estimating latent factors that are highly predictive. We fit TB-PLS models to subsets of data and assign points to clusters based on their predictive influence under each model which is evaluated using a PRESS statistic. We compare our method to state of the art algorithms in real applications in webpage and document clustering and find that our approach to predictive clustering yields superior results. Finally, we propose a method for dynamically tracking multivariate data streams based on PLS. Our method learns a linear regression function from multivariate input and output streaming data in an incremental fashion while also performing dimensionality reduction and variable selection. Moreover, the recursive regression model is able to adapt to sudden changes in the data generating mechanism and also identifies the number of latent factors. We apply our method to the enhanced index tracking problem in computational finance.
14

A non-parametric procedure to estimate a linear discriminant function with an application to credit scoring

Voorduin, Raquel January 2004 (has links)
The present work studies the application of two group discriminant analysis in the field of credit scoring. The view here given provides a completely different approach to how this problem is usually targeted. Credit scoring is widely used among financial institutions and is performed in a number of ways, depending on a wide range of factors, which include available information, support data bases, and informatic resources. Since each financial institution has its own methods of measuring risk, the ways in which an applicant is evaluated for the concession of credit for a particular product are at least as many as credit concessioners. However, there exist certain standard procedures for different products. For example, in the credit card business, when databases containing applicant information are available, usually credit score cards are constructed. These score cards provide an aid to qualify the applicant and decide if he or she represents a high risk for the institution or, on the contrary, a good investment. Score cards are generally used in conjunction with other criteria, such as the institution's own policies. In building score cards, generally parametric regression based procedures are used, where the assumption of an underlying model generating the data has to be made. Another aspect is that, in general, score cards are built taking into consideration only the probability that a particular applicant will not default. In this thesis, the objective will be to present a method of calculating a risk score that, does not depend on the actual process generating the data and that takes into account the costs and profits related to accepting a particular applicant. The ultimate objective of the financial institution should be to maximise profit and this view is a fundamental part of the procedure presented here.
15

Some aspects of longitudinal data analysis / Peter J. Ricci.

Ricci, Peter J. (Peter Joseph) January 1994 (has links)
Bibliography: leaves 173-188. / vii, 188 leaves : ill. ; 30 cm. / Title page, contents and abstract only. The complete thesis in print form is available from the University Library. / Thesis (Ph.D.)--University of Adelaide, Dept. of Statistics, 1994
16

Fitting distances and dimension reduction methods with applications / Méthodes d’ajustement et de réduction de dimension avec applications

Alawieh, Hiba 13 March 2017 (has links)
Dans la plupart des études, le nombre de variables peut prendre des valeurs élevées ce qui rend leur analyse et leur visualisation assez difficile. Cependant, plusieurs méthodes statistiques ont été conçues pour réduire la complexité de ces données et permettant ainsi une meilleure compréhension des connaissances disponibles dans ces données. Dans cette thèse, notre objectif est de proposer deux nouvelles méthodes d’analyse des données multivariées intitulées en anglais : " Multidimensional Fitting" et "Projection under pairwise distance control". La première méthode est une dérivée de la méthode de positionnement multidimensionnelle dont l’application nécessite la disponibilité des deux matrices décrivant la même population : une matrice de coordonnées et une matrice de distances et l’objective est de modifier la matrice des coordonnées de telle sorte que les distances calculées sur cette matrice soient les plus proches possible des distances observées sur la matrice de distances. Nous avons élargi deux extensions de cette méthode : la première en pénalisant les vecteurs de modification des coordonnées et la deuxième en prenant en compte les effets aléatoires qui peuvent intervenir lors de la modification. La deuxième méthode est une nouvelle méthode de réduction de dimension basée sur la projection non linéaire des données dans un espace de dimension réduite et qui tient en compte la qualité de chaque point projeté pris individuellement dans l’espace réduit. La projection des points s’effectue en introduisant des variables supplémentaires, qui s’appellent "rayons", et indiquent dans quelle mesure la projection d’un point donné est précise. / In various studies the number of variables can take high values which makes their analysis and visualization quite difficult. However, several statistical methods have been developed to reduce the complexity of these data, allowing a better comprehension of the knowledge available in these data. In this thesis, our aim is to propose two new methods of multivariate data analysis called: " Multidimensional Fitting" and "Projection under pairwise distance control". The first method is a derivative of multidimensional scaling method (MDS) whose the application requires the availability of two matrices describing the same population: a coordinate matrix and a distance matrix and the objective is to modify the coordinate matrix such that the distances calculated on the modified matrix are as close as possible to the distances observed on the distance matrix. Two extensions of this method have been extended: the first by penalizing the modification vectors of the coordinates and the second by taking into account the random effects that may occur during the modification. The second method is a new method of dimensionality reduction techniques based on the non-linearly projection of the points in a reduced space by taking into account the projection quality of each projected point taken individually in the reduced space. The projection of the points is done by introducing additional variables, called "radii", and indicate to which extent the projection of each point is accurate.
17

Πολλαπλό γραμμικό μοντέλο παλινδρόμησης : στατιστικά συμπεράσματα και εκτιμήσεις

Κιουφεντζή, Όλγα 27 August 2008 (has links)
Η εργασία μου αναφέρεται στην ανάλυση της γραμμικής παλινδρόμησης και στην περαιτέρω ανάλυση των υποθέσεων του γραμμικού μοντέλου παλινδρόμησης σε συνδυασμό με παραδείγματα από την οικονομική θεωρία. Περιέχει ακόμη μια εφαρμογή για την ελαχιστοποίηση του κόστους παραγωγής ηλεκτρικής ενέργειας κάποιων επιχειρήσεων. / My paper concerns in Linear Regression analysis and the further analysis of the hypothesis of the Linear Regression model accompanied with many examples of econometric theory. Also, it concludes an application of minimization of the production cost of electric power of a specific number of companies.
18

Πολυμεταβλητή στατιστική ανάλυση

Καλκούνου, Δήμητρα 02 April 2014 (has links)
Τις τελευταίες δεκαετίες κυρίως λόγω της εμφάνισης των ηλεκτρονικών υπολογιστών έχει μεταβληθεί η φυσιογνωμία της στατιστικής επιστήμης και οι εφαρμογές της έχουν επεκταθεί σε πολλούς τομείς, αφού έγινε εύκολη και σύντομη επεξεργασία μεγάλου όγκου δεδομένων. Αυτές οι νέες συνθήκες στατιστικών αναλύσεων οδήγησαν τους στατιστικούς στην εξέλιξη πολλών θεωρητικών μεθόδων. Μια μεγάλη ενότητα αυτών των μεθόδων είναι η Πολυμεταβλητή Στατιστική Ανάλυση, που αποτελεί μια εξαιρετικά ενδιαφέρουσα κατεύθυνση της στατιστικής επιστήμης. Για να έχουμε το τελικό προιόν από μια έρευνα πρέπει να προκύψει μετά από ένα σύνολο μετρήσεων που θα γίνει σε ένα σύνολο πειραμάτων. Με τις συνήθεις όμως στατιστικές αναλύσεις εξετάζεται κάθε φορά και όχι η συνισταμένη δράση αυτών ταυτόχρονα. Έτσι πρέπει να καταφύγουμε στην Πολυμεταβλητή Ανάλυση. Η Πολυμεταβλητή Ανάλυση ασχολείται με στατιστικές μεθόδους συλλογής, περιγραφής και ανάλυσης δεδομένων που αποτελούνται από μετρήσεις πολλών μεταβλητών σε ένα πλήθος ατόμων ή γενικότερων πειραματικών μονάδων. Σε αυτήν την εργασία θα δούμε τις τεχνικές και τα αποτελέσματα, ώστε να αναπτύξουμε τεχνικές για την ανάλυση δεδομένων. Ο στόχος μου θα είναι να παρουσιάσω μια πλήρη στατιστική ανάλυση του στοιχείου που βασίζεται σε ταυτόχρονες δηλώσεις εμπιστοσύνης. Ένα από τα κεντρικά μηνύματα της πολυμεταβλητής ανάλυσης είναι ότι οι p - μεταβλητές πρέπει να αναλυθούν από κοινού. Επομένως θα πρέπει να χρησιμοποιήσουμε πολυμεταβλητούς ελέγχους υποθλεσεων, οι οποίοι θα εξετάζουν όλο το διάνυσμα κάθε παρατήρησης και όχι μεμονωμένες μεταβλητές. Επιπλέον, θα δούμε τη μέθοδο της Πολυμεταβλητής Ανάλυσης Διακύμανσης, η οποία αποτελεί γενίκευση της Μονομεταβλητής Ανάλυσης Διακύμανσης, όταν εξετάζουμε περισσότερες από μια μεταβλητές. Άποτελεί λοιπόν μια μέθοδο ελέγχου του αν οι μέσοι δυο ή περισσοτέρων ομάδων διαφέρουν και γενικεύοντας σε περιπτώσεις πολλών παραγόντων, αν οι παράγοντες αυτοί επιδρούν στη μέση τιμή ( μιλώντας πια για διάνυσμα μέσων τιμών. Πολλά πράγματα που ισχύουν στη Μονομεταβλητή περίπτωση μεταφέρονται με ανάλογο τρόπο και στην Πολυμεταβλητή περίπτωση, όπως για παράδειγμα η διάσπαση της συνολικής διακύμανσης στη μεταξύ των ομάδων και εντός των ομάδων διακύμανση. / --
19

Μη γραμμική παλινδρόμηση

Τόλιας, Γεώργιος 28 August 2008 (has links)
Μελέτη μη γραμμικών μοντέλων παλινδρόμησης(λογιστικό, εκθετικό, Poisson,γενικευμένα γραμμικά μοντέλα) όσον αφορά διαστημα εμπιστοσύνης, έλεγχο υποθέσεων και καλή προσαρμογή. / Analysis of nonlinear regression models (logistic, exponential, Poisson, generalized linear models) regarding confidence interval estimation, tests and good fit.
20

Παραγοντική ανάλυση και ανάλυση σε κύριες συνιστώσες

Παπαγεωργίου, Ανδρέας 09 March 2011 (has links)
Η ανάλυση σε κύριες συνιστώσες είναι μια τεχνική μείωσης του δείγματος. Χρησιμοποιείται όταν έχουμε ψηλά συσχετισμένες μεταβλητές. Μειώνει τον αριθμό των αρχικών μεταβλητών σε ένα μικρότερο αριθμό κύριων συνιστωσών που μετρούν τη μεγαλύτερη δυνατή διασπορά του δείγματος. Είναι μια διαδικασία που εφαρμόζεται για μεγάλα δείγματα. Η παραγοντική ανάλυση είναι μια τεχνική μείωσης των μεταβλητών του δείγματος η οποία αναγνωρίζει τον αριθμό των λανθάνουσων δομών και δημιουργεί μια δομή, ένα νέο σύνολο μεταβλητών, τους κοινούς παράγοντες που ερμηνεύουν το δείγμα. Προϋποθέτει μια δομή από μη παρατηρήσιμες μεταβλητές που δεν μπορούν να μετρηθούν άμεσα. Εκτιμά τους παράγοντες εκείνους που έχουν επίδραση και αντανακλούν τις αρχικές μεταβλητές. Επιτρέπει στον ερευνητή να περιγράψει αλλά ακόμη και να αναγνωρίσει τους παράγοντες εκείνους που παριστάνουν το δείγμα. Συμπεριλαμβάνει τους ειδικούς παράγοντες (ειδικά σφάλματα) που οφείλονται για την αναξιοπιστία των μετρήσεων. / Principal component analysis is a technique for reducing the sample, used when we have high correlated variables. It reduces the number of input variables into a smaller number of key components that measure the maximum sample variance. It is a process applied to large samples. Factor analysis is a technique to reduce the variables in the sample that identifies the number of latent structures and creates a structure, a new set of variables, called common factors explaining the sample. Implies a structure of non-observable variables that can not be measured directly. It considers the factors that affect and reflect the original variables. It allows the researcher to describe and even to identify the factors that represent the sample. Includes special factors (specific errors) due to unreliability of measurement.

Page generated in 0.0391 seconds