1 |
Aspects of Metric Spaces in ComputationSkala, Matthew Adam January 2008 (has links)
Metric spaces, which generalise the properties of commonly-encountered physical and abstract spaces into a mathematical framework, frequently occur in computer science applications. Three major kinds of questions about metric spaces are considered here: the intrinsic dimensionality of a distribution, the maximum number of distance permutations, and the difficulty of reverse similarity search. Intrinsic dimensionality measures the tendency for points to be equidistant, which is diagnostic of high-dimensional spaces. Distance permutations describe the order in which a set of fixed sites appears while moving away from a chosen point; the number of distinct permutations determines the amount of storage space required by some kinds of indexing data structure. Reverse similarity search problems are constraint satisfaction problems derived from distance-based index structures. Their difficulty reveals details of the structure of the space. Theoretical and experimental results are given for these three questions in a wide range of metric spaces, with commentary on the consequences for computer science applications and additional related results where appropriate.
|
2 |
Aspects of Metric Spaces in ComputationSkala, Matthew Adam January 2008 (has links)
Metric spaces, which generalise the properties of commonly-encountered physical and abstract spaces into a mathematical framework, frequently occur in computer science applications. Three major kinds of questions about metric spaces are considered here: the intrinsic dimensionality of a distribution, the maximum number of distance permutations, and the difficulty of reverse similarity search. Intrinsic dimensionality measures the tendency for points to be equidistant, which is diagnostic of high-dimensional spaces. Distance permutations describe the order in which a set of fixed sites appears while moving away from a chosen point; the number of distinct permutations determines the amount of storage space required by some kinds of indexing data structure. Reverse similarity search problems are constraint satisfaction problems derived from distance-based index structures. Their difficulty reveals details of the structure of the space. Theoretical and experimental results are given for these three questions in a wide range of metric spaces, with commentary on the consequences for computer science applications and additional related results where appropriate.
|
3 |
Daugdarų dimensijos atpažinimo daugiamačiuose duomenyse metodai / Methods for recognition the intrinsic dimensionality of manifolds in the multidimensional dataMakovskaja, Katažina 27 June 2011 (has links)
Šio magistro darbo tikslas yra ištirti daugdarų dimensijos atpažinimo daugiamačiuose duomenyse metodus. Darbe buvo išnagrinėti 3 lokalūs dimensijos vertinimo metodai, koreliacinis, artimiausių kaimynų ir didžiausio tikėtinumo, ir su įvairių matmenų duomenimis atlikti tyrimai. Atstumai tarp kaimyninių taškų buvo skaičiuojami dviem būdais: Euklido ir geodeziniu. Atlikus tyrimus buvo padarytos tokios išvados: • Maksimalaus tikėtinumo vertinimo metodas tiksliausiai vertina vidinį matmenų skaičių dirbtiniams duomenims, taip pat realiems duomenims, kai tarp kaimynų skaičiuojami geodeziniai atstumai. • Koreliacinis vertinimo metodas dirbtiniams duomenims, ir realiems duomenims, kai tarp kaimynų skaičiuojami geodeziniai atstumai, vidinį matmenų skaičių nustato gerai, kai tarp kaimynų skaičiuojami Euklido atstumai, vidinį matmenų skaičių nustato labai blogai. • Koreliacinis metodas nėra geras, nes yra sunku parinkti tinkamus parametrus – spindulius. • Artimiausių kaimynų vertinimo metodas vidinį matmenų skaičių nustato gerai tik realiems duomenims, kai tarp kaimynų skaičiuojami geodeziniai atstumai, visais kitais nagrinėjamais atvejais – blogai. • Artimiausių kaimynų metodas vidinį matmenų skaičių nustato blogiausiai iš visų trijų nagrinėjamų metodų. / The objective of this master thesis is to explore different techniques of dataset intrinsic dimensionality estimation. The purpose was to examine three local estimators for intrinsic dimensionality: the correlation dimension estimator, the nearest neighbor dimension estimator, and the maximum likelihood estimator. Data with various intrinsic dimensionalities were examined. The distances between neighboring points were calculated using two metrics: Euclidean and Geodesic. The investigation revealed the following conclusions: • The results by maximum likelihood estimation method were closest to the real intrinsic dimensionality of an artificial data, as well as real data, in cases when distances between neighbors were calculated using Geodesic metrics. • The correlation dimension estimator showed good results for artificial and real data when distances between neighbors were estimated using Geodesic metrics. When distances between neighbors were calculated by Euclidean metrics, the intrinsic dimension estimation results were very bad. • Correlation dimension estimator is not a good method because it is difficult to select proper settings – radiuses. • The nearest neighbor estimation method works well only when determining intrinsic dimension of real data when distances between neighbors are estimated using Geodesic metrics. In all other cases it is almost useless. • The nearest neighbor method is the worst method amongst all.
|
4 |
Information Retrieval Performance Enhancement Using The Average Standard Estimator And The Multi-criteria Decision Weighted SetAhram, TAREQ 01 January 2008 (has links)
Information retrieval is much more challenging than traditional small document collection retrieval. The main difference is the importance of correlations between related concepts in complex data structures. These structures have been studied by several information retrieval systems. This research began by performing a comprehensive review and comparison of several techniques of matrix dimensionality estimation and their respective effects on enhancing retrieval performance using singular value decomposition and latent semantic analysis. Two novel techniques have been introduced in this research to enhance intrinsic dimensionality estimation, the Multi-criteria Decision Weighted model to estimate matrix intrinsic dimensionality for large document collections and the Average Standard Estimator (ASE) for estimating data intrinsic dimensionality based on the singular value decomposition (SVD). ASE estimates the level of significance for singular values resulting from the singular value decomposition. ASE assumes that those variables with deep relations have sufficient correlation and that only those relationships with high singular values are significant and should be maintained. Experimental results over all possible dimensions indicated that ASE improved matrix intrinsic dimensionality estimation by including the effect of both singular values magnitude of decrease and random noise distracters. Analysis based on selected performance measures indicates that for each document collection there is a region of lower dimensionalities associated with improved retrieval performance. However, there was clear disagreement between the various performance measures on the model associated with best performance. The introduction of the multi-weighted model and Analytical Hierarchy Processing (AHP) analysis helped in ranking dimensionality estimation techniques and facilitates satisfying overall model goals by leveraging contradicting constrains and satisfying information retrieval priorities. ASE provided the best estimate for MEDLINE intrinsic dimensionality among all other dimensionality estimation techniques, and further, ASE improved precision and relative relevance by 10.2% and 7.4% respectively. AHP analysis indicates that ASE and the weighted model ranked the best among other methods with 30.3% and 20.3% in satisfying overall model goals in MEDLINE and 22.6% and 25.1% for CRANFIELD. The weighted model improved MEDLINE relative relevance by 4.4%, while the scree plot, weighted model, and ASE provided better estimation of data intrinsic dimensionality for CRANFIELD collection than Kaiser-Guttman and Percentage of variance. ASE dimensionality estimation technique provided a better estimation of CISI intrinsic dimensionality than all other tested methods since all methods except ASE tend to underestimate CISI document collection intrinsic dimensionality. ASE improved CISI average relative relevance and average search length by 28.4% and 22.0% respectively. This research provided evidence supporting a system using a weighted multi-criteria performance evaluation technique resulting in better overall performance than a single criteria ranking model. Thus, the weighted multi-criteria model with dimensionality reduction provides a more efficient implementation for information retrieval than using a full rank model.
|
5 |
Prise en compte de l'environnement marin dans le processus de reconnaissance automatique de cibles sous-marines / Underwater environment characterization for automatic target recognitionPicard, Laurent 18 May 2017 (has links)
Au cours des dernières décennies, les avancées en termes de technologies robotiques sous-marines ont permis de réaliser des levés sur les fonds marins à l'aide de véhicules sous-marins autonomes (AUV). Ainsi, équiper un AUV avec un sonar latéral permet de scanner une vaste zone de manière rapide. Naturellement, les forces armées se sont intéressées à de tels dispositifs pour effectuer des missions de chasses aux mines rapides et sécurisées pour le facteur humain. Néanmoins, analyser des images sonar par un ordinateur plutôt que par un opérateur reste très complexe. En effet, les chaînes de reconnaissance automatique de cibles (ATR) doivent faire face à la variabilité de l'environnement marin et il a été démontré qu'une forte relation existe entre la texture d'une image et la difficulté d'y détecter des mines. Effectivement, sur des fonds fortement texturés, voire encombrés, les performances d'une chaîne ATR peuvent être très dégradées. Ainsi, intégrer des informations environnementales dans le processus apparaît comme une piste crédible pour améliorer ses performances. Ces travaux de thèse proposent d'étudier la manière de décrire cet environnement marin et comment l'intégrer dans un processus ATR. Pour répondre à ces défis, nous proposons tout d'abord une nouvelle représentation des images sonar basée sur l'utilisation du signal monogène. Ce dernier permet d'extraire des informations énergétiques, géométriques et structurelles sur la texture locale d'une image. La nature multi-échelle de cet outil permet de tenir compte de la variabilité en taille des structures sous-marines. Ensuite, le concept de dimension intrinsèque est introduit pour décrire une image sonar en termes d'homogénéité, d'anisotropie et de complexité. Ces trois descripteurs sont directement reliés à la difficulté de détection des mines sous-marines dans un fond texturé et permettent de réaliser une classification très précise des images sonar en fonds homogènes, anisotropes et complexes. De notre point de vue, la chasse aux mines sous-marines ne peut pas être réalisée de la même manière sur ces trois types de fond. En effet, leurs natures et caractéristiques propres mènent à des challenges variés pour le processus ATR. Pour le démontrer, nous proposons de réaliser un premier algorithme de détection spécifique, appliqué aux zones anisotropes, qui prend en considération les caractéristiques environnementales de ces régions. / In the last decades, advances in marine robot technology allowed to perform accurate seafloor surveys by means of autonomous underwater vehicles (AUVs). Thanks to a sidescan sonar carried by an AUV, a wide area can be scanned quickly. Navies are really interested in using such vehicles for underwater mine countermeasures (MCM) purposes, in order to perform mine hunting missions rapidly and safely for human operators. Nevertheless, on-board intelligence, which intends to replace human operator for sonar image analysis, remains challenging. Current automatic target recognition (ATR) processes have to cope with the variability of the seafloor. Indeed, there is a strong relationship between the seafloor appearance on sidescan sonar images and the underwater target detection rates. Thus, embed some environmental information in the ATR process seems to be a way for achieving more effective automatic target recognition. In this thesis, we address the problem of improving the ATR process by taking into account the local environment. To this end, a new representation of sonar images is considered by use of the theory of monogenic signal. It provides a pixelwise energetic, geometric and structural information into a multi-scale framework. Then a seafloor characterization is carried out by estimating the intrinsic dimensionality of the underwater structures so as to describe sonar images in terms of homogeneity, anisotropy and complexity. These three features are directly linked to the difficulty of detecting underwater mines and enable an accurate classification of sonar images into benign, rippled or complex areas. From our point of view, underwater mine hunting cannot be performed in the same way on these three seafloor types with various challenges from an ATR point of view. To proceed with this idea, we propose to design a first specific detection algorithm for sand rippled areas. This algorithm takes into consideration an environmental description of ripples which allow to outperform classic approaches in this type of seafloor.
|
Page generated in 0.1333 seconds