151 |
Seleção supervisionada de características por ranking para processar consultas por similaridade em imagens médicas / Supervised feature selection by ranking to process similarity queries in medical imagesMamani, Gabriel Efrain Humpire 05 December 2012 (has links)
Obter uma representação sucinta e representativa de imagens médicas é um desafio que tem sido perseguido por pesquisadores da área de processamento de imagens médicas com o propósito de apoiar o diagnóstico auxiliado por computador (Computer Aided Diagnosis - CAD). Os sistemas CAD utilizam algoritmos de extração de características para representar imagens, assim, diferentes extratores podem ser avaliados. No entanto, as imagens médicas contêm estruturas internas que são importantes para a identificação de tecidos, órgãos, malformações ou doenças. É usual que um grande número de características sejam extraídas das imagens, porém esse fato que poderia ser benéfico, pode na realidade prejudicar o processo de indexação e recuperação das imagens com problemas como a maldição da dimensionalidade. Assim, precisa-se selecionar as características mais relevantes para tornar o processo mais eficiente e eficaz. Esse trabalho desenvolveu o método de seleção supervisionada de características FSCoMS (Feature Selection based on Compactness Measure from Scatterplots) para obter o ranking das características, contemplando assim, o que é necessário para o tipo de imagens médicas sob análise. Dessa forma, produziu-se vetores de características mais enxutos e eficientes para responder consultas por similaridade. Adicionalmente, foi desenvolvido o extrator de características k-Gabor que extrai características por níveis de cinza, ressaltando estruturas internas das imagens médicas. Os experimentos realizados foram feitos com quatro bases de imagens médicas do mundo real, onde o k-Gabor sobressai pelo desempenho na recuperação por similaridade de imagens médicas, enquanto o FSCoMS reduz a redundância das características para obter um vetor de características menor do que os métodos de seleção de características convencionais e ainda com um maior desempenho em recuperação de imagens / Obtaining a representative and succinct description of medical images is a challenge that has been pursued by researchers in the area of medical image processing to support Computer-Aided Diagnosis (CAD). CAD systems use feature extraction algorithms to represent images. Thus, different extractors can be evaluated. However, medical images contain important internal structures that allow identifying tissues, organs, deformations and diseases. It is usual that a large number of features are extracted the images. Nevertheless, what appears to be beneficial actually impairs the process of indexing and retrieval of images, revealing problems such as the curse of dimensionality. Thus, it is necessary to select the most relevant features to make the process more efficient and effective. This dissertation developed a supervised feature selection method called FSCoMS (Feature Selection based on Compactness Measure from Scatterplots) in order to obtain a ranking of features, suitable for medical image analysis. Our method FSCoMS had generated shorter and efficient feature vectors to answer similarity queries. Additionally, the k-Gabor feature extractor was developed, which extracts features by gray levels, highlighting internal structures of medical images. The experiments performed were performed on four real world medical datasets. Results have shown that the k-Gabor boosts the retrieval performance, whereas the FSCoMS reduces the subsets redundancy to produce a more compact feature vector than the conventional feature selection methods and even with a higher performance in image retrieval
|
152 |
Approches complémentaires pour une classification efficace des textures / Complementary Approaches for Efficient Texture ClassificationNguyen, Vu Lam 29 May 2018 (has links)
Dans cette thèse, nous nous intéressons à la classification des images de textures avec aucune connaissance a priori sur les conditions de numérisation. Cette classification selon des définitions pré-établies de matériaux repose sur des algorithmes qui extraient des descripteurs visuels.A cette fin, nous introduisons tout d'abord une variante de descripteurs par motifs binaires locaux (Local Binary Patterns).Dans cette proposition, une approche statistique est suivie pour représenter les textures statiques.Elle incorpore la quantité d'information complémentaire des niveaux de gris des images dans des opérateurs basés LBP.Nous avons nommé cette nouvelle méthode "Completed Local Entropy Binary Patterns (CLEBP)".CLEBP capture la distribution des relations entre les mesures statistiques des données aléatoires d'une image, l'ensemble étant calculé pour tous les pixels au sein d'une structure locale.Sans la moindre étape préalable d'apprentissage, ni de calibration automatique, les descriptions CLEBP contiennent à la fois des informations locales et globales des textures, tout en étant robustes aux variations externes.En outre, nous utilisons le filtrage inspiré par la biologie, ou biologically-inspired filtering (BF), qui simule la rétine humaine via une phase de prétraitement.Nous montrons que notre approche est complémentaire avec les LBP conventionnels, et les deux combinés offrent de meilleurs résultats que l'une des deux méthodes seule.Les résultats expérimentaux sur quatre bases de texture, Outex, KTH-TIPS-2b, CURet, et UIUC montrent que notre approche est plus performante que les méthodes actuelles.Nous introduisons également un cadre formel basé sur une combinaison de descripteurs pour la classification de textures.Au sein de ce cadre, nous combinons des descripteurs LBP invariants en rotation et en échelle, et de faible dimension, avec les réseaux de dispersion, ou scattering networks (ScatNet).Les résultats expérimentaux montrent que l'approche proposée est capable d'extraire des descripteurs riches à de nombreuses orientations et échelles.Les textures sont modélisées par une concaténation des codes LBP et valeurs moyennes des coefficients ScatNet.Nous proposons également d'utiliser le filtrage inspiré par la biologie, ou biologically-inspired filtering (BF), pour améliorer la resistance des descripteurs LBP.Nous démontrons par l'expérience que ces nouveaux descripteurs présentent de meilleurs résultats que les approches usuelles de l'état de l'art.Ces résultats sont obtenus sur des bases réelles qui contiennent de nombreuses avec des variations significatives.Nous proposons aussi un nouveau réseau conçu par l'expertise appelé réseaux de convolution normalisée, ou normalized convolution network.Celui-ci est inspiré du modèle des ScatNet, auquel deux modifications ont été apportées.La première repose sur l'utilisation de la convolution normalisé en lieu et place de la convolution standard.La deuxième propose de remplacer le calcul de la valeur moyenne des coefficients du réseaux par une agrégation avec la méthode des vecteurs de Fisher.Les expériences montrent des résultats compétitifs sur de nombreuses bases de textures.Enfin, tout au long de cette thèse, nous avons montré par l'expérience qu'il est possible d'obtenir de très bons résultats de classification en utilisant des techniques peu coûteuses en ressources. / This thesis investigates the complementary approaches for classifying texture images.The thesis begins by proposing a Local Binary Pattern (LBP) variant for efficient texture classification.In this proposed method, a statistical approach to static texture representation is developed. It incorporates the complementary quantity information of image intensity into the LBP-based operators. We name our LBP variant `the completed local entropy binary patterns (CLEBP)'. CLEBP captures the distribution of the relationships between statistical measures of image data randomness, calculated over all pixels within a local structure. Without any pre-learning process and any additional parameters to be learned, the CLEBP descriptors convey both global and local information about texture while being robust to external variations. Furthermore, we use biologically-inspired filtering (BF) which simulates the performance of human retina as preprocessing technique. It is shown that our approach and the conventional LBP have the complementary strength and that by combining these algorithms, one obtains better results than either of them considered separately. Experimental results on four large texture databases show that our approach is more efficient than contemporary ones.We then introduce a framework which is a feature combination approach to the problem of texture classification. In this framework, we combine Local Binary Pattern (LBP) features with low dimensional, rotation and scale invariant counterparts, the handcrafted scattering network (ScatNet). The experimental results show that the proposed approach is capable of extracting rich features at multiple orientations and scales. Textures are modeled by concatenating histogram of LBP codes and the mean values of ScatNet coefficients. Then, we propose using Biological Inspired Filtering (BF) preprocessing technique to enhance the robustness of LBP features. We have demonstrated by experiment that the novel features extracted from the proposed framework achieve superior performance as compared to their traditional counterparts when benchmarked on real-world databases containing many classes with significant imaging variations.In addition, we propose a novel handcrafted network called normalized convolution network. It is inspired by the model of ScatNet with two important modification. Firstly, normalized convolution substitute for standard convolution in ScatNet model to extract richer texture features. Secondly, Instead of using mean values of the network coefficients, Fisher vector is exploited as an aggregation method. Experiments show that our proposed network gains competitive classification results on many difficult texture benchmarks.Finally, throughout the thesis, we have proved by experiments that the proposed approaches gain good classification results with low resource required.
|
153 |
Effective Linear-Time Feature SelectionPradhananga, Nripendra January 2007 (has links)
The classification learning task requires selection of a subset of features to represent patterns to be classified. This is because the performance of the classifier and the cost of classification are sensitive to the choice of the features used to construct the classifier. Exhaustive search is impractical since it searches every possible combination of features. The runtime of heuristic and random searches are better but the problem still persists when dealing with high-dimensional datasets. We investigate a heuristic, forward, wrapper-based approach, called Linear Sequential Selection, which limits the search space at each iteration of the feature selection process. We introduce randomization in the search space. The algorithm is called Randomized Linear Sequential Selection. Our experiments demonstrate that both methods are faster, find smaller subsets and can even increase the classification accuracy. We also explore the idea of ensemble learning. We have proposed two ensemble creation methods, Feature Selection Ensemble and Random Feature Ensemble. Both methods apply a feature selection algorithm to create individual classifiers of the ensemble. Our experiments have shown that both methods work well with high-dimensional data.
|
154 |
Probabilistic Shape Parsing and Action Recognition Through Binary Spatio-Temporal Feature DescriptionWhiten, Christopher J. 09 April 2013 (has links)
In this thesis, contributions are presented in the areas of shape parsing for view-based object recognition and spatio-temporal feature description for action recognition. A probabilistic model for parsing shapes into several distinguishable parts for accurate shape recognition is presented. This approach is based on robust geometric features that permit high recognition accuracy.
As the second contribution in this thesis, a binary spatio-temporal feature descriptor is presented. Recent work shows that binary spatial feature descriptors are effective for increasing the efficiency of object recognition, while retaining comparable performance to state of the art descriptors. An extension of these approaches to action recognition is presented, facilitating huge gains in efficiency due to the computational advantage of computing a bag-of-words representation with the Hamming distance. A scene's motion and appearance is encoded with a short binary string. Exploiting the binary makeup of this descriptor greatly increases the efficiency while retaining competitive recognition performance.
|
155 |
Visual Stereo Odometry for Indoor PositioningJohansson, Fredrik January 2012 (has links)
In this master thesis a visual odometry system is implemented and explained. Visual odometry is a technique, which could be used on autonomous vehicles to determine its current position and is preferably used indoors when GPS is notworking. The only input to the system are the images from a stereo camera and the output is the current location given in relative position. In the C++ implementation, image features are found and matched between the stereo images and the previous stereo pair, which gives a range of 150-250 verified feature matchings. The image coordinates are triangulated into a 3D-point cloud. The distance between two subsequent point clouds is minimized with respect to rigid transformations, which gives the motion described with six parameters, three for the translation and three for the rotation. Noise in the image coordinates gives reconstruction errors which makes the motion estimation very sensitive. The results from six experiments show that the weakness of the system is the ability to distinguish rotations from translations. However, if the system has additional knowledge of how it is moving, the minimization can be done with only three parameters and the system can estimate its position with less than 5 % error.
|
156 |
Feature Ranking for Text ClassifiersMakrehchi, Masoud January 2007 (has links)
Feature selection based on feature ranking has received much
attention by researchers in the field of text classification. The
major reasons are their scalability, ease of use, and fast computation. %,
However, compared to the search-based feature selection methods such
as wrappers and filters, they suffer from poor performance. This is
linked to their major deficiencies, including: (i) feature ranking
is problem-dependent; (ii) they ignore term dependencies, including
redundancies and correlation; and (iii) they usually fail in
unbalanced data.
While using feature ranking methods for dimensionality reduction, we
should be aware of these drawbacks, which arise from the function of
feature ranking methods. In this thesis, a set of solutions is
proposed to handle the drawbacks of feature ranking and boost their
performance. First, an evaluation framework called feature
meta-ranking is proposed to evaluate ranking measures. The framework
is based on a newly proposed Differential Filter Level Performance
(DFLP) measure. It was proved that, in ideal cases, the performance
of text classifier is a monotonic, non-decreasing function of the
number of features. Then we theoretically and empirically validate
the effectiveness of DFLP as a meta-ranking measure to evaluate and
compare feature ranking methods. The meta-ranking framework is also
examined by a stopword extraction problem. We use the framework to
select appropriate feature ranking measure for building
domain-specific stoplists. The proposed framework is evaluated by
SVM and Rocchio text classifiers on six benchmark data. The
meta-ranking method suggests that in searching for a proper feature
ranking measure, the backward feature ranking is as important as the
forward one.
Second, we show that the destructive effect of term redundancy gets
worse as we decrease the feature ranking threshold. It implies that
for aggressive feature selection, an effective redundancy reduction
should be performed as well as feature ranking. An algorithm based
on extracting term dependency links using an information theoretic
inclusion index is proposed to detect and handle term dependencies.
The dependency links are visualized by a tree structure called a
term dependency tree. By grouping the nodes of the tree into two
categories, including hub and link nodes, a heuristic algorithm is
proposed to handle the term dependencies by merging or removing the
link nodes. The proposed method of redundancy reduction is evaluated
by SVM and Rocchio classifiers for four benchmark data sets.
According to the results, redundancy reduction is more effective on
weak classifiers since they are more sensitive to term redundancies.
It also suggests that in those feature ranking methods which compact
the information in a small number of features, aggressive feature
selection is not recommended.
Finally, to deal with class imbalance in feature level using ranking
methods, a local feature ranking scheme called reverse
discrimination approach is proposed. The proposed method is applied
to a highly unbalanced social network discovery problem. In this
case study, the problem of learning a social network is translated
into a text classification problem using newly proposed actor and
relationship modeling. Since social networks are usually sparse
structures, the corresponding text classifiers become highly
unbalanced. Experimental assessment of the reverse discrimination
approach validates the effectiveness of the local feature ranking
method to improve the classifier performance when dealing with
unbalanced data. The application itself suggests a new approach to
learn social structures from textual data.
|
157 |
Efficient Reasoning Techniques for Large Scale Feature ModelsMendonca, Marcilio January 2009 (has links)
In Software Product Lines (SPLs), a feature model can be used to represent the
similarities and differences within a family of software systems. This allows
describing the systems derived from the product line as a unique combination of
the features in the model. What makes feature models particularly appealing is
the fact that the constraints in the model prevent incompatible features from
being part of the same product.
Despite the benefits of feature models, constructing and maintaining these models
can be a laborious task especially in product lines with a large number of
features and constraints. As a result, the study of automated techniques to
reason on feature models has become an important research topic in the SPL
community in recent years. Two techniques, in particular, have significant
appeal for researchers: SAT solvers and Binary Decision Diagrams (BDDs). Each
technique has been applied successfully for over four decades now to tackle
many practical combinatorial problems in various domains. Currently, several
approaches have proposed the compilation of feature models to specific logic
representations to enable the use of SAT solvers and BDDs.
In this thesis, we argue that several critical issues related to the use of SAT
solvers and BDDs have been consistently neglected. For instance, satisfiability
is a well-known NP-complete problem which means that, in theory, a SAT solver
might be unable to check the satisfiability of a feature model in a feasible
amount of time. Similarly, it is widely known that the size of BDDs can become
intractable for large models. At the same time, we currently do not know
precisely whether these are real issues when feature models, especially large
ones, are compiled to SAT and BDD representations.
Therefore, in our research we provide a significant step forward in the
state-of-the-art by examining deeply many relevant properties of the feature
modeling domain and the mechanics of SAT solvers and BDDs and the sensitive
issues related to these techniques when applied in that domain. Specifically, we
provide more accurate explanations for the space and/or time (in)tractability of
these techniques in the feature modeling domain, and enhance the algorithmic
performance of these techniques for reasoning on feature models. The
contributions of our work include the proposal of novel heuristics to reduce the
size of BDDs compiled from feature models, several insights on the construction
of efficient domain-specific reasoning algorithms for feature models, and
empirical studies to evaluate the efficiency of SAT solvers in handling very
large feature models.
|
158 |
Feature Ranking for Text ClassifiersMakrehchi, Masoud January 2007 (has links)
Feature selection based on feature ranking has received much
attention by researchers in the field of text classification. The
major reasons are their scalability, ease of use, and fast computation. %,
However, compared to the search-based feature selection methods such
as wrappers and filters, they suffer from poor performance. This is
linked to their major deficiencies, including: (i) feature ranking
is problem-dependent; (ii) they ignore term dependencies, including
redundancies and correlation; and (iii) they usually fail in
unbalanced data.
While using feature ranking methods for dimensionality reduction, we
should be aware of these drawbacks, which arise from the function of
feature ranking methods. In this thesis, a set of solutions is
proposed to handle the drawbacks of feature ranking and boost their
performance. First, an evaluation framework called feature
meta-ranking is proposed to evaluate ranking measures. The framework
is based on a newly proposed Differential Filter Level Performance
(DFLP) measure. It was proved that, in ideal cases, the performance
of text classifier is a monotonic, non-decreasing function of the
number of features. Then we theoretically and empirically validate
the effectiveness of DFLP as a meta-ranking measure to evaluate and
compare feature ranking methods. The meta-ranking framework is also
examined by a stopword extraction problem. We use the framework to
select appropriate feature ranking measure for building
domain-specific stoplists. The proposed framework is evaluated by
SVM and Rocchio text classifiers on six benchmark data. The
meta-ranking method suggests that in searching for a proper feature
ranking measure, the backward feature ranking is as important as the
forward one.
Second, we show that the destructive effect of term redundancy gets
worse as we decrease the feature ranking threshold. It implies that
for aggressive feature selection, an effective redundancy reduction
should be performed as well as feature ranking. An algorithm based
on extracting term dependency links using an information theoretic
inclusion index is proposed to detect and handle term dependencies.
The dependency links are visualized by a tree structure called a
term dependency tree. By grouping the nodes of the tree into two
categories, including hub and link nodes, a heuristic algorithm is
proposed to handle the term dependencies by merging or removing the
link nodes. The proposed method of redundancy reduction is evaluated
by SVM and Rocchio classifiers for four benchmark data sets.
According to the results, redundancy reduction is more effective on
weak classifiers since they are more sensitive to term redundancies.
It also suggests that in those feature ranking methods which compact
the information in a small number of features, aggressive feature
selection is not recommended.
Finally, to deal with class imbalance in feature level using ranking
methods, a local feature ranking scheme called reverse
discrimination approach is proposed. The proposed method is applied
to a highly unbalanced social network discovery problem. In this
case study, the problem of learning a social network is translated
into a text classification problem using newly proposed actor and
relationship modeling. Since social networks are usually sparse
structures, the corresponding text classifiers become highly
unbalanced. Experimental assessment of the reverse discrimination
approach validates the effectiveness of the local feature ranking
method to improve the classifier performance when dealing with
unbalanced data. The application itself suggests a new approach to
learn social structures from textual data.
|
159 |
Efficient Reasoning Techniques for Large Scale Feature ModelsMendonca, Marcilio January 2009 (has links)
In Software Product Lines (SPLs), a feature model can be used to represent the
similarities and differences within a family of software systems. This allows
describing the systems derived from the product line as a unique combination of
the features in the model. What makes feature models particularly appealing is
the fact that the constraints in the model prevent incompatible features from
being part of the same product.
Despite the benefits of feature models, constructing and maintaining these models
can be a laborious task especially in product lines with a large number of
features and constraints. As a result, the study of automated techniques to
reason on feature models has become an important research topic in the SPL
community in recent years. Two techniques, in particular, have significant
appeal for researchers: SAT solvers and Binary Decision Diagrams (BDDs). Each
technique has been applied successfully for over four decades now to tackle
many practical combinatorial problems in various domains. Currently, several
approaches have proposed the compilation of feature models to specific logic
representations to enable the use of SAT solvers and BDDs.
In this thesis, we argue that several critical issues related to the use of SAT
solvers and BDDs have been consistently neglected. For instance, satisfiability
is a well-known NP-complete problem which means that, in theory, a SAT solver
might be unable to check the satisfiability of a feature model in a feasible
amount of time. Similarly, it is widely known that the size of BDDs can become
intractable for large models. At the same time, we currently do not know
precisely whether these are real issues when feature models, especially large
ones, are compiled to SAT and BDD representations.
Therefore, in our research we provide a significant step forward in the
state-of-the-art by examining deeply many relevant properties of the feature
modeling domain and the mechanics of SAT solvers and BDDs and the sensitive
issues related to these techniques when applied in that domain. Specifically, we
provide more accurate explanations for the space and/or time (in)tractability of
these techniques in the feature modeling domain, and enhance the algorithmic
performance of these techniques for reasoning on feature models. The
contributions of our work include the proposal of novel heuristics to reduce the
size of BDDs compiled from feature models, several insights on the construction
of efficient domain-specific reasoning algorithms for feature models, and
empirical studies to evaluate the efficiency of SAT solvers in handling very
large feature models.
|
160 |
Efficient case-based reasoning through feature weighting, and its application in protein crystallographyGopal, Kreshna 02 June 2009 (has links)
Data preprocessing is critical for machine learning, data mining, and pattern
recognition. In particular, selecting relevant and non-redundant features in highdimensional
data is important to efficiently construct models that accurately describe the
data. In this work, I present SLIDER, an algorithm that weights features to reflect
relevance in determining similarity between instances. Accurate weighting of features
improves the similarity measure, which is useful in learning algorithms like nearest
neighbor and case-based reasoning. SLIDER performs a greedy search for optimum
weights in an exponentially large space of weight vectors. Exhaustive search being
intractable, the algorithm reduces the search space by focusing on pivotal weights at
which representative instances are equidistant to truly similar and different instances in
Euclidean space. SLIDER then evaluates those weights heuristically, based on
effectiveness in properly ranking pre-determined matches of a set of cases, relative to
mismatches.
I analytically show that by choosing feature weights that minimize the mean rank of
matches relative to mismatches, the separation between the distributions of Euclidean
distances for matches and mismatches is increased. This leads to a better distance metric,
and consequently increases the probability of retrieving true matches from a database. I
also discuss how SLIDER is used to improve the efficiency and effectiveness of case
retrieval in a case-based reasoning system that automatically interprets electron density
maps to determine the three-dimensional structures of proteins. Electron density patterns
for regions in a protein are represented by numerical features, which are used in a distance metric to efficiently retrieve matching patterns by searching a large database.
These pre-selected cases are then evaluated by more expensive methods to identify truly
good matches – this strategy speeds up the retrieval of matching density regions, thereby
enabling fast and accurate protein model-building. This two-phase case retrieval
approach is potentially useful in many case-based reasoning systems, especially those
with computationally expensive case matching and large case libraries.
|
Page generated in 0.0295 seconds