Global ETD Search

1	Feature Construction, Selection And Consolidation For Knowledge Discovery Li, Jiexun January 2007 (has links) With the rapid advance of information technologies, human beings increasingly rely on computers to accumulate, process, and make use of data. Knowledge discovery techniques have been proposed to automatically search large volumes of data for patterns. Knowledge discovery often requires a set of relevant features to represent the specific domain. My dissertation presents a framework of feature engineering for knowledge discovery, including feature construction, feature selection, and feature consolidation.Five essays in my dissertation present novel approaches to construct, select, or consolidate features in various applications. Feature construction is used to derive new features when relevant features are unknown. Chapter 2 focuses on constructing informative features from a relational database. I introduce a probabilistic relational model-based approach to construct personal and social features for identity matching. Experiments on a criminal dataset showed that social features can improve the matching performance. Chapter 3 focuses on identifying good features for knowledge discovery from text. Four types of writeprint features are constructed and shown effective for authorship analysis of online messages. Feature selection is aimed at identifying a subset of significant features from a high dimensional feature space. Chapter 4 presents a framework of feature selection techniques. This essay focuses on identifying marker genes for microarray-based cancer classification. Our experiments on gene array datasets showed excellent performance for optimal search-based gene subset selection. Feature consolidation is aimed at integrating features from diverse data sources or in heterogeneous representations. Chapter 5 presents a Bayesian framework to integrate gene functional relations extracted from heterogeneous data sources such as gene expression profiles, biological literature, and genome sequences. Chapter 6 focuses on kernel-based methods to capture and consolidate information in heterogeneous data representations. I design and compare different kernels for relation extraction from biomedical literature. Experiments show good performances of tree kernels and composite kernels for biomedical relation extraction.These five essays together compose a framework of feature engineering and present different techniques to construct, select, and consolidate relevant features. This feature engineering framework contributes to the domain of information systems by improving the effectiveness, efficiency, and interpretability of knowledge discovery. knowledge discovery feature construction feature selection feature consolidation
2	Processing and representation in auditory cognition Dyson, Benjamin J. January 2002 (has links) No description available. 152 Feature binding
3	Object recognition from large libraries of line patterns Huet, Benoit January 1999 (has links) No description available. 621.3994 Histograms; Feature sets
4	Pattern recognition analysis of in vivo magnetic resonance spectra Tate, Anne Rosemary January 1996 (has links) No description available. 621.3994 Feature extraction
5	Colour content based image retrieval using spatial and perceptual cues Seaborn, Matthew January 2001 (has links) No description available. 621.3994 Feature estimation; Psychophysical
6	Advanced process monitoring and control using principal and independent component analysis Li, Rui Fa January 2003 (has links) No description available. 660 Feature extraction
7	L’Apprentissage artificiel pour la fouille de données multilingues : application à la classification automatique des documents arabes / Machine learning and the data mining of multilingual documents : application to the automatic classification of arabic documents Raheel, Saeed 22 October 2010 (has links) La classification automatique des documents, une approche issue de l’apprentissage artificiel et de la fouille de textes, s’avère être très efficace pour l’organisation des ressources textuelles multilingues. Très peu des travaux se rapportent à la classification automatique de documents écrits en caractères arabes malgré la richesse morphologique de cette langue. Pour cela, nous nous intéressons dans cette thèse à la question de la classification automatique des documents écrits en caractères arabes. Il faut noter que pour surmonter les difficultés liées au traitement automatique de l’arabe, nous nous basons dans cette thèse sur une solution très performante celle basée sur la ressource linguistique informatisée de l’arabe DIINAR.1 et son analyseur morphologique. Le choix de la nature des attributs est un élément très important pour une classification automatique efficace et mérite être fait avec le plus grand soin puisqu’il a un effet directe sur la fidélité des classifieurs. Ainsi, nous avons mené dans cette thèse une étude comparative entre les n-grammes, les racines, les lemmes, et les mots comme nature d’attributs qui nous a permis de conclure une instabilité dans la performance des classifieurs basés sur les corpus construit via les n-grammes vis-à-vis d’une stabilité dans le comportement des classifieurs basés sur les corpus construits à partir des racines.De plus, on constate dans la plupart des travaux menés sur des documents écrits en caractères arabes qu’ils se basent sur des algorithmes d’apprentissage modernes comme, par exemple, les machines à vecteurs supports, les réseaux bayésiens naïfs, et les arbres de décision qui sont connus être parmi les meilleurs performants classifieurs du domaine. Toutefois, on ne trouve, à l’heure actuelle, aucun travail portant sur la classification automatique des documents écrits en caractères arabes qui utilise l’algorithme du dopage (« Boosting »). Pour cela, nous avons mené une étude comparative de la fidélité des arbres de décision (C4.5) dopés, d’une part, et les arbres de décision (C4.5) (sans dopage), les machines à vecteurs supports (SMO), et les réseaux bayésiens naïfs (NBM), d’un autre part, en fonction de la classification automatique des documents écrits en caractères arabes. Nous avons constaté que l’algorithme C4.5 boosté n’a pas pu surpasser la fidélité des algorithmes SVM et NBM. Nous attribuons cette faiblesse, sans reprocher le dopage, au fait que les arbres de décision sont très sensibles au moindre changement de leurs données sous-jacentes qui sont régulièrement pondérées et modifiées lors du dopage.Un document arabe peut être rédigé en une seule ou plusieurs langues i.e. le contenu du document est un mélange de mots écrits en caractères arabes ainsi que d’autres écrits en caractère latins. Tous les travaux portant sur la classification automatique des documents écrits en caractères arabes abordent le sujet d’un point de vue monolingue i.e. en exploitant uniquement le texte écrit en caractères arabes et en éliminant tout autre texte écrit dans d’autres langues. En conséquence, une partie vitale des informations présentes dans les documents est perdue délibérément sachant qu’elle aurait pu contribuer à la subjectivité de la décision prise par le classifieur puisque l’attribution d’un document à une catégorie ou une autre se base, principalement, sur son contenu. En conséquent, l’élimination des mots écrits en caractères latins tronque le texte ce qui met en question le degré de la subjectivité de la décision finale prise par le modèle de prédiction. Pour cela, nous nous sommes intéressés aussi dans cette thèse à la classification automatique des documents arabes ayant un contenu multilingues i.e. écrits en plusieurs langues. / The automatic classification of documents is an approach resulting from the hybridization of machine learning and text mining techniques. It is has proven to be very effective for the automatic organization of text based resources, in particularly, multilingual ones. We find, however, very little literature written on the subject when it comes to Arabic documents despite the fact that this language is morphologically much richer than Latin based ones. It should be noted that, in order to overcome the difficulties related to the automatic processing of Arabic documents, a deep analysis, such as the one performed by the morphological analyzer based on the computerized dictionary for Arabic DIINAR.1, is required.One of the intrinsic elements of any automatic classification system is the choice of the attribute’s nature. Great care should be taken while making that choice since it has a great impact on the classifier’s accuracy. One of the contributions made by this thesis is the presentation of a comparative study between Support Vector Machines (SMO) and Naïve Bayes Multinomial (NBM) algorithms based on multiple corpuses generated from n-grams, stems, lemmas, and words. We concluded that the performance of the classifiers based on corpuses generated from stems was better than the one based on lemmas and words. In addition, the performance of the classifiers based on stems was more stable than the one based on corpuses generated from n-grams.Another contribution made by this thesis is the use of Boosting as a classifier. None of the literature written on the automatic classification of Arabic documents has ever used it before despite the fact that this algorithm was designed for that purpose. Therefore, we have conducted a comparative study between Decision Trees (C4.5), Boosted Decision Trees (C4.5 and AdaBoost.M1), SMO, and NBM algorithms respectively. Boosting was indeed able to boost the performance of C4.5 but the regular re-weighting made by Boosting to the dataset’s instances hampered C4.5 from bypassing the performances of SMO and NBM algorithms. This weakness is due to the very nature of decision trees that renders them very sensitive to any change in their underlying data.We noticed while analyzing our dataset that an Arabic document is either written in one (i.e. Arabic) or multiple languages (i.e. it will contain words written in Arabic mixed with a minority of words written in another language). All of the literature written on the automatic classification of Arabic documents treats both cases equally and eliminates all the foreign terms in case it finds any. This deliberate elimination deprives the learning process from a vital part of the information found in the documents knowing that it could have contributed to the decision taken by the classifier since to assign to a document one category or another relies basically on its content and as such the degree of certainty of the decision made by the classifier is being compromised. Therefore, the main contribution made by this thesis is that it deals with the automatic classification of Arabic documents from a multilingual perspective and tries to preserve as much as possible of the foreign terms while eliminating only the useless ones (e.g. stowords). Sélection d’attributs Feature Selection
8	Classification multi-modèles des images dans les bases hétérogènes / Multi-model image classification in heterogeneous databases Kachouri, Rostom 29 June 2010 (has links) La reconnaissance d'images est un domaine de recherche qui a été largement étudié par la communauté scientifique. Les travaux proposés dans ce cadre s'adressent principalement aux diverses applications des systèmes de vision par ordinateur et à la catégorisation des images issues de plusieurs sources. Dans cette thèse, on s'intéresse particulièrement aux systèmes de reconnaissance d'images par le contenu dans les bases hétérogènes. Les images dans ce type de bases appartiennent à différents concepts et représentent un contenu hétérogène. Pour ce faire, une large description permettant d'assurer une représentation fiable est souvent requise. Cependant, les caractéristiques extraites ne sont pas nécessairement toutes appropriées pour la discrimination des différentes classes d'images qui existent dans une base donnée d'images. D'où, la nécessité de sélection des caractéristiques pertinentes selon le contenu de chaque base. Dans ce travail, une méthode originale de sélection adaptative est proposée. Cette méthode permet de considérer uniquement les caractéristiques qui sont jugées comme les mieux adaptées au contenu de la base d'image utilisée. Par ailleurs, les caractéristiques sélectionnées ne disposent pas généralement des mêmes performances. En conséquence, l'utilisation d'un algorithme de classification, qui s'adapte aux pouvoirs discriminants des différentes caractéristiques sélectionnées par rapport au contenu de la base d'images utilisée, est vivement recommandée. Dans ce contexte, l'approche d'apprentissage par noyaux multiples est étudiée et une amélioration des méthodes de pondération des noyaux est présentée. Cette approche s'avère incapable de décrire les relations non-linéaires des différents types de description. Ainsi, nous proposons une nouvelle méthode de classification hiérarchique multi-modèles permettant d'assurer une combinaison plus flexible des caractéristiques multiples. D'après les expérimentations réalisées, cette nouvelle méthode de classification assure des taux de reconnaissance très intéressants. Enfin, les performances de la méthode proposée sont mises en évidence à travers une comparaison avec un ensemble d'approches cité dans la littérature récente du domaine. / Image recognition is widely studied by the scientific community. The proposed research in this field is addressed to various applications of computer vision systems and multiple source image categorization. This PhD dissertation deals particularly with content based image recognition systems in heterogeneous databases. Images in this kind of databases belong to different concepts and represent a heterogeneous content. In this case and to ensure a reliable representation, a broad description is often required. However, the extracted features are not necessarily always suitable for the considered image database. Hence, the need of selecting relevant features based on the content of each database. In this work, an adaptive selection method is proposed. It considers only the most adapted features according to the used image database content. Moreover, selected features do not have generally the same performance degrees. Consequently, a specific classification algorithm which considers the discrimination powers of the different selected features is strongly recommended. In this context, the multiple kernel learning approach is studied and an improved kernel weighting method is presented. It proved that this approach is unable to describe the nonlinear relationships of different description kinds. Thus, we propose a new hierarchical multi-model classification method able to ensure a more flexible combination of multiple features. Experimental results confirm the effectiveness and the robustness of this new classification approach. In addition, the proposed method is very competitive in comparison with a set of approaches cited in the recent literature. Extraction d'attributs Feature extraction
9	Comparison of Salient Feature Descriptors Farzaneh, Sara January 2008 (has links) <p>In robot navigation, and image content searches reliable salient features are of pivotal importance. Also in biometric human recognition, salient features are increasingly used. </p><p>Regardless the application, image matching is one of the many problems in computer </p><p>vision, including object recognition. </p><p> </p><p>This report investigates some salient features to match sub-images of different images. </p><p>An underlying assumption is that sub-images, also called image objects, or objects, are </p><p>possible to recognize by the salient features that can be recognized independently. </p><p> </p><p>Since image objects are images of 3D objects, the salient features in 2D images must be </p><p>invariant to reasonably large viewing direction and distance (scale) changes. These </p><p>changes are typically due to 3D rotations and translations of the 3D object with respect to </p><p>the camera. Other changes that influence the matching of two 2D image objects is </p><p>illumination changes, and image acquisition noise. </p><p> </p><p>This thesis will discuss how to find the salient features and will compare them with </p><p>respect to their matching performance. Also it will explore how these features are </p><p>invariant to rotation and scaling.</p> Salient Feature image matching
10	Cartoon Character Animation Using Human Facial Feature Transformation Young, Chiao-Wen 25 July 2001 (has links) NPR (Non-Photorealistic Rendering) is a new and quick-developed research topic in Image Processing. The main purpose of NPR is to generate sketching or comics, something different from photos, automatically by computer algorithms. Examples of such applications include pen-and-ink tree or watercolor. On the other hands, there is another technique called PR (Photorealistic Rendering). The goal of PR is to generate real objects by computer algorithms. The performance of a PR program depends on the realities of the objects generated by the PR program. Furthermore, NPR includes two modes: one is with Physical Model and the other is not. 1.¡@With Physical Model: Researchers could write programs to simulate NPR by the properties of Physical Model. 2.¡@Without Physical Model: Researchers could write programs to simulate NPR by their observations and deliberation. Our research belongs to the second one, NPR without Physical Model. In the efforts of artists, some common consensuses about human facial proportion are brought up gradually. Then, the common standards are produced. In our research, there are several steps. First, read an input human front photo and separate main features from face, include the maximal and minimal values of left and right eyebrows, left and right eyes, left and right ears, nose and mouth, in horizontal and vertical. Then quantify these facial features. Next, we would construct a standard model based on the facial feature standard in arts. Compare the values we obtain from input human front photo with the values in the standard model, then we could obtain a cartoon face model. At last, adjust and exaggerate features according to scale relations between features of input face front photo and standard model and the distance among facial features. Keys of facial features transformation in this step are enlarging, shrinking, closing and separating. In varied parts, like face form and hair form, we hope to extract some sample feature points and use Bezier Curve in Numerical Analysis to draw them. That is because that lines of cartoon, unlike real human face, are sketched very smoothly and colors are uniform in general. We also provide several roles, one four grids comic and one cartoon animation. Users could import results of programs to those test images or image sequence, then complete these comics or cartoons based on input human face front photos. By this research, we hope that we could reach the goal that makes everyone as a main character of comics or cartoons. feature transformation cartoon animation

Search results