Global ETD Search

111	Nearest neighbor classification using a density sensitive distance measurement [electronic resource] / Burkholder, Joshua Jeremy. January 2009 (has links) (PDF) Thesis (M.S. in Modeling, Virtual Environments, And Simulations (MOVES))--Naval Postgraduate School, September 2009. / Thesis Advisor(s): Squire, Kevin. "September 2009." Description based on title screen as viewed on November 03, 2009. Author(s) subject terms: Classification, Supervised Learning, k-Nearest Neighbor Classification, Euclidean Distance, Mahalanobis Distance, Density Sensitive Distance, Parzen Windows, Manifold Parzen Windows, Kernel Density Estimation Includes bibliographical references (p. 99-100). Also available in print. Classification. Distances.
112	Children's classificatory abilities and map use Geissman, Anne Elizabeth. January 1981 (has links) Thesis (M.S.)--University of Wisconsin--Madison, 1981. / Typescript. eContent provider-neutral record in process. Description based on print version record. Includes bibliographical references (leaves 116-123). Classification Cartography
113	Analyse critique des méthodes classiques et nouvelle approche par la programmation mathématique en classification automatique Gafner, Thierry. January 1900 (has links) Thesis (doctoral)--Université de Neuchâtel, 1991. / Includes bibliographical references (p. 112) and index. Automatic classification
114	Paper Categorization Using Naive Bayes Cui, Man 29 April 2013 (has links) Literature survey is a time-consuming process as researchers spend a lot of time in searching the papers of interest. While search engines can be useful in finding papers that contain a certain set of keywords, one still has to go through these papers in order to decide whether they are of interest. On the other hand, one can quickly decide which papers are of interest if each one of them is labelled with a category. The process of labelling each paper with a category is termed paper categorization, an instance of a more general problem called text classification. In this thesis, we presented a text classifier called Iris that makes use of the popular Naive Bayes algorithm. With Iris, we were able to (1) evaluate Naive Bayes using a number of popular datasets, (2) propose a GUI for assisting users with document categorization and searching, and (3) demonstrate how the GUI can be utilized for paper categorization and searching. / Graduate / 0984 Text classification
115	A revision of the genus Rafnia thunb.(fam. Fabaceae : sub. fam. Papilionoideae) Richardson, Gaynor Rose-Marie January 1987 (has links) A taxonomic revision of Rafnia Thunb. (Fam. Fabaceae, Subfam . Papili onoideae) is presented in which 21 species are recognised. The relative value of the taxonomic characters is discussed. An electron microscopy study of the seed surface, pollen grains and several sexual characters has been undertaken. Two keys are included , one using vegetative and floral characters and the other using ultrastructure of the testa. Each species description is accompanied by illustrations and a distribution map. Historical and ecological notes on the genus are given Thunbergia -- Classification
116	Numerical assessment of soil properties in relation to classification and genesis Sondheim, M. January 1982 (has links) Soil properties are examined from two perspectives: (1), in relation to classes and categories of classification systems, and (2), in terms of mathematically tractable, chemical and physical continuums. Through four independent studies, major limitations of each approach are defined and evaluated. The first study examines samples from six different types of horizons commonly found in podzolic soils. The results suggest that in a chemical context the horizons do not represent distinct entities; rather they appear to dominate overlapping regions along a multidimensional chemical spectrum. The second study analyzes the extent to which V.J. Krajina's phytosociological classification of biogeocoenoses explains the variability of a number of site properties. It is determined that many of the physiographic properties are significantly related to the association category of the system, but that many of the pedologic properties are not. The two studies lead to a dichotomy concerning classification and the statistical relationships both among soil properties and between soil properties and other elements of an ecosystem. Where sampling is restricted to comparatively limited ranges along environmental gradients, relationships may be so weak that a classification based on only a few properties or elements may not be that useful for associated properties and elements. On the other hand, because of the implied high degree of variability, attempts to develop a holistic, integrated classification are not likely to be highly successful either. In the third study chemical and physical changes across a prograded beach chronosequence are examined. It is found that soil development over both time and depth may be modelled by a non-linear regression equation. The last of the four studies concerns an evaluation of the extent to which the inherent variability of soil properties masks expected trends across a morainal chronosequence. For those properties most affected by vegetation succession, the same type of regression equation as used in the previous study was applied with excellent results. For the other, less dynamic properties, assumed trends were too obscure to model. The two studies suggest that, where soil properties are directly influenced by strong environmental gradients, ordination techniques may be quite illuminating. In less biologically stressful environments and in those which have reached steady state, both the predictive and explanatory capabilities of such techniques may be relatively low. These findings closely parallel those discussed earlier concerning classification. The thesis concludes that for many applications attempts to model and map the landscape as an integrated whole should be abandoned. Furthermore, instead of viewing the landscape from either a classification or ordination perspective, digital terrain models should be considered. Data for the models could be generated from regionalized, statistical, stochastic, and deterministic equations, calibrated with ground truth observations. Traditional polygon and contour maps can also be transformed into digital terrain models. Landscape interpretations could then be tied directly to measured and estimated data. This approach involves a minimum loss of information and is conceptually simple. / Land and Food Systems, Faculty of / Graduate Soils -- Classification
117	The place of cataloguing and classification in the curricula of South African universities Spruyt, M L January 1980 (has links) Bibliography: pages 361-372. / The aim of this study is to determine the place of cataloguing and classification in the library and information science curricula of South African universities today, and to determine whether, in compiling the syllabus comprising bibliographic description and subject analysis, new developments and changes are being taken into consideration. With this in mind, attention has been given to the following: (a) Developments in general have been reconstructed by means of a review of the history of cataloguing and classification, from ancient to present times; (b) a review of the comprehensive development of education for librarianship overseas and in South Africa; and (c) an investigation of the present position of bibliographic description and subject analysis in the curricula of library and information science of South African universities. Cataloging Classification
118	Spatial clustering of linkage disequilibrium blocks for genome-wide association studies / Classification spatiale du déséquilibre de liaison pour les études d'association pangénomique Dehman, Alia 09 December 2015 (has links) Avec le développement récent des technologies de génotypage à haut débit, l'utilisation des études d'association pangénomiques (GWAS) est devenue très répandue dans la recherche génétique. Au moyen de criblage de grandes parties du génome, ces études visent à caractériser les facteurs génétiques impliqués dans le développement de maladies génétiques complexes. Les GWAS sont également basées sur l'existence de dépendances statistiques, appelées déséquilibre de liaison (DL), habituellement observées entre des loci qui sont proches dans l'ADN. Le DL est défini comme l'association non aléatoire d'allèles à des loci différents sur le même chromosome ou sur des chromosomes différents dans une population. Cette caractéristique biologique est d'une importance fondamentale dans les études d'association car elle permet la localisation précise des mutations causales en utilisant les marqueurs génétiques adjacents. Néanmoins, la structure de blocs complexe induite par le DL ainsi que le grand volume de données génétiques constituent les principaux enjeux soulevés par les études GWAS. Les contributions présentées dans ce manuscrit comportent un double aspect, à la fois méthodologique et algorithmique. Sur le plan méthodologie, nous proposons une approche en trois étapes qui tire profit de la structure de groupes induite par le DL afin d'identifier des variants communs qui pourraient avoir été manquées par l'analyse simple marqueur. Dans une première étape, nous effectuons une classification hiérarchique des SNPs avec une contrainte d'adjacence et en utilisant le DL comme mesure de similarité. Dans une seconde étape, nous appliquons une approche de sélection de modèle à la hiérarchie obtenue afin de définir des blocs de DL. Enfin, nous appliquons le modèle de régression Group Lasso sur les blocs de DL inférés. L'efficacité de l'approche proposée est comparée à celle des approches de régression standards sur des données simulées, semi-simulées et réelles de GWAS. Sur le plan algorithmique, nous nous concentrons sur l'algorithme de classification hiérarchique avec contrainte spatiale dont la complexité quadratique en temps n'est pas adaptée à la grande dimension des données GWAS. Ainsi, nous présentons, dans ce manuscrit, une mise en œuvre efficace d'un tel algorithme dans le contexte général de n'importe quelle mesure de similarité. En introduisant un paramètre $h$ défini par l'utilisateur et en utilisant la structure de tas-min, nous obtenons une complexité sous-quadratique en temps de l'algorithme de classification hiérarchie avec contrainte d'adjacence, ainsi qu'une complexité linéaire en mémoire en le nombre d'éléments à classer. L'intérêt de ce nouvel algorithme est illustré dans des applications GWAS. / With recent development of high-throughput genotyping technologies, the usage of Genome-Wide Association Studies (GWAS) has become widespread in genetic research. By screening large portions of the genome, these studies aim to characterize genetic factors involved in the development of complex genetic diseases. GWAS are also based on the existence of statistical dependencies, called Linkage Disequilibrium (LD) usually observed between nearby loci on DNA. LD is defined as the non-random association of alleles at different loci on the same chromosome or on different chromosomes in a population. This biological feature is of fundamental importance in association studies as it provides a fine location of unobserved causal mutations using adjacent genetic markers. Nevertheless, the complex block structure induced by LD as well as the large volume of genetic data arekey issues that have arisen with GWA studies. The contributions presented in this manuscript are in twofold, both methodological and algorithmic. On the methodological part, we propose a three-step approach that explicitly takes advantage of the grouping structure induced by LD in order to identify common variants which may have been missed by single marker analyses. In thefirst step, we perform a hierarchical clustering of SNPs with anadjacency constraint using LD as a similarity measure. In the second step, we apply a model selection approach to the obtained hierarchy in order to define LD blocks. Finally, we perform Group Lasso regression on the inferred LD blocks. The efficiency of the proposed approach is investigated compared to state-of-the art regression methods on simulated, semi-simulated and real GWAS data. On the algorithmic part, we focus on the spatially-constrained hierarchical clustering algorithm whose quadratic time complexity is not adapted to the high-dimensionality of GWAS data. We then present, in this manuscript, an efficient implementation of such an algorithm in the general context of anysimilarity measure. By introducing a user-parameter $h$ and using the min-heap structure, we obtain a sub-quadratic time complexity of the adjacency-constrained hierarchical clustering algorithm, as well as a linear space complexity in thenumber of items to be clustered. The interest of this novel algorithm is illustrated in GWAS applications. Classification hiérarchique
119	Statistical learning for omics association and interaction studies based on blockwise feature compression / Apprentissage statistique pour les études d'association et d'interactions entre données omiques fondée sur une approche de compression structurée Guinot, Florent 04 December 2018 (has links) Depuis la dernière décennie le développement rapide des technologies de génotypage a profondément modifié la façon dont les gènes impliqués dans les troubles mendéliens et les maladies complexes sont cartographiés, passant d'approches gènes candidats aux études d'associations pan-génomique, ou Genome-Wide Association Studies (GWASs). Ces études visent à identifier, au sein d'échantillons d'individus non apparentés, des marqueurs génétiques impliqués dans l'expression de maladies complexes. Ces études exploitent le fait qu'il est plus facile d'établir, à partir de la population générale, de grandes cohortes de personnes affectées par une maladie et partageant un facteur de risque génétique qu'au sein d'échantillons apparentés issus d'une même famille, comme c'est le cas dans les études familiales traditionnelles.D'un point de vue statistique, l'approche standard est basée sur le test d'hypothèse: dans un échantillon d'individus non apparentés, des individus malades sont testés contre des individus sains à un ou plusieurs marqueurs. Cependant, à cause de la grande dimension des données, ces procédures de tests classiques sont souvent sujettes à des faux positifs, à savoir des marqueurs faussement identifiés comme étant significatifs. Une solution consiste à appliquer une correction sur les p-valeurs obtenues afin de diminuer le seuil de significativité, augmentant en contrepartie le risque de manquer des associations n’ayant qu'un faible effet sur le phénotype.De plus, bien que cette approche ait réussi à identifier des marqueurs génétiques associés à des maladies multi-factorielles complexes (maladie de Crohn, diabète I et II, maladie coronarienne,…), seule une faible proportion des variations phénotypiques attendues des études familiales classiques a été expliquée. Cette héritabilité manquante peut avoir de multiples causes parmi les suivantes: fortes corrélations entre les variables génétiques, structure de la population, épistasie (interactions entre gènes), maladie associée aux variants rares,...Les principaux objectifs de cette thèse sont de développer de nouvelles méthodes statistiques pouvant répondre à certaines des limitations mentionnées ci-dessus. Plus précisément, nous avons développé deux nouvelles approches: la première exploite la structure de corrélation entre les marqueurs génétiques afin d'améliorer la puissance de détection dans le cadre des tests d'hypothèses tandis que la seconde est adaptée à la détection d'interactions statistiques entre groupes de marqueurs méta-génomiques et génétiques permettant une meilleure compréhension de la relation complexe entre environnement et génome sur l'expression d'un caractère. / Since the last decade, the rapid advances in genotyping technologies have changed the way genes involved in mendelian disorders and complex diseases are mapped, moving from candidate genes approaches to linkage disequilibrium mapping. In this context, Genome-Wide Associations Studies (GWAS) aim at identifying genetic markers implied in the expression of complex disease and occuring at different frequencies between unrelated samples of affected individuals and unaffected controls. These studies exploit the fact that it is easier to establish, from the general population, large cohorts of affected individuals sharing a genetic risk factor for a complex disease than within individual families, as is the case with traditional linkage analysis.From a statistical point of view, the standard approach in GWAS is based on hypothesis testing, with affected individuals being tested against healthy individuals at one or more markers. However, classical testing schemes are subject to false positives, that is markers that are falsely identified as significant. One way around this problem is to apply a correction on the p-values obtained from the tests, increasing in return the risk of missing true associations that have only a small effect on the phenotype, which is usually the case in GWAS.Although GWAS have been successful in the identification of genetic variants associated with complex multifactorial diseases (Crohn's disease, diabetes I and II, coronary artery disease,…) only a small proportion of the phenotypic variations expected from classical family studies have been explained .This missing heritability may have multiple causes amongst the following: strong correlations between genetic variants, population structure, epistasis (gene by gene interactions), disease associated with rare variants,…The main objectives of this thesis are thus to develop new methodologies that can face part of the limitations mentioned above. More specifically we developed two new approaches: the first one is a block-wise approach for GWAS analysis which leverages the correlation structure among the genomic variants to reduce the number of statistical hypotheses to be tested, while in the second we focus on the detection of interactions between groups of metagenomic and genetic markers to better understand the complex relationship between environment and genome in the expression of a given phenotype. Classification hiérarchique
120	Understanding state-of-the-art material classiﬁcation through deep visualization Donovan, Jordan 13 December 2019 (has links) Neural networks (NNs) excel at solving several complex, non-linear problems in the area of supervised learning. A prominent application of these networks is image classiﬁcation. Numerous improvements over the last few decades have improved the capability of these image classiﬁers. However, neural networks are still a black-box for solving image classiﬁcation and other sophisticated tasks. A number of experiments conducted look into exactly how neural networks solve these complex problems. This paper dismantles the neural network solution, incorporating convolution layers, of a specific material classiﬁer. Several techniques are utilized to investigate the solution to this problem. These techniques look at speciﬁcally which pixels contribute to the decision made by the NN as well as a look at each neuron’s contribution to the decision. The purpose of this investigation is to understand the decision-making process of the NN and to use this knowledge to suggest improvements to the material classiﬁcation algorithm. Material Classification

Search results