Global ETD Search

541	Fuzzy kNNModel Applied to Predictive Toxicology Data Mining Guo, G., Neagu, Daniel January 2005 (has links) No / A robust method, fuzzy kNNModel, for toxicity prediction of chemical compounds is proposed. The method is based on a supervised clustering method, called kNNModel, which employs fuzzy partitioning instead of crisp partitioning to group clusters. The merits of fuzzy kNNModel are two-fold: (1) it overcomes the problems of choosing the parameter ¿ ¿ allowed error rate in a cluster and the parameter N ¿ minimal number of instances covered by a cluster, for each data set; (2) it better captures the characteristics of boundary data by assigning them with different degrees of membership between 0 and 1 to different clusters. The experimental results of fuzzy kNNModel conducted on thirteen public data sets from UCI machine learning repository and seven toxicity data sets from real-world applications, are compared with the results of fuzzy c-means clustering, k-means clustering, kNN, fuzzy kNN, and kNNModel in terms of classification performance. This application shows that fuzzy kNNModel is a promising method for the toxicity prediction of chemical compounds. Fuzzy kNNModel Classification Predictive toxicology
542	Terrain analysis using data from proprioceptive sensors on mobile robots Larocque, Damien 17 July 2024 (has links) La reconnaissance du terrain est essentielle pour la mobilité autonome. La forêt boréale étant un des plus grands biomes terrestres de la planète, il est attendu que les véhicules autonomes soient capables de manœuvrer dans les conditions difficiles d'un environnement forestier. Cela induit un besoin de considérer la reconnaissance de terrains qui la caractérisent. Ce mémoire aborde la problématique de l'analyse de terrains, dont ceux de la forêt boréale, à partir de données de capteurs proprioceptifs à travers des approches théoriques et empiriques. Les capteurs proprioceptifs ont l'avantage de fournir des informations directes sur les caractéristiques physiques d'un sol, via les effets de ces derniers sur la dynamique d'un véhicule terrestre sans pilotes (en anglais, UGV). Dans le Chapitre 3, nous détaillons une méthode pour caractériser les terrains en exploitant les modèles de puissance des véhicules mobiles à roues non directionnelles (en anglais, SSMR) à partir des données d'un véhicule Warthog. Le 4 présente les résultats de l'utilisation de classificateurs basés sur l'apprentissage automatique pour déduire le terrain à partir des données acquises avec un véhicule Husky. Le 5 introduit l'ensemble de données BorealTC et démontre l'utilisation de deux modèles de classification basés sur l'apprentissage profond, un réseau de neurones convolutifs (CNN) et Mamba, pour classifier des terrains à partir de données de capteurs proprioceptifs. Ce chapitre est adapté d'un article soumis à l'édition 2024 de la conférence IEEE International Conference on Intelligent Robotsand Systems (IROS). Enfin, nous discutons des résultats et des limites de chaque approche, tout en proposant quelques pistes de réflexion à explorer pour une caractérisation autonome et en temps réel du terrain. / Terrain awareness is key for autonomous mobility. As autonomous vehicles are expected tobe deployed in complex environments, such as boreal forests, terrainawareness should bestudied for a variety of environments. This thesis tackles the problem of terrain analysisfrom proprioceptive sensor data through theoretical and empirical approaches. Proprioceptive sensors have the benefit of providing direct information about the physical characteristics of a surface, through their effect on the dynamics of a Uncrewed Ground Vehicle (UGV).In Chapter 3, we detail a method to characterize terrains by leveraging skid-steering mobilerobot (SSMR) power models with data from a Warthog UGV. Chapter 4 presents the results of using Machine Learning (ML) classifiers to infer the terrain from Husky data. Chapter 5 introduces the BorealTC dataset and demonstrates the use of two Deep Learning (DL) classifiers, a Convolutional Neural Network (CNN) and Mamba, for proprioceptive-based terrainclassification (TC). This chapter is adapted from a paper submitted to the 2024 IEEE Inter-national Conference on Intelligent Robots and Systems (IROS). Lastly, we discuss the resultsand limits of each approach, by providing some leads to explore for autonomous online terrain characterization Propriocepteurs. Robots autonomes. Terrains -- Classification.
543	The graminaceous rusts and smuts of Kansas Haard, Richard Thomas. January 1963 (has links) Call number: LD2668 .T4 1963 H32 / Master of Science Rust fungi--Classification. Smut diseases--Classification. Smut fungi--Classification. Masters theses
544	CLASSIFICATION AND DISTRIBUTION OF THE CENTRAL EASTERN PACIFIC ECHINODERMS. MALUF, LINDA YVONNE. January 1987 (has links) A total of 627 echinoderm species (12 crinoids, 185 asteroids, 185 ophiuroids, 95 echinoids and 150 holothuroids) are known from the shallow and deep waters between southern California and southern Peru, and an up-to-date classification scheme is given for them. Distribution tables provide detailed presence-absence data for latitudinal increments, geographic range endpoints, depth ranges, and substrate associations of each species. Annotated lists of all species include relevant synonyms and mistaken records as well as literature citations used for both lists and distribution tables. A species-level biogeographic analysis shows that echinoderm provinces conform to those generally observed for other marine taxa, including mollusks, crustaceans and fishes. Based on cluster analysis and more traditional approaches (using species richness, faunal turnover and faunal composition), overall faunal similarity of the shelf echinoderms is very high between 23°N and 4°S, in the tropical Panamic province. There is a northern warm-temperate fauna (California province) between Pt. Conception, California and Pt. Eugenio, Baja California that also extends into lower Baja and the Gulf of California. Warm-temperate elements in the subtropical Gulf of California distinguish it from the tropics, and it is recognized as a faunal province in spite of its low endemism. Echinoderm endemism is unusually high in the Galapagos province and is attributed to the wide habitat diversity and isolation of the archipelago. There is no evidence for a Mexican province, but there is evidence for a distinction between the tropics to the north and south of Costa Rica/Panama. Transition zones (especially in Panama and southern California) often have high species richness, increased habitat diversity, and a number of endemic species. The warm-water eastern Pacific genera are most closely related to those of the west Atlantic tropics, but very few species are shared between the regions. Trans-Pacific species in the CEP are widespread throughout the region. A confinement of Indo-Pacific species to offshore CEP islands is only seen at Clipperton Island, the lone coral atoll of the eastern Pacific.
545	Boundaries for use in wheat variety classification use in Australia Williams, Richard Malcolm January 2006 (has links) Suppliers of wheat must ensure that their products have the required quality profile demanded by customers and consistently deliver that quality in order to be competitive. Australia’s wheat industry is highly exposed to such competitive threats because it relies heavily on exports. An integral component in maintaining Australia’s competitiveness has been its classification system. The first step involves the complex process of determining a genotypic quality profile of each variety – a variety classification. At harvest, subsequent steps are the use of a statutory declaration and testing of physical quality traits. Together these steps determine how deliveries of wheat are segregated. A single variety can have different classifications across the 7 classification regions of Australia. Most classification regions are divided along state borders and these are not reflective of potential environmental influences. / The manner in which Australia wheat breeding programs now tackle their task has changed since 1999. The commercially focused companies of the current era have national targets to remain viable, and are focused on costs. Other evolutions associated with the change, are the introduction of different sources of parental material, and moving to more economic composite quality testing regimes instead of the individual site by site testing used in the past. Together, these factors, particularly variety adaptability and stability of performance, have the capacity to increase variability. The likelihood of variation is further increased given that the current classification regions upon which classification decisions are made do not adequately reflect environmental effects on the expression of quality. To determine whether better divisions of the Australian wheat-belt could be identified for variety classification purposes, a substantial spatial and temporal database of historical quality results was assembled. The creation of this relational database was unique, because never before had expansive sets of independent, state-based, quality sub-sets been joined together. However, the data were unbalanced and required alternative statistical tools to be analysed. The relational database was the platform from which three phases of research were conducted. / The first research phase investigated the extent of cross over, or re-ranking of results, statistically referred to as genotype x environment interaction. The approach was to assess balanced data sets, in a manner reminiscent of the most common method identified from the literature. The results of those analyses showed that the size of genotype and environment interaction was small compared with the main effects of genotype and environment. The second phase of research focused on identifying alternative boundaries for classification purposes. Test divisions were compared with the current set of 7 classification regions for the capacity to minimise environmental variance while maintaining differences between the zones of a set. Test divisions were based on fourteen published divisions of the Australian wheat-belt. Analyses were conducting using residual maximum likelihood because of the unbalanced structure of the data. Estimates of variance components, quality trait means and standard errors were calculated. Consideration of such estimates resulted in the identification of 4 different divisions of the wheat-belt that had low environmental variance levels for important quality traits such as maximum resistance, dough development time, and water absorption. / In addition, these 4 divisions of the wheat-belt had fewer number of zones compared with the existing set of classification regions because they linked separate parts of the wheat-belt together. In order of decreasing merit, the 4 divisions of the wheat-belt represented average October maximum temperatures; agro-ecological zones reported by Williams et al. (2002); average annual rainfall; and Departments of Agriculture recommendation zones. A final phase of crosschecking was performed to assess the veracity of the 4 identified divisions. A cluster analysis supported the orientation of their boundaries and it was also observed that the use of fixed boundaries for classification purposes would not be negatively affected by seasonal variation. The 4 divisions of the wheat-belt identified in this research support the use of environmentally focused classification boundaries. In addition to improving the capacity to segregate consistent quality, the linking of geographically separate production areas of the wheat-belt reduced the number of zones and this offers process efficiencies.
546	Pendent Usnea (Lichens; Ascomycetes; Parmeliaceae) in Western Oregon : taxonomy; morphological characters; and geographical distribution Pittam, Sherry K. 14 March 1995 (has links) Pendent Usnea species were collected in western Oregon and examined. Character states, such as cortex-medulla-axis ratio; fibril length; papilla diameter; branching patterns; and presence or absence of fibrils, papillae, soredia, isidia; plus chemistry, were recorded and analyzed by inspection for differences. Historical names were researched in the literature. A comparison was made between species concepts used in these accounts, with many conflicting concepts encountered. Selected morphological characters were examined by scanning electron microscope, or dissecting microscope, described, and illustrated. The characters reviewed included articulate fissures; isidia and soredia; cortex-medulla-axis ratio; papillae; and foveate pits. Species determinations were made for field collections. Names were found for all specimens inspected without introducing new names at this time. Eight pendent species were found in western Oregon; they are Usnea cavernosa, Usnea ceratina, Usnea fillpendula, Usnea hesperina subsp. liturata, Usnea inflata, Usnea leucosticta, Usnea longissima, and Usnea merrillii. A practical key to taxa with descriptions is provided and geographic distributions are recorded in tables and maps. / Graduation date: 1995 Lichens -- Oregon -- Classification Ascomycetes -- Oregon -- Classification Parmeliaceae -- Oregon -- Classification Lichens -- Morphology Lichens -- Geographical distribution
547	Supervised Classification of Missense Mutations as Pathogenic or Tolerated using Ensemble Learning Methods Balasubramanyam, Rashmi January 2017 (has links) (PDF) Missense mutations account for more than 50% of the mutations known to be involved in human inherited diseases. Missense classification is a challenging task that involves sequencing of the genome, identifying the variations, and assessing their deleteriousness. This is a very laborious, time and cost intensive task to be carried out in the laboratory. Advancements in bioinformatics have led to several large-scale next-generation genome sequencing projects, and subsequently the identification of genome variations. Several studies have combined this data with information on established deleterious and neutral variants to develop machine learning based classifiers. There are significant issues with the missense classifiers due to which missense classification is still an open area of research. These issues can be classified under two broad categories: (a) Dataset overlap issue - where the performance estimates reported by the state-of-the-art classifiers are overly optimistic as they have often been evaluated on datasets that have significant overlaps with their training datasets. Also, there is no comparative analysis of these tools using a common benchmark dataset that contains no overlap with the training datasets, therefore making it impossible to identify the best classifier among them. Also, such a common benchmark dataset is not available. (b) Inadequate capture of vital biological information of the protein and mutations - such as conservation of long-range amino acid dependencies, changes in certain physico-chemical properties of the wild-type and mutant amino acids, due to the mutation. It is also not clear how to extract and use this information. Also, some classifiers use structural information that is not available for all proteins. In this study, we compiled a new dataset, containing around 2 - 15% overlap with the popularly used training datasets, with 18,036 mutations in 5,642 proteins. We reviewed and evaluated 15 state-of-the-art missense classifiers - SIFT, PANTHER, PROVEAN, PhD-SNP, Mutation Assessor, FATHMM, SNPs&GO, SNPs&GO3D, nsSNPAnalyzer, PolyPhen-2, SNAP, MutPred, PON-P2, CONDEL and MetaSNP, using the six metrics - accuracy, sensitivity, specificity, precision, NPV and MCC. When evaluated on our dataset, we observe huge performance drops from what has been claimed. Average drop in the performance for these 13 classifiers are around 15% in accuracy, 17% in sensitivity, 14% in specificity, 7% in NPV, 24% in precision and 30% in MCC. With this we show that the performance of these tools is not consistent on different datasets, and thus not reliable for practical use in a clinical setting. As we observed that the performance of the existing classifiers is poor in general, we tried to develop a new classifier that is robust and performs consistently across datasets, and better than the state-of-the-art classifiers. We developed a novel method of capturing long-range amino acid dependency conservation by boosting the conservation frequencies of substrings of amino acids of various lengths around the mutation position using AdaBoost learning algorithm. This score alone performed equivalently to the sequence conservation based tools in classifying missense mutations. Popularly used sequence conservation properties was combined with this boosted long-range dependency conservation scores using AdaBoost algorithm. This reduced the class bias, and improved the overall accuracy of the classifier. We trained a third classifier by incorporating changes in 21 important physico-chemical properties, due to the mutation. In this case, we observed that the overall performance further improved and the class bias further reduced. The performance of our final classifier is comparable with the state-of-the-art classifiers. We did not find any significant improvement, but the class-specific accuracies and precisions are marginally better by around 1-2% than those of the existing classifiers. In order to understand our classifier better, we dissected our benchmark dataset into: (a) seen and unseen proteins, and (b) pure and mixed proteins, and analysed the performance in detail. Finally we concluded that our classifier performs consistently across each of these categories of seen, unseen, pure and mixed protein. Missense Classification Supervised Classification Ensemble Learning Methods Missense Mutation Classification Missense Classifiers Missense Mutations Computer Science
548	Dynamic machine learning for supervised and unsupervised classification / Apprentissage automatique dynamique pour la classification supervisée et non supervisée Sîrbu, Adela-Maria 06 June 2016 (has links) La direction de recherche que nous abordons dans la thèse est l'application des modèles dynamiques d'apprentissage automatique pour résoudre les problèmes de classification supervisée et non supervisée. Les problèmes particuliers que nous avons décidé d'aborder dans la thèse sont la reconnaissance des piétons (un problème de classification supervisée) et le groupement des données d'expression génétique (un problème de classification non supervisée). Les problèmes abordés sont représentatifs pour les deux principaux types de classification et sont très difficiles, ayant une grande importance dans la vie réelle. La première direction de recherche que nous abordons dans le domaine de la classification non supervisée dynamique est le problème de la classification dynamique des données d'expression génétique. L'expression génétique représente le processus par lequel l'information d'un gène est convertie en produits de gènes fonctionnels : des protéines ou des ARN ayant différents rôles dans la vie d'une cellule. La technologie des micro-réseaux moderne est aujourd'hui utilisée pour détecter expérimentalement les niveaux d'expression de milliers de gènes, dans des conditions différentes et au fil du temps. Une fois que les données d'expression génétique ont été recueillies, l'étape suivante consiste à analyser et à extraire des informations biologiques utiles. L'un des algorithmes les plus populaires traitant de l'analyse des données d'expression génétique est le groupement, qui consiste à diviser un certain ensemble en groupes, où les composants de chaque groupe sont semblables les uns aux autres données. Dans le cas des ensembles de données d'expression génique, chaque gène est représenté par ses valeurs d'expression (caractéristiques), à des points distincts dans le temps, dans les conditions contrôlées. Le processus de regroupement des gènes est à la base des études génomiques qui visent à analyser les fonctions des gènes car il est supposé que les gènes qui sont similaires dans leurs niveaux d'expression sont également relativement similaires en termes de fonction biologique. Le problème que nous abordons dans le sens de la recherche de classification non supervisée dynamique est le regroupement dynamique des données d'expression génique. Dans notre cas, la dynamique à long terme indique que l'ensemble de données ne sont pas statiques, mais elle est sujette à changement. Pourtant, par opposition aux approches progressives de la littérature, où l'ensemble de données est enrichie avec de nouveaux gènes (instances) au cours du processus de regroupement, nos approches abordent les cas lorsque de nouvelles fonctionnalités (niveaux d'expression pour de nouveaux points dans le temps) sont ajoutés à la gènes déjà existants dans l'ensemble de données. À notre connaissance, il n'y a pas d'approches dans la littérature qui traitent le problème de la classification dynamique des données d'expression génétique, définis comme ci-dessus. Dans ce contexte, nous avons introduit trois algorithmes de groupement dynamiques que sont capables de gérer de nouveaux niveaux d'expression génique collectés, en partant d'une partition obtenue précédente, sans la nécessité de ré-exécuter l'algorithme à partir de zéro. L'évaluation expérimentale montre que notre méthode est plus rapide et plus précis que l'application de l'algorithme de classification à partir de zéro sur la fonctionnalité étendue ensemble de données... / The research direction we are focusing on in the thesis is applying dynamic machine learning models to salve supervised and unsupervised classification problems. We are living in a dynamic environment, where data is continuously changing and the need to obtain a fast and accurate solution to our problems has become a real necessity. The particular problems that we have decided te approach in the thesis are pedestrian recognition (a supervised classification problem) and clustering of gene expression data (an unsupervised classification. problem). The approached problems are representative for the two main types of classification and are very challenging, having a great importance in real life.The first research direction that we approach in the field of dynamic unsupervised classification is the problem of dynamic clustering of gene expression data. Gene expression represents the process by which the information from a gene is converted into functional gene products: proteins or RNA having different roles in the life of a cell. Modern microarray technology is nowadays used to experimentally detect the levels of expressions of thousand of genes, across different conditions and over time. Once the gene expression data has been gathered, the next step is to analyze it and extract useful biological information. One of the most popular algorithms dealing with the analysis of gene expression data is clustering, which involves partitioning a certain data set in groups, where the components of each group are similar to each other. In the case of gene expression data sets, each gene is represented by its expression values (features), at distinct points in time, under the monitored conditions. The process of gene clustering is at the foundation of genomic studies that aim to analyze the functions of genes because it is assumed that genes that are similar in their expression levels are also relatively similar in terms of biological function.The problem that we address within the dynamic unsupervised classification research direction is the dynamic clustering of gene expression data. In our case, the term dynamic indicates that the data set is not static, but it is subject to change. Still, as opposed to the incremental approaches from the literature, where the data set is enriched with new genes (instances) during the clustering process, our approaches tackle the cases when new features (expression levels for new points in time) are added to the genes already existing in the data set. To our best knowledge, there are no approaches in the literature that deal with the problem of dynamic clustering of gene expression data, defined as above. In this context we introduced three dynamic clustering algorithms which are able to handle new collected gene expression levels, by starting from a previous obtained partition, without the need to re-run the algorithm from scratch. Experimental evaluation shows that our method is faster and more accurate than applying the clustering algorithm from scratch on the feature extended data set... Classification supervisée Classification non supervisée Expression génétique Dynamic classification Pedestrian recognition Gene clustering Dynamic fusion
549	Rethinking Document Classification: A Pilot for the Application of Text Mining Techniques To Enhance Standardized Assessment Protocols for Critical Care Medical Team Transfer of Care Walker, Briana Shanise 09 June 2017 (has links) No description available. Bioinformatics Nursing critical care transfer of care text classification document classification supervised classification dialogue
550	Le système ESAR : un modèle de classement des jouets et du matériel de jeu à l'intention des éducateurs Garon, Denise 25 April 2018 (has links) La présente étude élaborée en fonction des données de la psychopédagogie actuelle propose un instrument d'analyse des jouets et du matériel de jeu à l'intention des éducateurs; elle répond en outre plus spécialement dans ses modalités d'application, aux besoins d'analyse et de classement des services de prêts de jouets et de matériel de jeu. Cette recherche appliquée présente d'abord le milieu des ludothèques et passe en revue les nombreux modèles de classement des jeux, des jouets et du matériel de jeu tirés de la littérature ancienne et contemporaine et les regroupe pour en faire une analyse critique. Le modèle proposé s'appuie sur un plan de classification emprunté au langage psychologique et articulé avec le support des techniques documentaires et informatisées. Il élabore un ensemble de catégories générales et spécifiques disposées dans un ordre cumulatif et hiérarchique. Ces catégories se présentent sous forme de répertoire de descripteurs c'est-à -dire de mots-clés spécialisés servant d'unités d'analyse conformes aux réalités décrites. Le modèle de classement ESAR est constitué de facettes complémentaires représentant autant d'aspects du savoir jouer. Le nom même du modèle est tiré du premier volet du système. Il correspond aux grandes catégories de jeux inspirées de l'approche piagétienne et couvre l'ensemble du développement de l'activité ludique, de la petite enfance à l'âge adulte; ce sont les jeux d'exercice (E), les jeux symboliques (S), les jeux d'assemblage (A) et les jeux de règles simples et de règles complexes (R). Les modalités d'application du modèle de classement ESAR prévoient un protocole d'analyse régi par des règles précises, un guide de travail , des bordereaux d'analyse et un fichier de définitions de chacun des 151 mots clés composant le thésaurus de manière à uniformiser le processus d'analyse . Cette étude qui associe de façon inédite à la fois un langage psychologique précis et cohérent et les techniques du traitement documentaire fait également appel au langage informatisé. Le modèle de classement ESAR a été utilisé à l'étape d'indexation dans la mise sur pied d'une banque d'analyse informatisée de jouets et de matériel de jeu et a été expérimenté, par la Centrale des bibliothèques du ministère de l'Éducation du Québec. La méthodologie d'application de cette expérience est décrite dans la dernière partie de l'étude. Les informations concernant le jeu et les jouets contenues dans cette banque de données automatisée sont accessibles pour interrogation directe par les usagers (éducateurs en garderies, en maternelles, orthopédagogues, parents, ludothécaires, etc.) ou sous forme de publications courantes distribuées par la Centrale des bibliothèques qui se propose d'enrichir périodiquement la banque de données et de la tenir à jour. La présente étude apporte donc à plusieurs groupes d'éducateurs différents, un instrument d'analyse du jeu et de ses accessoires, instrument à la fois cohérent sur le plan psychopédagogique et à la fine pointe des techniques de traitement documentaire et automatisé. / Québec Université Laval, Bibliothèque 2014 LB 5.5 UL 1982 G237 Classification -- Jouets Système ESAR Classification -- Jeux Classification à facettes.

Search results