Global ETD Search

21	Data Science techniques for predicting plant genes involved in secondary metabolites production Muteba, Ben Ilunga January 2018 (has links) Masters of Science / Plant genome analysis is currently experiencing a boost due to reduced costs associated with the development of next generation sequencing technologies. Knowledge on genetic background can be applied to guide targeted plant selection and breeding, and to facilitate natural product discovery and biological engineering. In medicinal plants, secondary metabolites are of particular interest because they often represent the main active ingredients associated with health-promoting qualities. Plant polyphenols are a highly diverse family of aromatic secondary metabolites that act as antimicrobial agents, UV protectants, and insect or herbivore repellents. Most of the genome mining tools developed to understand genetic materials have very seldom addressed secondary metabolite genes and biosynthesis pathways. Little significant research has been conducted to study key enzyme factors that can predict a class of secondary metabolite genes from polyketide synthases. The objectives of this study were twofold: Primarily, it aimed to identify the biological properties of secondary metabolite genes and the selection of a specific gene, naringenin-chalcone synthase or chalcone synthase (CHS). The study hypothesized that data science approaches in mining biological data, particularly secondary metabolite genes, would enable the compulsory disclosure of some aspects of secondary metabolite (SM). Secondarily, the aim was to propose a proof of concept for classifying or predicting plant genes involved in polyphenol biosynthesis from data science techniques and convey these techniques in computational analysis through machine learning algorithms and mathematical and statistical approaches. Three specific challenges experienced while analysing secondary metabolite datasets were: 1) class imbalance, which refers to lack of proportionality among protein sequence classes; 2) high dimensionality, which alludes to a phenomenon feature space that arises when analysing bioinformatics datasets; and 3) the difference in protein sequences lengths, which alludes to a phenomenon that protein sequences have different lengths. Considering these inherent issues, developing precise classification models and statistical models proves a challenge. Therefore, the prerequisite for effective SM plant gene mining is dedicated data science techniques that can collect, prepare and analyse SM genes. Medicinal plants Polyphenols Feature selection Data visualisation Feature engineering
22	A multilevel search algorithm for feature selection in biomedical data Oduntan, Idowu Olayinka 10 April 2006 (has links) The automated analysis of patients’ biomedical data can be used to derive diagnostic and prognostic inferences about the observed patients. Many noninvasive techniques for acquiring biomedical samples generate data that are characterized by a large number of distinct attributes (i.e. features) and a small number of observed patients (i.e. samples). Deriving reliable inferences, such as classifying a given patient as either cancerous or non-cancerous, using these biomedical data requires that the ratio r of the number of samples to the number of features be within the range 5 < r < 10. To satisfy this requirement, the original set of features in the biomedical datasets can be reduced to an ‘optimal’ subset of features that most discriminates the observed patients. Feature selection techniques strategically seek the ‘optimal’ subset. In this thesis, I present a new feature selection technique - multilevel feature selection. The technique seeks the ‘optimal’ feature subset in biomedical datasets using a multilevel search algorithm. This algorithm combines a hierarchical search framework with a search method. The framework, which provides the capability to easily adapt the technique to different forms of biomedical datasets, consists of increasingly coarse forms of the original feature set that are strategically and progressively explored by the search method. Tabu search (a search meta-heuristics) is the search method used in the multilevel feature selection technique. I evaluate the performance of the new technique, in terms of the solution quality, using experiments that compare the classification inferences derived from the result of the technique with those derived from the result of other feature selection techniques such as the basic tabu-search-based feature selection, sequential forward selection, and random feature selection. In the experiments, the same biomedical dataset is used and equivalent amount of computational resource is allocated to the evaluated techniques to provide a common basis for comparison. The empirical results show that the multilevel feature selection technique finds ‘optimal’ subsets that enable more accurate and stable classification than those selected using the other feature selection techniques. Also, a similar comparison of the new technique with a genetic algorithm feature selection technique that selects highly discriminatory regions of consecutive features shows that the multilevel technique finds subsets that enable more stable classification. / February 2006 multilevel search multilevel feature selection multilevel paradigm feature selection problem
23	Automatic Attribute Clustering and Feature Selection Based on Genetic Algorithms Wang, Po-Cheng 21 August 2009 (has links) Feature selection is an important pre-processing step in mining and learning. A good set of features can not only improve the accuracy of classification, but also reduce the time to derive rules. It is executed especially when the amount of attributes in a given training data is very large. This thesis thus proposes three GA-based clustering methods for attribute clustering and feature selection. In the first method, each feasible clustering result is encoded into a chromosome with positive integers and a gene in the chromosome is for an attribute. The value of a gene represents the cluster to which the attribute belongs. The fitness of each individual is evaluated using both the average accuracy of attribute substitutions in clusters and the cluster balance. The second method further extends the first method to improve the time performance. A new fitness function based on both the accuracy and the attribute dependency is proposed. It can reduce the time of scanning the data base. The third approach uses another encoding method for representing chromosomes. It can achieve a faster convergence and a better result than the second one. At last, the experimental comparison with the k-means clustering approach and with all combinations of attributes also shows the proposed approach can get a good trade-off between accuracy and time complexity. Besides, after feature selection, the rules derived from only the selected features may usually be hard to use if some values of the selected features cannot be obtained in current environments. This problem can be easily solved in our proposed approaches. The attributes with missing values can be replaced by other attributes in the same clusters. The proposed approaches thus provide flexible alternatives for feature selection. k-means reduct genetic algorithms feature clustering feature selection
24	A multilevel search algorithm for feature selection in biomedical data Oduntan, Idowu Olayinka 10 April 2006 (has links) The automated analysis of patients’ biomedical data can be used to derive diagnostic and prognostic inferences about the observed patients. Many noninvasive techniques for acquiring biomedical samples generate data that are characterized by a large number of distinct attributes (i.e. features) and a small number of observed patients (i.e. samples). Deriving reliable inferences, such as classifying a given patient as either cancerous or non-cancerous, using these biomedical data requires that the ratio r of the number of samples to the number of features be within the range 5 < r < 10. To satisfy this requirement, the original set of features in the biomedical datasets can be reduced to an ‘optimal’ subset of features that most discriminates the observed patients. Feature selection techniques strategically seek the ‘optimal’ subset. In this thesis, I present a new feature selection technique - multilevel feature selection. The technique seeks the ‘optimal’ feature subset in biomedical datasets using a multilevel search algorithm. This algorithm combines a hierarchical search framework with a search method. The framework, which provides the capability to easily adapt the technique to different forms of biomedical datasets, consists of increasingly coarse forms of the original feature set that are strategically and progressively explored by the search method. Tabu search (a search meta-heuristics) is the search method used in the multilevel feature selection technique. I evaluate the performance of the new technique, in terms of the solution quality, using experiments that compare the classification inferences derived from the result of the technique with those derived from the result of other feature selection techniques such as the basic tabu-search-based feature selection, sequential forward selection, and random feature selection. In the experiments, the same biomedical dataset is used and equivalent amount of computational resource is allocated to the evaluated techniques to provide a common basis for comparison. The empirical results show that the multilevel feature selection technique finds ‘optimal’ subsets that enable more accurate and stable classification than those selected using the other feature selection techniques. Also, a similar comparison of the new technique with a genetic algorithm feature selection technique that selects highly discriminatory regions of consecutive features shows that the multilevel technique finds subsets that enable more stable classification. multilevel search multilevel feature selection multilevel paradigm feature selection problem
25	Article identification for inventory list in a warehouse environment Gao, Yang January 2014 (has links) In this paper, an object recognition system has been developed that uses local image features. In the system, multiple classes of objects can be recognized in an image. This system is basically divided into two parts: object detection and object identification. Object detection is based on SIFT features, which are invariant to image illumination, scaling and rotation. SIFT features extracted from a test image are used to perform a reliable matching between a database of SIFT features from known object images. Method of DBSCAN clustering is used for multiple object detection. RANSAC method is used for decreasing the amount of false detection. Object identification is based on 'Bag-of-Words' model. The 'BoW' model is a method based on vector quantization of SIFT descriptors of image patches. In this model, K-means clustering and Support Vector Machine (SVM) classification method are applied. Object recognition SIFT feature Feature matching DBSCAN RANSAC Bag of Words
26	A multilevel search algorithm for feature selection in biomedical data Oduntan, Idowu Olayinka 10 April 2006 (has links) The automated analysis of patients’ biomedical data can be used to derive diagnostic and prognostic inferences about the observed patients. Many noninvasive techniques for acquiring biomedical samples generate data that are characterized by a large number of distinct attributes (i.e. features) and a small number of observed patients (i.e. samples). Deriving reliable inferences, such as classifying a given patient as either cancerous or non-cancerous, using these biomedical data requires that the ratio r of the number of samples to the number of features be within the range 5 < r < 10. To satisfy this requirement, the original set of features in the biomedical datasets can be reduced to an ‘optimal’ subset of features that most discriminates the observed patients. Feature selection techniques strategically seek the ‘optimal’ subset. In this thesis, I present a new feature selection technique - multilevel feature selection. The technique seeks the ‘optimal’ feature subset in biomedical datasets using a multilevel search algorithm. This algorithm combines a hierarchical search framework with a search method. The framework, which provides the capability to easily adapt the technique to different forms of biomedical datasets, consists of increasingly coarse forms of the original feature set that are strategically and progressively explored by the search method. Tabu search (a search meta-heuristics) is the search method used in the multilevel feature selection technique. I evaluate the performance of the new technique, in terms of the solution quality, using experiments that compare the classification inferences derived from the result of the technique with those derived from the result of other feature selection techniques such as the basic tabu-search-based feature selection, sequential forward selection, and random feature selection. In the experiments, the same biomedical dataset is used and equivalent amount of computational resource is allocated to the evaluated techniques to provide a common basis for comparison. The empirical results show that the multilevel feature selection technique finds ‘optimal’ subsets that enable more accurate and stable classification than those selected using the other feature selection techniques. Also, a similar comparison of the new technique with a genetic algorithm feature selection technique that selects highly discriminatory regions of consecutive features shows that the multilevel technique finds subsets that enable more stable classification. multilevel search multilevel feature selection multilevel paradigm feature selection problem
27	Localized Feature Selection for Classification Armanfard, Narges January 2017 (has links) The main idea of this thesis is to present the novel concept of localized feature selection (LFS) for data classification and its application for coma outcome prediction. Typical feature selection methods choose an optimal global feature subset that is applied over all regions of the sample space. In contrast, in this study we propose a novel localized feature selection approach whereby each region of the sample space is associated with its own distinct optimized feature set, which may vary both in membership and size across the sample space. This allows the feature set to optimally adapt to local variations in the sample space. An associated localized classification method is also proposed. The proposed LFS method selects a feature subset such that, within a localized region, within-class and between-class distances are respectively minimized and maximized. We first determine the localized region using an iterative procedure based on the distances in the original feature space. This results in a linear programming optimization problem. Then, the second method is formulated as a non-linear joint convex/increasing quasi-convex optimization problem where a logistic function is applied to focus the optimization process on the localized region within the unknown co-ordinate system. This results in a more accurate classification performance at the expense of some sacrifice in computational time. Experimental results on synthetic and real-world data sets demonstrate the effectiveness of the proposed localized approach. Using the LFS idea, we propose a practical machine learning approach for automatic and continuous assessment of event related potentials for detecting the presence of the mismatch negativity component, whose existence has a high correlation with coma awakening. This process enables us to determine prognosis of a coma patient. Experimental results on normal and comatose subjects demonstrate the effectiveness of the proposed method. / Dissertation / Doctor of Philosophy (PhD) / This study proposes a novel form of pattern classification method, which is formulated in a way so that it is easily executable on a computer. Two different versions of the method are developed. These are the LFS (localized feature selection) and lLFS (logistic LFS) methods. Both versions are appropriate for analysis of data with complex distributions, such as datasets that occur in biological signal processing problems. We have shown that the performance of the proposed methods is significantly improved over that of previous methods, on the datasets that were considered in this thesis. The proposed method is applied to the specific problem of determining the prognosis of a coma patient. The viability of the formulation and the effectiveness of the proposed algorithm are demonstrated on several synthetic and real world datasets, including comatose subjects. Local Feature Selection Data Classification Coma Outcome Prediction Feature Selection
28	EVALUATION OF FLATNESS TOLERANCE AND DATUMS IN COMPUTATIONAL METROLOGY CHEPURI, SHAMBAIAH January 2000 (has links) No description available. Engineering, Industrial convex hull flat datum feature round datum feature
29	Implementation of a 3D Imaging Sensor Aided Inertial Measurement Unit Navigation System Venable, Donald T. 03 October 2008 (has links) No description available. Electrical Engineering Flash Lidar histogramming navigation feature extraction feature association
30	Feature Extraction and Feasibility Study on CT Image Guided Colonoscopy Shen, Yuan 14 May 2010 (has links) Computed tomographic colonography(CTC), also called virtual colonoscopy, uses CT scanning and computer post-processing to create two dimensional images and three dimensional virtual views inside of the colon. Computer-aided polyp detection(CAPD) automatically detects colonic polyps and presents them to the user in either a first or second reader paradigm, with a goal reducing examination time while increasing the detection sensitivity. During colonoscopy, the endoscopists use the colonoscope inside of a patient's colon to target potential polyps and validate CAPD found ones. However, there is no direct information linking between CT images and the real-time optical colonoscopy(OC) video provided during the operation, thus endoscopists need to rely largely on their past experience to locate and remove polyps. The goal of this research project is to study the feasibility of developing an image guided colonoscopy(IGC) system that combines CTC images, real-time colonoscope position measurements, and video stream to validate and guide the removal of polyps found in CAPD. System would ease polyp level validation of CTC and improve the accuracy and efficiency of guiding the endoscopist to the target polyps. In this research project, a centerline based matching algorithm has been designed to estimate, in real time, the relative location of the colonoscope in the virtual colonoscopy environment. Furthermore, the feasibility of applying online simultaneous localization and mapping(SLAM) into CT image guided colonoscopy has been evaluated to further improve the performance of localizing and removing the pre-defined target polyps. A colon phantom is used to provide a testing setup to assess the performance of the proposed algorithms. / Master of Science Feature Extraction Image Guided Colonoscopy SLAM Video Feature Tracking

Search results