Global ETD Search

1	Relationship between classifier performance and distributional complexity for small samples Attoor, Sanju Nair 15 November 2004 (has links) Given a limited number of samples for classification, several issues arise with respect to design, performance and analysis of classifiers. This is especially so in the case of microarray-based classification. In this paper, we use a complexity measure based mixture model to study classifier performance for small sample problems. The motivation behind such a study is to determine the conditions under which a certain class of classifiers is suitable for classification, subject to the constraint of a limited number of samples being available. Classifier study in terms of the VC dimension of a learning machine is also discussed. classifier small sample distributional complexity VC dimension
2	Relationship between classifier performance and distributional complexity for small samples Attoor, Sanju Nair 15 November 2004 (has links) Given a limited number of samples for classification, several issues arise with respect to design, performance and analysis of classifiers. This is especially so in the case of microarray-based classification. In this paper, we use a complexity measure based mixture model to study classifier performance for small sample problems. The motivation behind such a study is to determine the conditions under which a certain class of classifiers is suitable for classification, subject to the constraint of a limited number of samples being available. Classifier study in terms of the VC dimension of a learning machine is also discussed. classifier small sample distributional complexity VC dimension
3	Small sample feature selection Sima, Chao 17 September 2007 (has links) High-throughput technologies for rapid measurement of vast numbers of biolog- ical variables offer the potential for highly discriminatory diagnosis and prognosis; however, high dimensionality together with small samples creates the need for fea- ture selection, while at the same time making feature-selection algorithms less reliable. Feature selection is required to avoid overfitting, and the combinatorial nature of the problem demands a suboptimal feature-selection algorithm. In this dissertation, we have found that feature selection is problematic in small- sample settings via three different approaches. First we examined the feature-ranking performance of several kinds of error estimators for different classification rules, by considering all feature subsets and using 2 measures of performance. The results show that their ranking is strongly affected by inaccurate error estimation. Secondly, since enumerating all feature subsets is computationally impossible in practice, a suboptimal feature-selection algorithm is often employed to find from a large set of potential features a small subset with which to classify the samples. If error estimation is required for a feature-selection algorithm, then the impact of error estimation can be greater than the choice of algorithm. Lastly, we took a regression approach by comparing the classification errors for the optimal feature sets and the errors for the feature sets found by feature-selection algorithms. Our study shows that it is unlikely that feature selection will yield a feature set whose error is close to that of the optimal feature set, and the inability to find a good feature set should not lead to the conclusion that good feature sets do not exist. feature selection classification microarray small sample
4	Comparison of Denominator Degrees of Freedom Approximations for Linear Mixed Models in Small-Sample Simulations January 2020 (has links) abstract: Whilst linear mixed models offer a flexible approach to handle data with multiple sources of random variability, the related hypothesis testing for the fixed effects often encounters obstacles when the sample size is small and the underlying distribution for the test statistic is unknown. Consequently, five methods of denominator degrees of freedom approximations (residual, containment, between-within, Satterthwaite, Kenward-Roger) are developed to overcome this problem. This study aims to evaluate the performance of these five methods with a mixed model consisting of random intercept and random slope. Specifically, simulations are conducted to provide insights on the F-statistics, denominator degrees of freedom and p-values each method gives with respect to different settings of the sample structure, the fixed-effect slopes and the missing-data proportion. The simulation results show that the residual method performs the worst in terms of F-statistics and p-values. Also, Satterthwaite and Kenward-Roger methods tend to be more sensitive to the change of designs. The Kenward-Roger method performs the best in terms of F-statistics when the null hypothesis is true. / Dissertation/Thesis / Masters Thesis Statistics 2020 Statistics Degrees of freedom Mixed model Small sample
5	Color Image Based Face Recognition Ganapathi, Tejaswini 24 February 2009 (has links) Traditional appearance based face recognition (FR) systems use gray scale images, however recently attention has been drawn to the use of color images. Color inputs have a larger dimensionality, which increases the computational cost, and makes the small sample size (SSS) problem in supervised FR systems more challenging. It is therefore important to determine the scenarios in which usage of color information helps the FR system. In this thesis, it was found that inclusion of chromatic information in FR systems is shown to be particularly advantageous in poor illumination conditions. In supervised systems, a color input of optimal dimensionality would improve the FR performance under SSS conditions. A fusion of decisions from individual spectral planes also helps in the SSS scenario. Finally, chromatic information is integrated into a supervised ensemble learner to address pose and illumination variations. This framework significantly boosts FR performance under a range of learning scenarios. Color Face Recognition Biometrics Small Sample Size Problem 0544
6	A New Reclassification Method for Highly Uncertain Microarray Data in Allergy Gene Prediction Paul, Jasmin 11 April 2012 (has links) The analysis of microarray data is a challenging task because of the large dimensionality and small sample size involved. Although a few methods are available to address the problem of small sample size, they are not sufficiently successful in dealing with microarray data from extremely small (~<20) sample sizes. We propose a method to incorporate information from diverse sources to analyze the microarray data so as to improve the predictability of significant genes. A transformed data set, including statistical parameters, literature mining and gene ontology data, is evaluated. We performed classification experiments to identify potential allergy-related genes. Feature selection is used to identify the effect of features on classifier behaviour. An exploratory and domain knowledge analysis was performed on noisy real-life allergy data, and a subset of genes was selected as positive and negative class. A new set of transformed variables, depending on the mean and standard deviation statistics of the data distribution and other data sources, was identified. Significant allergy- and immune-related genes from the microarray data were selected. Experiments showed that classification predictability of significant genes can be improved. Important features from the transformed variable set were also identified.
7	Color Image Based Face Recognition Ganapathi, Tejaswini 24 February 2009 (has links) Traditional appearance based face recognition (FR) systems use gray scale images, however recently attention has been drawn to the use of color images. Color inputs have a larger dimensionality, which increases the computational cost, and makes the small sample size (SSS) problem in supervised FR systems more challenging. It is therefore important to determine the scenarios in which usage of color information helps the FR system. In this thesis, it was found that inclusion of chromatic information in FR systems is shown to be particularly advantageous in poor illumination conditions. In supervised systems, a color input of optimal dimensionality would improve the FR performance under SSS conditions. A fusion of decisions from individual spectral planes also helps in the SSS scenario. Finally, chromatic information is integrated into a supervised ensemble learner to address pose and illumination variations. This framework significantly boosts FR performance under a range of learning scenarios. Color Face Recognition Biometrics Small Sample Size Problem 0544
8	Robust Margin Based Classifiers For Small Sample Data January 2011 (has links) abstract: In many classication problems data samples cannot be collected easily, example in drug trials, biological experiments and study on cancer patients. In many situations the data set size is small and there are many outliers. When classifying such data, example cancer vs normal patients the consequences of mis-classication are probably more important than any other data type, because the data point could be a cancer patient or the classication decision could help determine what gene might be over expressed and perhaps a cause of cancer. These mis-classications are typically higher in the presence of outlier data points. The aim of this thesis is to develop a maximum margin classier that is suited to address the lack of robustness of discriminant based classiers (like the Support Vector Machine (SVM)) to noise and outliers. The underlying notion is to adopt and develop a natural loss function that is more robust to outliers and more representative of the true loss function of the data. It is demonstrated experimentally that SVM's are indeed susceptible to outliers and that the new classier developed, here coined as Robust-SVM (RSVM), is superior to all studied classier on the synthetic datasets. It is superior to the SVM in both the synthetic and experimental data from biomedical studies and is competent to a classier derived on similar lines when real life data examples are considered. / Dissertation/Thesis / Source Code for RSVM(MATLAB) / Presentation on RSVM / M.S. Computer Science 2011 Computer Science Statistics Bioinformatics Classifier Overfitting RSVM Small Sample SVM
9	Inferring condition specific regulatory networks with small sample sizes : a case study in Bacillus subtilis and infection of Mus musculus by the parasite Toxoplasma gondii Pacini, Clare January 2017 (has links) Modelling interactions between genes and their regulators is fundamental to understanding how, for example a disease progresses, or the impact of inserting a synthetic circuit into a cell. We use an existing method to infer regulatory networks under multiple conditions: the Joint Graphical Lasso (JGL), a shrinkage based Gaussian graphical model. We apply this method to two data sets: one, a publicly available set of microarray experiments perturbing the gram-positive bacteria Bacillus subtilis under multiple experimental conditions; the second, a set of RNA-seq samples of Mouse (Mus musculus) embryonic fibroblasts (MEFs) infected with different strains of the parasite Toxoplasma gondii. In both cases we infer a subset of the regulatory networks using relatively small sample sizes. For the Bacillus subtilis analysis we focused on the use of these regulatory networks in synthetic biology and found examples of transcriptional units active only under a subset of conditions, this information can be useful when designing circuits to have condition dependent behaviour. We developed methods for large network decomposition that made use of the condition information and showed a greater specificity of identifying single transcriptional units from the larger network using our method. Through annotating these results with known information we were able to identify novel connections and found supporting evidence for a selection of these from publicly available experimental results. Biological data collection is typically expensive and due to the relatively small sample sizes of our MEF data set we developed a novel empirical Bayes method for reducing the false discovery rate when estimating block diagonal covariance matrices. Using these methods we were able to infer regulatory networks for the host infected with either the ME49 or RH strain of the parasite. This enabled the identification of known and novel regulatory mechanisms. The Toxoplasma gondii parasite has shown to subvert host function using similar mechanisms as cancers and through our analysis we were able to identify genes, networks and ontologies associated with cancer, including connections that have not previously been associated with T. gondii infection. Finally a Shiny application was developed as an online resource giving access to the Bacillus subtilis inferred networks with interactive methods for exploring the networks including expansion of sub networks and large network decomposition.
10	Design, Fabrication, and Verification of a Miniature Load Frame Howard, Andrew Martin 05 May 2007 (has links) This thesis documents the tasks in support of the design and instrumentation of a miniature tensile load frame. small sample testing tensile testing load frame design

Search results