Global ETD Search

61	Robust Margin Based Classifiers For Small Sample Data January 2011 (has links) abstract: In many classication problems data samples cannot be collected easily, example in drug trials, biological experiments and study on cancer patients. In many situations the data set size is small and there are many outliers. When classifying such data, example cancer vs normal patients the consequences of mis-classication are probably more important than any other data type, because the data point could be a cancer patient or the classication decision could help determine what gene might be over expressed and perhaps a cause of cancer. These mis-classications are typically higher in the presence of outlier data points. The aim of this thesis is to develop a maximum margin classier that is suited to address the lack of robustness of discriminant based classiers (like the Support Vector Machine (SVM)) to noise and outliers. The underlying notion is to adopt and develop a natural loss function that is more robust to outliers and more representative of the true loss function of the data. It is demonstrated experimentally that SVM's are indeed susceptible to outliers and that the new classier developed, here coined as Robust-SVM (RSVM), is superior to all studied classier on the synthetic datasets. It is superior to the SVM in both the synthetic and experimental data from biomedical studies and is competent to a classier derived on similar lines when real life data examples are considered. / Dissertation/Thesis / Source Code for RSVM(MATLAB) / Presentation on RSVM / M.S. Computer Science 2011 Computer Science Statistics Bioinformatics Classifier Overfitting RSVM Small Sample SVM
62	Hyperspectral Imaging for Nondestructive Measurement of Food Quality Nanyam, Yasasvy 01 December 2010 (has links) This thesis focuses on developing a nondestructive strategy for measuring the quality of food using hyperspectral imaging. The specific focus is to develop a classification methodology for detecting bruised/unbruised areas in hyperspectral images of fruits such as strawberries through the classification of pixels containing the edible portion of the fruit. A multiband segmentation algorithm is formulated to generate a mask for extracting the edible pixels from each band in a hypercube. A key feature of the segmentation algorithm is that it makes no prior assumptions for selecting the bands involved in the segmentation. Consequently, different bands may be selected for different hypercubes to accommodate the intra-hypercube variations. Gaussian univariate classifiers are implemented to classify the bruised-unbruised pixels in each band and it is shown that many band classifiers yield 100% classification accuracies. Furthermore, it is shown that the bands that contain the most useful discriminatory information for classifying bruised-unbruised pixels can be identified from the classification results. The strategy developed in this study will facilitate the design of fruit sorting systems using NIR cameras with selected bands. hyperspectral imaging near infrared pattern recognition univariate classifier
63	Investigação sobre o efeito de ruído na generalização de redes neurais sem peso em problemas de classificação binária Ferreira de Oliveira Neto, Rosalvo 31 January 2008 (has links) Made available in DSpace on 2014-06-12T15:54:40Z (GMT). No. of bitstreams: 2 arquivo1989_1.pdf: 870524 bytes, checksum: 413a85636fb1d4ac740960f22b3960f6 (MD5) license.txt: 1748 bytes, checksum: 8a4605be74aa9ea9d79846c1fba20a33 (MD5) Previous issue date: 2008 / Redes neurais com neurônios baseados em memória RAM (random access memory) são caracterizadas por serem implementáveis em hardware e por serem uma opção atraente na solução de problemas definidos em espaço de entradas binárias. No entanto, para problemas definidos no espaço de entradas reais, existe uma tarefa difícil que é encontrar a representação adequada desses valores, sem perder o poder de generalização em tarefas de classificação de padrões. Este trabalho investiga a utilização de ruído gaussiano aditivo nas variáveis de entradas continuas para aumentar o poder de generalização da rede. Dessa forma uma maior quantidade de posições de memória pode ser treinada , formando uma região de vizinhança comum para padrões semelhantes, conhecida como bacia de atração. Foram realizadas análises da influência da adição de ruído durante o treinamento do n-tuple classifier, que é um tipo de rede booleana, onde se pôde comprovar que o treinamento com ruído aumenta o poder de generalização da rede. O desempenho do modelo investigado foi comparado com resultados obtidos pela rede neural Multi Layer Perceptron (MLP). Para o estudo foram selecionadas quatro bases de dados públicas, três de um conhecido benchmark da área e outra de recente competição internacional. Resultados experimentais mostram que o modelo investigado obtém desempenho equivalente ao da rede neural MLP para os problemas utilizados Redes Neurais Redes Booleana n-tuple classifier Treinamento com ruído
64	Evaluation of classifier performance and the impact of learning algorithm parameters Lavesson, Niklas January 2003 (has links) Much research has been done in the fields of classifier performance evaluation and optimization. This work summarizes this research and tries to answer the question if algorithm parameter tuning has more impact on performance than the choice of algorithm. An alternative way of evaluation; a measure function is also demonstrated. This type of evaluation is compared with one of the most accepted methods; the cross-validation test. Experiments, described in this work, show that parameter tuning often has more impact on performance than the actual choice of algorithm and that the measure function could be a complement or an alternative to the standard cross-validation tests. classifier performance evaluation optimization Computer Sciences Datavetenskap (datalogi)
65	k-Nearest Neighbour Classification of Datasets with a Family of Distances Hatko, Stan January 2015 (has links) The k-nearest neighbour (k-NN) classifier is one of the oldest and most important supervised learning algorithms for classifying datasets. Traditionally the Euclidean norm is used as the distance for the k-NN classifier. In this thesis we investigate the use of alternative distances for the k-NN classifier. We start by introducing some background notions in statistical machine learning. We define the k-NN classifier and discuss Stone's theorem and the proof that k-NN is universally consistent on the normed space R^d. We then prove that k-NN is universally consistent if we take a sequence of random norms (that are independent of the sample and the query) from a family of norms that satisfies a particular boundedness condition. We extend this result by replacing norms with distances based on uniformly locally Lipschitz functions that satisfy certain conditions. We discuss the limitations of Stone's lemma and Stone's theorem, particularly with respect to quasinorms and adaptively choosing a distance for k-NN based on the labelled sample. We show the universal consistency of a two stage k-NN type classifier where we select the distance adaptively based on a split labelled sample and the query. We conclude by giving some examples of improvements of the accuracy of classifying various datasets using the above techniques. Machine Learning k-Nearest Neighbour Classifier Universal Consistency Data Science
66	A probabilistic perspective on ensemble diversity Zanda, Manuela January 2010 (has links) We study diversity in classifier ensembles from a broader perspectivethan the 0/1 loss function, the main reason being that the bias-variance decomposition of the 0/1 loss function is not unique, and therefore the relationship between ensemble accuracy and diversity is still unclear. In the parallel field of regression ensembles, where the loss function of interest is the mean squared error, this decomposition not only exists, but it has been shown that diversity can be managed via the Negative Correlation (NC) framework. In the field of probabilistic modelling the expected value of the negative log-likelihood loss function is given by its conditional entropy; this result suggests that interaction information might provide some insight into the trade off between accuracy and diversity. Our objective is to improve our understanding of classifier diversity by focusing on two different loss functions - the mean squared error and the negative log-likelihood. In a study of mean squared error functions, we reformulate the Tumer & Ghosh model for the classification error as a regression problem, and we show how the NC learning framework can be deployed to manage diversity in classification problems. In an empirical study of classifiers that minimise the negative log-likelihood loss function, we discuss model diversity as opposed to error diversity in ensembles of Naive Bayes classifiers. We observe that diversity in low-variance classifiers has to be structurally inferred. We apply interaction information to the problem of monitoring diversity in classifier ensembles. We present empirical evidence that interaction information can capture the trade-off between accuracy and diversity, and that diversity occurs at different levels of interactions between base classifiers. We use interaction information properties to build ensembles of structurally diverse averaged Augmented Naive Bayes classifiers. Our empirical study shows that this novel ensemble approach is computationally more efficient than an accuracy based approach and at the same time it does not negatively affect the ensemble classification performance. 006.3
67	Fall Detection Using Still Images in Hybrid Classifier Kandavel, Srianuradha January 2021 (has links) No description available. Computer Science Fall detection Hybrid classifier Machine Learning Image Classification
68	Integrated Assembly and Annotation of Fathead Minnow Genome Towards Prediction of Environmentarl Exposures Martinson, John W. 16 June 2020 (has links) No description available. Biology genome annotation fathead minnow RNA-seq classifier
69	Using Freebase, An Automatically Generated Dictionary, And A Classifier To Identify A Person's Profession In Tweets Hall, Abraham 01 January 2013 (has links) Algorithms for classifying pre-tagged person entities in tweets into one of eight profession categories are presented. A classifier using a semi-supervised learning algorithm that takes into consideration the local context surrounding the entity in the tweet, hash tag information, and topic signature scores is described. In addition to the classifier, this research investigates two dictionaries containing the professions of persons. These two dictionaries are used in their own classification algorithms which are independent of the classifier. The method for creating the first dictionary dynamically from the web and the algorithm that accesses this dictionary to classify a person into one of the eight profession categories are explained next. The second dictionary is freebase, an openly available online database that is maintained by its online community. The algorithm that uses freebase for classifying a person into one of the eight professions is described. The results also show that classifications made using the automated constructed dictionary, freebase, or the classifier are all moderately successful. The results also show that classifications made with the automated constructed person dictionary are slightly more accurate than classifications made using freebase. Various hybrid methods, combining the classifier and the two dictionaries are also explained. The results of those hybrid methods show significant improvement over any of the individual methods. Twitter named entity recognition classifier freebase Computer Sciences Engineering
70	A Hybrid Cost Model for Evaluating Query Execution Plans Wang, Ning 22 January 2024 (has links) Query optimization aims to select a query execution plan among all query paths for a given query. The query optimization of traditional relational database management systems (RDBMSs) relies on estimating the cost of the alternative query plans in the query plan search space provided by a cost model. The classic cost model (CCM) may lead the optimizer to choose query plans with poor execution time due to inaccurate cardinality estimations and simplifying assumptions. A learned cost model (LCM) based on machine learning does not rely on such estimations and learns the cost from runtime. While learned cost models are shown to improve the average performance, they may not guarantee that optimal performance will be consistently achieved. In addition, the query plans generated using the LCM may not necessarily outperform the query plans generated with the CCM. This thesis proposes a hybrid approach to solve this problem by striking a balance between the LCM and the CCM. The hybrid model uses the LCM when it is expected to be reliable in selecting a good plan and falls back to the CCM otherwise. The evaluation results of the hybrid model demonstrate promising performance, indicating potential for successful use in future applications. Query optimization Hybrid cost model Learned cost model Query classifier

Search results