61 |
Robust Margin Based Classifiers For Small Sample DataJanuary 2011 (has links)
abstract: In many classication problems data samples cannot be collected easily, example in drug trials, biological experiments and study on cancer patients. In many situations the data set size is small and there are many outliers. When classifying such data, example cancer vs normal patients the consequences of mis-classication are probably more important than any other data type, because the data point could be a cancer patient or the classication decision could help determine what gene might be over expressed and perhaps a cause of cancer. These mis-classications are typically higher in the presence of outlier data points. The aim of this thesis is to develop a maximum margin classier that is suited to address the lack of robustness of discriminant based classiers (like the Support Vector Machine (SVM)) to noise and outliers. The underlying notion is to adopt and develop a natural loss function that is more robust to outliers and more representative of the true loss function of the data. It is demonstrated experimentally that SVM's are indeed susceptible to outliers and that the new classier developed, here coined as Robust-SVM (RSVM), is superior to all studied classier on the synthetic datasets. It is superior to the SVM in both the synthetic and experimental data from biomedical studies and is competent to a classier derived on similar lines when real life data examples are considered. / Dissertation/Thesis / Source Code for RSVM(MATLAB) / Presentation on RSVM / M.S. Computer Science 2011
|
62 |
Hyperspectral Imaging for Nondestructive Measurement of Food QualityNanyam, Yasasvy 01 December 2010 (has links)
This thesis focuses on developing a nondestructive strategy for measuring the quality of food using hyperspectral imaging. The specific focus is to develop a classification methodology for detecting bruised/unbruised areas in hyperspectral images of fruits such as strawberries through the classification of pixels containing the edible portion of the fruit. A multiband segmentation algorithm is formulated to generate a mask for extracting the edible pixels from each band in a hypercube. A key feature of the segmentation algorithm is that it makes no prior assumptions for selecting the bands involved in the segmentation. Consequently, different bands may be selected for different hypercubes to accommodate the intra-hypercube variations. Gaussian univariate classifiers are implemented to classify the bruised-unbruised pixels in each band and it is shown that many band classifiers yield 100% classification accuracies. Furthermore, it is shown that the bands that contain the most useful discriminatory information for classifying bruised-unbruised pixels can be identified from the classification results. The strategy developed in this study will facilitate the design of fruit sorting systems using NIR cameras with selected bands.
|
63 |
Investigação sobre o efeito de ruído na generalização de redes neurais sem peso em problemas de classificação bináriaFerreira de Oliveira Neto, Rosalvo 31 January 2008 (has links)
Made available in DSpace on 2014-06-12T15:54:40Z (GMT). No. of bitstreams: 2
arquivo1989_1.pdf: 870524 bytes, checksum: 413a85636fb1d4ac740960f22b3960f6 (MD5)
license.txt: 1748 bytes, checksum: 8a4605be74aa9ea9d79846c1fba20a33 (MD5)
Previous issue date: 2008 / Redes neurais com neurônios baseados em memória RAM (random access memory) são
caracterizadas por serem implementáveis em hardware e por serem uma opção atraente na
solução de problemas definidos em espaço de entradas binárias. No entanto, para problemas
definidos no espaço de entradas reais, existe uma tarefa difícil que é encontrar a representação
adequada desses valores, sem perder o poder de generalização em tarefas de classificação de
padrões.
Este trabalho investiga a utilização de ruído gaussiano aditivo nas variáveis de
entradas continuas para aumentar o poder de generalização da rede. Dessa forma uma maior
quantidade de posições de memória pode ser treinada , formando uma região de vizinhança
comum para padrões semelhantes, conhecida como bacia de atração.
Foram realizadas análises da influência da adição de ruído durante o treinamento do
n-tuple classifier, que é um tipo de rede booleana, onde se pôde comprovar que o treinamento
com ruído aumenta o poder de generalização da rede. O desempenho do modelo investigado
foi comparado com resultados obtidos pela rede neural Multi Layer Perceptron (MLP).
Para o estudo foram selecionadas quatro bases de dados públicas, três de um
conhecido benchmark da área e outra de recente competição internacional. Resultados
experimentais mostram que o modelo investigado obtém desempenho equivalente ao da rede
neural MLP para os problemas utilizados
|
64 |
Evaluation of classifier performance and the impact of learning algorithm parametersLavesson, Niklas January 2003 (has links)
Much research has been done in the fields of classifier performance evaluation and optimization. This work summarizes this research and tries to answer the question if algorithm parameter tuning has more impact on performance than the choice of algorithm. An alternative way of evaluation; a measure function is also demonstrated. This type of evaluation is compared with one of the most accepted methods; the cross-validation test. Experiments, described in this work, show that parameter tuning often has more impact on performance than the actual choice of algorithm and that the measure function could be a complement or an alternative to the standard cross-validation tests.
|
65 |
k-Nearest Neighbour Classification of Datasets with a Family of DistancesHatko, Stan January 2015 (has links)
The k-nearest neighbour (k-NN) classifier is one of the oldest and most important supervised learning algorithms for classifying datasets. Traditionally the Euclidean norm is used as the distance for the k-NN classifier. In this thesis we investigate the use of alternative distances for the k-NN classifier.
We start by introducing some background notions in statistical machine learning. We define the k-NN classifier and discuss Stone's theorem and the proof that k-NN is universally consistent on the normed space R^d. We then prove that k-NN is universally consistent if we take a sequence of random norms (that are independent of the sample and the query) from a family of norms that satisfies a particular boundedness condition. We extend this result by replacing norms with distances based on uniformly locally Lipschitz functions that satisfy certain conditions. We discuss the limitations of Stone's lemma and Stone's theorem, particularly with respect to quasinorms and adaptively choosing a distance for k-NN based on the labelled sample. We show the universal consistency of a two stage k-NN type classifier where we select the distance adaptively based on a split labelled sample and the query. We conclude by giving some examples of improvements of the accuracy of classifying various datasets using the above techniques.
|
66 |
A probabilistic perspective on ensemble diversityZanda, Manuela January 2010 (has links)
We study diversity in classifier ensembles from a broader perspectivethan the 0/1 loss function, the main reason being that the bias-variance decomposition of the 0/1 loss function is not unique, and therefore the relationship between ensemble accuracy and diversity is still unclear. In the parallel field of regression ensembles, where the loss function of interest is the mean squared error, this decomposition not only exists, but it has been shown that diversity can be managed via the Negative Correlation (NC) framework. In the field of probabilistic modelling the expected value of the negative log-likelihood loss function is given by its conditional entropy; this result suggests that interaction information might provide some insight into the trade off between accuracy and diversity. Our objective is to improve our understanding of classifier diversity by focusing on two different loss functions - the mean squared error and the negative log-likelihood. In a study of mean squared error functions, we reformulate the Tumer & Ghosh model for the classification error as a regression problem, and we show how the NC learning framework can be deployed to manage diversity in classification problems. In an empirical study of classifiers that minimise the negative log-likelihood loss function, we discuss model diversity as opposed to error diversity in ensembles of Naive Bayes classifiers. We observe that diversity in low-variance classifiers has to be structurally inferred. We apply interaction information to the problem of monitoring diversity in classifier ensembles. We present empirical evidence that interaction information can capture the trade-off between accuracy and diversity, and that diversity occurs at different levels of interactions between base classifiers. We use interaction information properties to build ensembles of structurally diverse averaged Augmented Naive Bayes classifiers. Our empirical study shows that this novel ensemble approach is computationally more efficient than an accuracy based approach and at the same time it does not negatively affect the ensemble classification performance.
|
67 |
Fall Detection Using Still Images in Hybrid ClassifierKandavel, Srianuradha January 2021 (has links)
No description available.
|
68 |
Integrated Assembly and Annotation of Fathead Minnow Genome Towards Prediction of Environmentarl ExposuresMartinson, John W. 16 June 2020 (has links)
No description available.
|
69 |
Using Freebase, An Automatically Generated Dictionary, And A Classifier To Identify A Person's Profession In TweetsHall, Abraham 01 January 2013 (has links)
Algorithms for classifying pre-tagged person entities in tweets into one of eight profession categories are presented. A classifier using a semi-supervised learning algorithm that takes into consideration the local context surrounding the entity in the tweet, hash tag information, and topic signature scores is described. In addition to the classifier, this research investigates two dictionaries containing the professions of persons. These two dictionaries are used in their own classification algorithms which are independent of the classifier. The method for creating the first dictionary dynamically from the web and the algorithm that accesses this dictionary to classify a person into one of the eight profession categories are explained next. The second dictionary is freebase, an openly available online database that is maintained by its online community. The algorithm that uses freebase for classifying a person into one of the eight professions is described. The results also show that classifications made using the automated constructed dictionary, freebase, or the classifier are all moderately successful. The results also show that classifications made with the automated constructed person dictionary are slightly more accurate than classifications made using freebase. Various hybrid methods, combining the classifier and the two dictionaries are also explained. The results of those hybrid methods show significant improvement over any of the individual methods.
|
70 |
A Hybrid Cost Model for Evaluating Query Execution PlansWang, Ning 22 January 2024 (has links)
Query optimization aims to select a query execution plan among all query paths for a given query. The query optimization of traditional relational database management systems (RDBMSs) relies on estimating the cost of the alternative query plans in the query plan search space provided by a cost model. The classic cost model (CCM) may lead the optimizer to choose query plans with poor execution time due to inaccurate cardinality estimations and simplifying assumptions. A learned cost model (LCM) based on machine learning does not rely on such estimations and learns the cost from runtime. While learned cost models are shown to improve the average performance, they may not guarantee that optimal performance will be consistently achieved. In addition, the query plans generated using the LCM may not necessarily outperform the query plans generated with the CCM. This thesis proposes a hybrid approach to solve this problem by striking a balance between the LCM and the CCM. The hybrid model uses the LCM when it is expected to be reliable in selecting a good plan and falls back to the CCM otherwise. The evaluation results of the hybrid model demonstrate promising performance, indicating potential for successful use in future applications.
|
Page generated in 0.0285 seconds