• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 1
  • 1
  • Tagged with
  • 4
  • 4
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

A Common Misconception in Multi-Label Learning

Brodie, Michael Benjamin 01 November 2016 (has links)
The majority of current multi-label classification research focuses on learning dependency structures among output labels. This paper provides a novel theoretical view on the purported assumption that effective multi-label classification models must exploit output dependencies. We submit that the flurry of recent dependency-exploiting, multi-label algorithms may stem from the deficiencies in existing datasets, rather than an inherent need to better model dependencies. We introduce a novel categorization of multi-label metrics, namely, evenly and unevenly weighted label metrics. We explore specific features that predispose datasets to improved classification by methods that model label dependence. Additionally, we provide an empirical analysis of 15 benchmark datasets, 1 real-life dataset, and a variety of synthetic datasets. We assert that binary relevance (BR) yields similar, if not better, results than dependency-exploiting models for metrics with evenly weighted label contributions. We qualify this claim with discussions on specific characteristics of datasets and models that render negligible the differences between BR and dependency-learning models.
2

Improved shrunken centroid method for better variable selection in cancer classification with high throughput molecular data

Xukun, Li January 1900 (has links)
Master of Science / Department of Statistics / Haiyan Wang / Cancer type classification with high throughput molecular data has received much attention. Many methods have been published in this area. One of them is called PAM (nearest centroid shrunken algorithm), which is simple and efficient. It can give very good prediction accuracy. A problem with PAM is that this method selects too many genes, some of which may have no influence on cancer type. A reason for this phenomenon is that PAM assumes that all genes have identical distribution and give a common threshold parameter for genes selection. This may not hold in reality since expressions from different genes could have very different distributions due to complicated biological process. We propose a new method aimed to improve the ability of PAM to select informative genes. Keeping informative genes while reducing false positive variables can lead to more accurate classification result and help to pinpoint target genes for further studies. To achieve this goal, we introduce variable specific test based on Edgeworth expansion to select informative genes. We apply this test on each gene and select some genes based on the result of the test so that a large number of genes will be excluded. Afterward, soft thresholding with cross-validation can be further applied to decide a common threshold value. Simulation and real application show that our method can reduce the irrelevant information and select the informative genes more precisely. The simulation results give us more insight about where the newly proposed procedure could improve the accuracy, especially when the data set is skewed or unbalanced. The method can be applied to broad molecular data, including, for example, lipidomic data from mass spectrum, copy number data from genomics, eQLT analysis with GWAS data, etc. We expect the proposed method will help life scientists to accelerate discoveries with highthroughput data.
3

An Improved Classifier Chain Ensemble for Multi-DimensionalClassification with Conditional Dependence

Heydorn, Joseph Ethan 01 July 2015 (has links) (PDF)
We focus on multi-dimensional classification (MDC) problems with conditional dependence, which we call multiple output dependence (MOD) problems. MDC is the task of predicting a vector of categorical outputs for each input. Conditional dependence in MDC means that the choice for one output value affects the choice for others, so it is not desirable to predict outputs independently. We show that conditional dependence in MDC implies that a single input can map to multiple correct output vectors. This means it is desirable to find multiple correct output vectors per input. Current solutions for MOD problems are not sufficient because they predict only one of the correct output vectors per input, ignoring all others.We modify four existing MDC solutions, including chain classifiers, to predict multiple output vectors. We further create a novel ensemble technique named weighted output vector ensemble (WOVE) which combines these multiple predictions from multiple chain classifiers in a way that preserves the integrity of output vectors and thus preserves conditional dependence among outputs. We verify the effectiveness of WOVE by comparing it against 7 other solutions on a variety of data sets and find that it shows significant gains over existing methods.
4

Qu’est-ce que le trouble de l’addiction? : pour une définition hybride et une classification dimensionnelle de l’addiction

Frenette, Rachel 08 1900 (has links)
La catégorisation actuelle du trouble de l’addiction dans le DSM-V fait face à plusieurs problèmes théoriques. D’abord, la catégorie nommée « Troubles liés à l’abus de substance et troubles addictifs » met en évidence le problème de l’exclusion par son manque de justification à inclure certains troubles du comportement, mais à en exclure d’autres, dans sa caractérisation. Le chevauchement constitue le deuxième problème que pose la catégorie du DSM-V, dans la mesure où certaines catégories censées être distinctes se recoupent en réalité. Les problèmes d’exclusion et de chevauchement remettent en question le fait de tracer ainsi les frontières entre catégories et en révèlent leur manque de validité conceptuelle. Et alors que la catégorie du trouble de l’addiction se heurte à ces problèmes, on peut douter de son utilité dans le traitement et la prise en charge des patients. Donc, par souci de fournir une classification en psychiatrie qui est valide et utile, il est nécessaire de redéfinir le trouble de l’addiction. Cela nous permettra de le classer autrement et adéquatement. Ainsi, nous défendons la thèse, dans ce mémoire, selon laquelle l’addiction ne renvoie pas à une entité discrète mais plutôt à un continuum, où coexistent deux phénomènes qu’il faut toutefois séparer : la motivation addictive et le trouble de l’addiction. Selon la définition que nous proposons, une taxonomie dimensionnelle, plutôt que catégorielle, représente mieux le trouble de l’addiction. Une telle approche possède le potentiel d’offrir de meilleurs outils aux cliniciens et aux chercheurs dans le traitement des personnes atteintes du trouble de l’addiction. / The categorization of addiction in the DSM-V faces many theoretical problems. First, the category named “Substance-use disorders and addictive disorders” emphasizes the problem of exclusion by its lack of justification to include certain behavioral disorders, whilst also excluding many other ones. Second, the category also induces the problem of overlapping, which refers to the way certain categories expected to be distinct actually intersect with each other. These problems of exclusion and overlapping raise some questions about the way the boundaries between categories are traced and reveal their lack of conceptual validity. Moreover, as categorization faces these theoretical problems, we can also doubt the usefulness of the category of addiction in the treatment and care of patients. Therefore, it is necessary to redefine addictive disorder in order to offer a classification that is valid and useful. Thus, in this memoir, we want to argue that addiction refers not to a discrete entity but to a continuum where two distinct phenomena coexist: addictive motivation and addictive disorder. According to our definition, dimensions, rather than categories, are much more appropriate to represent as is the disorder of addiction. This approach has the potential to offer better tools to clinicians and researchers in the treatment of people suffering from an addictive disorder.

Page generated in 0.1669 seconds