• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 1
  • Tagged with
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

A Common Misconception in Multi-Label Learning

Brodie, Michael Benjamin 01 November 2016 (has links)
The majority of current multi-label classification research focuses on learning dependency structures among output labels. This paper provides a novel theoretical view on the purported assumption that effective multi-label classification models must exploit output dependencies. We submit that the flurry of recent dependency-exploiting, multi-label algorithms may stem from the deficiencies in existing datasets, rather than an inherent need to better model dependencies. We introduce a novel categorization of multi-label metrics, namely, evenly and unevenly weighted label metrics. We explore specific features that predispose datasets to improved classification by methods that model label dependence. Additionally, we provide an empirical analysis of 15 benchmark datasets, 1 real-life dataset, and a variety of synthetic datasets. We assert that binary relevance (BR) yields similar, if not better, results than dependency-exploiting models for metrics with evenly weighted label contributions. We qualify this claim with discussions on specific characteristics of datasets and models that render negligible the differences between BR and dependency-learning models.
2

Statistical Modeling of Dynamic Risk in Security Systems / Statistisk modellering av dynamisk risk i säkerhetssystem

Singh, Gurpreet January 2020 (has links)
Big data has been used regularly in finance and business to build forecasting models. It is, however, a relatively new concept in the security industry. This study predicts technology related alarm codes that will sound in the coming 7 days at location $L$ by observing the past 7 days. Logistic regression and neural networks are applied to solve this problem. Due to the problem being of a multi-labeled nature logistic regression is applied in combination with binary relevance and classifier chains. The models are trained on data that has been labeled with two separate methods, the first method labels the data by only observing location $L$. The second considers $L$ and $L$'s surroundings. As the problem is multi-labeled the labels are likely to be unbalanced, thus a resampling technique, SMOTE, and random over-sampling is applied to increase the frequency of the minority labels. Recall, precision, and F1-score are calculated to evaluate the models. The results show that the second labeling method performs better for all models and that the classifier chains and binary relevance model performed similarly. Resampling the data with the SMOTE technique increases the macro average F1-scores for the binary relevance and classifier chains models, however, the neural networks performance decreases. The SMOTE resampling technique also performs better than random over-sampling. The neural networks model outperforms the other two models on all methods and achieves the highest F1-score. / Big data har använts regelbundet inom ekonomi för att bygga prognosmodeller, det är dock ett relativt nytt koncept inom säkerhetsbranschen. Denna studie förutsäger vilka larmkoder som kommer att låta under de kommande 7 dagarna på plats $L$ genom att observera de senaste 7 dagarna. Logistisk regression och neurala nätverk används för att lösa detta problem. Eftersom att problemet är av en multi-label natur tillämpas logistisk regression i kombination med binary relevance och classifier chains. Modellerna tränas på data som har annoterats med två separata metoder. Den första metoden annoterar datan genom att endast observera plats $L$ och den andra metoden betraktar $L$ och $L$:s omgivning. Eftersom problemet är multi-labeled kommer annoteringen sannolikt att vara obalanserad och därför används resamplings metoden, SMOTE, och random over-sampling för att öka frekvensen av minority labels. Recall, precision och F1-score mättes för att utvärdera modellerna. Resultaten visar att den andra annoterings metoden presterade bättre för alla modeller och att classifier chains och binary relevance presterade likartat. Binary relevance och classifier chains modellerna som tränades på datan som använts sig av resamplings metoden SMOTE gav ett högre macro average F1-score, dock sjönk prestationen för neurala nätverk. Resamplings metoden SMOTE presterade även bättre än random over-sampling. Neurala nätverksmodellen överträffade de andra två modellerna på alla metoder och uppnådde högsta F1-score.

Page generated in 0.1093 seconds