• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 119
  • 19
  • 15
  • 8
  • 8
  • 5
  • 4
  • 3
  • 3
  • 3
  • 2
  • 1
  • 1
  • 1
  • Tagged with
  • 211
  • 93
  • 75
  • 61
  • 50
  • 49
  • 41
  • 37
  • 36
  • 31
  • 31
  • 26
  • 23
  • 21
  • 20
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
71

Cost-Sensitive Boosting for Classification of Imbalanced Data

Sun, Yanmin 11 May 2007 (has links)
The classification of data with imbalanced class distributions has posed a significant drawback in the performance attainable by most well-developed classification systems, which assume relatively balanced class distributions. This problem is especially crucial in many application domains, such as medical diagnosis, fraud detection, network intrusion, etc., which are of great importance in machine learning and data mining. This thesis explores meta-techniques which are applicable to most classifier learning algorithms, with the aim to advance the classification of imbalanced data. Boosting is a powerful meta-technique to learn an ensemble of weak models with a promise of improving the classification accuracy. AdaBoost has been taken as the most successful boosting algorithm. This thesis starts with applying AdaBoost to an associative classifier for both learning time reduction and accuracy improvement. However, the promise of accuracy improvement is trivial in the context of the class imbalance problem, where accuracy is less meaningful. The insight gained from a comprehensive analysis on the boosting strategy of AdaBoost leads to the investigation of cost-sensitive boosting algorithms, which are developed by introducing cost items into the learning framework of AdaBoost. The cost items are used to denote the uneven identification importance among classes, such that the boosting strategies can intentionally bias the learning towards classes associated with higher identification importance and eventually improve the identification performance on them. Given an application domain, cost values with respect to different types of samples are usually unavailable for applying the proposed cost-sensitive boosting algorithms. To set up the effective cost values, empirical methods are used for bi-class applications and heuristic searching of the Genetic Algorithm is employed for multi-class applications. This thesis also covers the implementation of the proposed cost-sensitive boosting algorithms. It ends with a discussion on the experimental results of classification of real-world imbalanced data. Compared with existing algorithms, the new algorithms this thesis presents are superior in achieving better measurements regarding the learning objectives.
72

Cost-Sensitive Boosting for Classification of Imbalanced Data

Sun, Yanmin 11 May 2007 (has links)
The classification of data with imbalanced class distributions has posed a significant drawback in the performance attainable by most well-developed classification systems, which assume relatively balanced class distributions. This problem is especially crucial in many application domains, such as medical diagnosis, fraud detection, network intrusion, etc., which are of great importance in machine learning and data mining. This thesis explores meta-techniques which are applicable to most classifier learning algorithms, with the aim to advance the classification of imbalanced data. Boosting is a powerful meta-technique to learn an ensemble of weak models with a promise of improving the classification accuracy. AdaBoost has been taken as the most successful boosting algorithm. This thesis starts with applying AdaBoost to an associative classifier for both learning time reduction and accuracy improvement. However, the promise of accuracy improvement is trivial in the context of the class imbalance problem, where accuracy is less meaningful. The insight gained from a comprehensive analysis on the boosting strategy of AdaBoost leads to the investigation of cost-sensitive boosting algorithms, which are developed by introducing cost items into the learning framework of AdaBoost. The cost items are used to denote the uneven identification importance among classes, such that the boosting strategies can intentionally bias the learning towards classes associated with higher identification importance and eventually improve the identification performance on them. Given an application domain, cost values with respect to different types of samples are usually unavailable for applying the proposed cost-sensitive boosting algorithms. To set up the effective cost values, empirical methods are used for bi-class applications and heuristic searching of the Genetic Algorithm is employed for multi-class applications. This thesis also covers the implementation of the proposed cost-sensitive boosting algorithms. It ends with a discussion on the experimental results of classification of real-world imbalanced data. Compared with existing algorithms, the new algorithms this thesis presents are superior in achieving better measurements regarding the learning objectives.
73

Modèles d'Ensembles pour l'Apprentissage Multi-Tache, avec des taches Hétérogènes et sans Restrictions

Faddoul, Jean Baptiste 18 June 2012 (has links) (PDF)
Apprendre des tâches simultanément peut améliorer le performance de prédiction par rapport à l'apprentissage de ces tâches de manière indépendante. dans cette thèse, nous considérons l'apprentissage multi-tâche lorsque le nombre de tâches est grand. En outre, nous détendons des restrictions imposées sur les tâches. Ces restrictions peuvent trouvées dans les méthodes de l'état de l'art. Plus précisément on trouve les restrictions suivantes : l'imposition du même espace d'étiquette sur les tâches, l'exigence des mêmes examples d'apprentissage entre tâches et / ou supposant une hypothèse de corrélation globale entre tâches. Nous proposons des nouveaux classificateurs multi-tâches qui relaxent les restrictions précédentes. Nos classificateurs sont considérés en fonction de la théorie de l'apprentissage PAC des classifieurs faibles, donc, afin de parvenir à un faible taux d'erreur de classification, un ensemble de ces classifieurs faibles doivent être appris. Ce cadre est appelé l'apprentissage d'ensembles, dans lequel nous proposons un algorithme d'apprentissage multi-tâche inspirée de l'algorithme Adaboost pour seule tâche. Différentes variantes sont proposées également, à savoir, les forêts aléatoires pour le multi-tâche, c'est une méthode d'apprentissage d'ensemble, mais fondée sur le principe statistique d'échantillonnage Bootstrap. Enfin, nous donnons une validation expérimentale qui montre que approche sur-performe des méthodes existants et permet d'apprendre des nouvelles configurations de tâches qui ne correspondent pas aux méthodes de l'état de l'art.
74

FPGA interconnection networks with capacitive boosting in strong and weak inversion

Eslami, Fatemeh 22 August 2012 (has links)
Designers of Field-Programmable Gate Arrays (FPGAs) are always striving to improve the speed of their designs. The propagation delay of FPGA interconnection networks is a major challenge and continues to grow with newer technologies. FPGAs interconnection networks are implemented using NMOS pass transistor based multiplexers followed by buffers. The threshold voltage drop across an NMOS device degrades the high logic value, and results in unbalanced rising and falling edges, static power consumption due to the crowbar currents, and reduced noise margins. In this work, circuit design techniques to construct interconnection circuit with capacitive boosting are proposed. By using capacitive boosting in FPGAs interconnection networks, the signal transitions are accelerated and the crowbar currents of downstream buffers are reduced. In addition, buffers can be non-skewed or slightly skewed to improve noise immunity of the interconnection network. Results indicate that by using the presented circuit design technique, the propagation delay can be reduced by at least 10% versus prior art at the expense of a slight increase in silicon area. In addition, in a bid to reduce power consumption in reconfigurable arrays, operation in weak inversion region has been suggested. Current programmable interconnections cannot be directly used in this region due to a very poor propagation delay and sensitivity to Process-Voltage-Temperature (PVT) variations. This work also focuses on designing a common structure for FPGAs interconnection networks that can operate in both strong and weak inversion. We propose to use capacitive boosting together with a new circuit design technique, called Twins transmission gates in implementing FPGA interconnect multiplexers. We also propose to use capacitive boosting in designing buffers. This way, the operation region of the interconnection circuitry is shifted away from weak inversion toward strong inversion resulting in improved speed and enhanced tolerance to PVT variations. Simulation results indicate using capacitive boosting to implement the interconnection network can have a significant influence on delay and tolerance to variations. The interconnection network with capacitive boosting is at least 34% faster than prior art in weak inversion. / Graduate
75

Predição genômica da resistência à ferrugem alaranjada em café arábica via algoritmos de aprendizagem de máquina / Genomic prediction of leaf rust resistance to arabica coffee using machine learning algorithms

Sousa, Ithalo Coelho de 26 February 2018 (has links)
Submitted by Marco Antônio de Ramos Chagas (mchagas@ufv.br) on 2018-07-11T12:09:39Z No. of bitstreams: 1 textocompleto.pdf: 925551 bytes, checksum: 6e6a52bb70c4e45081687d495922f845 (MD5) / Made available in DSpace on 2018-07-11T12:09:39Z (GMT). No. of bitstreams: 1 textocompleto.pdf: 925551 bytes, checksum: 6e6a52bb70c4e45081687d495922f845 (MD5) Previous issue date: 2018-02-26 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior / A seleção genômica (SG) foi proposta como uma forma de aumentar a eficiência e acelerar o melhoramento genético. A SG enfatiza a predição simultânea dos efeitos genéticos de milhares de marcadores dispersos em todo o genoma de um organismo. Algumas metodologias estatísticas têm sido utilizadas em SG para a predição do mérito genético, como por exemplo a Ridge Regression Best Linear Unbiased Prediction (RR- BLUP), Bayesian Lasso (BLASSO). Porém tais metodologias exigem algumas pressuposições a respeito dos dados tais como normalidade da distribuição dos valores fenotípicos. Além disto, a presença de fatores complicadores tais como epistasia e dominância atrapalham a utilização destes modelos, uma vez que exigem que tais efeitos sejam estabelecidos à priori pelo pesquisador. Visando contornar a não normalidade dos valores fenotípicos a literatura sugere o uso dos modelos lineares generalizados sob o enfoque bayesiano (BGLR). Outra alternativa são os modelos baseados em aprendizagem de máquina (AM), representados por metodologias tais como Redes Neurais (RNA), Árvores de Decisão (AD) e seus possíveis refinamentos (Bagging, Random Forest e Boosting) as quais podem incorporar a epistasia e a dominância no modelo além de não exigirem pressuposições quanto ao modelo e a distribuição dos valores fenotípicos. Diante disso, o objetivo deste trabalho foi utilizar AD e seus refinamentos Bagging, Random Forest e Boosting para predição da resistência a ferrugem alaranjada no café arábica. Além disso, AD e seus refinamentos foram utilizadas para identificar a importância dos marcadores relacionados a característica de interesse. Os resultados foram comparados com aqueles provenientes do GBLASSO (Lasso Bayesiano Generalizado) e RNA. Foram utilizados dados da resistência a ferrugem do café de 245 plantas derivadas do cruzamento do Híbrido de Timor e do Catuaí Amarelo, genotipados para 137 marcadores. A AD e seus refinamentos obtiveram resultados satisfatórios, visto que apresentaram valores iguais ou inferiores de Taxa de Erro Aparente comparados com aqueles obtidos pelo GBLASSO e RNA. Ademais, os refinamentos da AD demonstraram ser capazes de identificar marcadores importantes para característica de interesse, visto que dentre os 10 marcadores mais importantes analisados em cada metodologia, 3-4 marcadores estavam próximos a QTL’s relacionados a resistência a doença listados na literatura. Por fim, a AD e seus refinamentos mostraram um melhor desempenho em relação ao GBLASSO e a RNA quanto ao custo computacional. / Genomic selection (GS) has been proposed as a way to increase efficiency and accelerate genetic improvement. GS emphasizes the simultaneous prediction of the genetic effects of thousands of scattered markers throughout an organism's genome. Some statistical methodologies have been used in GS for the prediction of genetic merit, such as Ridge Regression Best Linear Unbiased Prediction (RR-BLUP), Bayesian Lasso (BLASSO). However such methodologies require some assumptions about the data such as normality of the distribution of phenotypic values. In addition, the presence of complicating factors such as epistasis and dominance hinder the use of these models, since they require that such effects be established a priori by the researcher. In order to avoid the non-normality of phenotypic values, the literature suggests the use of Bayesian Generalized Linear Regression (BGLR). Another alternative is the models based on machine learning, represented by methodologies such as Artificial Neural Networks (ANN), Decision Trees (DT) and their possible refinements such as Bagging, Random Forest and Boosting, which can incorporate epistasis and dominance in the model, besides not requiring assumptions about the model and the distribution of phenotypic values. The aim of this work was to use DT and its refinements Bagging, Random Forest and Boosting for prediction of resistance to orange rust in arabica coffee. In addition, DT and its refinements were used to identify the importance of markers related to the characteristic of interest. The results were compared with those from GBLASSO (Generalized Bayesian Lasso) and ANN. Data from the coffee rust resistance of 245 plants derived from the hybrid of the Timor Hybrid and the Yellow Catuaí, genotyped for 137 markers were used. The DT and its refinements obtained satisfactory results, since they presented equal or inferior values of Apparent Error Rate compared to those obtained by GBLASSO and RNA. In addition, DT refinements seem to be able to identify important markers for characteristic of interest, since among the 10 most important markers analyzed in each methodology, 3-4 markers were close to QTLs related to resistance to disease listed in the literature. Finally, the Decision Tree and its refinements showed a better performance in relation to the GBLASSO and RNA regarding computational cost.
76

A Boosted-Window Ensemble

Elahi, Haroon January 2014 (has links)
Context. The problem of obtaining predictions from stream data involves training on the labeled instances and suggesting the class values for the unseen stream instances. The nature of the data-stream environments makes this task complicated. The large number of instances, the possibility of changes in the data distribution, presence of noise and drifting concepts are just some of the factors that add complexity to the problem. Various supervised-learning algorithms have been designed by putting together efficient data-sampling, ensemble-learning, and incremental-learning methods. The performance of the algorithm is dependent on the chosen methods. This leaves an opportunity to design new supervised-learning algorithms by using different combinations of constructing methods. Objectives. This thesis work proposes a fast and accurate supervised-learning algorithm for performing predictions on the data-streams. This algorithm is called as Boosted-Window Ensemble (BWE), which is invented using the mixture-of-experts technique. BWE uses Sliding Window, Online Boosting and incremental-learning for data-sampling, ensemble-learning, and maintaining a consistent state with the current stream data, respectively. In this regard, a sliding window method is introduced. This method uses partial-updates for sliding the window on the data-stream and is called Partially-Updating Sliding Window (PUSW). The investigation is carried out to compare two variants of sliding window and three different ensemble-learning methods for choosing the superior methods. Methods. The thesis uses experimentation approach for evaluating the Boosted-Window Ensemble (BWE). CPU-time and the Prediction accuracy are used as performance indicators, where CPU-time is the execution time in seconds. The benchmark algorithms include: Accuracy-Updated Ensemble1 (AUE1), Accuracy-Updated Ensemble2 (AUE2), and Accuracy-Weighted Ensemble (AWE). The experiments use nine synthetic and five real-world datasets for generating performance estimates. The Asymptotic Friedman test and the Wilcoxon Signed-Rank test are used for hypothesis testing. The Wilcoxon-Nemenyi-McDonald-Thompson test is used for performing post-hoc analysis. Results. The hypothesis testing suggests that: 1) both for the synthetic and real-wrold datasets, the Boosted Window Ensemble (BWE) has significantly lower CPU-time values than two benchmark algorithms (Accuracy-updated Ensemble1 (AUE1) and Accuracy-weighted Ensemble (AWE). 2) BWE returns similar prediction accuracy as AUE1 and AWE for synthetic datasets. 3) BWE returns similar prediction accuracy as the three benchmark algorithms for the real-world datasets. Conclusions. Experimental results demonstrate that the proposed algorithm can be as accurate as the state-of-the-art benchmark algorithms, while obtaining predictions from the stream data. The results further show that the use of Partially-Updating Sliding Window has resulted in lower CPU-time for BWE as compared with the chunk-based sliding window method used in AUE1, AUE2, and AWE.
77

Nouvelles approches thérapeutiques par potentialisation d’antituberculeux analogues du nicotinamide / New therapeutic approaches by a boosting strategy of antituberculosis nicotinamide analogues

Blondiaux, Nicolas 17 December 2012 (has links)
Les antibiotiques représentent à l’heure actuelle le seul moyen de lutte efficace contre la tuberculose. Parmi eux, l’éthionamide (ETH) est l’un des antituberculeux les plus efficaces. Il pose cependant des problèmes d’effets indésirables non négligeables ce qui relègue son utilisation en seconde ligne de traitement. Ces inconvénients aboutissent fréquemment à une inobservance au traitement, à l’origine du développement de souches résistantes.L’ETH, à l’instar d’autres composés antimycobactériens, est une pro-drogue nécessitant son activation métabolique par une enzyme produite par la mycobactérie elle-même. Il a été montré que cette bio-activation intra-bactérienne est exercée par la mono-oxygénase EthA dont la production est réprimée par le régulateur transcriptionnel EthR. Lors de travaux précédents, des inhibiteurs de EthR ont été développés dans le but de stimuler la bioactivation de l’ETH par EthA. Ces molécules de synthèse ont permis de potentialiser l’efficacité de l’ETH d’un facteur trois sur un modèle murin d’infection tuberculeuse. Toutefois, bien qu’actifs chez l’animal, cette première série de composés possède des propriétés pharmacocinétiques et pharmacodynamiques (PK/PD) insuffisantes pour une utilisation en clinique humaine. Le premier objectif de ce travail a donc été de définir un « profil minimum acceptable » nécessaire à la réalisation d’études pré-cliniques. L’évaluation systématique des performances de plus de 500 composés a mené à l’identification de leads compatibles avec le profil défini. Notre deuxième objectif a été d’évaluer l’intérêt de la stratégie de potentialisation de l’ETH dans la problématique de la prise en charge de la tuberculose multi-résistante (MDR-TB). Ainsi, dans 80% des cas, l’usage de nos inhibiteurs d’EthR a permis d’abaisser significativement la concentration minimale inhibitrice d’ETH.Parallèlement, tirant profit de la quantité importante de composés générés lors de ce programme d’optimisation, une étude fondamentale des interactions entre inhibiteurs et EthR a été menée. De cette manière, nous avons pu identifier une région restreinte de la poche d’interaction de EthR avec ses inhibiteurs/ligands, nécessaire et suffisante à la réorganisation spatiale menant à une forme inactive du répresseur. Pour la première fois dans cette famille de répresseur de type TetR, nous avons montré que la modification d’un seul acide aminé dans cette région de la protéine provoque les mêmes phénomènes allostériques que ceux induits par la fixation des inhibiteurs/ligands. De façon inattendue, le programme d’optimisation des inhibiteurs nous a mené à l’identification d’une nouvelle famille de molécules capables de potentialiser l’ETH alors qu’elles ont perdu leur capacité d’interagir avec EthR. Des expériences de transcriptomique et de RMN ont révélé que ces composés inhibent une voie de bio-activation de l’ETH indépendante de EthA. Cette voie ouvre des perspectives extraordinaires de traitement puisque ces inhibiteurs augmentent significativement l’efficacité de la prodrogue, non seulement sur les souches cliniques MDR-TB, mais également sur les souches cliniques résistantes à l’ETH. Notre dernier objectif a été de calquer cette stratégie de potentialisation à l’antituberculeux le plus utilisé dans le monde, l’isoniazide (INH). Tout comme l’ETH, l’INH est une pro-drogue. Sa bio-activation est tributaire de la catalase-peroxydase KatG dont le niveau d’expression est sous dépendance du régulateur transcriptionnel FurA. Notre objectif a donc été d’obtenir des inhibiteurs spécifiques de FurA. En l’absence de structure cristallographique de FurA nous empêchant une approche par chimie raisonnée sur cible, nous avons basé notre stratégie sur un criblage à haut débit de vastes chimiothèques. Les premiers hits et leur partielle optimisation sont discutés dans ce travail. / Antibiotics are currently the only effective means of control against tuberculosis. Among them, ethionamide (ETH) is one of the most effective. However it is responsible for significant side effects that relegate the ETH use to a second-line. These events often lead to non-compliance with treatment promoting many cases of multidrug resistant-tuberculosis (MDR-TB). Like other antimycobacterial compounds, ETH is a prodrug that requires bioactivation by an enzyme produced by the mycobacteria. It has been shown that the intrabacterial bioactivation of the prodrug by the monooxygenase EthA is controled by the mycobacterial repressor EthR. In previous studies, our group has developped EthR inhibitors shown to stimulate the bioactivation of ETH by EthA. These synthetic compounds led to boost the ETH efficacy three-fold in a M. tuberculosis-infected mice model. However, although active in animals, these compounds possess insufficient pharmacokinetic and pharmacodynamic (PK/PD) properties for envisaging human clinical evaluation. The first objective of this work was therefore to define a “minimum acceptable profile” required for initiating pre-clinical studies. Systematic evaluation of the performance of more than 500 compounds led to the identification of leads compatible with the defined profile. Our second objective was to evaluate the benefit of the ETH boosting strategy in the management of MDR-TB. In 80% of cases, the use of our EthR inhibitors drastically decreased the minimum inhibitory concentration of ETH.In parallel, we conducted a fundamental study on the interactions between inhibitors and EthR by exploiting the large amount of compounds generated during the optimization blueprint. This way, we have identified a narrow region of the binding pocket of EthR that interacts in all cases with its inhibitors/ligands. For the first time in this TetR family of repressors, we have shown that this portion of the ligand-binding site is necessary and sufficient for the structural reorganization of the repressor. As such, the modification of a single amino acid in this region of the protein caused the same allosteric phenomena as those induced by inhibitors/ligands, which led to the inactive form of EthR.Unexpectedly, the optimization blueprint of EthR inhibitors led to the identification of a new family of compounds able to boost ETH in spite of their loss of interaction with EthR. Transcriptomics and NMR experiments showed that these compounds inhibit the ETH bioactivation independently of EthA. This novel pathway opens up extraordinary opportunities for TB treatment since these compounds significantly increase the effectiveness of ETH, not only against clinical MDR-TB strains, but also against clinical isolates resistant to ETH.The last objective was to transpose this boosting strategy to isoniazid (INH), the most commonly used antituberculosis drug. As ETH, INH is a prodrug. Its bioactivation depends on the catalase-peroxidase KatG whose level of expression is controlled by the transcriptional regulator FurA. Our objective was therefore to obtain specific FurA inhibitors. Due to the absence of crystallographic structure of FurA, which preclude a target based approach, our strategy was based on high-throughput screening of large chemical libraries. The first hits and their partial optimization are discussed in this work.
78

Méthodes d’ensembles pour l’apprentissage multi-tâche avec des tâches hétérogènes et sans restrictions / Ensemble Methods to Learn Multiple Heterogenous Tasks without Restrictions

Faddoul, Jean-Baptiste 18 June 2012 (has links)
Apprendre des tâches simultanément peut améliorer la performance de prédiction par rapport à l'apprentissage de ces tâches de manière indépendante. Dans cette thèse, nous considérons l'apprentissage multi-tâche lorsque le nombre de tâches est grand. En outre, nous débattons des restrictions imposées sur les tâches. Ces restrictions peuvent être trouvées dans les méthodes de l'état de l'art. Plus précisément on trouve les restrictions suivantes : l'imposition du même espace d'étiquette sur les tâches, l'exigence des mêmes exemples d'apprentissage entre tâches et / ou supposant une hypothèse de corrélation globale entre tâches. Nous proposons des nouveaux classificateurs multi-tâches qui relaxent les restrictions précédentes. Nos classificateurs sont considérés en fonction de la théorie de l'apprentissage PAC des classifieurs faibles, donc, afin de parvenir à un faible taux d'erreur de classification, un ensemble de ces classifieurs faibles doivent être appris. Ce cadre est appelé l'apprentissage d'ensembles, dans lequel nous proposons un algorithme d'apprentissage multi-tâche inspiré de l'algorithme Adaboost pour seule tâche. Différentes variantes sont proposées également, à savoir, les forêts aléatoires pour le multi-tâche, c'est une méthode d'apprentissage d'ensemble, mais fondée sur le principe statistique d'échantillonnage Bootstrap. Enfin, nous donnons une validation expérimentale qui montre que l'approche sur-performe des méthodes existantes et permet d'apprendre des nouvelles configurations de tâches qui ne correspondent pas aux méthodes de l'état de l'art. / Learning multiple related tasks jointly by exploiting their underlying shared knowledge can improve the predictive performance on every task compared to learning them individually. In this thesis, we address the problem of multi-task learning (MTL) when the tasks are heterogenous: they do not share the same labels (eventually with different number of labels), they do not require shared examples. In addition, no prior assumption about the relatedness pattern between tasks is made. Our contribution to multi-task learning lies in the framework of en- semble learning where the learned function consists normally of an ensemble of "weak " hypothesis aggregated together by an ensemble learning algorithm (Boosting, Bagging, etc.). We propose two approaches to cope with heterogenous tasks without making prior assumptions about the relatedness patterns. For each approach, we devise novel multi-task weak hypothesis along with their learning algorithms then we adapt a boosting algorithm to the multi-task setting. In the first approach, the weak classi ers we consider are 2-level decision stumps for di erent tasks. A weak classi er assigns a class to each instance on two tasks and abstain on other tasks. The weak classi ers allow to handle dependencies between tasks on the instance space. We introduce di fferent effi cient weak learners. We then consider Adaboost with weak classi ers which can abstain and adapt it to multi-task learning. In an empirical study, we compare the weak learners and we study the influence of the number of boosting rounds. In the second approach, we develop the multi-task Adaboost environment with Multi-Task Decision Trees as weak classi ers. We fi rst adapt the well known decision tree learning to the multi-task setting. We revise the information gain rule for learning decision trees in the multi-task setting. We use this feature to develop a novel criterion for learning Multi-Task Decision Trees. The criterion guides the tree construction by learning the decision rules from data of di fferent tasks, and representing diff erent degrees of task relatedness. We then modify MT-Adaboost to combine Multi-task Decision Trees as weak learners. We experimentally validate the advantage of our approaches; we report results of experiments conducted on several multi-task datasets, including the Enron email set and Spam Filtering collection.
79

Srovnání heuristických a konvenčních statistických metod v data miningu / Comparison of Heuristic and Conventional Statistical Methods in Data Mining

Bitara, Matúš January 2019 (has links)
The thesis deals with the comparison of conventional and heuristic methods in data mining used for binary classification. In the theoretical part, four different models are described. Model classification is demonstrated on simple examples. In the practical part, models are compared on real data. This part also consists of data cleaning, outliers removal, two different transformations and dimension reduction. In the last part methods used to quality testing of models are described.
80

Detektion von Gesichtern in Bildern

Schulz, Daniel 26 February 2007 (has links)
Die Diplomarbeit beschäftigt sich mit der Detektion von Gesichtern in Bildern. Ausgehend von einem Überblick über bestehende Verfahren wird ein viel versprechendes Verfahren ausgewählt, vorgestellt und basierend auf neuen Erkenntnissen weiterentwickelt. Bilddaten aus dem Universitätsarchiv werden exemplarisch für die Evaluierung des Verfahrens verwendet.

Page generated in 0.0326 seconds