481 |
Deep learning of representations and its application to computer visionGoodfellow, Ian 04 1900 (has links)
No description available.
|
482 |
Towards deep semi supervised learningPezeshki, Mohammad 05 1900 (has links)
No description available.
|
483 |
Feature extraction on faces : from landmark localization to depth estimationHonari, Sina 12 1900 (has links)
No description available.
|
484 |
[en] PREDICTING DRUG SENSITIVITY OF CANCER CELLS BASED ON GENOMIC DATA / [pt] PREVENDO A EFICÁCIA DE DROGAS A PARTIR DE CÉLULAS CANCEROSAS BASEADO EM DADOS GENÔMICOSSOFIA PONTES DE MIRANDA 22 April 2021 (has links)
[pt] Prever com precisão a resposta a drogas para uma dada amostra baseado em características moleculares pode ajudar a otimizar o desenvolvimento de drogas e explicar mecanismos por trás das respostas aos tratamentos. Nessa dissertação, dois estudos de caso foram gerados, cada um aplicando diferentes dados genômicos para a previsão de resposta a drogas. O estudo de caso 1 avaliou dados de perfis de metilação de DNA como um tipo de característica molecular que se sabe ser responsável por causar tumorigênese e modular a resposta a tratamentos. Usando perfis de metilação de 987 linhagens celulares do genoma completo na base de dados Genomics of Drug Sensitivity in Cancer (GDSC), utilizamos algoritmos de aprendizado de máquina para avaliar o potencial preditivo de respostas citotóxicas para oito drogas contra o câncer. Nós comparamos a performance de cinco algoritmos de classificação e quatro algoritmos de regressão representando metodologias diversas, incluindo abordagens tree-, probability-, kernel-, ensemble- e distance-based. Aplicando sub-amostragem artificial em graus variados, essa pesquisa procura avaliar se o treinamento baseado em resultados relativamente extremos geraria melhoria no desempenho. Ao utilizar algoritmos de classificação e de regressão para prever respostas discretas ou contínuas, respectivamente, nós observamos consistentemente excelente desempenho na predição quando os conjuntos de treinamento e teste consistiam em dados de linhagens celulares. Algoritmos de classificação apresentaram melhor desempenho quando nós treinamos os modelos utilizando linhagens celulares com valores de resposta a drogas relativamente extremos, obtendo valores de area-under-the-receiver-operating-characteristic-curve de até 0,97. Os algoritmos de regressão tiveram melhor desempenho quando treinamos os modelos utilizado o intervalo completo de valores de resposta às drogas, apesar da dependência das métricas de desempenho utilizadas.
O estudo de caso 2 avaliou dados de RNA-seq, dados estes comumente utilizados no estudo da eficácia de drogas. Aplicando uma abordagem de aprendizado semi-supervisionado, essa pesquisa busca avaliar o impacto da combinação de dados rotulados e não-rotulados para melhorar a predição do modelo. Usando dados rotulados de RNA-seq do genoma completo de uma média de 125 amostras de tumor AML rotuladas da base de dados Beat AML (separados por tipos de droga) e 151 amostras de tumor AML não-rotuladas na base de dados The Cancer Genome Atlas (TCGA), utilizamos uma estrutura de modelo semi-supervisionado para prever respostas citotóxicas para quatro drogas contra câncer. Modelos semi-supervisionados foram gerados, avaliando várias combinações de parâmetros e foram comparados com os algoritmos supervisionados de classificação. / [en] Accurately predicting drug responses for a given sample based on molecular features may help to optimize drug-development pipelines and explain mechanisms behind treatment responses. In this dissertation, two case studies were generated, each applying different genomic data to predict drug response. Case study 1 evaluated DNA methylation profile data as one type of molecular feature that is known to drive tumorigenesis and modulate treatment responses. Using genome-wide, DNA methylation profiles from 987 cell lines in the Genomics of Drug Sensitivity in Cancer (GDSC) database, we used machine-learning algorithms to evaluate the potential to predict cytotoxic responses for eight anti-cancer drugs. We compared the performance of five classification algorithms and four regression algorithms representing diverse methodologies, including tree-, probability-, kernel-, ensemble- and distance-based approaches. By applying artificial subsampling in varying degrees, this research aims to understand whether training based on relatively extreme outcomes would yield improved performance. When using classification or regression algorithms to predict discrete or continuous responses, respectively, we consistently observed excellent predictive performance when the training and test sets consisted of cell-line data. Classification algorithms performed best when we trained the models using cell lines with relatively extreme drug-response values, attaining area-under-the-receiver-operating-characteristic-curve values as high as 0.97. The regression algorithms performed best when we trained the models using the full range of drug-response values, although this depended on the performance metrics we used. Case study 2 evaluated RNA-seq data as one of the most popular molecular data used to study drug efficacy. By applying a semi-supervised learning approach, this research aimed to understand the impact of combining labeled and unlabeled data to improve model prediction. Using genome-wide RNA-seq labeled data from an average of 125 AML tumor samples in the Beat AML database (varying by drug type) and 151 unlabeled AML tumor samples in The Cancer Genome Atlas (TCGA) database, we used a semi-supervised model structure to predict cytotoxic responses for four anti-cancer drugs. Semi-supervised models were generated, while assessing several parameter combinations and were compared against supervised classification algorithms.
|
485 |
Enkele tegnieke vir die ontwikkeling en benutting van etiketteringhulpbronne vir hulpbronskaars tale / A.C. GriebenowGriebenow, Annick January 2015 (has links)
Because the development of resources in any language is an expensive process, many languages, including the indigenous languages of South Africa, can be classified as being resource scarce, or lacking in tagging resources. This study investigates and applies techniques and methodologies for optimising the use of available resources and improving the accuracy of a tagger using Afrikaans as resource-scarce language and aims to i) determine whether combination techniques can be effectively applied to improve the accuracy of a tagger for Afrikaans, and ii) determine whether structural semi-supervised learning can be effectively applied to improve the accuracy of a supervised learning tagger for Afrikaans. In order to realise the first aim, existing methodologies for combining classification algorithms are investigated. Four taggers, trained using MBT, SVMlight, MXPOST and TnT respectively, are then combined into a combination tagger using weighted voting. Weights are calculated by means of total precision, tag precision and a combination of precision and recall. Although the combination of taggers does not consistently lead to an error rate reduction with regard to the baseline, it manages to achieve an error rate reduction of up to 18.48% in some cases. In order to realise the second aim, existing semi-supervised learning algorithms, with specific focus on structural semi-supervised learning, are investigated. Structural semi-supervised learning is implemented by means of the SVD-ASO-algorithm, which attempts to extract the shared structure of untagged data using auxiliary problems before training a tagger. The use of untagged data during the training of a tagger leads to an error rate reduction with regard to the baseline of 1.67%. Even though the error rate reduction does not prove to be statistically significant in all cases, the results show that it is possible to improve the accuracy in some cases. / MSc (Computer Science), North-West University, Potchefstroom Campus, 2015
|
486 |
Enkele tegnieke vir die ontwikkeling en benutting van etiketteringhulpbronne vir hulpbronskaars tale / A.C. GriebenowGriebenow, Annick January 2015 (has links)
Because the development of resources in any language is an expensive process, many languages, including the indigenous languages of South Africa, can be classified as being resource scarce, or lacking in tagging resources. This study investigates and applies techniques and methodologies for optimising the use of available resources and improving the accuracy of a tagger using Afrikaans as resource-scarce language and aims to i) determine whether combination techniques can be effectively applied to improve the accuracy of a tagger for Afrikaans, and ii) determine whether structural semi-supervised learning can be effectively applied to improve the accuracy of a supervised learning tagger for Afrikaans. In order to realise the first aim, existing methodologies for combining classification algorithms are investigated. Four taggers, trained using MBT, SVMlight, MXPOST and TnT respectively, are then combined into a combination tagger using weighted voting. Weights are calculated by means of total precision, tag precision and a combination of precision and recall. Although the combination of taggers does not consistently lead to an error rate reduction with regard to the baseline, it manages to achieve an error rate reduction of up to 18.48% in some cases. In order to realise the second aim, existing semi-supervised learning algorithms, with specific focus on structural semi-supervised learning, are investigated. Structural semi-supervised learning is implemented by means of the SVD-ASO-algorithm, which attempts to extract the shared structure of untagged data using auxiliary problems before training a tagger. The use of untagged data during the training of a tagger leads to an error rate reduction with regard to the baseline of 1.67%. Even though the error rate reduction does not prove to be statistically significant in all cases, the results show that it is possible to improve the accuracy in some cases. / MSc (Computer Science), North-West University, Potchefstroom Campus, 2015
|
487 |
應用共變異矩陣描述子及半監督式學習於行人偵測 / Semi-supervised learning for pedestrian detection with covariance matrix feature黃靈威, Huang, Ling Wei Unknown Date (has links)
行人偵測為物件偵測領域中一個極具挑戰性的議題。其主要問題在於人體姿勢以及衣著服飾的多變性,加之以光源照射狀況迥異,大幅增加了辨識的困難度。吾人在本論文中提出利用共變異矩陣描述子及結合單純貝氏分類器與級聯支持向量機的線上學習辨識器,以增進行人辨識之正確率與重現率。
實驗結果顯示,本論文所提出之線上學習策略在某些辨識狀況較差之資料集中能有效提升正確率與重現率達百分之十四。此外,即便於相同之初始訓練條件下,在USC Pedestrian Detection Test Set、 INRIA Person dataset 及 Penn-Fudan Database for Pedestrian Detection and Segmentation三個資料集中,本研究之正確率與重現率亦較HOG搭配AdaBoost之行人辨識方式為優。 / Pedestrian detection is an important yet challenging problem in object classification due to flexible body pose, loose clothing and ever-changing illumination. In this thesis, we employ covariance feature and propose an on-line learning classifier which combines naïve Bayes classifier and cascade support vector machine (SVM) to improve the precision and recall rate of pedestrian detection in a still image.
Experimental results show that our on-line learning strategy can improve precision and recall rate about 14% in some difficult situations. Furthermore, even under the same initial training condition, our method outperforms HOG + AdaBoost in USC Pedestrian Detection Test Set, INRIA Person dataset and Penn-Fudan Database for Pedestrian Detection and Segmentation.
|
488 |
On Recurrent and Deep Neural NetworksPascanu, Razvan 05 1900 (has links)
L'apprentissage profond est un domaine de recherche en forte croissance en apprentissage automatique qui est parvenu à des résultats impressionnants dans différentes tâches allant de la classification d'images à la parole, en passant par la modélisation du langage. Les réseaux de neurones récurrents, une sous-classe d'architecture profonde, s'avèrent particulièrement prometteurs. Les réseaux récurrents peuvent capter la structure temporelle dans les données. Ils ont potentiellement la capacité d'apprendre des corrélations entre des événements éloignés dans le temps et d'emmagasiner indéfiniment des informations dans leur mémoire interne. Dans ce travail, nous tentons d'abord de comprendre pourquoi la profondeur est utile. Similairement à d'autres travaux de la littérature, nos résultats démontrent que les modèles profonds peuvent être plus efficaces pour représenter certaines familles de fonctions comparativement aux modèles peu profonds. Contrairement à ces travaux, nous effectuons notre analyse théorique sur des réseaux profonds acycliques munis de fonctions d'activation linéaires par parties, puisque ce type de modèle est actuellement l'état de l'art dans différentes tâches de classification. La deuxième partie de cette thèse porte sur le processus d'apprentissage. Nous analysons quelques techniques d'optimisation proposées récemment, telles l'optimisation Hessian free, la descente de gradient naturel et la descente des sous-espaces de Krylov. Nous proposons le cadre théorique des méthodes à région de confiance généralisées et nous montrons que plusieurs de ces algorithmes développés récemment peuvent être vus dans cette perspective. Nous argumentons que certains membres de cette famille d'approches peuvent être mieux adaptés que d'autres à l'optimisation non convexe. La dernière partie de ce document se concentre sur les réseaux de neurones récurrents. Nous étudions d'abord le concept de mémoire et tentons de répondre aux questions suivantes: Les réseaux récurrents peuvent-ils démontrer une mémoire sans limite? Ce comportement peut-il être appris? Nous montrons que cela est possible si des indices sont fournis durant l'apprentissage. Ensuite, nous explorons deux problèmes spécifiques à l'entraînement des réseaux récurrents, à savoir la dissipation et l'explosion du gradient. Notre analyse se termine par une solution au problème d'explosion du gradient qui implique de borner la norme du gradient. Nous proposons également un terme de régularisation conçu spécifiquement pour réduire le problème de dissipation du gradient. Sur un ensemble de données synthétique, nous montrons empiriquement que ces mécanismes peuvent permettre aux réseaux récurrents d'apprendre de façon autonome à mémoriser des informations pour une période de temps indéfinie. Finalement, nous explorons la notion de profondeur dans les réseaux de neurones récurrents. Comparativement aux réseaux acycliques, la définition de profondeur dans les réseaux récurrents est souvent ambiguë. Nous proposons différentes façons d'ajouter de la profondeur dans les réseaux récurrents et nous évaluons empiriquement ces propositions. / Deep Learning is a quickly growing area of research in machine learning, providing impressive results on different tasks ranging from image classification to speech and language modelling. In particular, a subclass of deep models, recurrent neural networks, promise even more. Recurrent models can capture the temporal structure in the data. They can learn correlations between events that might be far apart in time and, potentially, store information for unbounded amounts of time in their innate memory. In this work we first focus on understanding why depth is useful. Similar to other published work, our results prove that deep models can be more efficient at expressing certain families of functions compared to shallow models. Different from other work, we carry out our theoretical analysis on deep feedforward networks with piecewise linear activation functions, the kind of models that have obtained state of the art results on different classification tasks. The second part of the thesis looks at the learning process. We analyse a few recently proposed optimization techniques, including Hessian Free Optimization, natural gradient descent and Krylov Subspace Descent. We propose the framework of generalized trust region methods and show that many of these recently proposed algorithms can be viewed from this perspective. We argue that certain members of this family of approaches might be better suited for non-convex optimization than others. The last part of the document focuses on recurrent neural networks. We start by looking at the concept of memory. The questions we attempt to answer are: Can recurrent models exhibit unbounded memory? Can this behaviour be learnt? We show this to be true if hints are provided during learning. We explore, afterwards, two specific difficulties of training recurrent models, namely the vanishing gradients and exploding gradients problem. Our analysis concludes with a heuristic solution for the exploding gradients that involves clipping the norm of the gradients. We also propose a specific regularization term meant to address the vanishing gradients problem. On a toy dataset, employing these mechanisms, we provide anecdotal evidence that the recurrent model might be able to learn, with out hints, to exhibit some sort of unbounded memory. Finally we explore the concept of depth for recurrent neural networks. Compared to feedforward models, for recurrent models the meaning of depth can be ambiguous. We provide several ways in which a recurrent model can be made deep and empirically evaluate these proposals.
|
489 |
Apprentissage supervisé de données déséquilibrées par forêt aléatoire / Supervised learning of imbalanced datasets using random forestThomas, Julien 12 February 2009 (has links)
La problématique des jeux de données déséquilibrées en apprentissage supervisé est apparue relativement récemment, dès lors que le data mining est devenu une technologie amplement utilisée dans l'industrie. Le but de nos travaux est d'adapter différents éléments de l'apprentissage supervisé à cette problématique. Nous cherchons également à répondre aux exigences spécifiques de performances souvent liées aux problèmes de données déséquilibrées. Ce besoin se retrouve dans notre application principale, la mise au point d'un logiciel d'aide à la détection des cancers du sein.Pour cela, nous proposons de nouvelles méthodes modifiant trois différentes étapes d'un processus d'apprentissage. Tout d'abord au niveau de l'échantillonnage, nous proposons lors de l'utilisation d'un bagging, de remplacer le bootstrap classique par un échantillonnage dirigé. Nos techniques FUNSS et LARSS utilisent des propriétés de voisinage pour la sélection des individus. Ensuite au niveau de l'espace de représentation, notre contribution consiste en une méthode de construction de variables adaptées aux jeux de données déséquilibrées. Cette méthode, l'algorithme FuFeFa, est basée sur la découverte de règles d'association prédictives. Enfin, lors de l'étape d'agrégation des classifieurs de base d'un bagging, nous proposons d'optimiser le vote à la majorité en le pondérant. Pour ce faire nous avons mis en place une nouvelle mesure quantitative d'évaluation des performances d'un modèle, PRAGMA, qui permet la prise en considération de besoins spécifiques de l'utilisateur vis-à-vis des taux de rappel et de précision de chaque classe. / The problem of imbalanced datasets in supervised learning has emerged relatively recently, since the data mining has become a technology widely used in industry. The assisted medical diagnosis, the detection of fraud, abnormal phenomena, or specific elements on satellite imagery, are examples of industrial applications based on supervised learning of imbalanced datasets. The goal of our work is to bring supervised learning process on this issue. We also try to give an answer about the specific requirements of performance often related to the problem of imbalanced datasets, such as a high recall rate for the minority class. This need is reflected in our main application, the development of software to help radiologist in the detection of breast cancer. For this, we propose new methods of amending three different stages of a learning process. First in the sampling stage, we propose in the case of a bagging, to replaced classic bootstrap sampling by a guided sampling. Our techniques, FUNSS and LARSS use neighbourhood properties for the selection of objects. Secondly, for the representation space, our contribution is a method of variables construction adapted to imbalanced datasets. This method, the algorithm FuFeFa, is based on the discovery of predictive association rules. Finally, at the stage of aggregation of base classifiers of a bagging, we propose to optimize the majority vote in using weightings. For this, we have introduced a new quantitative measure of model assessment, PRAGMA, which allows taking into account user specific needs about recall and precision rates of each class.
|
490 |
Learning during search / Apprendre durant la recherche combinatoireArbelaez Rodriguez, Alejandro 31 May 2011 (has links)
La recherche autonome est un nouveau domaine d'intérêt de la programmation par contraintes, motivé par l'importance reconnue de l'utilisation de l'apprentissage automatique pour le problème de sélection de l'algorithme le plus approprié pour une instance donnée, avec une variété d'applications, par exemple: Planification, Configuration d'horaires, etc. En général, la recherche autonome a pour but le développement d'outils automatiques pour améliorer la performance d'algorithmes de recherche, e.g., trouver la meilleure configuration des paramètres pour un algorithme de résolution d'un problème combinatoire. Cette thèse présente l'étude de trois points de vue pour l'automatisation de la résolution de problèmes combinatoires; en particulier, les problèmes de satisfaction de contraintes, les problèmes d'optimisation de combinatoire, et les problèmes de satisfiabilité (SAT).Tout d'abord, nous présentons domFD, une nouvelle heuristique pour le choix de variable, dont l'objectif est de calculer une forme simplifiée de dépendance fonctionnelle, appelée dépendance-relaxée. Ces dépendances-relaxées sont utilisées pour guider l'algorithme de recherche à chaque point de décision.Ensuite, nous révisons la méthode traditionnelle pour construire un portefeuille d'algorithmes pour le problème de la prédiction de la structure des protéines. Nous proposons un nouveau paradigme de recherche-perpétuelle dont l'objectif est de permettre à l'utilisateur d'obtenir la meilleure performance de son moteur de résolution de contraintes. La recherche-perpétuelle utilise deux modes opératoires: le mode d'exploitation utilise le modèle en cours pour solutionner les instances de l'utilisateur; le mode d'exploration réutilise ces instances pour s'entraîner et améliorer la qualité d'un modèle d'heuristiques par le biais de l'apprentissage automatique. Cette deuxième phase est exécutée quand l'unit\'e de calcul est disponible (idle-time). Finalement, la dernière partie de cette thèse considère l'ajout de la coopération au cours d'exécution d'algorithmes de recherche locale parallèle. De cette façon, on montre que si on partage la meilleure configuration de chaque algorithme dans un portefeuille parallèle, la performance globale peut être considérablement amélioré. / Autonomous Search is a new emerging area in Constraint Programming, motivated by the demonstrated importance of the application of Machine Learning techniques to the Algorithm Selection Problem, and with potential applications ranging from planning and configuring to scheduling. This area aims at developing automatic tools to improve the performance of search algorithms to solve combinatorial problems, e.g., selecting the best parameter settings for a constraint solver to solve a particular problem instance. In this thesis, we study three different points of view to automatically solve combinatorial problems; in particular Constraint Satisfaction, Constraint Optimization, and SAT problems.First, we present domFD, a new Variable Selection Heuristic whose objective is to heuristically compute a simplified form of functional dependencies called weak dependencies. These weak dependencies are then used to guide the search at each decision point. Second, we study the Algorithm Selection Problem from two different angles. On the one hand, we review a traditional portfolio algorithm to learn offline a heuristics model for the Protein Structure Prediction Problem. On the other hand, we present the Continuous Search paradigm, whose objective is to allow any user to eventually get his constraint solver to achieve a top performance on their problems. Continuous Search comes in two modes: the functioning mode solves the user's problem instances using the current heuristics model; the exploration mode reuses these instances to training and improve the heuristics model through Machine Learning during the computer idle time. Finally, the last part of the thesis, considers the question of adding a knowledge-sharing layer to current portfolio-based parallel local search solvers for SAT. We show that by sharing the best configuration of each algorithm in the parallel portfolio on regular basis and aggregating this information in special ways, the overall performance can be greatly improved.
|
Page generated in 0.1913 seconds