Global ETD Search

81	A Methodology for the Development and Verification of Expressive Ontologies Katsumi, Megan 12 December 2011 (has links) This work focuses on the presentation of a methodology for the development and verification of expressive ontologies. Motivated by experiences with the development of first-order logic ontologies, we call attention to the inadequacies of existing development methodologies for expressive ontologies. We attempt to incorporate pragmatic considerations inspired by our experiences while maintaining the rigorous definition and verification of requirements necessary for the development of expressive ontologies. We leverage automated reasoning tools to enable semiautomatic verification of requirements, and to assist other aspects of development where possible. In addition, we discuss the related issue of ontology quality, and formulate a set of requirements for MACLEOD - a proposed development tool that would support our lifecycle. industrial engineering ontology development lifecycle verification first-order logic 0546 0800
82	Improving Credit Card Fraud Detection using a Meta-learning Strategy Pun, Joseph King-Fung 19 December 2011 (has links) One of the issues facing credit card fraud detection systems is that a significant percentage of transactions labeled as fraudulent are in fact legitimate. These “false alarms” delay the detection of fraudulent transactions. Analysis of 11 months of credit card transaction data from a major Canadian bank was conducted to determine savings improvements that can be achieved by identifying truly fraudulent transactions. A meta-classifier model was used in this research. This model consists of 3 base classifiers constructed using the k-nearest neighbour, decision tree, and naïve Bayesian algorithms. The naïve Bayesian algorithm was also used as the meta-level algorithm to combine the base classifier predictions to produce the final classifier. Results from this research show that when a meta-classifier was deployed in series with the Bank’s existing fraud detection algorithm a 24% to 34% performance improvement was achieved resulting in $1.8 to $2.6 million cost savings per year. data mining credit card meta-learning artificial intelligence algorithms fraud detection 0800
83	Tumor Gene Expression Purification Using Infinite Mixture Topic Models Deshwar, Amit Gulab 11 July 2013 (has links) There is significant interest in using gene expression measurements to aid in the personalization of medical treatment. The presence of significant normal tissue contamination in tumor samples makes it difficult to use tumor expression measurements to predict clinical variables and treatment response. I present a probabilistic method, TMMpure, to infer the expression profile of the cancerous tissue using a modified topic model that contains a hierarchical Dirichlet process prior on the cancer profiles. I demonstrate that TMMpure is able to infer the expression profile of cancerous tissue and improves the power of predictive models for clinical variables using expression profiles. Bayesian methods Gene expression purficiation Bayesian Nonparametric Topic models 0984 0800 0544
84	A Methodology for the Development and Verification of Expressive Ontologies Katsumi, Megan 12 December 2011 (has links) This work focuses on the presentation of a methodology for the development and verification of expressive ontologies. Motivated by experiences with the development of first-order logic ontologies, we call attention to the inadequacies of existing development methodologies for expressive ontologies. We attempt to incorporate pragmatic considerations inspired by our experiences while maintaining the rigorous definition and verification of requirements necessary for the development of expressive ontologies. We leverage automated reasoning tools to enable semiautomatic verification of requirements, and to assist other aspects of development where possible. In addition, we discuss the related issue of ontology quality, and formulate a set of requirements for MACLEOD - a proposed development tool that would support our lifecycle. industrial engineering ontology development lifecycle verification first-order logic 0546 0800
85	Improving Credit Card Fraud Detection using a Meta-learning Strategy Pun, Joseph King-Fung 19 December 2011 (has links) One of the issues facing credit card fraud detection systems is that a significant percentage of transactions labeled as fraudulent are in fact legitimate. These “false alarms” delay the detection of fraudulent transactions. Analysis of 11 months of credit card transaction data from a major Canadian bank was conducted to determine savings improvements that can be achieved by identifying truly fraudulent transactions. A meta-classifier model was used in this research. This model consists of 3 base classifiers constructed using the k-nearest neighbour, decision tree, and naïve Bayesian algorithms. The naïve Bayesian algorithm was also used as the meta-level algorithm to combine the base classifier predictions to produce the final classifier. Results from this research show that when a meta-classifier was deployed in series with the Bank’s existing fraud detection algorithm a 24% to 34% performance improvement was achieved resulting in $1.8 to $2.6 million cost savings per year. data mining credit card meta-learning artificial intelligence algorithms fraud detection 0800
86	Tumor Gene Expression Purification Using Infinite Mixture Topic Models Deshwar, Amit Gulab 11 July 2013 (has links) There is significant interest in using gene expression measurements to aid in the personalization of medical treatment. The presence of significant normal tissue contamination in tumor samples makes it difficult to use tumor expression measurements to predict clinical variables and treatment response. I present a probabilistic method, TMMpure, to infer the expression profile of the cancerous tissue using a modified topic model that contains a hierarchical Dirichlet process prior on the cancer profiles. I demonstrate that TMMpure is able to infer the expression profile of cancerous tissue and improves the power of predictive models for clinical variables using expression profiles. Bayesian methods Gene expression purficiation Bayesian Nonparametric Topic models 0984 0800 0544
87	Attention, concentration, and distraction measure using EEG and eye tracking in virtual reality Zarour, Mahdi 12 1900 (has links) Attention is important in learning, Attention-deficit/hyperactivity disorder, Driving, and many other fields. Hence, intelligent tutoring systems, Attention-deficit/hyperactivity disorder diagnosis systems, and distraction detection of driver systems should be able to correctly monitor the attention levels of individuals in real time in order to estimate their attentional state. We study the feasibility of detecting distraction and concentration by monitoring participants' attention levels while they complete cognitive tasks using Electroencephalography and Eye Tracking in a virtual reality environment. Furthermore, we investigate the possibility of improving the concentration of participants using relaxation in virtual reality. We developed an indicator that estimates levels of attention with a real value using EEG data. The participant-independent indicator based on EEG data we used to assess the concentration levels of participants correctly predicts the concentration state with an accuracy (F1 = 73%). Furthermore, the participant-independent distraction model based on Eye Tracking data correctly predicted the distraction state of participants with an accuracy (F1 = 89%) in a participant-independent validation setting. / La concentration est importante dans l’apprentissage, Le trouble du déficit de l’attention avec ou sans hyperactivité, la conduite automobile et dans de nombreux autres domaines. Par conséquent, les systèmes de tutorat intelligents, les systèmes de diagnostic du trouble du déficit de l’attention avec ou sans hyperactivité et les systèmes de détection de la distraction au volant devraient être capables de surveiller correctement les niveaux d’attention des individus en temps réel afin de déduire correctement leur état attentionnel. Nous étudions la faisabilité de la détection de la distraction et de la concentration en surveillant les niveaux d’attention des participants pendant qu’ils effectuent des tâches cognitives en utilisant l’Électroencéphalographie et l’Eye Tracking dans un environnement de réalité virtuelle. En outre, nous étudions la possibilité d’améliorer la concentration des participants en utilisant la relaxation en réalité virtuelle. Nous avons mis au point un indicateur qui estime les niveaux d’attention avec une valeur réelle en utilisant les données EEG. L’indicateur indépendant du participant basé sur les données EEG que nous avons utilisé pour évaluer les niveaux de concentration des participants prédit correctement l’état de concentration avec une précision (F1 = 73%). De plus, le modèle de distraction indépendant des participants, basé sur les données d’Eye Tracking, a correctement prédit l’état de distraction des participants avec une précision (F1 = 89%) dans un cadre de validation indépendant des participants. eye tracking EEG virtual reality distraction concentration attention Réalité virtuelle
88	Accounting for variance and hyperparameter optimization in machine learning benchmarks Bouthillier, Xavier 06 1900 (has links) La récente révolution de l'apprentissage automatique s'est fortement appuyée sur l'utilisation de bancs de test standardisés. Ces derniers sont au centre de la méthodologie scientifique en apprentissage automatique, fournissant des cibles et mesures indéniables des améliorations des algorithmes d'apprentissage. Ils ne garantissent cependant pas la validité des résultats ce qui implique que certaines conclusions scientifiques sur les avancées en intelligence artificielle peuvent s'avérer erronées. Nous abordons cette question dans cette thèse en soulevant d'abord la problématique (Chapitre 5), que nous étudions ensuite plus en profondeur pour apporter des solutions (Chapitre 6) et finalement developpons un nouvel outil afin d'amélioration la méthodologie des chercheurs (Chapitre 7). Dans le premier article, chapitre 5, nous démontrons la problématique de la reproductibilité pour des bancs de test stables et consensuels, impliquant que ces problèmes sont endémiques aussi à de grands ensembles d'applications en apprentissage automatique possiblement moins stable et moins consensuels. Dans cet article, nous mettons en évidence l'impact important de la stochasticité des bancs de test, et ce même pour les plus stables tels que la classification d'images. Nous soutenons d'après ces résultats que les solutions doivent tenir compte de cette stochasticité pour améliorer la reproductibilité des bancs de test. Dans le deuxième article, chapitre 6, nous étudions les différentes sources de variation typiques aux bancs de test en apprentissage automatique, mesurons l'effet de ces variations sur les méthodes de comparaison d'algorithmes et fournissons des recommandations sur la base de nos résultats. Une contribution importante de ce travail est la mesure de la fiabilité d'estimateurs peu coûteux à calculer mais biaisés servant à estimer la performance moyenne des algorithmes. Tel qu'expliqué dans l'article, un estimateur idéal implique plusieurs exécution d'optimisation d'hyperparamètres ce qui le rend trop coûteux à calculer. La plupart des chercheurs doivent donc recourir à l'alternative biaisée, mais nous ne savions pas jusqu'à présent la magnitude de la dégradation de cet estimateur. Sur la base de nos résultats, nous fournissons des recommandations pour la comparison d'algorithmes sur des bancs de test avec des budgets de calculs limités. Premièrement, les sources de variations devraient être randomisé autant que possible. Deuxièmement, la randomization devrait inclure le partitionnement aléatoire des données pour les ensembles d'entraînement, de validation et de test, qui s'avère être la plus importante des sources de variance. Troisièmement, des tests statistiques tel que la version du Mann-Withney U-test présenté dans notre article devrait être utilisé plutôt que des comparisons sur la simple base de moyennes afin de prendre en considération l'incertitude des mesures de performance. Dans le chapitre 7, nous présentons un cadriciel d'optimisation d'hyperparamètres développé avec principal objectif de favoriser les bonnes pratiques d'optimisation des hyperparamètres. Le cadriciel est conçu de façon à privilégier une interface simple et intuitive adaptée aux habitudes de travail des chercheurs en apprentissage automatique. Il inclut un nouveau système de versionnage d'expériences afin d'aider les chercheurs à organiser leurs itérations expérimentales et tirer profit des résultats antérieurs pour augmenter l'efficacité de l'optimisation des hyperparamètres. L'optimisation des hyperparamètres joue un rôle important dans les bancs de test, les hyperparamètres étant un facteur confondant significatif. Fournir aux chercheurs un instrument afin de bien contrôler ces facteurs confondants est complémentaire aux recommandations pour tenir compte des sources de variation dans le chapitre 6. Nos recommendations et l'outil pour l'optimisation d'hyperparametre offre une base solide pour une méthodologie robuste et fiable. / The recent revolution in machine learning has been strongly based on the use of standardized benchmarks. Providing clear target metrics and undeniable measures of improvements of learning algorithms, they are at the center of the scientific methodology in machine learning. They do not ensure validity of results however, therefore some scientific conclusions based on flawed methodology may prove to be wrong. In this thesis we address this question by first raising the issue (Chapter 5), then we study it to find solutions and recommendations (Chapter 6) and build tools to help improve the methodology of researchers (Chapter 7). In first article, Chapter 5, we demonstrate the issue of reproducibility in stable and consensual benchmarks, implying that these issues are endemic to a large ensemble of machine learning applications that are possibly less stable or less consensual. We raise awareness of the important impact of stochasticity even in stable image classification tasks and contend that solutions for reproducible benchmarks should account for this stochasticity. In second article, Chapter 6, we study the different sources of variation that are typical in machine learning benchmarks, measure their effect on comparison methods to benchmark algorithms and provide recommendations based on our results. One important contribution of this work is that we measure the reliability of a cheaper but biased estimator for the average performance of algorithms. As explained in the article, an ideal estimator involving multiple rounds of hyperparameter optimization is too computationally expensive. Most researchers must resort to use the biased alternative, but it has been unknown until now how serious a degradation of the quality of estimation this leads to. Our investigations provides guidelines for benchmarks on practical budgets. First, as many sources of variations as possible should be randomized. Second, the partitioning of data in training, validation and test sets should be randomized as well, since this is the most important source of variation. Finally, statistical tests should be used instead of ad-hoc average comparisons so that the uncertainty of performance estimation can be accounted for when comparing machine learning algorithms. In Chapter 7, we present a framework for hyperparameter optimization that has been developed with the main goal of encouraging best practices for hyperparameter optimization. The framework is designed to favor a simple and intuitive interface adapted to the workflow of machine learning researchers. It includes a new version control system for experiments to help researchers organize their rounds of experimentations and leverage prior results for more efficient hyperparameter optimization. Hyperparameter optimization plays an important role in benchmarking, with the effect of hyperparameters being a serious confounding factor. Providing an instrument for researchers to properly control this confounding factor is complementary to our guidelines to account for sources of variation in Chapter 7. Our recommendations together with our tool for hyperparameter optimization provides a solid basis for a reliable methodology in machine learning benchmarks. Reproducibility Reproductibilité Optimisation d'hyperparamètres Machine learning Apprentissage automatique Hyperparameter optimization
89	FETA : fairness enforced verifying, training, and predicting algorithms for neural networks Mohammadi, Kiarash 06 1900 (has links) L’automatisation de la prise de décision dans des applications qui affectent directement la qualité de vie des individus grâce aux algorithmes de réseaux de neurones est devenue monnaie courante. Ce mémoire porte sur les enjeux d’équité individuelle qui surviennent lors de la vérification, de l’entraînement et de la prédiction des réseaux de neurones. Une approche populaire pour garantir l’équité consiste à traduire une notion d’équité en contraintes sur les paramètres du modèle. Néanmoins, cette approche ne garantit pas toujours des prédictions équitables des modèles de réseaux de neurones entraînés. Pour relever ce défi, nous avons développé une technique de post-traitement guidée par les contre-exemples afin de faire respecter des contraintes d’équité lors de la prédiction. Contrairement aux travaux antérieurs qui ne garantissent l’équité qu’aux points entourant les données de test ou d’entraînement, nous sommes en mesure de garantir l’équité sur tous les points du domaine. En outre, nous proposons une technique de prétraitement qui repose sur l’utilisation de l’équité comme biais inductif. Cette technique consiste à incorporer itérativement des contre-exemples plus équitables dans le processus d’apprentissage à travers la fonction de perte. Les techniques que nous avons développé ont été implémentées dans un outil appelé FETA. Une évaluation empirique sur des données réelles indique que FETA est non seulement capable de garantir l’équité au moment de la prédiction, mais aussi d’entraîner des modèles précis plus équitables. / Algorithmic decision-making driven by neural networks has become very prominent in applications that directly affect people’s quality of life. This paper focuses on the problem of ensuring individual fairness in neural network models during verification, training, and prediction. A popular approach for enforcing fairness is to translate a fairness notion into constraints over the parameters of the model. However, such a translation does not always guarantee fair predictions of the trained neural network model. To address this challenge, we develop a counterexample-guided post-processing technique to provably enforce fairness constraints at prediction time. Contrary to prior work that enforces fairness only on points around test or train data, we are able to enforce and guarantee fairness on all points in the domain. Additionally, we propose a counterexample guided loss as an in-processing technique to use fairness as an inductive bias by iteratively incorporating fairness counterexamples in the learning process. We have implemented these techniques in a tool called FETA. Empirical evaluation on real-world datasets indicates that FETA is not only able to guarantee fairness on-the-fly at prediction time but also is able to train accurate models exhibiting a much higher degree of individual fairness. Fairness Bias Mitigation Neural Networks Verification Équité Réseaux de Neurones Vérification
90	Improving predictive behavior under distributional shift Ahmed, Faruk 08 1900 (has links) L'hypothèse fondamentale guidant la pratique de l'apprentissage automatique est qu’en phase de test, les données sont \emph{indépendantes et identiquement distribuées} à la distribution d'apprentissage. En pratique, les ensembles d'entraînement sont souvent assez petits pour favoriser le recours à des biais trompeurs. De plus, lorsqu'il est déployé dans le monde réel, un modèle est susceptible de rencontrer des données nouvelles ou anormales. Lorsque cela se produit, nous aimerions que nos modèles communiquent une confiance prédictive réduite. De telles situations, résultant de différentes formes de changement de distribution, sont incluses dans ce que l'on appelle actuellement les situations \emph{hors distribution} (OOD). Dans cette thèse par article, nous discutons des aspects de performance OOD relativement à des changement de distribution sémantique et non sémantique -- ceux-ci correspondent à des instances de détection OOD et à des problèmes de généralisation OOD. Dans le premier article, nous évaluons de manière critique le problème de la détection OOD, en se concentrant sur l’analyse comparative et l'évaluation. Tout en soutenant que la détection OOD est trop vague pour être significative, nous suggérons plutôt de détecter les anomalies sémantiques. Nous montrons que les classificateurs entraînés sur des objectifs auxiliaires auto-supervisés peuvent améliorer la sémanticité dans les représentations de caractéristiques, comme l’indiquent notre meilleure détection des anomalies sémantiques ainsi que notre meilleure généralisation. Dans le deuxième article, nous développons davantage notre discussion sur le double objectif de robustesse au changement de distribution non sémantique et de sensibilité au changement sémantique. Adoptant une perspective de compositionnalité, nous décomposons le changement non sémantique en composants systématiques et non systématiques, la généralisation en distribution et la détection d'anomalies sémantiques formant les tâches correspondant à des compositions complémentaires. Nous montrons au moyen d'évaluations empiriques sur des tâches synthétiques qu'il est possible d'améliorer simultanément les performances sur tous ces aspects de robustesse et d'incertitude. Nous proposons également une méthode simple qui améliore les approches existantes sur nos tâches synthétiques. Dans le troisième et dernier article, nous considérons un scénario de boîte noire en ligne dans lequel non seulement la distribution des données d'entrée conditionnées sur les étiquettes change de l’entraînement au test, mais aussi la distribution marginale des étiquettes. Nous montrons que sous de telles contraintes pratiques, de simples estimations probabilistes en ligne du changement d'étiquette peuvent quand même être une piste prometteuse. Nous terminons par une brève discussion sur les pistes possibles. / The fundamental assumption guiding practice in machine learning has been that test-time data is \emph{independent and identically distributed} to the training distribution. In practical use, training sets are often small enough to encourage reliance upon misleading biases. Additionally, when deployed in the real-world, a model is likely to encounter novel or anomalous data. When this happens, we would like our models to communicate reduced predictive confidence. Such situations, arising as a result of different forms of distributional shift, comprise what are currently termed \emph{out-of-distribution} (OOD) settings. In this thesis-by-article, we discuss aspects of OOD performance with regards to semantic and non-semantic distributional shift — these correspond to instances of OOD detection and OOD generalization problems. In the first article, we critically appraise the problem of OOD detection, with regard to benchmarking and evaluation. Arguing that OOD detection is too broad to be meaningful, we suggest detecting semantic anomalies instead. We show that classifiers trained with auxiliary self-supervised objectives can improve semanticity in feature representations, as indicated by improved semantic anomaly detection as well as improved generalization. In the second article, we further develop our discussion of the twin goals of robustness to non-semantic distributional shift and sensitivity to semantic shift. Adopting a perspective of compositionality, we decompose non-semantic shift into systematic and non-systematic components, along with in-distribution generalization and semantic anomaly detection forming the complementary tasks. We show by means of empirical evaluations on synthetic setups that it is possible to improve performance at all these aspects of robustness and uncertainty simultaneously. We also propose a simple method that improves upon existing approaches on our synthetic benchmarks. In the third and final article, we consider an online, black-box scenario in which both the distribution of input data conditioned on labels changes from training to testing, as well as the marginal distribution of labels. We show that under such practical constraints, simple online probabilistic estimates of label-shift can nevertheless be a promising approach. We close with a brief discussion of possible avenues forward. Anomaly detection Distributional shift Changement de distribution Détection d'anomalies

Search results