Global ETD Search

21	On sparse representations and new meta-learning paradigms for representation learning Mehta, Nishant A. 27 August 2014 (has links) Given the "right" representation, learning is easy. This thesis studies representation learning and meta-learning, with a special focus on sparse representations. Meta-learning is fundamental to machine learning, and it translates to learning to learn itself. The presentation unfolds in two parts. In the first part, we establish learning theoretic results for learning sparse representations. The second part introduces new multi-task and meta-learning paradigms for representation learning. On the sparse representations front, our main pursuits are generalization error bounds to support a supervised dictionary learning model for Lasso-style sparse coding. Such predictive sparse coding algorithms have been applied with much success in the literature; even more common have been applications of unsupervised sparse coding followed by supervised linear hypothesis learning. We present two generalization error bounds for predictive sparse coding, handling the overcomplete setting (more original dimensions than learned features) and the infinite-dimensional setting. Our analysis led to a fundamental stability result for the Lasso that shows the stability of the solution vector to design matrix perturbations. We also introduce and analyze new multi-task models for (unsupervised) sparse coding and predictive sparse coding, allowing for one dictionary per task but with sharing between the tasks' dictionaries. The second part introduces new meta-learning paradigms to realize unprecedented types of learning guarantees for meta-learning. Specifically sought are guarantees on a meta-learner's performance on new tasks encountered in an environment of tasks. Nearly all previous work produced bounds on the expected risk, whereas we produce tail bounds on the risk, thereby providing performance guarantees on the risk for a single new task drawn from the environment. The new paradigms include minimax multi-task learning (minimax MTL) and sample variance penalized meta-learning (SVP-ML). Regarding minimax MTL, we provide a high probability learning guarantee on its performance on individual tasks encountered in the future, the first of its kind. We also present two continua of meta-learning formulations, each interpolating between classical multi-task learning and minimax multi-task learning. The idea of SVP-ML is to minimize the task average of the training tasks' empirical risks plus a penalty on their sample variance. Controlling this sample variance can potentially yield a faster rate of decrease for upper bounds on the expected risk of new tasks, while also yielding high probability guarantees on the meta-learner's average performance over a draw of new test tasks. An algorithm is presented for SVP-ML with feature selection representations, as well as a quite natural convex relaxation of the SVP-ML objective. Learning theory Data-dependent complexity Luckiness Dictionary learning Sparse coding Lasso Multi-task learning Meta-learning Learning to learn
22	Learning with Limited Supervision by Input and Output Coding Zhang, Yi 01 May 2012 (has links) In many real-world applications of supervised learning, only a limited number of labeled examples are available because the cost of obtaining high-quality examples is high. Even with a relatively large number of labeled examples, the learning problem may still suffer from limited supervision as the complexity of the prediction function increases. Therefore, learning with limited supervision presents a major challenge to machine learning. With the goal of supervision reduction, this thesis studies the representation, discovery and incorporation of extra input and output information in learning. Information about the input space can be encoded by regularization. We first design a semi-supervised learning method for text classification that encodes the correlation of words inferred from seemingly irrelevant unlabeled text. We then propose a multi-task learning framework with a matrix-normal penalty, which compactly encodes the covariance structure of the joint input space of multiple tasks. To capture structure information that is more general than covariance and correlation, we study a class of regularization penalties on model compressibility. Then we design the projection penalty, which encodes the structure information from a dimension reduction while controlling the risk of information loss. Information about the output space can be exploited by error correcting output codes. Using the composite likelihood view, we propose an improved pairwise coding for multi-label classification, which encodes pairwise label density (as opposed to label comparisons) and decodes using variational methods. We then investigate problemdependent codes, where the encoding is learned from data instead of being predefined. We first propose a multi-label output code using canonical correlation analysis, where predictability of the code is optimized. We then argue that both discriminability and predictability are critical for output coding, and propose a max-margin formulation that promotes both discriminative and predictable codes. We empirically study our methods in a wide spectrum of applications, including document categorization, landmine detection, face recognition, brain signal classification, handwritten digit recognition, house price forecasting, music emotion prediction, medical decision, email analysis, gene function classification, outdoor scene recognition, and so forth. In all these applications, our proposed methods for encoding input and output information lead to significantly improved prediction performance. regularization error-correcting output codes supervised learning semi-supervised learning multi-task learning multi-label classification dimensionality reduction Computer Sciences
23	On The Effectiveness of Multi-TaskLearningAn evaluation of Multi-Task Learning techniques in deep learning models Tovedal, Sofiea January 2020 (has links) Multi-Task Learning is today an interesting and promising field which many mention as a must for achieving the next level advancement within machine learning. However, in reality, Multi-Task Learning is much more rarely used in real-world implementations than its more popular cousin Transfer Learning. The questionis why that is and if Multi-Task Learning outperforms its Single-Task counterparts. In this thesis different Multi-Task Learning architectures were utilized in order to build a model that can handle labeling real technical issues within two categories. The model faces a challenging imbalanced data set with many labels to choose from and short texts to base its predictions on. Can task-sharing be the answer to these problems? This thesis investigated three Multi-Task Learning architectures and compared their performance to a Single-Task model. An authentic data set and two labeling tasks was used in training the models with the method of supervised learning. The four model architectures; Single-Task, Multi-Task, Cross-Stitched and the Shared-Private, first went through a hyper parameter tuning process using one of the two layer options LSTM and GRU. They were then boosted by auxiliary tasks and finally evaluated against each other. Multi-Task Learning Natural Language Processing Supervised Learning Gated Recurrent Unit Long Short Term Memory Engineering and Technology Teknik och teknologier
24	Assessment of lung damages from CT images using machine learning methods. / Bedömning av lungskador från CT-bilder med maskininlärningsmetoder. Chometon, Quentin January 2018 (has links) Lung cancer is the most commonly diagnosed cancer in the world and its finding is mainly incidental. New technologies and more specifically artificial intelligence has lately acquired big interest in the medical field as it can automate or bring new information to the medical staff. Many research have been done on the detection or classification of lung cancer. These works are done on local region of interest but only a few of them have been done looking at a full CT-scan. The aim of this thesis was to assess lung damages from CT images using new machine learning methods. First, single predictors had been learned by a 3D resnet architecture: cancer, emphysema, and opacities. Emphysema was learned by the network reaching an AUC of 0.79 whereas cancer and opacity predictions were not really better than chance AUC = 0.61 and AUC = 0.61. Secondly, a multi-task network was used to predict the factors altogether. A training with no prior knowledge and a transfer learning approach using self-supervision were compared. The transfer learning approach showed similar results in the multi-task approach for emphysema with AUC=0.78 vs 0.60 without pre-training and opacities with an AUC=0.61. Moreover using the pre-training approach enabled the network to reach the same performance as each of single factor predictor but with only one multi-task network which saves a lot of computational time. Finally a risk score can be derived from the training to use this information in a clinical context. Deep Learning Artifical Neural Networks Lung damages CT-Scans Multi-task learning Transfer learning Medical Engineering Medicinteknik
25	Life Long Learning In Sparse Learning Environments Reeder, John 01 January 2013 (has links) Life long learning is a machine learning technique that deals with learning sequential tasks over time. It seeks to transfer knowledge from previous learning tasks to new learning tasks in order to increase generalization performance and learning speed. Real-time learning environments in which many agents are participating may provide learning opportunities but they are spread out in time and space outside of the geographical scope of a single learning agent. This research seeks to provide an algorithm and framework for life long learning among a network of agents in a sparse real-time learning environment. This work will utilize the robust knowledge representation of neural networks, and make use of both functional and representational knowledge transfer to accomplish this task. A new generative life long learning algorithm utilizing cascade correlation and reverberating pseudo-rehearsal and incorporating a method for merging divergent life long learning paths will be implemented. Machine learning life long learning neural networks cascade correlation multi task learning observational learning Computer Engineering Engineering
26	Multi-task learning for joint diagnosis of CNVs and psychiatric conditions from rs-fMRI Harvey, Annabelle 04 1900 (has links) L'imagerie par résonance magnétique fonctionnelle à l'état de repos (IRMf-R) s'est imposée comme une technologie diagnostique prometteuse. Toutefois, l'application dans la pratique clinique des biomarqueurs de l'IRMf-R visant à capturer les mécanismes biologiques sous-jacents aux troubles psychiatriques a été entravée par le manque de généralisation. Le diagnostic de ces troubles repose entièrement sur des évaluations comportementales et les taux élevés de comorbidités et de chevauchement génétique et symptomatique confirment l'existence de facteurs latents communs à toutes les pathologies. De grandes mutations génétiques rares, appelées variants du nombre de copies (CNV), ont été associées à une série de troubles psychiatriques et ont des effets beaucoup plus importants sur la structure et la fonction du cerveau, ce qui en fait une voie prometteuse pour démêler la génétique des catégories diagnostiques actuelles. L'apprentissage multitâche est une approche prometteuse pour extraire des représentations communes à des tâches connexes, qui permet de mieux utiliser les données en tirant parti des informations partagées et en améliorant la généralisabilité. Nous avons recueilli un ensemble de données sans précédent composé de 19 CNV et de troubles psychiatriques et nous avons cherché à évaluer systématiquement les avantages potentiels de l'apprentissage multitâche pour la précision de la prédiction, afin d'effectuer un diagnostic conjoint de ces conditions interdépendantes. Nous avons estimé les tailles d'effet pour chaque condition, comparé la précision du diagnostic en utilisant des méthodes courantes d'apprentissage automatique, puis en utilisant l'apprentissage multitâches. Nous avons tenté de contrôler les multiples facteurs confondants tout au long des analyses et discutons des différentes approches permettant de le faire dans le contexte de la modélisation prédictive. L'hypothèse selon laquelle les facteurs latents partagés entre les CNV et les troubles psychiatriques les rendraient suffisamment liés en tant que tâches de prédiction pour bénéficier d'un apprentissage conjoint n'a pas été confirmée. Cependant, nous avons également appliqué l'apprentissage multitâche entre les sites pour prédire une cible commune et nous avons montré que la prédiction peut être améliorée lorsque les tâches sont très étroitement liées. Nous avons mis en œuvre un modèle léger de partage des paramètres durs, mais nos résultats et la littérature montrent que ce cadre n'est pas bien adapté aux tâches hétérogènes ou, de manière contre-intuitive, aux échantillons de petite taille. Nous pensons qu'il est possible d'exploiter les similitudes entre les CNV et les troubles psychiatriques en utilisant des méthodes qui modélisent les relations entre les tâches, mais la petite taille des échantillons pour les CNV rares constitue une limitation majeure pour l'application de l'apprentissage multitâche. / Resting state functional magnetic resonance imaging (rs-fMRI) has emerged as a promising diagnostic technology, however translation into clinical practice of rs-fMRI biomarkers that aim to capture the biological mechanisms underlying psychiatric disorders has been hindered by lack of generalizability. The diagnosis of these disorders is completely based on behavioural assessments and high rates of comorbidities and genetic and symptom overlap supports the existence of latent factors shared across conditions. Rare large genetic mutations, called copy number variants (CNVs), have been associated with a range of psychiatric conditions and have much larger effect sizes on brain structure and function, which makes them a promising avenue for untangling the genetics of the current diagnostic categories. Multi-task learning is a promising approach to extract common representations across related tasks that makes better use of data by leveraging shared information and improves generalizability. We collected an unprecedented dataset consisting of 19 CNVs and psychiatric disorders and aimed to systematically assess the potential benefits for prediction accuracy of using multi-task learning to perform joint diagnosis of these interlinked conditions. We estimated effect sizes for each condition, benchmarked diagnostic accuracy using common machine learning methods, and then using multi-task learning. We attempted to control for multiple confounding factors throughout the analyses, and discuss different approaches to do so in the predictive modelling context. The hypothesis that latent factors shared between CNVs and psychiatric conditions would make them sufficiently related as prediction tasks to benefit from being learned jointly was not supported. However, we also applied multi-task learning across sites to predict a common target and showed that prediction can be improved when tasks are very tightly related. We implemented a lightweight hard parameter sharing model, but evidence from our results and the literature shows this framework is not well suited to heterogeneous tasks or, counterintuitively, to small sample sizes. While we believe there is potential to exploit the similarities between CNVs and psychiatric conditions using methods that model relationships between tasks, small sample sizes for rare CNVs are a major limitation for the application of multi-task learning. Multi-task learning fMRI CNVs psychiatric conditions confounds Apprentissage multitâche IRMf troubles psychiatriques confonds
27	MultiModal Neural Network for Healthcare Applications / Multimodal neural network för tillämpningar inom hälso- och sjukvård Satayeva, Malika January 2023 (has links) BACKGROUND. Multimodal Machine Learning is a powerful paradigm that capitalizes on the complementary predictive capabilities of different data modalities, such as text, image, time series. This approach allows for an extremely diverse feature space, which proves useful for combining different real-world tasks into a single model. Current architectures in the field of multimodal learning often integrate feature representations in parallel, a practice that not only limits their interpretability but also creates a reliance on the availability of specific modalities. Interpretability and robustness to missing inputs are particularly important in clinical decision support systems. To address these issues, the iGH Research Group at EPFL proposed a modular sequential input fusion called Modular Decision Support Network (MoDN). MoDN was tested on unimodal tabular inputs for multitask outputs and was shown to be superior to its monolithic parallel counterparts, while handling any number and combination of inputs and providing continuous real-time predictive feedback. AIM. We aim to extend MoDN to MultiModN with multimodal inputs and compare the benefits and limitations of sequential fusion with a state-of-the-art parallel fusion (P-Fusion) baseline.METHODS & FINDINGS. We align our experimental setup with a previously published P-Fusion baseline, focusing on two binary diagnostic predictive tasks (presence of pleural effusion and edema) in a popular multimodal clinical benchmark dataset (MIMIC).We perform four experiments: 1) comparing MultiModN to P-Fusion, 2) extending the architecture to multiple tasks, 3) exploring MultiModN's inherent interpretability in several metrics, and 4) testing its ability to be resistant to biased missingness by simulating missing not at random (MNAR) data during training and flipping the bias at inference. We show that MultiModN's sequential architecture does not compromise performance compared with the P-Fusion baseline, despite the added advantages of being multitask, composable and inherently interpretable. The final experiment shows that MultiModN resists catastrophic failure from MNAR data, which is particularly prevalent in clinical settings. / Multimodal maskininlärning är ett kraftfullt paradigm som utnyttjar de kompletterande prediktiva egenskaperna hos olika datamodaliteter, såsom text, bild, tidsserier. Detta tillvägagångssätt möjliggör ett extremt varierat funktionsutrymme, vilket visar sig vara användbart för att kombinera olika verkliga uppgifter i en enda modell. Nuvarande arkitekturer för multimodal inlärning integrerar ofta funktionsrepresentationer parallellt, en praxis som inte bara begränsar deras tolkningsbarhet utan också skapar ett beroende av tillgängligheten av specifika modaliteter. Tolkningsbarhet och robusthet mot saknade indata är särskilt viktigt i kliniska beslutsstödsystem. För att lösa dessa problem har forskargruppen iGH vid EPFL föreslagit en modulär sekventiell fusion av indata som kallas Modular Decision Support Network (MoDN). MoDN testades på unimodala tabulära indata för multitask-utdata och visade sig vara överlägsen sina monolitiska parallella motsvarigheter, samtidigt som den hanterar alla antal och kombinationer av indata och ger kontinuerlig prediktiv feedback i realtid. Vårt mål är att utöka MoDN till MultiModN med multimodala indata och jämföra fördelarna och begränsningarna med sekventiell fusion med en toppmodern baslinje för parallell fusion (P-Fusion). Vi anpassar vår experimentuppsättning till en tidigare publicerad P-Fusion-baslinje, med fokus på två binära diagnostiska prediktiva uppgifter (närvaro av pleural effusion och ödem) i en populär multimodal klinisk benchmark datauppsättning (MIMIC), som omfattar bilder, text, tabelldata och tidsserier. Vi utför fyra experiment och visar att MultiModN:s sekventiella arkitektur inte försämrar prestandan jämfört med P-Fusions baslinje, trots de extra fördelarna med att vara multitasking, komponerbar och tolkningsbar i sin egen rätt. Det sista experimentet visar att MultiModN motstår katastrofala fel från MNAR-data, vilket är särskilt vanligt i kliniska miljöer. Multimodal Learning Multi-task Learning Missingness Interpretability Multimodal Maskininlärning Multi-task Maskininlärning Missingness Tolkningsbarhet Other Mathematics Annan matematik
28	Event Detection and Extraction from News Articles Wang, Wei 21 February 2018 (has links) Event extraction is a type of information extraction(IE) that works on extracting the specific knowledge of certain incidents from texts. Nowadays the amount of available information (such as news, blogs, and social media) grows in exponential order. Therefore, it becomes imperative to develop algorithms that automatically extract the machine-readable information from large volumes of text data. In this dissertation, we focus on three problems in obtaining event-related information from news articles. (1) The first effort is to comprehensively analyze the performance and challenges in current large-scale event encoding systems. (2) The second problem involves event detection and critical information extractions from news articles. (3) Third, the efforts concentrate on event-encoding which aims to extract event extent and arguments from texts. We start by investigating the two large-scale event extraction systems (ICEWS and GDELT) in the political science domain. We design a set of experiments to evaluate the quality of the extracted events from the two target systems, in terms of reliability and correctness. The results show that there exist significant discrepancies between the outputs of automated systems and hand-coded system and the accuracy of both systems are far away from satisfying. These findings provide preliminary background and set the foundation for using advanced machine learning algorithms for event related information extraction. Inspired by the successful application of deep learning in Natural Language Processing (NLP), we propose a Multi-Instance Convolutional Neural Network (MI-CNN) model for event detection and critical sentences extraction without sentence level labels. To evaluate the model, we run a set of experiments on a real-world protest event dataset. The result shows that our model could be able to outperform the strong baseline models and extract the meaningful key sentences without domain knowledge and manually designed features. We also extend the MI-CNN model and propose an MIMTRNN model for event extraction with distant supervision to overcome the problem of lacking fine level labels and small size training data. The proposed MIMTRNN model systematically integrates the RNN, Multi-Instance Learning, and Multi-Task Learning into a unified framework. The RNN module aims to encode into the representation of entity mentions the sequential information as well as the dependencies between event arguments, which are very useful in the event extraction task. The Multi-Instance Learning paradigm makes the system does not require the precise labels in entity mention level and make it perfect to work together with distant supervision for event extraction. And the Multi-Task Learning module in our approach is designed to alleviate the potential overfitting problem caused by the relatively small size of training data. The results of the experiments on two real-world datasets(Cyber-Attack and Civil Unrest) show that our model could be able to benefit from the advantage of each component and outperform other baseline methods significantly. / Ph. D. / Nowadays the amount of available information (such as news, blogs, and social media) grows in exponential order. The demand of making use of the massive on-line information during decision making process becomes increasing intensive. Therefore, it is imperative to develop algorithms that automatically extract the formatted information from large volumes of the unstructured text data. In this dissertation, we focus on three problems in obtaining event-related information from news articles. (1) The first effort is to comprehensively analyze the performance and challenges in current large-scale event encoding systems. (2) The second problem involves detecting the event and extracting key information about the event in the article. (3) Third, the efforts concentrate on extracting the arguments of the event from the text. We found that there exist significant discrepancies between the outputs of automated systems and hand-coded system and the accuracy of current event extraction systems are far away from satisfying. These findings provide preliminary background and set the foundation for using advanced machine learning algorithms for event related information extraction. Our experiments on two real-world event extraction tasks (Cyber-Attack and Civil Unrest) show the effectiveness of our deep learning approaches for detecting and extracting the event information from unstructured text data. Event Detection Event Encoding Deep learning (Machine learning) Convolutional Neural Network Recurrent Neural Network Multi Instance Learning Multi Task Learning
29	Découverte et exploitation de la hiérarchie des tâches pour apprendre des séquences de politiques motrices par un robot stratégique et interactif / Discovering and exploiting the task hierarchy to learn sequences of motor policies for a strategic and interactive robot Duminy, Nicolas 18 December 2018 (has links) Il y a actuellement des efforts pour faire opérer des robots dans des environnements complexes, non bornés, évoluant en permanence, au milieu ou même en coopération avec des humains. Leurs tâches peuvent être de types variés, hiérarchiques, et peuvent subir des changements radicaux ou même être créées après le déploiement du robot. Ainsi, ces robots doivent être capable d'apprendre en continu de nouvelles compétences, dans un espace non-borné, stochastique et à haute dimensionnalité. Ce type d'environnement ne peut pas être exploré en totalité, le robot va devoir organiser son exploration et décider ce qui est le plus important à apprendre ainsi que la méthode d'apprentissage. Ceci devient encore plus difficile lorsque le robot est face à des tâches à complexités variables, demandant soit une action simple ou une séquence d'actions pour être réalisées. Nous avons développé une infrastructure algorithmique d'apprentissage stratégique intrinsèquement motivé, appelée Socially Guided Intrinsic Motivation for Sequences of Actions through Hierarchical Tasks (SGIM-SAHT), apprenant la relation entre ses actions et leurs conséquences sur l'environnement. Elle organise son apprentissage, en décidant activement sur quelle tâche se concentrer, et quelle stratégie employer entre autonomes et interactives. Afin d'apprendre des tâches hiérarchiques, une architecture algorithmique appelée procédures fut développée pour découvrir et exploiter la hiérarchie des tâches, afin de combiner des compétences en fonction des tâches. L'utilisation de séquences d'actions a permis à cette architecture d'apprentissage d'adapter la complexité de ses actions à celle de la tâche étudiée. / Efforts are made to make robots operate more and more in complex unbounded ever-changing environments, alongside or even in cooperation with humans. Their tasks can be of various kinds, can be hierarchically organized, and can also change dramatically or be created, after the robot deployment. Therefore, those robots must be able to continuously learn new skills, in an unbounded, stochastic and high-dimensional space. Such environment is impossible to be completely explored during the robot's lifetime, therefore it must be able to organize its exploration and decide what is more important to learn and how to learn it, using metrics such as intrinsic motivation guiding it towards the most interesting tasks and strategies. This becomes an even bigger challenge, when the robot is faced with tasks of various complexity, some requiring a simple action to be achieved, other needing a sequence of actions to be performed. We developed a strategic intrinsically motivated learning architecture, called Socially Guided Intrinsic Motivation for Sequences of Actions through Hierarchical Tasks (SGIM-SAHT), able to learn the mapping between its actions and their outcomes on the environment. This architecture, is capable to organize its learning process, by deciding which outcome to focus on, and which strategy to use among autonomous and interactive ones. For learning hierarchical set of tasks, the architecture was provided with a framework, called procedure framework, to discover and exploit the task hierarchy and combine skills together in a task-oriented way. The use of sequences of actions enabled such a learner to adapt the complexity of its actions to that of the task at hand. Motivation intrinsèque Babillage de buts Apprentissage de tâches multiples Apprentissage interactif Apprentissage hiérarchique Apprentissage stratégique Intrinsic motivation Goal-babbling Multi-task learning Interactive learning Hierarchical learning Strategic learning 629.892 006.31
30	Learning under differing training and test distributions Bickel, Steffen January 2008 (has links) One of the main problems in machine learning is to train a predictive model from training data and to make predictions on test data. Most predictive models are constructed under the assumption that the training data is governed by the exact same distribution which the model will later be exposed to. In practice, control over the data collection process is often imperfect. A typical scenario is when labels are collected by questionnaires and one does not have access to the test population. For example, parts of the test population are underrepresented in the survey, out of reach, or do not return the questionnaire. In many applications training data from the test distribution are scarce because they are difficult to obtain or very expensive. Data from auxiliary sources drawn from similar distributions are often cheaply available. This thesis centers around learning under differing training and test distributions and covers several problem settings with different assumptions on the relationship between training and test distributions-including multi-task learning and learning under covariate shift and sample selection bias. Several new models are derived that directly characterize the divergence between training and test distributions, without the intermediate step of estimating training and test distributions separately. The integral part of these models are rescaling weights that match the rescaled or resampled training distribution to the test distribution. Integrated models are studied where only one optimization problem needs to be solved for learning under differing distributions. With a two-step approximation to the integrated models almost any supervised learning algorithm can be adopted to biased training data. In case studies on spam filtering, HIV therapy screening, targeted advertising, and other applications the performance of the new models is compared to state-of-the-art reference methods. / Eines der wichtigsten Probleme im Maschinellen Lernen ist das Trainieren von Vorhersagemodellen aus Trainingsdaten und das Ableiten von Vorhersagen für Testdaten. Vorhersagemodelle basieren üblicherweise auf der Annahme, dass Trainingsdaten aus der gleichen Verteilung gezogen werden wie Testdaten. In der Praxis ist diese Annahme oft nicht erfüllt, zum Beispiel, wenn Trainingsdaten durch Fragebögen gesammelt werden. Hier steht meist nur eine verzerrte Zielpopulation zur Verfügung, denn Teile der Population können unterrepräsentiert sein, nicht erreichbar sein, oder ignorieren die Aufforderung zum Ausfüllen des Fragebogens. In vielen Anwendungen stehen nur sehr wenige Trainingsdaten aus der Testverteilung zur Verfügung, weil solche Daten teuer oder aufwändig zu sammeln sind. Daten aus alternativen Quellen, die aus ähnlichen Verteilungen gezogen werden, sind oft viel einfacher und günstiger zu beschaffen. Die vorliegende Arbeit beschäftigt sich mit dem Lernen von Vorhersagemodellen aus Trainingsdaten, deren Verteilung sich von der Testverteilung unterscheidet. Es werden verschiedene Problemstellungen behandelt, die von unterschiedlichen Annahmen über die Beziehung zwischen Trainings- und Testverteilung ausgehen. Darunter fallen auch Multi-Task-Lernen und Lernen unter Covariate Shift und Sample Selection Bias. Es werden mehrere neue Modelle hergeleitet, die direkt den Unterschied zwischen Trainings- und Testverteilung charakterisieren, ohne dass eine einzelne Schätzung der Verteilungen nötig ist. Zentrale Bestandteile der Modelle sind Gewichtungsfaktoren, mit denen die Trainingsverteilung durch Umgewichtung auf die Testverteilung abgebildet wird. Es werden kombinierte Modelle zum Lernen mit verschiedenen Trainings- und Testverteilungen untersucht, für deren Schätzung nur ein einziges Optimierungsproblem gelöst werden muss. Die kombinierten Modelle können mit zwei Optimierungsschritten approximiert werden und dadurch kann fast jedes gängige Vorhersagemodell so erweitert werden, dass verzerrte Trainingsverteilungen korrigiert werden. In Fallstudien zu Email-Spam-Filterung, HIV-Therapieempfehlung, Zielgruppenmarketing und anderen Anwendungen werden die neuen Modelle mit Referenzmethoden verglichen. Maschinelles Lernen Verteilungsunterschied Selektionsbias Multi-Task-Lernen Machine Learning Covariate Shift Sample Selection Bias Multi Task Learning Data processing Computer science

Search results