Spelling suggestions: "subject:"spinlabeled 1earning"" "subject:"spinlabeled c1earning""
1 |
Towards Quality and General Knowledge Representation LearningTang, Zhenwei 03 1900 (has links)
Knowledge representation learning (KRL) has been a long-standing and challenging topic in artificial intelligence. Recent years have witnessed the rapidly growing research interest and industrial applications of KRL. However, two important aspects of KRL remains unsatisfactory in the academia and industries, i.e., the quality and the generalization capabilities of the learned representations. This thesis presents a set of methods target at learning high quality distributed knowledge representations and further empowering the learned representations for more general reasoning tasks over knowledge bases. On the one hand, we identify the false negative issue and the data sparsity issue in the knowledge graph completion (KGC) task that can limit the quality of the learned representations. Correspondingly, we design a ranking-based positive-unlabeled learning method along with an adversarial data augmentation strategy for KGC. Then we unify them seamlessly to improve the quality of the learned representations. On the other hand, although recent works expand the supported neural reasoning tasks remarkably by answering multi-hop logical queries, the generalization capabilities are still limited to inductive reasoning tasks that can only provide entity-level answers. In fact, abductive reasoning that provides concept-level answers to queries is also in great need by online users and a wide range of downstream tasks. Therefore, we design a joint abductive and inductive knowledge representation learning and reasoning system by incorporating, representing, and operating on concepts. Extensive experimental results along with case studies demonstrate the effectiveness of our methods in improving the quality and generalization capabilities of the learned distributed knowledge representations.
|
2 |
Positive unlabeled learning applications in music and healthcareArjannikov, Tom 10 September 2021 (has links)
The supervised and semi-supervised machine learning paradigms hinge on the idea that the training data is labeled. The label quality is often brought into question, and problems related to noisy, inaccurate, or missing labels are studied. One of these is an interesting and prevalent problem in the semi-supervised classification area where only some positive labels are known. At the same time, the remaining and often the majority of the available data is unlabeled, i.e., there are no negative examples. Known as Positive-Unlabeled (PU) learning, this problem has been identified with increasing frequency across many disciplines, including but not limited to health science, biology, bioinformatics, geoscience, physics, business, and politics. Also, there are several closely related machine learning problems, such as cost-sensitive learning and mixture proportion estimation.
This dissertation explores the PU learning problem from the perspective of density estimation and proposes a new modular method compatible with the relabeling framework that is common in PU learning literature. This approach is compared with two existing algorithms throughout the manuscript, one from a seminal work by Elkan and Noto and a current state-of-the-art algorithm by Ivanov. Furthermore, this thesis identifies two machine learning application domains that can benefit from PU learning approaches, which were not previously seen that way: predicting length of stay in hospitals and automatic music tagging. Experimental results with multiple synthetic and real-world datasets from different application domains validate the proposed approach.
Accurately predicting the in-hospital length of stay (LOS) at the time of admission can positively impact healthcare metrics, particularly in novel response scenarios such as the Covid-19 pandemic. During the regular steady-state operation, traditional classification algorithms can be used for this purpose to inform planning and resource management. However, when there are sudden changes to the admission and patient statistics, such as during the onset of a pandemic, these approaches break down because reliable training data becomes available only gradually over time. This thesis demonstrates the effectiveness of PU learning approaches in such situations through experiments by simulating the positive-unlabeled scenario using two fully-labeled publicly available LOS datasets.
Music auto-tagging systems are typically trained using tag labels provided by human listeners. In many cases, this labeling is weak, which means that the provided tags are valid for the associated tracks, but there can be tracks for which a tag would be valid but not present. This situation is analogous to PU learning with the additional complication of being a multi-label scenario. Experimental results on publicly available music datasets with tags representing three different labeling paradigms demonstrate the effectiveness of PU learning techniques in recovering the missing labels and improving auto-tagger performance. / Graduate
|
Page generated in 0.0673 seconds