Global ETD Search

41	A Study on Multi-Granularity Representation Learning of Time Series Data / 時系列データのマルチグラニュラリティ表現学習に関する研究 Ye, Chengyang 23 January 2024 (has links) 京都大学 / 新制・課程博士 / 博士(情報学) / 甲第25022号 / 情博第854号 / 新制\|\|情\|\|143(附属図書館) / 京都大学大学院情報学研究科社会情報学専攻 / (主査)教授伊藤孝行, 教授神田崇行, 教授森信介, 教授馬強(京都工芸繊維大学) / 学位規則第4条第1項該当 / Doctor of Informatics / Kyoto University / DFAM Representation learning Time Series Multi-Granularity Timestamp-Level Segment-Level Cross-Granularity 007
42	Representation Learning Based Causal Inference in Observational Studies Lu, Danni 22 February 2021 (has links) This dissertation investigates novel statistical approaches for causal effect estimation in observational settings, where controlled experimentation is infeasible and confounding is the main hurdle in estimating causal effect. As such, deconfounding constructs the main subject of this dissertation, that is (i) to restore the covariate balance between treatment groups and (ii) to attenuate spurious correlations in training data to derive valid causal conclusions that generalize. By incorporating ideas from representation learning, adversarial matching, generative causal estimation, and invariant risk modeling, this dissertation establishes a causal framework that balances the covariate distribution in latent representation space to yield individualized estimations, and further contributes novel perspectives on causal effect estimation based on invariance principles. The dissertation begins with a systematic review and examination of classical propensity score based balancing schemes for population-level causal effect estimation, presented in Chapter 2. Three causal estimands that target different foci in the population are considered: average treatment effect on the whole population (ATE), average treatment effect on the treated population (ATT), and average treatment effect on the overlap population (ATO). The procedure is demonstrated in a naturalistic driving study (NDS) to evaluate the causal effect of cellphone distraction on crash risk. While highlighting the importance of adopting causal perspectives in analyzing risk factors, discussions on the limitations in balance efficiency, robustness against high-dimensional data and complex interactions, and the need for individualization are provided to motivate subsequent developments. Chapter 3 presents a novel generative Bayesian causal estimation framework named Balancing Variational Neural Inference of Causal Effects (BV-NICE). Via appealing to the Robinson factorization and a latent Bayesian model, a novel variational bound on likelihood is derived, explicitly characterized by the causal effect and propensity score. Notably, by treating observed variables as noisy proxies of unmeasurable latent confounders, the variational posterior approximation is re-purposed as a stochastic feature encoder that fully acknowledges representation uncertainties. To resolve the imbalance in representations, BV-NICE enforces KL-regularization on the respective representation marginals using Fenchel mini-max learning, justified by a new generalization bound on the counterfactual prediction accuracy. The robustness and effectiveness of this framework are demonstrated through an extensive set of tests against competing solutions on semi-synthetic and real-world datasets. In recognition of the reliability issue when extending causal conclusions beyond training distributions, Chapter 4 argues ascertaining causal stability is the key and introduces a novel procedure called Risk Invariant Causal Estimation (RICE). By carefully re-examining the relationship between statistical invariance and causality, RICE cleverly leverages the observed data disparities to enable the identification of stable causal effects. Concretely, the causal inference objective is reformulated under the framework of invariant risk modeling (IRM), where a population-optimality penalty is enforced to filter out un-generalizable effects across heterogeneous populations. Importantly, RICE allows settings where counterfactual reasoning with unobserved confounding or biased sampling designs become feasible. The effectiveness of this new proposal is verified with respect to a variety of study designs on real and synthetic data. In summary, this dissertation presents a flexible causal inference framework that acknowledges the representation uncertainties and data heterogeneities. It enjoys three merits: improved balance to complex covariate interactions, enhanced robustness to unobservable latent confounders, and better generalizability to novel populations. / Doctor of Philosophy / Reasoning cause and effect is the innate ability of a human. While the drive to understand cause and effect is instinct, the rigorous reasoning process is usually trained through the observation of countless trials and failures. In this dissertation, we embark on a journey to explore various principles and novel statistical approaches for causal inference in observational studies. Throughout the dissertation, we focus on the causal effect estimation which answers questions like ``what if" and ``what could have happened". The causal effect of a treatment is measured by comparing the outcomes corresponding to different treatment levels of the same unit, e.g. ``what if the unit is treated instead of not treated?". The challenge lies in the fact that i) a unit only receives one treatment at a time and therefore it is impossible to directly compare outcomes of different treatment levels; ii) comparing the outcomes across different units may involve bias due to confounding as the treatment assignment potentially follows a systematic mechanism. Therefore, deconfounding constructs the main hurdle in estimating causal effects. This dissertation presents two parallel principles of deconfounding: i) balancing, i.e., comparing difference under similar conditions; ii) contrasting, i.e., extracting invariance under heterogeneous conditions. Chapter 2 and Chapter 3 explore causal effect through balancing, with the former systematically reviews a classical propensity score weighting approach in a conventional data setting and the latter presents a novel generative Bayesian framework named Balancing Variational Neural Inference of Causal Effects(BV-NICE) for high-dimensional, complex, and noisy observational data. It incorporates the advance deep learning techniques of representation learning, adversarial learning, and variational inference. The robustness and effectiveness of the proposed framework are demonstrated through an extensive set of experiments. Chapter 4 extracts causal effect through contrasting, emphasizing that ascertaining stability is the key of causality. A novel causal effect estimating procedure called Risk Invariant Causal Estimation(RICE) is proposed that leverages the observed data disparities to enable the identification of stable causal effects. The improved generalizability of RICE is demonstrated through synthetic data with different structures, compared with state-of-art models. In summary, this dissertation presents a flexible causal inference framework that acknowledges the data uncertainties and heterogeneities. By promoting two different aspects of causal principles and integrating advance deep learning techniques, the proposed framework shows improved balance for complex covariate interactions, enhanced robustness for unobservable latent confounders, and better generalizability for novel populations. Causal Inference Representation Learning Naturalistic Driving Study Propensity Score Representation Balancing Invariant Risk Minimization
43	Domain adaptation in reinforcement learning via causal representation learning Côté-Turcotte, Léa 07 1900 (has links) Les progrès récents en apprentissage par renforcement ont été substantiels, mais ils dépendent souvent de l'accès à l'état. Un état est un ensemble d'informations qui fournit une description concise et complète de l'environnement, englobant tous les détails pertinents nécessaires pour que l'agent puisse prendre des décisions éclairées. Cependant, de telles données détaillées sont rarement disponibles dans les situations réelles. Les images offrent une forme de données plus réaliste et accessible, mais leur complexité pose d'importants défis dans le développement de politiques robustes et efficaces. Les méthodes d'apprentissage de représentation se sont révélées prometteuses pour améliorer l'efficacité des politiques basées sur les données de pixels. Néanmoins, les politiques peinent toujours à généraliser à de nouveaux domaines, rendant l'application de l'apprentissage par renforcement basé sur les pixels impraticable pour des scénarios du monde réel. Cela souligne le besoin urgent de s'attaquer à l'adaptation de domaine dans l'apprentissage par renforcement basé sur les pixels. Cette thèse examine le potentiel de l'apprentissage de représentation causale pour améliorer l'adaptation de domaine dans l'apprentissage par renforcement. L'idée sous-jacente est que pour que les agents s'adaptent efficacement à de nouveaux domaines, ils doivent être capables d'extraire des informations de haut niveau à partir de données brutes et de comprendre les dynamiques causales qui régulent l'environnement. Pour étudier cela, nous évaluons quatre algorithmes distincts d'apprentissage de représentation causale, chacun conçu pour capturer un niveau de structure plus détaillé dans l'espace latent, évaluant leur impact sur la performance d'adaptation de domaine. Le processus implique d'abord d'apprendre une représentation causale puis de former l'agent d'apprentissage par renforcement sur cette représentation. La performance d'adaptation de domaine de ces agents est évaluée dans deux environnements de conduite autonome : CarRacing et CARLA. Nos résultats soutiennent que l'apprentissage d'une représentation latente améliore nettement l'efficacité et la robustesse dans l'apprentissage par renforcement basé sur les pixels. De plus, ils indiquent qu'apprendre une structure causale dans l'espace latent contribue à une meilleure performance d'adaptation de domaine. Cependant, la promesse de la représentation causale pour améliorer l'adaptation de domaine est tempérée par leurs demandes computationnelles substantielles. De plus, lorsque des observations de plusieurs domaines sont disponibles, cette approche ne dépasse pas l'efficacité des méthodes plus simples. Nous avons également trouvé que les agents entraînés sur des représentations qui conservent toutes les informations de l'espace latent ont tendance à surpasser les autres, suggérant que les représentations dissociées sont préférables aux représentations invariantes. / Recent advancements in reinforcement learning have been substantial, but they often depend on access to the state. A state is a set of information that provides a concise and complete description of the environment, encompassing all relevant details necessary for the agent to make informed decisions. However, such detailed data is rarely available in real-world settings. Images present a more realistic and accessible data form, but their complexity introduces considerable challenges in developing robust and efficient policies. Representation learning methods have shown promise in enhancing the efficiency of policies based on pixel data. Nonetheless, policies continue to struggle to generalize to new domains, making the application of pixel-based reinforcement learning impractical for real-world scenarios. This highlights the urgent need to address domain adaptation in pixel-based reinforcement learning. This thesis investigates the potential of causal representation learning in improving domain adaptation in reinforcement learning. The underlying premise is that for reinforcement learning agents to adapt to new domains effectively, they must be able to extract high-level information from raw data and comprehend the causal dynamics that regulate the environment. We evaluate four distinct causal representation learning algorithms, each aimed at uncovering a more intricate level of structure within the latent space, to assess their impact on domain adaptation performance. This involves first learning a causal representation, followed by training the reinforcement learning agent on this representation. The domain adaptation performance of these agents is evaluated within two autonomous driving environments: CarRacing and CARLA. Our results support that learning a latent representation enhances efficiency and robustness in pixel-based RL. Moreover, it indicates that understanding complex causal structures in the latent space leads to improved domain adaptation performance. However, the promise of advanced causal representation in augmenting domain adaptation is tempered by its substantial computational demands. Additionally, when observations from multiple domains are available, this approach does not exceed the effectiveness of simpler methods. We also found that agents trained on representations that retain all information tend to outperform others, suggesting that disentangled representations are preferable to invariant representations. Apprentissage par renforcement Causalité Adaptation de domaines Apprentissage auto-supervisé Apprentissage de représentations Apprentissage de representaions causales Apprentissage automatique Reinforcement learning Causality Domain adaptation Self-supervised learning Representation learning Invariant representation learning Disentangled representation learning Causal representation learning Model-free reinforcement learning Machine learning
44	Numerical Methods in Deep Learning and Computer Vision Song, Yue 23 April 2024 (has links) Numerical methods, the collective name for numerical analysis and optimization techniques, have been widely used in the field of computer vision and deep learning. In this thesis, we investigate the algorithms of some numerical methods and their relevant applications in deep learning. These studied numerical techniques mainly include differentiable matrix power functions, differentiable eigendecomposition (ED), feasible orthogonal matrix constraints in optimization and latent semantics discovery, and physics-informed techniques for solving partial differential equations in disentangled and equivariant representation learning. We first propose two numerical solvers for the faster computation of matrix square root and its inverse. The proposed algorithms are demonstrated to have considerable speedup in practical computer vision tasks. Then we turn to resolve the main issues when integrating differentiable ED into deep learning -- backpropagation instability, slow decomposition for batched matrices, and ill-conditioned input throughout the training. Some approximation techniques are first leveraged to closely approximate the backward gradients while avoiding gradient explosion, which resolves the issue of backpropagation instability. To improve the computational efficiency of ED, we propose an efficient ED solver dedicated to small and medium batched matrices that are frequently encountered as input in deep learning. Some orthogonality techniques are also proposed to improve input conditioning. All of these techniques combine to mitigate the difficulty of applying differentiable ED in deep learning. In the last part of the thesis, we rethink some key concepts in disentangled representation learning. We first investigate the relation between disentanglement and orthogonality -- the generative models are enforced with different proposed orthogonality to show that the disentanglement performance is indeed improved. We also challenge the linear assumption of the latent traversal paths and propose to model the traversal process as dynamic spatiotemporal flows on the potential landscapes. Finally, we build probabilistic generative models of sequences that allow for novel understandings of equivariance and disentanglement. We expect our investigation could pave the way for more in-depth and impactful research at the intersection of numerical methods and deep learning. Settore INF/01 - Informatica
45	Exploring adaptation of self-supervised representation learning to histopathology images for liver cancer detection Jonsson, Markus January 2024 (has links) This thesis explores adapting self-supervised representation learning to visual domains beyond natural scenes, focusing on medical imaging. The research addresses the central question: “How can self-supervised representation learning be specifically adapted for detecting liver cancer in histopathology images?” The study utilizes the PAIP 2019 dataset for liver cancer segmentation and employs a self-supervised approach based on the VICReg method. The evaluation results demonstrated that the ImageNet-pretrained model achieved superior performance on the test set, with a clipped Jaccard index of 0.7747 at a threshold of 0.65. The VICReg-pretrained model followed closely with a score of 0.7461, while the model initialized with random weights trailed behind at 0.5420. These findings indicate that while ImageNet-pretrained models outperformed VICReg-pretrained models, the latter still captured essential data characteristics, suggesting the potential of self-supervised learning in diverse visual domains. The research attempts to contribute to advancing self-supervised learning in non-natural scenes and provides insights into model pretraining strategies. Self-supervised learning Representation learning Computer vision Computer Sciences Datavetenskap (datalogi)
46	Unsupervised Learning for Efficient Underwriting Dalla Torre, Elena January 2024 (has links) In the field of actuarial science, statistical methods have been extensively studied toestimate the risk of insurance. These methods are good at estimating the risk of typicalinsurance policies, as historical data is available. However, their performance can be pooron unique insurance policies, which require the manual assessment of an underwriter. Aclassification of insurance policies on a unique/typical scale would help insurance companiesallocate manual resources more efficiently and validate the goodness of fit of thepricing models on unique objects. The aim of this thesis is to use outlier detection methodsto identify unique non-life insurance policies. The many categorical nominal variablespresent in insurance policy data sets represent a challenge when applying outlier detectionmethods. Therefore, we also explore different ways to derive informative numericalrepresentations of categorical nominal variables. First, as a baseline, we use the principalcomponent analysis of mixed data to find a numerical representation of categorical nominalvariables and the principal component analysis to identify unique insurances. Then,we see whether better performance can be achieved using autoencoders which can capturecomplex non-linearities. In particular, we learn a numerical representation of categoricalnominal variables using the encoder layer of an autoencoder, and we use a different autoencoderto identify unique insurances. Since we are in an unsupervised setting, the twomethods are compared by performing a simulation study and using the NLS-KDD dataset. The analysis shows autoencoders are superior at identifying unique objects than principalcomponent analysis. We conclude that the ability of autoencoders to model complexnon-linearities between the variables allows for this class of methods to achieve superiorperformance. Datadriven Underwriting Outlier Detection Autoencoders Principal Component Analysis Representation Learning Probability Theory and Statistics Sannolikhetsteori och statistik
47	Adversarial approaches to remote sensing image analysis Bejiga, Mesay Belete 17 April 2020 (has links) The recent advance in generative modeling in particular the unsupervised learning of data distribution is attributed to the invention of models with new learning algorithms. Among the methods proposed, generative adversarial networks (GANs) have shown to be the most efficient approaches to estimate data distributions. The core idea of GANs is an adversarial training of two deep neural networks, called generator and discriminator, to learn an implicit approximation of the true data distribution. The distribution is approximated through the weights of the generator network, and interaction with the distribution is through the process of sampling. GANs have found to be useful in applications such as image-to-image translation, in-painting, and text-to-image synthesis. In this thesis, we propose to capitalize on the power of GANs for different remote sensing problems. The first problem is a new research track to the remote sensing community that aims to generate remote sensing images from text descriptions. More specifically, we focus on exploiting ancient text descriptions of geographical areas, inherited from previous civilizations, and convert them the equivalent remote sensing images. The proposed method is composed of a text encoder and an image synthesis module. The text encoder is tasked with converting a text description into a vector. To this end, we explore two encoding schemes: a multilabel encoder and a doc2vec encoder. The multilabel encoder takes into account the presence or absence of objects in the encoding process whereas the doc2vec method encodes additional information available in the text. The encoded vectors are then used as conditional information to a GAN network and guide the synthesis process. We collected satellite images and ancient text descriptions for training in order to evaluate the efficacy of the proposed method. The qualitative and quantitative results obtained suggest that the doc2vec encoder-based model yields better images in terms of the semantic agreement with the input description. In addition, we present open research areas that we believe are important to further advance this new research area. The second problem we want to address is the issue of semi-supervised domain adaptation. The goal of domain adaptation is to learn a generic classifier for multiple related problems, thereby reducing the cost of labeling. To that end, we propose two methods. The first method uses GANs in the context of image-to-image translation to adapt source domain images into target domain images and train a classifier using the adapted images. We evaluated the proposed method on two remote sensing datasets. Though we have not explored this avenue extensively due to computational challenges, the results obtained show that the proposed method is promising and worth exploring in the future. The second domain adaptation strategy borrows the adversarial property of GANs to learn a new representation space where the domain discrepancy is negligible, and the new features are discriminative enough. The method is composed of a feature extractor, class predictor, and domain classifier blocks. Contrary to the traditional methods that perform representation and classifier learning in separate stages, this method combines both into a single-stage thereby learning a new representation of the input data that is domain invariant and discriminative. After training, the classifier is used to predict both source and target domain labels. We apply this method for large-scale land cover classification and cross-sensor hyperspectral classification problems. Experimental results obtained show that the proposed method provides a performance gain of up to 40%, and thus indicates the efficacy of the method.
48	Towards Label Efficiency and Privacy Preservation in Video Understanding Dave, Ishan Rajendrakumar 01 January 2024 (has links) (PDF) Video understanding involves tasks like action recognition, video retrieval, human pose propagation which are essential for applications such as surveillance, surgical videos, sports analysis, and content recommendation. The progress in this domain has been largely driven by advancements in deep learning, facilitated by large-scale labeled datasets. However, video annotation presents significant challenges due to its time-consuming and expensive nature. This limitation underscores the importance of developing methods that can learn effectively from unlabeled or limited-labeled data, which makes self-supervised learning (SSL) and semi-supervised learning particularly relevant for video understanding. Another significant challenge in video understanding is privacy preservation, as methods often inadvertently leak private information, presenting a growing concern in the field. In this dissertation, we present methods to improve the label efficiency of deep video models by employing self-supervised and semi-supervised methods, and a self-supervised method designed to mitigate privacy leakage in action recognition task. Our first contribution is the Temporal Contrastive Learning framework for Video Representation (TCLR). Unlike prior contrastive self-supervised learning methods which aim to learn temporal similarity between different clips of the same video, TCLR encourages the learning differences rather than similarities in clips from the same video. TCLR consists of two novel losses to improve upon existing contrastive self-supervised video representations, contrasting temporal segments of the same video at two different temporal aggregation steps: clip level and temporal pooling level. Although TCLR offers an effective solution for video-level downstream tasks, it does not encourage framewise video representation for addressing low-level temporal correspondence-based downstream tasks. To promote a more effective framewise video representation, we first eliminate learning shortcuts present in existing temporal pretext tasks by introducing framewise spatial jittering and proposing more challenging frame-level temporal pretext tasks. Our approach "No More Shortcuts"(NMS) results in state-of-the-art performance across a wide range of downstream tasks, encompassing both high-level semantic and low-level temporal correspondence tasks. While the VideoSSL approaches, TCLR and NMS, focus only on learning from unlabeled videos, in practice, some labeled data often exists. Our next focus is on semi-supervised action recognition, where we have a small set of labeled videos with a large pool of unlabeled videos. Using the observations from the self-supervised representations, we leverage the unlabeled videos using the complementary strengths of temporally-invariant and temporally-distinctive contrastive self-supervised video representations. Our proposed semi-supervised method "TimeBalance" introduces a student-teacher framework that dynamically combines the knowledge of two self-supervised teachers based on the nature of the unlabeled video using the proposed reweighting strategy. Although TimeBalance performs well for coarse-grained actions, it struggles with fine-grained actions. To address this, we propose "FinePseudo" framework, which leverages temporal alignability to learn phase-aware distances. It also introduces collaborative pseudo-labeling between video-level and alignability encoder, refining the pseudo-labeling process for fine-grained actions. Although the above mentioned video representations are useful for various downstream applications, they often leak a considerable amount of private information present in the videos. To mitigate the privacy leaks in videos, we propose SPAct, a self-supervised framework that removes private information from input videos without requiring privacy labels. SPAct exhibits competitive performance compared to supervised methods and introduces new evaluation protocols to assess the generalization capability of the anonymization across novel action and privacy attributes. Overall, this dissertation contributes to the advancement of label-efficient and privacy-preserving video understanding by exploring novel self-supervised and semi-supervised learning approaches and their applications in privacy-preserving action recognition. Self-supervised learning Video Representation learning Privacy-Preservation Computer Vision Machine Learning
49	Learning and evaluating multimodal representations for digital domains Burns, Andrea 11 March 2025 (has links) 2024 / Digital domains such as mobile apps and webpages have become fundamental to everyday life. Humans perform many tasks on their phones and online, like reading recipes, booking calendar events, viewing images, and shopping for food or clothes. A prerequisite to building Artificially Intelligent models to aid in these tasks is the process of learning embeddings, i.e., representations, of mobile app and webpage data. In this thesis, we (1) Curate multimodal app and webpage datasets. Digital domains capture four modalities: image, text, structure, and action. We contribute the first multimodal app dataset with all app modalities and language annotations and the first multimodal webpage dataset to retain structure with all image and text content in a unified webpage sample. (2) Define new tasks to evaluate app and webpage understanding. Using our new app dataset, we define an instruction following benchmark that requires mapping a natural language high-level user goal to a sequence of low-level actions. We also define a novel feasibility classification task, in which we predict which user requests can be satisfied in the app environment. Using our new webpage dataset, we define three generation-style tasks: webpage description generation, section summarization, and contextual image captioning. This aims to evaluate webpage understanding at a global, regional, and local level, respectively. (3) Evaluate the importance of each data modality. With our new benchmarks, we determine the impact of each modality on downstream task performance. We find images to be useful for classifying whether a user command is actually satisfiable in an app environment and key to correcting over-reliance on text information. For our webpage benchmarks, contextual text and images aid all tasks, helping image captions retain knowledge-based detail and page descriptions or section summaries retain topical relevance or specificity. (4) Propose new methods for learning multimodal representations of digital domains. Utilizing all available modalities, we contribute a novel attention scheme to make use of webpage structure, separating the most salient content for each task. Results demonstrate that our multimodal encoder is more performant and more computationally efficient. For mobile app representations, we propose using text descriptions and action sequences to learn embeddings that can encode both global and local features while being significantly more data efficient. We outperform prior work on a suite of app understanding tasks while only utilizing publicly available data. Computer science Machine learning Mobile apps Representation learning User interfaces Vision and language Webpages
50	Novel document representations based on labels and sequential information Kim, Seungyeon 21 September 2015 (has links) A wide variety of text analysis applications are based on statistical machine learning techniques. The success of those applications is critically affected by how we represent a document. Learning an efficient document representation has two major challenges: sparsity and sequentiality. The sparsity often causes high estimation error, and text's sequential nature, interdependency between words, causes even more complication. This thesis presents novel document representations to overcome the two challenges. First, I employ label characteristics to estimate a compact document representation. Because label attributes implicitly describe the geometry of dense subspace that has substantial impact, I can effectively resolve the sparsity issue while only focusing the compact subspace. Second, while modeling a document as a joint or conditional distribution between words and their sequential information, I can efficiently reflect sequential nature of text in my document representations. Lastly, the thesis is concluded with a document representation that employs both labels and sequential information in a unified formulation. The following four criteria are utilized to evaluate the goodness of representations: how close a representation is to its original data, how strongly a representation can be distinguished from each other, how easy to interpret a representation by a human, and how much computational effort is needed for a representation. While pursuing those good representation criteria, I was able to obtain document representations that are closer to the original data, stronger in discrimination, and easier to be understood than traditional document representations. Efficient computation algorithms make the proposed approaches largely scalable. This thesis examines emotion prediction, temporal emotion analysis, modeling documents with edit histories, locally coherent topic modeling, and text categorization tasks for possible applications. Representation learning Topic modeling Supervised learning Sequential document modeling Sentiment analysis Mood analysis Matrix factorization Machine learning Artificial intelligence

Search results