Global ETD Search

721	Morphing architectures for pose-based image generation of people in clothing / Morphing-arkitekturer för pose-baserad bildgeneration av människor i kläder Baldassarre, Federico January 2018 (has links) This project investigates the task of conditional image generation from misaligned sources, with an example application in the context of content creation for the fashion industry. The problem of spatial misalignment between images is identified, the related literature is discussed, and different approaches are introduced to address it. In particular, several non-linear differentiable morphing modules are designed and integrated in current architectures for image-to-image translation. The proposed method for conditional image generation is applied on a clothes swapping task, using a real-world dataset of fashion images provided by Zalando. In comparison to previous methods for clothes swapping and virtual try-on, the result achieved with our method are of high visual quality and achieve precise reconstruction of the details of the garments. / Detta projekt undersöker villkorad bildgenerering från förskjutna bild-källor, med ett tillämpat exempel inom innehållsskapande för modebranschen. Problemet med rumslig förskjutning mellan bilder identifieras varpå relaterad litteratur diskuteras. Därefter introduceras olika tillvägagångssätt för att lösa problemet. Projektet fokuserar i synnerhet på ickelinjära, differentierbara morphing-moduler vilka designas och integreras i befintlig arkitektur för bild-till-bild-översättning. Den föreslagna metoden för villkorlig bildgenerering tillämpas på en uppgift för klädbyte, med hjälp av ett verklighetsbaserat dataset av modebilder från Zalando. I jämförelse med tidigare modeller för klädbyte och virtuell provning har resultaten från vår metod hög visuell kvalité och uppnår exakt återuppbyggnad av klädernas detaljer. Deep learning image generation fashion Computer Sciences Datavetenskap (datalogi)
722	Bi-directional Sampling in Partial Fourier Reconstruction Ma, Zizhong 28 October 2022 (has links) No description available. Electrical Engineering
723	Learning generalizable and transferable representations across domains and modalities Kim, Donghyun 02 November 2022 (has links) While deep neural networks attain state-of-the-art performance for computer vision tasks with the help of massive supervised datasets, it is usually assumed that all train and test examples are drawn independently from the same distribution. However, in real-world applications, dataset bias and domain shift violate this assumption. Test data can be from different domains represented by different distributions, which can seriously affect the model performance. Thus, learning generalizable and transferable representations is important to make a model robust to many different types of distributional shift. Domain transfer such as Domain Adaptation (DA) and Domain Generalization (DG) have been proposed to learn generalizable and transferable features across domains. Domain transfer consists of two steps: 1) pre-training, where a model is first pre-trained on an upstream task with a massive supervised dataset, e.g., ImageNet, and 2) transfer (adaptation), where the model is fine-tuned on downstream multi-domain data. In this thesis, we highlight the limitations of current domain transfer approaches and relax the limitations to produce more practical and diverse domain transfer methods. To be specific, we study: 1) Cross-Domain Self-supervised Learning for Domain Adaptation. Prior DA methods use ImageNet pre-trained models as a weight initialization (i.e., pre-training stage). However, the downstream data can be very different from that of ImageNet. Previous domain adaptation approaches assume there are many labeled data in the source domain. Some applications (e.g., Medical Imaging) may not have enough source labels. We explore the problem of few-shot domain adaptation where we only have a few source labels. In addition, we propose cross-domain self-supervised pre-training, which uses only unlabeled multi-domain data. We show that our method significantly boosts the performance of diverse domain transfer tasks. 2) Pre-training for Domain Adaptation. While many DA and DG methods have been proposed and studied extensively in prior work, little attention has been paid to pre-training for domain transfer. We provide comprehensive experiments and an in-depth analysis of pre-training in terms of network architectures, datasets, and loss functions. Finally, we observe significant improvements from the modern pre-training and propose to modernize the current evaluation protocols. 3) Multimodal Representation Learning for Domain Adaptation. We devise self-supervised formulations for multimodal domain adaptation where we promote better knowledge transfer by aligning multimodal features. We first explore a language-vision task where we align the features of multiple languages and images. Then, we explore video domain adaptation with RGB and Flow modalities and propose a joint contrastive regularization that interplays among cross-modal and cross-domain features. 4) Domain Adaptive Keypoint Detection. Lastly, we explore domain adaptive keypoint detection tasks (e.g., human and animal pose estimation) which are not well explored in prior work. We propose a unified framework for diverse keypoint detection scenarios, where we can encounter different types of domain shifts. To handle these domain shifts, we propose a multi-level feature alignment using the input-level and output-level cues and show that our method generalizes well to diverse domain adaptive keypoint detection tasks. Computer science Artificial intelligence Computer vision Deep learning Machine learning
724	Sequential Survival Analysis with Deep Learning Glazier, Seth William 01 July 2019 (has links) Survival Analysis is the collection of statistical techniques used to model the time of occurrence, i.e. survival time, of an event of interest such as death, marriage, the lifespan of a consumer product or the onset of a disease. Traditional survival analysis methods rely on assumptions that make it difficult, if not impossible to learn complex non-linear relationships between the covariates and survival time that is inherent in many real world applications. We first demonstrate that a recurrent neural network (RNN) is better suited to model problems with non-linear dependencies in synthetic time-dependent and non-time-dependent experiments. Survival Analysis Deep Learning Neural Networks Mathematics Physical Sciences and Mathematics
725	Duration of Anticoagulant Therapy for Unprovoked Venous Thromboembolism Khan, Faizan 17 October 2022 (has links) Venous thromboembolism (VTE) is a chronic illness that affects nearly 10 million people every year worldwide. Anticoagulant therapy with direct oral anticoagulants is the mainstay of treatment for patients with VTE, and should be continued for at least 3-6 months. Thereafter, a decision should be made to discontinue anticoagulation or continue it indefinitely. This decision is most challenging for patients with a first unprovoked VTE because of uncertainty in estimates for the long-term benefits (e.g., reduction in recurrent VTE) and harms (e.g., increase in major bleeding) of extended anticoagulation, and the trade-offs between them. The overarching aim of this doctoral thesis was to address these key evidence gaps that are pertinent to making decisions regarding the duration of anticoagulation for patients with a first unprovoked VTE. The first three studies of this thesis synthesized contemporary and reliable estimates for the long-term risks and consequences of recurrent VTE and major bleeding, with and without extended anticoagulation (parameters that can influence the clinical and cost-effectiveness of discontinuing versus continuing anticoagulation indefinitely). Broadly, these systematic reviews and meta-analyses found that: 1) the long-term risks and consequences of major bleeding during extended anticoagulation are considerable, particularly with vitamin K antagonists as well as in older patients, patients using antiplatelet therapy, and in patients with kidney disease, a history of bleeding, or anemia; and 2) the long-term risks of recurrent VTE during extended anticoagulation and major bleeding after discontinuing anticoagulation are reassuringly low but not negligible. The fourth study incorporated the synthesized evidence to compare the lifetime clinical benefits, harms, and costs of discontinuing versus continuing anticoagulation indefinitely. This decision analytic modelling study showed that indefinite anticoagulation is unlikely to either result in a net clinical benefit or be cost-effective in all (i.e., unselected) patients with a first unprovoked VTE. Findings from this thesis can serve to impact clinical practice and health policy by informing patient prognosis to guide shared decision-making regarding the duration of treatment for unprovoked VTE, and informing future research to ultimately identify which patients should receive anticoagulation indefinitely in order to maximize health benefits for the available healthcare resources. Deep Vein Thrombosis Pulmonary Embolism Anticoagulant Therapy Venous Thromboembolism
726	Towards Scalable Deep 3D Perception and Generation Qian, Guocheng 11 October 2023 (has links) Scaling up 3D deep learning systems emerges as a paramount issue, comprising two primary facets: (1) Model scalability that designs a 3D network that is scalefriendly, i.e. model archives improving performance with increasing parameters and can run efficiently. Unlike 2D convolutional networks, 3D networks have to accommodate the irregularities of 3D data, such as respecting permutation invariance in point clouds. (2) Data scalability: high-quality 3D data is conspicuously scarce in the 3D field. 3D data acquisition and annotations are both complex and costly, hampering the development of scalable 3D deep learning. This dissertation delves into 3D deep learning including both perception and generation, addressing the scalability challenges. To address model scalability in 3D perception, I introduce ASSANet which outlines an approach for efficient 3D point cloud representation learning, allowing the model to scale up with a low cost of computation, and notably achieving substantial accuracy gains. I further introduce the PointNeXt framework, focusing on data augmentation and scalability of the architecture, that outperforms state-of-the-art 3D point cloud perception networks. To address data scalability, I present Pix4Point which explores the utilization of abundant 2D images to enhance 3D understanding. For scalable 3D generation, I propose Magic123 which leverages a joint 2D and 3D diffusion prior for zero-shot image-to-3D content generation without the necessity of 3D supervision. These collective efforts provide pivotal solutions to model and data scalability in 3D deep learning. 3D Deep Learning 3D Understanding 3D Generation Point Cloud
727	New Approaches to Optical Music Recognition Alfaro-Contreras, María 22 September 2023 (has links) El Reconocimiento Óptico de Música (Optical Music Recognition, OMR) es un campo de investigación que estudia cómo leer computacionalmente la notación musical presente en documentos y almacenarla en un formato digital estructurado. Los enfoques tradicionales de OMR suelen estructurarse en torno a un proceso de varias etapas: (i) preprocesamiento de imágenes, donde se abordan cuestiones relacionadas con el proceso de escaneado y la calidad del papel, (ii) segmentación y clasificación de símbolos, donde se detectan y etiquetan los distintos elementos de la imagen, (iii) reconstrucción de la notación musical, una fase de postprocesamiento del proceso de reconocimiento, y (iv) codificación de resultados, donde los elementos reconocidos se almacenan en un formato simbólico adecuado. Estos sistemas logran tasas de reconocimiento competitivas a costa de utilizar determinadas heurísticas, adaptadas a los casos para los que fueron diseñados. En consecuencia, la escalabilidad se convierte en una limitación importante, ya que para cada colección o tipo notacional es necesario diseñar un nuevo conjunto de heurísticas. Además, otro inconveniente de estos enfoques tradicionales es la necesidad de un etiquetado detallado, a menudo obtenido manualmente. Dado que cada símbolo se reconoce individualmente, se requieren las posiciones exactas de cada uno de ellos, junto con sus correspondientes etiquetas musicales. Los enfoques tradicionales de OMR suelen estructurarse en torno a un proceso de varias etapas: (i) preprocesamiento de imágenes, donde se abordan cuestiones relacionadas con el proceso de escaneado y la calidad del papel, (ii) segmentación y clasificación de símbolos, donde se detectan y etiquetan los distintos elementos de la imagen, (iii) reconstrucción de la notación musical, una fase de postprocesamiento del proceso de reconocimiento, y (iv) codificación de resultados, donde los elementos reconocidos se almacenan en un formato simbólico adecuado. Estos sistemas logran tasas de reconocimiento competitivas a costa de utilizar determinadas heurísticas, adaptadas a los casos para los que fueron diseñados. En consecuencia, la escalabilidad se convierte en una limitación importante, ya que para cada colección o tipo notacional es necesario diseñar un nuevo conjunto de heurísticas. Además, otro inconveniente de estos enfoques tradicionales es la necesidad de un etiquetado detallado, a menudo obtenido manualmente. Dado que cada símbolo se reconoce individualmente, se requieren las posiciones exactas de cada uno de ellos, junto con sus correspondientes etiquetas musicales. La incorporación del Aprendizaje Profundo (Deep Learning, DL) en el OMR ha producido un cambio hacia el uso de sistemas holísticos o de extremo a extremo basados en redes neuronales para la etapa de segmentación y clasificación de símbolos, tratando el proceso de reconocimiento como un único paso en lugar de dividirlo en distintas subtareas. Al aprender simultáneamente los procesos de extracción de características y clasificación, estas soluciones eliminan la necesidad de diseñar procesos específicos para cada caso: las características necesarias para la clasificación se infieren directamente de los datos. Para lograrlo, solo son necesarios pares de entrenamiento formados por la imagen de entrada y su correspondiente transcripción. En otras palabras, este enfoque evita la necesidad de anotar las posiciones exactas de los símbolos, lo que simplifica aún más el proceso de transcripción. El enfoque de extremo a extremo ha sido recientemente explorado en la literatura, pero siempre bajo la suposición de que un determinado preproceso ya ha segmentado los diferentes pentagramas de una partitura. El objetivo es, por tanto, recuperar la serie de símbolos musicales que aparecen en una imagen de un pentagrama. En este contexto, las Redes Neuronales Convolucionales Recurrentes (Convolutional Recurrent Neural Networks, CRNN) representan el estado del arte: el bloque convolucional se encarga de extraer características relevantes de la imagen de entrada, mientras que las capas recurrentes interpretan estas características en términos de secuencias de símbolos musicales. Las CRNN se entrenan principalmente utilizando la función de pérdida de Clasificación Temporal Conexionista (Connectionist Temporal Classification, CTC), la cual permite el entrenamiento sin requerir información explícita sobre la ubicación de los símbolos en la imagen. Para la etapa de inferencia, generalmente se emplea una política de decodificación voraz, es decir, se recupera la secuencia de mayor probabilidad. Esta tesis presenta una serie de contribuciones, organizadas en tres grupos distintos pero interconectados, que avanzan en el desarrollo de sistemas de OMR a nivel de pentagrama más robustos y generalizables. El primer grupo de contribuciones se centra en la reducción del esfuerzo humano al utilizar sistemas de OMR. Se comparan los tiempos de transcripción con y sin la ayuda de un sistema de OMR, observando que su uso acelera el proceso, aunque requiere una cantidad suficiente de datos etiquetados, lo cual implica un esfuerzo humano. Por lo tanto, se propone utilizar técnicas de Aprendizaje Auto- Supervisado (Self-Supervised Learning, SSL) para preentrenar un clasificador de símbolos, logrando una precisión superior al 80% al utilizar solo un ejemplo por clase en el entrenamiento. Este clasificador de símbolos puede acelerar el proceso de etiquetado de datos. El segundo grupo de contribuciones mejora el rendimiento de los sistemas de OMR de dos maneras. Por un lado, se propone una codificación musical que permite reconocer música monofónica y homofónica. Por otro lado, se mejora el rendimiento de los sistemas mediante el uso de la bidimensionalidad de la representación agnóstica, introduciendo tres cambios en el enfoque estándar: (i) una nueva arquitectura que incluye ramas específicas para captura características relacionadas con la forma (duración del evento) o la altura (tono) de los símbolos musicales, (ii) el uso de una representación de secuencia dividida, que requiere que el modelo prediga los atributos de forma y altura de manera secuencial, y (iii) un algoritmo de decodificación voraz personalizado que garantiza que la representación mencionada se cumple en la secuencia predicha. El tercer y último grupo de contribuciones explora las sinergias entre OMR y su equivalente en audio, la Transcripción Automática de Música (Automatic Music Transcription, AMT). Estas contribuciones confirman la existencia de sinergias entre ambos campos y evalúan distintos enfoques de fusión tardía para la transcripción multimodal, lo que se traduce en mejoras significativas en la precisión de la transcripción. Por último, la tesis concluye comparando los enfoques de fusión temprana y fusión tardía, y afirma que la fusión tardía ofrece más flexibilidad y mejor rendimiento. / Esta tesis ha sido financiada por el Ministerio de Universidades a través del programa de ayudas para la formación de profesorado universitario (Ref. FPU19/04957). Deep Learning Optical Music Recognition Automatic Music Transcription
728	FOCALSR: REVISITING IMAGE SUPER-RESOLUTION TRANSFORMERS WITH FFT-ENABLED CROSS ATTENTION LAYERS Botong Ou (17536914) 06 December 2023 (has links) <p dir="ltr">Motion blur arises from camera instability or swift movement of subjects within a scene. The objective of image deblurring is to eliminate these blur effects, thereby enhancing the image's quality. This task holds significant relevance, particularly in the era of smartphones and portable cameras. Yet, it remains a challenging issue, notwithstanding extensive research undertaken over many years. The fundamental concept in deblurring an image involves restoring a blurred pixel back to its initial state.</p><p dir="ltr">Deep learning (DL) algorithms, recognized for their capability to identify unique and significant features from datasets, have gained significant attention in the field of machine learning. These algorithms have been increasingly adopted in geoscience and remote sensing (RS) for analyzing large volumes of data. In these applications, low-level attributes like spectral and texture features form the foundational layer. The high-level feature representations derived from the upper layers of the network can be directly utilized in classifiers for pixel-based analysis. Thus, for enhancing the accuracy of classification using RS data, ensuring the clarity and quality of each collected data in the dataset is crucial for the effective construction of deep learning models.</p><p dir="ltr">In this thesis, we present the FFT-Cross Attention Transformer, an innovative approach amalgamating channel-focused and window-centric self-attention within a state-of-the-art(SOTA) Vision Transformer model. Augmented with a Fast Fourier Convolution Layer, this approach extends the Transformer's capability to capture intricate details in low-resolution images. Employing unified task pre-training during model development, we confirm the robustness of these enhancements through comprehensive testing, resulting in substantial performance gains. Notably, we achieve a remarkable 1dB improvement in the PSNR metric for remote sensing imagery, underscoring the transformative potential of the FFT-Cross Attention Transformer in advancing image processing and domain-specific vision tasks.</p> Computer vision Image processing Computer Vision Deep Learning Image Processing
729	Advanced Deep-Learning Methods For Automatic Change Detection and Classification of Multitemporal Remote-Sensing Images Bergamasco, Luca 09 June 2022 (has links) Deep-Learning (DL) methods have been widely used for Remote Sensing (RS) applications in the last few years, and they allow improving the analysis of the temporal information in bi-temporal and multi-temporal RS images. DL methods use RS data to classify geographical areas or find changes occurring over time. DL methods exploit multi-sensor or multi-temporal data to retrieve results more accurately than single-source or single-date processing. However, the State-of-the-Art DL methods exploit the heterogeneous information provided by these data by focusing the analysis either on the spatial information of multi-sensor multi-resolution images using multi-scale approaches or on the time component of the image time series. Most of the DL RS methods are supervised, so they require a large number of labeled data that is challenging to gather. Nowadays, we have access to many unlabeled RS data, so the creation of long image time series is feasible. However, supervised methods require labeled data that are expensive to gather over image time series. Hence multi-temporal RS methods usually follow unsupervised approaches. In this thesis, we propose DL methodologies that handle these open issues. We propose unsupervised DL methods that exploit multi-resolution deep feature maps derived by a Convolutional Autoencoder (CAE). These DL models automatically learn spatial features from the input during the training phase without any labeled data. We then exploit the high temporal resolution of image time series with the high spatial information of Very-High-Resolution (VHR) images to perform a multi-temporal and multi-scale analysis of the scene. We merge the information provided by the geometrical details of VHR images with the temporal information of the image time series to improve the RS application tasks. We tested the proposed methods to detect changes over bi-temporal RS images acquired by various sensors, such as Landsat-5, Landsat-8, and Sentinel-2, representing burned and deforested areas, and kinds of pasture impurities using VHR orthophotos and Sentinel-2 image time series. The results proved the effectiveness of the proposed methods.
730	Numerical Modeling and Inverse Design of Complex Nanophotonic Systems Baxter, Joshua Stuart Johannes 10 January 2024 (has links) Nanophotonics is the study and technological application of the interaction of electromagnetic waves (light) and matter at the nanometer scale. The field's extensive research focuses on generating, detecting, and controlling light using nanoscale features such as nanoparticles, waveguides, resonators, nanoantennas, and more. Exploration in the field is highly dependent on computational methods, which simulate how light will interact with matter in specific situations. However, as nanophotonics advances, so must the computational techniques. In this thesis, I present my work in various numerical studies in nanophotonics, sorted into three categories; plasmonics, inverse design, and deep learning. In plasmonics, I have developed methods for solving advanced material models (including nonlinearities) for small metallic and epsilon-near-zero features and validated them with other theoretical and experimental results. For inverse design, I introduce new methods for designing optical pulse shapes and metalenses for focusing high-harmonic generation. Finally, I used deep learning to model plasmonic colour generation from structured metal surfaces and to predict plasmonic nanoparticle multipolar responses. Nanophotonics Plasmonics Photonics Optics FDTD Inverse design Deep Learning

Search results