1 |
Unpaired Skeleton-to-Photo Translation for Sketch-to-Photo SynthesisGu, Yuanzhe 28 October 2022 (has links) (PDF)
Sketch-to-photo synthesis usually faced the problem of lack of labeled data, so we propose some methods based on CycleGAN to train a model to translate sketch to photo with unpaired data. Our main contribution is a proposed Sketch-to-Skeleton-to-Image (SSI) method, which performs skeletonization on sketches to reduce variance on the sketch data. We also tried different representations of the skeleton and different models for our task. Experiment results show that the generated image quality has a negative correlation with the sparsity of the input data.
|
2 |
Exploring Multi-Domain and Multi-Modal Representations for Unsupervised Image-to-Image TranslationLiu, Yahui 20 May 2022 (has links)
Unsupervised image-to-image translation (UNIT) is a challenging task in the image manipulation field, where input images in a visual domain are mapped into another domain with desired visual patterns (also called styles). An ideal direction in this field is to build a model that can map an input image in a domain to multiple target domains and generate diverse outputs in each target domain, which is termed as multi-domain and multi-modal unsupervised image-to-image translation (MMUIT). Recent studies have shown remarkable results in UNIT but they suffer from four main limitations: (1) State-of-the-art UNIT methods are either built from several two-domain mappings that are required to be learned independently or they generate low-diversity results, a phenomenon also known as model collapse. (2) Most of the manipulation is with the assistance of visual maps or digital labels without exploring natural languages, which could be more scalable and flexible in practice. (3) In an MMUIT system, the style latent space is usually disentangled between every two image domains. While interpolations within domains are smooth, interpolations between two different domains often result in unrealistic images with artifacts when interpolating between two randomly sampled style representations from two different domains. Improving the smoothness of the style latent space can lead to gradual interpolations between any two style latent representations even between any two domains. (4) It is expensive to train MMUIT models from scratch at high resolution. Interpreting the latent space of pre-trained unconditional GANs can achieve pretty good image translations, especially high-quality synthesized images (e.g., 1024x1024 resolution). However, few works explore building an MMUIT system with such pre-trained GANs.
In this thesis, we focus on these vital issues and propose several techniques for building better MMUIT systems. First, we base on the content-style disentangled framework and propose to fit the style latent space with Gaussian Mixture Models (GMMs). It allows a well-trained network using a shared disentangled style latent space to model multi-domain translations. Meanwhile, we can randomly sample different style representations from a Gaussian component or use a reference image for style transfer. Second, we show how the GMM-modeled latent style space can be combined with a language model (e.g., a simple LSTM network) to manipulate multiple styles by using textual commands. Then, we not only propose easy-to-use constraints to improve the smoothness of the style latent space in MMUIT models, but also design a novel metric to quantitatively evaluate the smoothness of the style latent space. Finally, we build a new model to use pretrained unconditional GANs to do MMUIT tasks.
|
3 |
Multimodal Representation Learning for Visual Reasoning and Text-to-Image TranslationJanuary 2018 (has links)
abstract: Multimodal Representation Learning is a multi-disciplinary research field which aims to integrate information from multiple communicative modalities in a meaningful manner to help solve some downstream task. These modalities can be visual, acoustic, linguistic, haptic etc. The interpretation of ’meaningful integration of information from different modalities’ remains modality and task dependent. The downstream task can range from understanding one modality in the presence of information from other modalities, to that of translating input from one modality to another. In this thesis the utility of multimodal representation learning for understanding one modality vis-à-vis Image Understanding for Visual Reasoning given corresponding information in other modalities, as well as translating from one modality to the other, specifically, Text to Image Translation was investigated.
Visual Reasoning has been an active area of research in computer vision. It encompasses advanced image processing and artificial intelligence techniques to locate, characterize and recognize objects, regions and their attributes in the image in order to comprehend the image itself. One way of building a visual reasoning system is to ask the system to answer questions about the image that requires attribute identification, counting, comparison, multi-step attention, and reasoning. An intelligent system is thought to have a proper grasp of the image if it can answer said questions correctly and provide a valid reasoning for the given answers. In this work how a system can be built by learning a multimodal representation between the stated image and the questions was investigated. Also, how background knowledge, specifically scene-graph information, if available, can be incorporated into existing image understanding models was demonstrated.
Multimodal learning provides an intuitive way of learning a joint representation between different modalities. Such a joint representation can be used to translate from one modality to the other. It also gives way to learning a shared representation between these varied modalities and allows to provide meaning to what this shared representation should capture. In this work, using the surrogate task of text to image translation, neural network based architectures to learn a shared representation between these two modalities was investigated. Also, the ability that such a shared representation is capable of capturing parts of different modalities that are equivalent in some sense is proposed. Specifically, given an image and a semantic description of certain objects present in the image, a shared representation between the text and the image modality capable of capturing parts of the image being mentioned in the text was demonstrated. Such a capability was showcased on a publicly available dataset. / Dissertation/Thesis / Masters Thesis Computer Engineering 2018
|
4 |
BrandGAN: Unsupervised Structural Image CorrectionEl Katerji, Mostafa 12 May 2021 (has links)
Recently, machine learning models such as Generative Adversarial Networks and Autoencoders have received significant attention from the research community. In fact, researchers have produced novel ways for using this technology in the space of image manipulation for cross-domain image-to-image transformations, upsampling, style imprinting, human facial editing, and computed tomography correction. Previous work primarily focuses on transformations where the output inherits the same skeletal outline as the input image.
This work proposes a novel framework, called BrandGAN, that tackles image correction for hand-drawn images. One of this problem’s novelties is that it requires the skeletal outline of the input image to be manipulated and adjusted to look more like a target reference while retaining key visual features that were included intentionally by its creator.
GANs, when trained on a dataset, are capable of producing a large variety of novel images derived from a combination of visual features from the original dataset. StyleGAN is a model that iterated on the concept of GANs and was able to produce high-fidelity images such as human faces and cars. StyleGAN includes a process called projection that finds an encoding of an input image capable of producing a visually similar image. Projection in StyleGAN demonstrated the model’s ability to represent real images that were not a part of its training dataset. StyleGAN encodings are vectors that represent features of an image. Encodings can be combined to merge or manipulate features of distinct images.
In BrandGAN, we tackle image correction by leveraging StyleGAN’s projection and encoding vector feature manipulation. We present a modified version of projection to find an encoding representation of hand-drawn images. We propose a novel GAN indexing technique, called GANdex, capable of finding encodings of novel images derived from the original dataset that share visual similarities with the input image. Finally, with vector feature manipulation, we combine the GANdex vector’s features with the input image’s projection to produce the final image-corrected output. Combining the vectors results in adjusting the input imperfections to resemble the original dataset’s structure while retaining novel features from the raw input image. We evaluate seventy-five hand-drawn images collected through a study with fifteen participants using objective and subjective measures. BrandGAN reduced the Fréchet inception distance from 193 to 161 and the Kernel-Inception distance from 0.048 to 0.026 when comparing the hand-drawn and BrandGAN output images to the reference design dataset. A blinded experiment showed that the average participant could identify 4.33 out of 5 images as their own when presented with a visually similar control image. We included a survey that collected opinion scores ranging from one or “strongly disagree” to five or “strongly agree.” The average participant answered 4.32 for the retention of detail, 4.25 for the output’s professionalism, and 4.57 for their preference of using the BrandGAN output over their own.
|
5 |
GANtruth – a regularization method for unsupervised image-to-image translation / GANtruth – en regulariseringsmetod för oövervakad bild-till-bild-översättningBujwid, Sebastian January 2018 (has links)
In this work, we propose a novel and effective method for constraining the output space of the ill-posed problem of unsupervised image-to-image translation. We make the assumption that the environment of the source domain is known, and we propose to explicitly enforce preservation of the ground-truth labels on the images translated from the source to the target domain. We run empirical experiments on preserving information such as semantic segmentation and disparity and show evidence that our method achieves improved performance over the baseline model UNIT on translating images from SYNTHIA to Cityscapes. The generated images are perceived as more realistic in human surveys and have reduced errors when using them as adapted images in the domain adaptation scenario. Moreover, the underlying ground-truth preservation assumption is complementary to alternative approaches and by combining it with the UNIT framework, we improve the results even further. / I det här arbetet föreslår vi en ny och effektiv metod för att begränsa värdemängden för det illa-definierade problemet som utgörs av oövervakad bild-till-bild-översättning. Vi antar att miljön i källdomänen är känd, och vi föreslår att uttryckligen framtvinga bevarandet av grundfaktaetiketterna på bilder översatta från källa till måldomän. Vi utför empiriska experiment där information som semantisk segmentering och skillnad bevaras och visar belägg för att vår metod uppnår förbättrad prestanda över baslinjemetoden UNIT på att översätta bilder från SYNTHIA till Cityscapes. De genererade bilderna uppfattas som mer realistiska i undersökningar där människor tillfrågats och har minskat fel när de används som anpassade bilder i domänpassningsscenario. Dessutom är det underliggande grundfaktabevarande antagandet kompletterat med alternativa tillvägagångssätt och genom att kombinera det med UNIT-ramverket förbättrar vi resultaten ytterligare.
|
6 |
Controllable Visual SynthesisAlBahar, Badour A. Sh A. 08 June 2023 (has links)
Computer graphics has become an integral part of various industries such as entertainment (i.e.,films and content creation), fashion (i.e.,virtual try-on), and video games. Computer graphics has evolved tremendously over the past years. It has shown remarkable image generation improvement from low-quality, pixelated images with limited details to highly realistic images with fine details that can often be mistaken for real images. However, the traditional pipeline of rendering an image in computer graphics is complex and time- consuming. The whole process of creating the geometry, material, and textures requires not only time but also significant expertise. In this work, we aim to replace this complex traditional computer graphics pipeline with a simple machine learning model. This machine learning model can synthesize realistic images without requiring expertise or significant time and effort. Specifically, we address the problem of controllable image synthesis. We propose several approaches that allow the user to synthesize realistic content and manipulate images to achieve their desired goals with ease and flexibility. / Doctor of Philosophy / Computer graphics has become an integral part of various industries such as entertainment (i.e.,films and content creation), fashion (i.e.,virtual try-on), and video games. Computer graphics has evolved tremendously over the past years. It has shown remarkable image generation improvement from low-quality, pixelated images with limited details to highly realistic images with fine details that can often be mistaken for real images. However, the traditional process of generating an image in computer graphics is complex and time- consuming. You need to set up a camera and light, and create objects with all sorts of details. This requires not only time but also significant expertise. In this work, we aim to replace this complex traditional computer graphics pipeline with a simple machine learning model. This machine learning model can generate realistic images without requiring expertise or significant time and effort. Specifically, we address the problem of controllable image synthesis. We propose several approaches that allow the user to synthesize realistic content and manipulate images to achieve their desired goals with ease and flexibility.
|
7 |
A Deep Learning Approach to Predict Full-Field Stress Distribution in Composite MaterialsSepasdar, Reza 17 May 2021 (has links)
This thesis proposes a deep learning approach to predict stress at various stages of mechanical loading in 2-D representations of fiber-reinforced composites. More specifically, the full-field stress distribution at elastic and at an early stage of damage initiation is predicted based on the microstructural geometry. The required data set for the purposes of training and validation are generated via high-fidelity simulations of several randomly generated microstructural representations with complex geometries. Two deep learning approaches are employed and their performances are compared: fully convolutional generator and Pix2Pix translation. It is shown that both the utilized approaches can well predict the stress distributions at the designated loading stages with high accuracy. / M.S. / Fiber-reinforced composites are material types with excellent mechanical performance. They form the major material in the construction of space shuttles, aircraft, fancy cars, etc., the structures that are designed to be lightweight and at the same time extremely stiff and strong. Due to the broad application, especially in the sensitives industries, fiber-reinforced composites have always been a subject of meticulous research studies. The research studies to better understand the mechanical behavior of these composites has to be conducted on the micro-scale. Since the experimental studies on micro-scale are expensive and extremely limited, numerical simulations are normally adopted. Numerical simulations, however, are complex, time-consuming, and highly computationally expensive even when run on powerful supercomputers. Hence, this research aims to leverage artificial intelligence to reduce the complexity and computational cost associated with the existing high-fidelity simulation techniques. We propose a robust deep learning framework that can be used as a replacement for the conventional numerical simulations to predict important mechanical attributes of the fiber-reinforced composite materials on the micro-scale. The proposed framework is shown to have high accuracy in predicting complex phenomena including stress distributions at various stages of mechanical loading.
|
8 |
Unsupervised Image-to-image translation : Taking inspiration from human perception / Unsupervised Image-to-image translation : Taking inspiration from human perceptionSveding, Jens Jakob January 2021 (has links)
Generative Artificial Intelligence is a field of artificial intelligence where systems can learn underlying patterns in previously seen content and generate new content. This thesis explores a generative artificial intelligence technique used for image-toimage translations called Cycle-consistent Adversarial network (CycleGAN), which can translate images from one domain into another. The CycleGAN is a stateof-the-art technique for doing unsupervised image-to-image translations. It uses the concept of cycle-consistency to learn a mapping between image distributions, where the Mean Absolute Error function is used to compare images and thereby learn an underlying mapping between the two image distributions. In this work, we propose to use the Structural Similarity Index Measure (SSIM) as an alternative to the Mean Absolute Error function. The SSIM is a metric inspired by human perception, which measures the difference in two images by comparing the difference in, contrast, luminance, and structure. We examine if using the SSIM as the cycle-consistency loss in the CycleGAN will improve the image quality of generated images as measured by the Inception Score and Fréchet Inception Distance. The inception Score and Fréchet Inception Distance are both metrics that have been proposed as methods for evaluating the quality of images generated by generative adversarial networks (GAN). We conduct a controlled experiment to collect the quantitative metrics. Our results suggest that using the SSIM in the CycleGAN as the cycle-consistency loss will, in most cases, improve the image quality of generated images as measured Inception Score and Fréchet Inception Distance.
|
9 |
Domain Adaptation for Multi-Contrast Image Segmentation in Cardiac Magnetic Resonance Imaging / Domänanpassning för segmentering av bilder med flera kontraster vid Magnetresonanstomografi av hjärtaProudhon, Thomas January 2023 (has links)
Accurate segmentation of the ventricles and myocardium on Cardiac Magnetic Resonance (CMR) images is crucial to assess the functioning of the heart or to diagnose patients suffering from myocardial infarction. However, the domain shift existing between the multiple sequences of CMR data prevents a deep learning model trained on a specific contrast to be used on a different sequence. Domain adaptation can address this issue by alleviating the domain shift between different CMR contrasts, such as Balanced Steady-State Free Precession (bSSFP) and Late Gadolinium Enhancement (LGE) sequences. The aim of the degree project “Domain Adaptation for Multi-Contrast Image Segmentation in Cardiac Magnetic Resonance Imaging” is to apply domain adaptation to perform unsupervised segmentation of cardiac structures on LGE sequences. A style-transfer model based on generative adversarial networks is trained to achieve modality-to-modality translation between LGE and bSSFP contrasts. Then, a supervised segmentation model is developed to segment the myocardium, left and right ventricles on bSSFP data. Final segmentation is performed on synthetic bSSFP obtained by translating LGE images. Our method shows a significant increase in Dice score compared to direct segmentation of LGE data. In conclusion, the results demonstrate that using domain adaptation based on information from complementary CMR sequences is a successful approach to unsupervised segmentation of Late Gadolinium Enhancement images.
|
10 |
Using Generative Adversarial Networks for H&E-to-HER2 Stain Translation in Digital Pathology ImagesTirmén, William January 2023 (has links)
In digital pathology, hematoxylin & eosin (H&E) is a routine stain which is performed on most clinical cases and it often provides clinicians with sufficient information for diagnosis. However, when making decisions on how to guide breast cancer treatment, immunohistochemical staining of human epidermal growth factor 2 (HER2 staining) is also needed. Over-expression of the HER2 protein plays a significant role in the progression of breast cancer and is therefore important to consider during treatment planning. However, the downside of HER2 staining is that it is both time consuming and rather expensive. This thesis explores the possibility for H&E-to-HER2 stain translation using generative adversarial networks (GANs). If effective, this has the potential to reduce the costs and time spent on tissue processing while still providing clinicians with the images necessary to make a complete diagnosis. To explore this area two supervised (Pix2Pix, PyramidPix2Pix) and one unsupervised (cycleGAN) GAN structure was implemented and trained on digital pathology images from the MIST dataset. These models were trained two times, with 256x256 and 512x512 patches, to see which effect patch size has on stain translation performance as well. In addition, a methodology for evaluating the quality of the generated HER2 patches was also presented and utilized. This methodology consists of structural similarity index (SSIM) and peak signal to noise ratio (PSNR) comparison to the ground truth, and a HER2 status classification protocol. In the latter a classification tool provided by Sectra was used to assign each patch with a HER2 status of No tumor, 1+, 2+ or 3+ and the statuses of the generated patches were then compared to the statuses of the ground truths. The results show that the supervised Pyramid Pix2Pix model trained on 512x512 patches performs the best according to the SSIM and PSNR metrics. However, the unsupervised cycleGAN model shows more promising results when it comes to both visual assessment and the HER2 status classification protocol. Especially when trained on 256x256 patches for 200 epochs which gave an accuracy of 0.655, F1-score of 0.674 and MCC of 0.490. In conclusion the HER2 status classification protocol is deemed as a suitable way to evaluate H&E-to-HER2 stain translation and thereby the unsupervised method is considered to be better than the supervised. Moreover, it is also concluded that a smaller patch size result in worse translation of cellular structure for the supervised methods. Further studies should focus on incorporating HER2 status classification in the cycleGAN loss function and more extensive training runs to further improve the quality of H&E-to-HER2 stain translation.
|
Page generated in 0.0933 seconds