• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 9
  • 2
  • 1
  • 1
  • Tagged with
  • 16
  • 16
  • 7
  • 5
  • 4
  • 4
  • 3
  • 3
  • 3
  • 3
  • 3
  • 3
  • 2
  • 2
  • 2
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
11

AI-assisted Image Manipulation with Eye Tracking / Bildbehandling med Eye Tracking och AI

Karlander, Rej, Wang, Julia January 2023 (has links)
Image editing tools can pose a challenge for motor impaired individuals who wish to perform image manipulation. The process includes many steps and can be difficult given a lack of tactile input such as mouse and keyboard. To increase the availability of image editing for motor impaired individuals, the potential for new tools and modalities have to be explored. In this project, a prototype was developed, which allows the user to edit images using eye tracking and deep learning models, specifically the DALL-E 2 model. This prototype was then tested on users who rated its functionality based on a set of human-computer interaction principles. The quality of the results varied a lot depending on the eye movements of the user, and the provided prompts. The results of the user testing found that there was potential for an editing tool implementing eye tracking and AI assistance, but that it requires further iteration and time to learn how to use. Most users enjoyed the experience of using the prototype and felt that continued experimentation would lead to improved results. / Användandet av bildbehandlingsverktyg kan för någon med motoriska svårigheter, specifikt de utan möjlighet att använda sina händer, innebära flera svårigheter. Processen omfattas av många steg som kan vara särskilt besvärliga utan användningen av mus och tangentbord. För att öka tillgängligheten av dessa verktyg behöver nya system utforskas, till exempel sådana som använder AI system. I denna studie utvärderas ett sådant system, för vilken en prototyp utvecklades. Prototypen låter användaren redigera bilder med hjälp av eye tracking och maskininlärningsmodellen DALL-E 2. Deltagarna i studien utvärderade funktionaliteten baserat på utvalda människa-datorinteraktionsprinciper. Resultaten av utvärderingen skiljde sig en del, till stor del grundat i ögonrörelserna av användaren och den givna ändringsbeskrivningen. Resultaten visade på att det fanns potential för ett bildbehandlingsverktyg som implementerar både AI och eye tracking men att det krävs mer tid och iterering för användaren att lära sig modellen. Användare fann överlag ett nöje i att använda programmet och upplevde att de skulle kunna presterat bättre resultat om de fick mer tid att experimentera.
12

Latent Space Manipulation of GANs for Seamless Image Compositing

Fruehstueck, Anna 04 1900 (has links)
Generative Adversarial Networks (GANs) are a very successful method for high-quality image synthesis and are a powerful tool to generate realistic images by learning their visual properties from a dataset of exemplars. However, the controllability of the generator output still poses many challenges. We propose several methods for achieving larger and/or higher visual quality in GAN outputs by combining latent space manipulations with image compositing operations: (1) GANs are inherently suitable for small-scale texture synthesis due to the generator’s capability to learn image properties of a limited domain such as the properties of a specific texture type at a desired level of detail. A rich variety of suitable texture tiles can be synthesized from the trained generator. Due to the convolutional nature of GANs, we can achieve largescale texture synthesis by tiling intermediate latent blocks, allowing the generation of (almost) arbitrarily large texture images that are seamlessly merged. (2) We notice that generators trained on heterogeneous data perform worse than specialized GANs, and we demonstrate that we can optimize multiple independently trained generators in such a way that a specialized network can fill in high-quality details for specific image regions, or insets, of a lower-quality canvas generator. Multiple generators can collaborate to improve the visual output quality and through careful optimization, seamless transitions between different generators can be achieved. (3) GANs can also be used to semantically edit facial images and videos, with novel 3D GANs even allowing for camera changes, enabling unseen views of the target. However, the GAN output must be merged with the surrounding image or video in a spatially and temporally consistent way, which we demonstrate in our method.
13

Controllable 3D Effects Synthesis in Image Editing

Yichen Sheng (18184378) 15 April 2024 (has links)
<p dir="ltr">3D effect synthesis is crucial in image editing to enhance realism or visual appeal. Unlike classical graphics rendering, which relies on complete 3D geometries, 3D effect synthesis in im- age editing operates solely with 2D images as inputs. This shift presents significant challenges, primarily addressed by data-driven methods that learn to synthesize 3D effects in an end-to-end manner. However, these methods face limitations in the diversity of 3D effects they can produce and lack user control. For instance, existing shadow generation networks are restricted to produc- ing hard shadows without offering any user input for customization.</p><p dir="ltr">In this dissertation, we tackle the research question: <i>how can we synthesize controllable and realistic 3D effects in image editing when only 2D information is available? </i>Our investigation leads to four contributions. First, we introduce a neural network designed to create realistic soft shadows from an image cutout and a user-specified environmental light map. This approach is the first attempt in utilizing neural network for realistic soft shadow rendering in real-time. Second, we develop a novel 2.5D representation Pixel Height, tailored for the nuances of image editing. This representation not only forms the foundation of a new soft shadow rendering pipeline that provides intuitive user control, but also generalizes the soft shadow receivers to be general shadow receivers. Third, we present the mathematical relationship between the Pixel Height representation and 3D space. This connection facilitates the reconstruction of normals or depth from 2D scenes, broadening the scope for synthesizing comprehensive 3D lighting effects such as reflections and refractions. A 3D-aware buffer channels are also proposed to improve the synthesized soft shadow quality. Lastly, we introduce Dr.Bokeh, a differentiable bokeh renderer that extends traditional bokeh effect algorithms with better occlusion modeling to correct flaws existed in existing methods. With the more precise lens modeling, we show that Dr.Bokeh not only achieves the state-of-the-art bokeh rendering quality, but also pushes the boundary of depth-from-defocus problem.</p><p dir="ltr">Our work in controllable 3D effect synthesis represents a pioneering effort in image editing, laying the groundwork for future lighting effect synthesis in various image editing applications. Moreover, the improvements to filtering-based bokeh rendering could significantly enhance com- mercial products, such as the portrait mode feature on smartphones.</p>
14

Interaktivní segmentace popředí/pozadí na mobilním telefonu / Interactive Foreground/Background Segmentation on Mobile Phone

Studený, Petr January 2015 (has links)
This thesis deals with the problem of foreground extraction on mobile devices. The main goal of this project is to find or design segmentation methods for separating a user-selected object from an image (or video). The main requirement of these methods is the image processing time and segmentation quality. Some existing solutions of this problem are mentioned and their usability on mobile devices is discussed. A mobile application is created within the project, demonstrating the implemented real time foreground extraction algorithm.
15

Text-Driven Fashion Image Manipulation with GANs : A case study in full-body human image manipulation in fashion / Textdriven manipulation av modebilder med GANs : En fallstudie om helkroppsbildsmanipulation av människor inom mode

Dadfar, Reza January 2023 (has links)
Language-based fashion image editing has promising applications in design, sustainability, and art. However, it is considered a challenging problem in computer vision and graphics. The diversity of human poses and the complexity of clothing shapes and textures make the editing problem difficult. Inspired by recent progress in editing face images through manipulating latent representations, such as StyleCLIP and HairCLIP, we apply those methods in editing the images of full-body humans in fashion datasets and evaluate their effectiveness. First, we assess different methodologies to find a latent representation of an image via Generative Adversarial Network (GAN) inversion; then, we apply three image manipulation schemes. Thus, a pre-trained e4e encoder is initially utilized for the inversion process, while the results are compared to a more accurate method, Pivotal Tuning Inversion (PTI). Next, we employ an optimization scheme that uses the Contrastive Language Image Pre-training (CLIP) model to guide the latent representation of an image in the direction of attributes described in the input text. We address the problem of the accuracy and speed of the process by incorporating a mapper network. Finally, we propose an optimized mapper called Text-Driven Garment Editing Mapper (TD-GEM) to achieve high-quality image editing in a disentangled way. Our empirical results show that the proposed method can edit fashion items for changing color and sleeve length. / Språkbaserad bildredigering inom mode har lovande tillämpningar inom design, hållbarhet och konst. Det betraktas dock som ett utmanande problem inom datorseende och grafik. Mångfalden och variationen av mänskliga poser och komplexiteten i klädform och texturer gör redigeringsproblemet svårt. Inspirerade av den senaste utvecklingen inom redigering av ansiktsbilder genom manipulation av latenta representationer, såsom StyleCLIP och HairCLIP, tillämpar vi dessa metoder för att redigera bilderna av fullständiga mänskliga kroppar i mode-dataset och utvärderar deras effektivitet. Först jämför vi olika metoder för att hitta en latent representation av en bild via så kallade Generative Adversarial Network (GAN) inversion; sedan tillämpar vi tre bildmanipulationsscheman. En förtränad (eng: pre-trained) e4e-encoder model används först för inversionsprocessen, medan resultaten jämförs med en mer exakt metod, Pivotal Tuning Inversion (PTI). Därefter använder vi en optimeringmetod som använder Contrastive Language Image Pre-training (CLIP) -modell för att vägleda den latenta representationen av en bild i riktning mot attribut som beskrivs i inmatningstexten. Vi tar upp problemet med noggrannhet och hastigheten i processen genom att integrera en mapper-nätverk. Slutligen föreslår vi en optimerad mapper som kallas TD-GEM för att uppnå högkvalitativ bildredigering på ett lösgjort sätt. Våra empiriska resultat visar att den föreslagna metoden kan redigera modeobjekt för att ändra färg och ärmens längd.
16

An Adversarial Approach to Spliced Forgery Detection and Localization in Satellite Imagery

Emily R Bartusiak (6630773) 11 June 2019 (has links)
The widespread availability of image editing tools and improvements in image processing techniques make image manipulation feasible for the general population. Oftentimes, easy-to-use yet sophisticated image editing tools produce results that contain modifications imperceptible to the human observer. Distribution of forged images can have drastic ramifications, especially when coupled with the speed and vastness of the Internet. Therefore, verifying image integrity poses an immense and important challenge to the digital forensic community. Satellite images specifically can be modified in a number of ways, such as inserting objects into an image to hide existing scenes and structures. In this thesis, we describe the use of a Conditional Generative Adversarial Network (cGAN) to identify the presence of such spliced forgeries within satellite images. Additionally, we identify their locations and shapes. Trained on pristine and falsified images, our method achieves high success on these detection and localization objectives.

Page generated in 0.1233 seconds