Global ETD Search

1	AI-based image generation: The impact of fine-tuning on fake image detection Hagström, Nick, Rydberg, Anders January 2024 (has links) Machine learning-based image generation models such as Stable Diffusion are now capable of generating synthetic images that are difficult to distinguish from real images, which gives rise to a number of legal and ethical concerns. As a potential measure of mitigation, it is possible to train neural networks to detect the digital artifacts present in the images synthesized by many generative models. However, as the artifacts in question are often rather model-specific, these so-called detectors usually suffer from poor performance when presented with images from models it has not been trained on. In this thesis we study DreamBooth and LoRA, two recently emerged finetuning methods, and their impact on the performance of fake image detectors. DreamBooth and LoRA can be used to fine-tune a Stable Diffusion foundation model, which has the effect of creating an altered version of the base model. The ease with which this can be done has led to a proliferation of communitygenerated synthetic images. However, the effect of model fine-tuning on the detectability of images has not yet been studied in a scientific context. We therefore formulate the following research question: Does fine-tuning a Stable Diffusion base model using DreamBooth or LoRA affect the performance metrics of detectors trained on only base model images? We employ an experimental approach, using the pretrained VGG16 architecture for binary classification as detector. We train the detector on real images from the ImageNet dataset together with images synthesized by three different Stable Diffusion foundation models, resulting in three trained detectors. We then test their performance on images generated by fine-tuned versions of these models. We find that the accuracy of detectors when tested on images generated using fine-tuned models is lower than when tested on images generated by the base models on which they were trained. Within the former category, DreamBooth-generated images have a greater negative impact on detector accuracy than LoRA-generated images. Our study suggests there is a need to consider in particular DreamBooth fine-tuned models as distinct entities in the context of fake image detector training. Fake image detection LoRA DreamBooth Stable Diffusion Image generation Information Systems
2	Are AI-Photographers Ready for Hire? : Investigating the possibilities of AI generated images in journalism Breuer, Andrea, Jonsson, Isac January 2023 (has links) In today’s information era, many news outlets are competing for attention. One way to cut through the noise is to use images. Obtaining images can be both time-consuming and expen- sive for smaller news agencies. In collaboration with the Swedish news agency Newsworthy, we investigate the possibilities of using AI-generated images in a journalistic context. Using images generated with the text-to-image generation model Stable Diffusion, we aim to answer the research question How do the parameters in Stable Diffusion affect the applicability of the generated images for journalistic purposes? A total of 511 images are generated with different Stable Diffusion parameter settings and rated on a scale of 1-5 by three journalists at Newswor- thy. The data is analyzed using ordinal logistic regression. The results suggest that the optimal value for the Stable Diffusion parameter classifier-free guidance is around 10-12, the default 50 iterations are sufficient, and keywords do not significantly affect the image outcome. The parameter that has the single greatest effect on the outcome is the prompt. Thus, to generate photo-realistic images that can be used in a journalistic context, most thought and effort should be put towards formulating a suitable prompt. Stable Diffusion text-to-image generating models ordinal logistic regression journalism machine learning AI Probability Theory and Statistics Sannolikhetsteori och statistik
3	Quantitative and Qualitative Analysis of Text-to-Image models Masrourisaadat, Nila 30 August 2023 (has links) The field of image synthesis has seen significant progress recently, including great strides with generative models like Generative Adversarial Networks (GANs), Diffusion Models, and Transformers. These models have shown they can create high-quality images from a variety of text prompts. However, a comprehensive analysis that examines both their performance and possible biases is often missing from existing research. In this thesis, I undertake a thorough examination of several leading text-to-image models, namely Stable Diffusion, DALL-E Mini, Lafite, and Ernie-ViLG. I assess their performance in generating accurate images of human faces, groups, and specified numbers of objects, using both Frechet Inception Distance (FID) scores and R-precision as my evaluation metrics. Moreover, I uncover inherent gender or social biases these models may possess. My research reveals a noticeable bias in these models, which show a tendency towards generating images of white males, thus under-representing minorities in their output of human faces. This finding contributes to the broader dialogue on ethics in AI and sets the stage for further research aimed at developing more equitable AI systems. Furthermore, based on the metrics I used for evaluation, the Stable Diffusion model outperforms the others in generating images from text prompts. This information could be particularly useful for researchers and practitioners trying to choose the most effective model for their future projects. To facilitate further research in this field, I have made my findings, the related data, and the source code publicly available. / Master of Science / In my research, I explored how cutting-edge computer models, namely Stable Diffusion, DALL-E Mini, Lafite, and Ernie-ViLG, can create images from text descriptions, a process that holds exciting possibilities for the future. However, these technologies aren't without their challenges. An important finding from my study is that these models exhibit bias, e.g., they often generate images of white males more than they do of other races and genders. This suggests they're not representing our diverse society fairly. Among these models, Stable Diffusion outperforms the others at creating images from text prompts, which is valuable information for anyone choosing a model for their projects. To help others learn from my work and build upon it, I've made all my data, findings, and the code I used in this study publicly available. By sharing this work, I hope to contribute to improving this technology, making it even better and fairer for everyone in the future. Text to Image Deep Learning Transformers Bias Analysis Quantitative Analysis Qualitative Analysis R-Precision FID DALL-E LAFITE Stable Diffusion ERNIE
4	Generating Synthetic Training Data with Stable Diﬀusion Rynell, Rasmus, Melin, Oscar January 2023 (has links) The usage of image classification in various industries has grown significantly in recentyears. There are however challenges concerning the data used to train such models. Inmany cases the data used in training is often difficult and expensive to obtain. Furthermore,dealing with image data may come with additional problems such as privacy concerns. Inrecent years, synthetic image generation models such as Stable Diffusion has seen signifi-cant improvement. Solely using a textual description, Stable Diffusion is able to generate awide variety of photorealistic images. In addition to textual descriptions, other condition-ing models such as ControlNet has enabled the possibility of additional grounding infor-mation, such as canny edge and segmentation images. This thesis investigates if syntheticimages generated by Stable Diffusion can be used effectively in training an image classifier.To find the most effective method for generating training data, multiple conditioning meth-ods are investigated and evaluated. The results show that it is possible to generate high-quality training data using several conditioning techniques. The best performing methodwas using canny edge grounded images to augment already existing data. Extending twoclasses with additional synthetic data generated by the best performing method, achievedthe highest average F1-score increase of 0.85 percentage points compared with a baselinesolely trained on real images. AI ML Artificial Intelligence Machine Learning Stable Diffusion ControlNet Image classification Image synthesis Generative models Generating images Generating training data Computer Engineering Datorteknik
5	Aplicación de técnicas de Deep Learning para el reconocimiento de páginas Web y emociones faciales: Un estudio comparativo y experimental Mejia-Escobar, Christian 07 March 2023 (has links) El progreso de la Inteligencia Artificial (IA) ha sido notable en los últimos años. Los impresionantes avances en imitar las capacidades humanas por parte de las máquinas se deben especialmente al campo del Deep Learning (DL). Este paradigma evita el complejo diseño manual de características. En su lugar, los datos pasan directamente a un algoritmo, que aprende a extraer y representar características jerárquicamente en múltiples capas a medida que aprende a resolver una tarea. Esto ha demostrado ser ideal para problemas relacionados con el mundo visual. Una solución de DL comprende datos y un modelo. La mayor parte de la investigación actual se centra en los modelos, en busca de mejores algoritmos. Sin embargo, aunque se prueben diferentes arquitecturas y configuraciones, difícilmente mejorará el rendimiento si los datos no son de buena calidad. Son escasos los estudios que se centran en mejorar los datos, pese a que constituyen el principal recurso para el aprendizaje automático. La recolección y el etiquetado de extensos datasets de imágenes consumen mucho tiempo, esfuerzo e introducen errores. La mala clasificación, la presencia de imágenes irrelevantes, el desequilibrio de las clases y la falta de representatividad del mundo real son problemas ampliamente conocidos que afectan el rendimiento de los modelos en escenarios prácticos. Nuestra propuesta enfrenta estos problemas a través de un enfoque data-centric. A través de la ingeniería del dataset original utilizando técnicas de DL, lo hacemos más adecuado para entrenar un modelo con mejor rendimiento y generalización en escenarios reales. Para demostrar esta hipótesis, consideramos dos casos prácticos que se han convertido en temas de creciente interés para la investigación. Por una parte, Internet es la plataforma mundial de comunicación y la Web es la principal fuente de información para las actividades humanas. Las páginas Web crecen a cada segundo y son cada vez más sofisticadas. Para organizar este complejo y vasto contenido, la clasificación es la técnica básica. El aspecto visual de una página Web puede ser una alternativa al análisis textual del código para distinguir entre categorías. Abordamos el reconocimiento y la clasificación de páginas Web creando un dataset de capturas de pantalla apropiado desde cero. Por otro lado, aunque los avances de la IA son significativos en el aspecto cognitivo, la parte emocional de las personas es un desafío. La expresión facial es la mejor evidencia para manifestar y transmitir nuestras emociones. Aunque algunos datasets de imágenes faciales existen para entrenar modelos de DL, no ha sido posible alcanzar el alto rendimiento en entornos controlados utilizando datasets in-the-lab. Abordamos el reconocimiento y la clasificación de emociones humanas mediante la combinación de varios datasets in-the wild de imágenes faciales. Estas dos problemáticas plantean situaciones distintas y requieren de imágenes con contenido muy diferente, por lo que hemos diseñado un método de refinamiento del dataset según el caso de estudio. En el primer caso, implementamos un modelo de DL para clasificar páginas Web en determinadas categorías utilizando únicamente capturas de pantalla, donde los resultados demostraron un problema multiclase muy difícil. Tratamos el mismo problema con la estrategia One vs. Rest y mejoramos el dataset mediante reclasificación, detección de imágenes irrelevantes, equilibrio y representatividad, además de utilizar técnicas de regularización y un nuevo mecanismo de predicción con los clasificadores binarios. Estos clasificadores operando por separado mejoran el rendimiento, en promedio incrementan un 26.29% la precisión de validación y disminuyen un 42.30% el sobreajuste, mostrando importantes mejoras respecto al clasificador múltiple que opera con todas las categorías juntas. Utilizando el nuevo modelo, hemos desarrollado un sistema en línea para clasificar páginas Web que puede ayudar a diseñadores, propietarios de sitios Web, Webmasters y usuarios en general. En el segundo caso, la estrategia consiste en refinar progresivamente el dataset de imágenes faciales mediante varios entrenamientos sucesivos de un modelo de red convolucional. En cada entrenamiento, se utilizan las imágenes faciales correspondientes a las predicciones correctas del entrenamiento anterior, lo que permite al modelo captar más características distintivas de cada clase de emoción. Tras el último entrenamiento, el modelo realiza una reclasificación automática de todo el dataset. Este proceso también nos permite detectar las imágenes irrelevantes, pero nuestro propósito es mejorar el dataset sin modificar, borrar o aumentar las imágenes, a diferencia de otros trabajos similares. Los resultados experimentales en tres datasets representativos demostraron la eficacia del método propuesto, mejorando la precisión de validación en un 20.45%, 14.47% y 39.66%, para FER2013, NHFI y AffectNet, respectivamente. Las tasas de reconocimiento en las versiones reclasificadas de estos datasets son del 86.71%, el 70.44% y el 89.17%, que alcanzan el estado del arte. Combinamos estas versiones mejor clasificadas para aumentar el número de imágenes y enriquecer la diversidad de personas, gestos y atributos de resolución, color, fondo, iluminación y formato de imagen. El dataset resultante se utiliza para entrenar un modelo más general. Frente a la necesidad de métricas más realistas de la generalización de los modelos, creamos un dataset evaluador combinado, equilibrado, imparcial y bien etiquetado. Para tal fin, organizamos este dataset en categorías de género, edad y etnia. Utilizando un predictor de estas características representativas de la población, podemos seleccionar el mismo número de imágenes y mediante el exitoso modelo Stable Diffusion es posible generar las imágenes faciales necesarias para equilibrar las categorías creadas a partir de las mencionadas características. Los experimentos single-dataset y cross-dataset indican que el modelo entrenado en el dataset combinado mejora la generalización de los modelos entrenados individualmente en FER2013, NHFI y AffectNet en un 13.93%, 24.17% y 7.45%, respectivamente. Desarrollamos un sistema en línea de reconocimiento de emociones que aprovecha el modelo más genérico obtenido del dataset combinado. Por último, la buena calidad de las imágenes faciales sintéticas y la reducción de tiempo conseguida con el método generativo nos motivan para crear el primer y mayor dataset artificial de emociones categóricas. Este producto de libre acceso puede complementar los datasets reales, que son difíciles de recopilar, etiquetar, equilibrar, controlar las características y proteger la identidad de las personas. Inteligencia Artificial Machine Learning Deep Learning Redes Neuronales Convolucionales CNN Dataset in-the-wild Data-centric Páginas Web Reconocimiento de emociones Reconocimiento de expresiones faciales Multiclase One vs. Rest FER2013 NHFI AffectNet Stable Diffusion Single-dataset Cross-dataset
6	AI learn, AI do : En konstvetenskaplig studie om AI-modellers materialbetingade förmågor, aktörskap och deltagande inom konstnärliga processer / AI learn, AI do : An art-historical study about the material-based abilities, agencies, and involvement in artistic processes of AI-models Persson, Cornelius January 2023 (has links) This master’s thesis investigates generative AI-art through the lens of actor network theory. By focusing on the role of images in datasets as a material that effects both AI-models and artworks, the decisively non-human agencies generative AI-models can be said to possess, and the traces and associations that generative AI-models imbue artworks with, this thesis aims to investigate art that has been created with GAN-models as well as contemporary text-to-image diffusion-models, by way of similar premises. Forgoing common discussions and questions regarding the status of AI-art as art that inundate many a reasoning regarding this topic, this thesis instead investigates the use of generative AI to make images and art with an understanding of it as a multifaceted practice that can be observed and experienced in a variety of ways. General topics such as the way images are used to train AI-models, the blurry connections between trained images and generated images, the way AI-models can be used and interacted with by using prompts as well as different kinds of interfaces and AI-Image-generators, are investigated, followed by the analysis of a number of artworks for which generative AI has been used. Throughout this study generative AI-art emerges as a both novel and oftentimes contested artform that is defined by direct and indirect connection to other media, a varied understanding of what it is that the artificial intelligence appears to do, and a use of the AI-artwork as a means to comment the mediums emerging characteristics. actor network theory AI-art AI-image artificial intelligence diffusion art prompt prompting Stable Diffusion GAN midjourney ANT aktör nätverks teori AI-konst AI-bild AI-modell AI artificiell intelligens diffusions-konst GAN-konst bildmaterial träningsdata appropriering latent utrymme generativ Latour Art History Konstvetenskap

1

Page generated in 0.0501 seconds