Global ETD Search

281	Compare Accuracy of Alternative Methods for Sound Classification on Environmental Sounds of Similar Characteristics Rudberg, Olov January 2022 (has links) Artificial neural networks have in the last decade been a vital tool in image recognition, signal processing and speech recognition. Because of these networks' ability to be highly flexible, they suit a vast amount of different data. This flexible attribute is very sought for within the field of environmental sound classification. This thesis seeks to investigate if audio from three types of water usage can be distinguished and classified. The usage types investigated are handwashing, showering and WC-flushing. The data originally consisted of sound recordings in WAV format. The recordings were converted into spectrograms, which are visual representations of audio signals. Two neural networks are addressed for this image classification issue, namely a Multilayer Perceptron (MLP) and a Convolutional Neural Network (CNN). Further, these spectrograms are subject to both image preprocessing using a Sobel filter, a Canny edge detector and a Gabor filter while also being subjected to data augmentation by applying different brightness and zooming alterations. The result showed that the CNN gave superior results compared to the MLP. The image preprocessing techniques did not improve the data and the model performances, neither did augmentation or a combination between them. An important finding was that constructing the convolutional and pooling filters of the CNN into rectangular shapes and using every other filter type horizontally and vertically on the input spectrogram gave superior results. It seemed to capture more information of the spectrograms since spectrograms mainly contain information in a horizontal or vertical direction. This model achieved 91.14% accuracy. The result stemming from this model architecture further contributes to the environmental sound classification community. / <p>Masters thesis approved 20th june 2022.</p> Machine Learning Algorithms Neural Networks Computer Vision Image Recognition Environmental Sound Classification Data Augmentation Probability Theory and Statistics Sannolikhetsteori och statistik Mathematics Matematik Computer Sciences Datavetenskap (datalogi)
282	Tools for fluid simulation control in computer graphics Schoentgen, Arnaud 09 1900 (has links) L’animation basée sur la physique peut générer des systèmes aux comportements complexes et réalistes. Malheureusement, contrôler de tels systèmes est une tâche ardue. Dans le cas de la simulation de fluide, le processus de contrôle est particulièrement complexe. Bien que de nombreuses méthodes et outils ont été mis au point pour simuler et faire le rendu de fluides, trop peu de méthodes offrent un contrôle efficace et intuitif sur une simulation de fluide. Étant donné que le coût associé au contrôle vient souvent s’additionner au coût de la simulation, appliquer un contrôle sur une simulation à plus haute résolution rallonge chaque itération du processus de création. Afin d’accélérer ce processus, l’édition peut se faire sur une simulation basse résolution moins coûteuse. Nous pouvons donc considérer que la création d’un fluide contrôlé peut se diviser en deux phases: une phase de contrôle durant laquelle un artiste modifie le comportement d’une simulation basse résolution, et une phase d’augmentation de détail durant laquelle une version haute résolution de cette simulation est générée. Cette thèse présente deux projets, chacun contribuant à l’état de l’art relié à chacune de ces deux phases. Dans un premier temps, on introduit un nouveau système de contrôle de liquide représenté par un modèle particulaire. À l’aide de ce système, un artiste peut sélectionner dans une base de données une parcelle de liquide animé précalculée. Cette parcelle peut ensuite être placée dans une simulation afin d’en modifier son comportement. À chaque pas de simulation, notre système utilise la liste de parcelles actives afin de reproduire localement la vision de l’artiste. Une interface graphique intuitive a été développée, inspirée par les logiciels de montage vidéo, et permettant à un utilisateur non expert de simplement éditer une simulation de liquide. Dans un second temps, une méthode d’augmentation de détail est décrite. Nous proposons d’ajouter une étape supplémentaire de suivi après l’étape de projection du champ de vitesse d’une simulation de fumée eulérienne classique. Durant cette étape, un champ de perturbations de vitesse non-divergent est calculé, résultant en une meilleure correspondance des densités à haute et à basse résolution. L’animation de fumée résultante reproduit fidèlement l’aspect grossier de la simulation d’entrée, tout en étant augmentée à l’aide de détails simulés. / Physics-based animation can generate dynamic systems of very complex and realistic behaviors. Unfortunately, controlling them is a daunting task. In particular, fluid simulation brings up particularly difficult problems to the control process. Although many methods and tools have been developed to convincingly simulate and render fluids, too few methods provide efficient and intuitive control over a simulation. Since control often comes with extra computations on top of the simulation cost, art-directing a high-resolution simulation leads to long iterations of the creative process. In order to shorten this process, editing could be performed on a faster, low-resolution model. Therefore, we can consider that the process of generating an art-directed fluid could be split into two stages: a control stage during which an artist modifies the behavior of a low-resolution simulation, and an upresolution stage during which a final high-resolution version of this simulation is driven. This thesis presents two projects, each one improving on the state of the art related to each of these two stages. First, we introduce a new particle-based liquid control system. Using this system, an artist selects patches of precomputed liquid animations from a database, and places them in a simulation to modify its behavior. At each simulation time step, our system uses these entities to control the simulation in order to reproduce the artist’s vision. An intuitive graphical user interface inspired by video editing tools has been developed, allowing a nontechnical user to simply edit a liquid animation. Second, a tracking solution for smoke upresolution is described. We propose to add an extra tracking step after the projection of a classical Eulerian smoke simulation. During this step, we solve for a divergence-free velocity perturbation field resulting in a better matching of the low-frequency density distribution between the low-resolution guide and the high-resolution simulation. The resulting smoke animation faithfully reproduces the coarse aspect of the low-resolution input, while being enhanced with simulated small-scale details. Animation basée sur la physique Simulation de fluide Contrôle de fluide Augmentation de détail Physics-based animation Fluid simulation Fluid control Upresolution
283	Effects of Transfer Learning on Data Augmentation with Generative Adversarial Networks / Effekten av transferlärande på datautökning med generativt adversarialt nätverk Berglöf, Olle, Jacobs, Adam January 2019 (has links) Data augmentation is a technique that acquires more training data by augmenting available samples, where the training data is used to fit model parameters. Data augmentation is utilized due to a shortage of training data in certain domains and to reduce overfitting. Augmenting a training dataset for image classification with a Generative Adversarial Network (GAN) has been shown to increase classification accuracy. This report investigates if transfer learning within a GAN can further increase classification accuracy when utilizing the augmented training dataset. The method section describes a specific GAN architecture for the experiments that includes a label condition. When using transfer learning within the specific GAN architecture, a statistical analysis shows a statistically significant increase in classification accuracy for a classification problem with the EMNIST dataset, which consists of images of handwritten alphanumeric characters. In the discussion section, the authors analyze the results and motivates other use cases for the proposed GAN architecture. / Datautökning är en metod som skapar mer träningsdata genom att utöka befintlig träningsdata, där träningsdatan används för att anpassa modellers parametrar. Datautökning används på grund av en brist på träningsdata inom vissa områden samt för att minska overfitting. Att utöka ett träningsdataset för att genomföra bildklassificering med ett generativt adversarialt nätverk (GAN) har visats kunna öka precisionen av klassificering av bilder. Denna rapport undersöker om transferlärande inom en GAN kan vidare öka klassificeringsprecisionen när ett utökat träningsdataset används. Metoden beskriver en specific GANarkitektur som innehåller ett etikettvillkor. När transferlärande används inom den utvalda GAN-arkitekturen visar en statistisk analys en statistiskt säkerställd ökning av klassificeringsprecisionen för ett klassificeringsproblem med EMNIST datasetet, som innehåller bilder på handskrivna bokstäver och siffror. I diskussionen diskuteras orsakerna bakom resultaten och fler användningsområden nämns. data augmentation generative adversarial networks GAN image classification transfer learning image generator generating training data machine learning Computer and Information Sciences Data- och informationsvetenskap
284	Effekten av textaugmenteringsstrategier på träffsäkerhet, F1-värde och viktat F1-värde / The effect of text data augmentation strategies on Accuracy, F1-score, and weighted F1-score Svedberg, Jonatan, Shmas, George January 2021 (has links) Att utveckla en sofistikerad chatbotlösning kräver stora mängder textdata för att kunna anpassalösningen till en specifik domän. Att manuellt skapa en komplett uppsättning textdata, specialanpassat för den givna domänen och innehållandes ett stort antal varierande meningar som en människa kan tänkas yttra, är ett enormt tidskrävande arbete. För att kringgå detta tillämpas dataaugmentering för att generera mer data utifrån en mindre uppsättning redan existerande textdata. Softronic AB vill undersöka alternativa strategier för dataaugmentering med målet att eventuellt ersätta den nuvarande lösningen med en mer vetenskapligt underbyggd sådan. I detta examensarbete har prototypmodeller utvecklats för att jämföra och utvärdera effekten av olika textaugmenteringsstrategier. Resultatet av genomförda experiment med prototypmodellerna visar att augmentering genom synonymutbyten med en domänanpassad synonymordlista, presenterade märkbart förbättrade effekter på förmågan hos en NLU-modell att korrekt klassificera data, gentemot övriga utvärderade strategier. Vidare indikerar resultatet att ett samband föreligger mellan den strukturella variationsgraden av det augmenterade datat och de tillämpade språkparens semantiska likhetsgrad under tillbakaöversättningar. / Developing a sophisticated chatbot solution requires large amounts of text data to be able to adapt the solution to a specific domain. Manually creating a complete set of text data, specially adapted for the given domain, and containing a large number of varying sentences that a human conceivably can express, is an exceptionally time-consuming task. To circumvent this, data augmentation is applied to generate more data based on a smaller set of already existing text data. Softronic AB wants to investigate alternative strategies for data augmentation with the aim of possibly replacing the current solution with a more scientifically substantiated one. In this thesis, prototype models have been developed to compare and evaluate the effect of different text augmentation strategies. The results of conducted experiments with the prototype models show that augmentation through synonym swaps with a domain-adapted thesaurus, presented noticeably improved effects on the ability of an NLU-model to correctly classify data, compared to other evaluated strategies. Furthermore, the result indicates that there is a relationship between the structural degree of variation of the augmented data and the applied language pair's semantic degree of similarity during back-translations. Text data augmentation noise injection synonym swap back translation RASA NLU F1-score Textdataaugmentering brusinjektion synonymutbyte tillbakaöversättning RASA NLU F1-värde
285	Flow-induced Responses of Normal, Bowed, and Augmented Synthetic Vocal Fold Models Murray, Preston Roylance 10 August 2011 (has links) (PDF) The voice is the primary mode of communication for humans. Because the voice is so important, voice disorders tend to severely diminish quality of life. A better understanding of the physics of voice production can help to improve treatment of voice disorders. For this thesis research a self-oscillating synthetic vocal fold model was developed, compared with previous synthetic vocal fold models, and used to explore the physical effects of augmentation injections on vibration dynamics. The research was conducted in two stages. First, four vocal fold models were evaluated by quantifying onset pressure, frequency, maximum glottal gap, flow rate, and medial surface motion. The newly developed model, differentiated from the other models by the inclusion of more layers, adjusted geometry, and an extremely soft superficial lamina propria layer, was included in this study. One of the models, created using MRI-derived geometry, had the most defined mucosal wave. The newly-developed model had the lowest onset pressure, flow rate, and smallest maximum glottal width, and the model motion compared very well with published excised human larynx data. Second, the new model was altered to simulate bowing by decreasing the volume of the body layer relative to that of a normal, unbowed model. Two models with varying degrees of bowing were created and tested while paired with normal models. Pre- and post-injection data (onset pressure, vibration frequency, glottal flow rate, open quotient, and high-speed image sequences) were recorded and compared. General pre- to post-injection trends included decreased onset pressure, glottal flow rate, and open quotient, and increased vibration frequency. Additionally, there was a decrease in mucosal wave velocity and an increase in phase angle. The thesis results are anticipated to aid in better understanding the physical effects of augmentation injections, with the ultimate goal of obtaining more consistent surgical outcomes, and also to contribute to the advancement of voice research through the development of the new synthetic model. vocal folds vocal fold modeling mucosal wave high-speed imaging injection laryngoplasty larynx medial surface augmentation Preston R. Murray Mechanical Engineering
286	[pt] AVALIAÇÃO DE AUMENTO DE DADOS VIA GERAÇÃO DE IMAGENS SINTÉTICAS PARA SEGMENTAÇÃO E DETECÇÃO DE PÓLIPOS EM IMAGENS DE COLONOSCOPIA UTILIZANDO APRENDIZADO DE MÁQUINA / [en] EVALUATION OF DATA AUGMENTATION THROUGH SYNTHETIC IMAGES GENERATION FOR SEGMENTATION AND DETECTION OF POLYPS IN COLONOSCOPY IMAGES USING MACHINE LEARNING VICTOR DE ALMEIDA THOMAZ 17 August 2020 (has links) [pt] O câncer de cólon é atualmente a segunda principal causa de morte por câncer no mundo. Nos últimos anos houve um aumento do interesse em pesquisas voltadas para o desenvolvimento de métodos automáticos para detecção de pólipos e os resultados mais relevantes foram alcançados por meio de técnicas de aprendizado profundo. No entanto, o desempenho destas abordagens está fortemente associado ao uso de grandes e variados conjuntos de dados. Amostras de imagens de colonoscopia estão disponíveis publicamente, porém a quantidade e a variação limitada podem ser insuficientes para um treinamento bem-sucedido. O trabalho de pesquisa desta tese propõe uma estratégia para aumentar a quantidade e variação de imagens de colonoscopia, melhorando os resultados de segmentação e detecção de pólipos. Diferentemente de outros trabalhos encontrados na literatura que fazem uso de abordagens tradicionais de aumento de dados (data augmentation) e da combinação de imagens de outras modalidades de exame, esta metodologia enfatiza a criação de novas amostras inserindo pólipos em imagens de colonoscopia publicamente disponíveis. A estratégia de inserção faz uso de pólipos gerados sinteticamente e também de pólipos reais, além de aplicar técnicas de processamento para preservar o aspecto realista das imagens, ao mesmo tempo em que cria automaticamente amostras mais diversas com seus rótulos apropriados para fins de treinamento. As redes neurais convolucionais treinadas com estes conjuntos de dados aprimorados apresentaram resultados promissores no contexto de segmentação e detecção. As melhorias obtidas indicam que a implementação de novos métodos para aprimoramento automático de amostras em conjuntos de imagens médicas tem potencial de afetar positivamente o treinamento de redes convolucionais. / [en] Nowadays colorectal cancer is the second-leading cause of cancer death worldwide. In recent years there has been an increase in interest in research aimed at the development of automatic methods for the detection of polyps and the most relevant results have been achieved through deep learning techniques. However, the performance of these approaches is strongly associated with the use of large and varied datasets. Samples of colonoscopy images are publicly available, but the amount and limited variation may be insufficient for successful training. Based on this observation, a new approach is described in this thesis with the objective of increasing the quantity and variation of colonoscopy images, improving the results of segmentation and detection of polyps. Unlike other works found in the literature that use traditional data augmentation approaches and the combination of images from other exam modalities, the proposed methodology emphasizes the creation of new samples by inserting polyps in publicly available colonoscopy images. The insertion strategy makes use of synthetically generated polyps as well as real polyps, in addition to applying processing techniques to preserve the realistic aspect of the images, while automatically creating more diverse samples with their appropriate labels for training purposes. Convolutional neural networks trained with these improved datasets have shown promising results in the context of segmentation and detection. The improvements obtained indicate that the implementation of new methods for the automatic improvement of samples in medical image datasets has the potential to positively affect the training of convolutional networks. [pt] REDES NEURAIS CONVOLUCIONAIS [pt] COLONOSCOPIA [pt] POLIPOS [pt] AUMENTO DE DADOS [pt] DADOS DE TREINAMENTO [en] CONVOLUTIONAL NEURAL NETWORKS [en] COLONOSCOPY [en] POLYP [en] DATA AUGMENTATION [en] TRAINING DATA
287	Knowledge Base Augmentation from Spreadsheet Data : Combining layout inference with multimodal candidate classification Heyder, Jakob Wendelin January 2020 (has links) Spreadsheets compose a valuable and notably large dataset of documents within many enterprise organizations and on the Web. Although spreadsheets are intuitive to use and equipped with powerful functionalities, extraction and transformation of the data remain a cumbersome and mostly manual task. The great flexibility they provide to the user results in data that is arbitrarily structured and hard to process for other applications. In this paper, we propose a novel architecture that combines supervised layout inference and multimodal candidate classification to allow knowledge base augmentation from arbitrary spreadsheets. In our design, we consider the need for repairing misclassifications and allow for verification and ranking of ambiguous candidates. We evaluate the performance of our system on two datasets, one with single-table spreadsheets, another with spreadsheets of arbitrary format. The evaluation result shows that the proposed system achieves similar performance on single-table spreadsheets compared to state-of-the-art rule-based solutions. Additionally, the flexibility of the system allows us to process arbitrary spreadsheet formats, including horizontally and vertically aligned tables, multiple worksheets, and contextualizing metadata. This was not possible with existing purely text-based or table-based solutions. The experiments demonstrate that it can achieve high effectiveness with an F1 score of 95.71 on arbitrary spreadsheets that require the interpretation of surrounding metadata. The precision of the system can be further increased by applying candidate schema-matching based on semantic similarity of column headers. / Kalkylblad består av ett värdefullt och särskilt stort datasätt av dokument inom många företagsorganisationer och på webben. Även om kalkylblad är intuitivt att använda och är utrustad med kraftfulla funktioner, utvinning och transformation av data är fortfarande en besvärlig och manuell uppgift. Den stora flexibiliteten som de ger användaren resulterar i data som är godtyckligt strukturerade och svåra att bearbeta för andra applikationer. I det här förslaget föreslår vi en ny arkitektur som kombinerar övervakad layoutinferens och multimodal kandidatklassificering för att tillåta kunskapsbasförstärkning från godtyckliga kalkylblad. I vår design överväger vi behovet av att reparera felklassificeringar och möjliggöra verifiering och rangordning av tvetydiga kandidater. Vi utvärderar systemets utförande på två datasätt, en med singeltabellkalkylblad, en annan med kalkylblad av godtyckligt format. Utvärderingsresultatet visar att det föreslagna systemet uppnår liknande prestanda på singel-tabellkalkylblad jämfört med state-of-the-art regelbaserade lösningar. Dessutom tillåter systemets flexibilitet oss att bearbeta godtyckliga kalkylark format, inklusive horisontella och vertikala inriktade tabeller, flera kalkylblad och sammanhangsförande metadata. Detta var inte möjligt med existerande rent textbaserade eller tabellbaserade lösningar. Experimenten visar att det kan uppnå hög effektivitet med en F1-poäng på 95.71 på godtyckliga kalkylblad som kräver tolkning av omgivande metadata. Systemets precision kan ökas ytterligare genom att applicera schema-matchning av kandidater baserat på semantisk likhet mellan kolumnrubriker. Data extraction Data transformation Knowledge base augmentation Machine learning Table understanding Spreadsheets Datainsamling Datatransformation Kunskapsbasförstärkning Maskininlärning Tabellförståelse Kalkylblad Computer and Information Sciences Data- och informationsvetenskap
288	Impact of data augmentations when training the Inception model for image classification Barai, Milad, Heikkinen, Anthony January 2017 (has links) Image classification is the process of identifying to which class a previously unobserved object belongs to. Classifying images is a commonly occurring task in companies. Currently many of these companies perform this classification manually. Automated classification however, has a lower expected accuracy. This thesis examines how automated classification could be improved by the addition of augmented data into the learning process of the classifier. We conduct a quantitative empirical study on the effects of two image augmentations, random horizontal/vertical flips and random rotations (<180◦). The data set that is used is from an auction house search engine under the commercial name of Barnebys. The data sets contain 700 000, 50 000 and 28 000 images with each set containing 28 classes. In this bachelor’s thesis, we re-trained a convolutional neural network model called the Inception-v3 model with the two larger data sets. The remaining set is used to get more class specific accuracies. In order to get a more accurate value of the effects we used a tenfold cross-validation method. Results of our quantitative study shows that the Inception-v3 model can reach a base line mean accuracy of 64.5% (700 000 data set) and a mean accuracy of 51.1% (50 000 data set). The overall accuracy decreased with augmentations on our data sets. However, our results display an increase in accuracy for some classes. The highest flat accuracy increase observed is in the class "Whine & Spirits" in the small data set where it went from 42.3% correctly classified images to 72.7% correctly classified images of the specific class. / Bildklassificering är uppgiften att identifiera vilken klass ett tidigare osett objekt tillhör. Att klassificera bilder är en vanligt förekommande uppgift hos företag. För närvarande utför många av dessa företag klassificering manuellt. Automatiserade klassificerare har en lägre förväntad nogrannhet. I detta examensarbete studeradas hur en maskinklassificerar kan förbättras genom att lägga till ytterligare förändrad data i inlärningsprocessen av klassificeraren. Vi genomför en kvantitativ empirisk studie om effekterna av två bildförändringar, slumpmässiga horisontella/vertikala speglingar och slumpmässiga rotationer (<180◦). Bilddatasetet som används är från ett auktionshus sökmotor under det kommersiella namnet Barnebys. De dataseten som används består av tre separata dataset, 700 000, 50 000 och 28 000 bilder. Var och en av dataseten innehåller 28 klasser vilka mappas till verksamheten. I det här examensarbetet har vi tränat Inception-v3-modellen med dataset av storlek 700 000 och 50 000. Vi utvärderade sedan noggrannhet av de tränade modellerna genom att klassificera 28 000-datasetet. För att få ett mer exakt värde av effekterna använde vi en tiofaldig korsvalideringsmetod. Resultatet av vår kvantitativa studie visar att Inceptionv3-modellen kan nå en genomsnittlig noggrannhet på 64,5% (700 000 dataset) och en genomsnittlig noggrannhet på 51,1% (50 000 dataset). Den övergripande noggrannheten minskade med förändringar på vårat dataset. Dock visar våra resultat en ökad noggrannhet i vissa klasser. Den observerade högsta noggrannhetsökningen var i klassen Åhine & Spirits", där vi gick från 42,3 % korrekt klassificerade bilder till 72,7 % korrekt klassificerade bilder i det lilla datasetet med förändringar. Image Classification Image Recognition Inception Data Augmentation Convolutional Neural Network Machine Learning Bildklassificering Bildigenkänning Inception Data förändring Convolutional Neural Network Maskininlärning Computer and Information Sciences Data- och informationsvetenskap
289	Character Recognition in Natural Images Utilising TensorFlow / Teckenigenkänning i naturliga bilder med TensorFlow Viklund, Alexander, Nimstad, Emma January 2017 (has links) Convolutional Neural Networks (CNNs) are commonly used for character recognition. They achieve the lowest error rates for popular datasets such as SVHN and MNIST. Usage of CNN is lacking in research about character classification in natural images regarding the whole English alphabet. This thesis conducts an experiment where TensorFlow is used to construct a CNN that is trained and tested on the Chars74K dataset, with 15 images per class for training and 15 images per class for testing. This is done with the aim of achieving a higher accuracy than the non-CNN approach by de Campos et al. [1], that achieved 55.26%. The thesis explores data augmentation techniques for expanding the small training set and evaluates the result of applying rotation, stretching, translation and noise-adding. The result of this is that all of these methods apart from adding noise gives a positive effect on the accuracy of the network. Furthermore, the experiment shows that with a three layered convolutional neural network it is possible to create a character classifier that is as good as de Campos et al.'s. It is believed that even better results can be achieved if more experiments would be conducted on the parameters of the network and the augmentation. / Det är vanligt att använda konvolutionära artificiella neuronnät (CNN) för bildigenkänning, då de ger de minsta felmarginalerna på kända datamängder som SVHN och MNIST. Dock saknas det forskning om användning av CNN för klassificering av bokstäver i naturliga bilder när det gäller hela det engelska alfabetet. Detta arbete beskriver ett experiment där TensorFlow används för att bygga ett CNN som tränas och testas med bilder från Chars74K. 15 bilder per klass används för träning och 15 per klass för testning. Målet med detta är att uppnå högre noggrannhet än 55.26%, vilket är vad de campos et al. [1] uppnådde med en metod utan artificiella neuronnät. I rapporten utforskas olika tekniker för att artificiellt utvidga den lilla datamängden, och resultatet av att applicera rotation, utdragning, translation och bruspåslag utvärderas. Resultatet av det är att alla dessa metoder utom bruspåslag ger en positiv effekt på nätverkets noggrannhet. Vidare visar experimentet att med ett CNN med tre lager går det att skapa en bokstavsklassificerare som är lika bra som de Campos et al.s klassificering. Om fler experiment skulle genomföras på nätverkets och utvidgningens parametrar är det troligt att ännu bättre resultat kan uppnås. character recognition natural images TensorFlow data augmentation neural networks Chars74K convolutional teckenigenkänning naturliga bilder TensorFlow dataaugmentering neurala nätverk neuronnät Chars74K Computer Sciences Datavetenskap (datalogi)
290	Data Augmentations for Improving Vision-Based Damage Detection : in Land Transport Infrastructure / Dataökningar för att förbättra bildbaserade sprickdetektering : i landtransportinfrastruktur Siripatthiti, Punnawat January 2023 (has links) Crack, a typical term most people know, is a common form of distress or damage in road pavements and railway sleepers. It poses significant challenges to their structural integrity, safety, and longevity. Over the years, researchers have developed various data-driven technologies for image-based crack detection in road and sleeper applications. The image-based crack detection has become a promising field. Many researchers use ensemble learning to win the Road Damage Detection Challenge. The challenge provides a street view dataset from several countries from different perspectives. The version of the dataset is 2020, which contains images from Japan, India, and Czech. Thus, the dataset inherits a domain shift problem. Current solutions use ensemble learning to deal with such a problem. Those solutions require much computational power and challenge adaptability in real-time applications. To mitigate the problem, the thesis experiments with various data augmentation techniques that could improve the base model performance. The main focuses are erasing a crack from an image using generative AI (Erase), implementing road segmentation by using the Panoptic Segmentation (RS) and injecting a perspective-aware synthetic crack (InjectPa) into the segmented road surface in the image. The results show that compared to the base model, the Erase + RS techniques improve the model's F1 score when trained only on Japan in the dataset rather than when trained on three countries simultaneously. Moreover, the InjectPa technique does not help improve the base model in both scenarios. Then, the experiment moved to the SBB dataset containing close-up images of sleepers from cameras mounted in front of the diagnostic vehicle. This section follows the same techniques but changes the segmentation model to the Segment Anything Model (SAM) because the previous segmentation model was trained on a street view dataset, making it vulnerable to close-up images. The Erase + SAM techniques show improvement in bbox/AP and validation loss. Nevertheless, it does not improve the F1 score significantly compared to the base model. This thesis also applies the explainable AI name D-RISE to determine which feature most influences the model decision. D-RISE shows that the augmentation model can pay attention to the damage type pothole for road pavements and defect type spalling for sleepers than other types. Finally, the thesis discusses the results and suggests a strategy for future study. / Sprickor, en typisk term som de flesta känner till, är en vänlig form av skador i vägbeläggningar och järnvägsslipers. Det innebär betydande utmaningar för strukturella integritet, säkerhet och livslängd. Under årens lopp har olika datadrivna tekniker utvecklats för bildbaserade sprickdetektering i vägbeläggningar och järnvägsslipers applikationer. Den bildbaserade sprickdetekteringen har blivit ett lovande område. Många forskare använder ensembleinlärningsmodeller för att vinna den Road Damage Detection Challenge (Vägbeläggningar Detektering Utmaning). Utmaningen ger en Gatuvy dataset från flera länder från olika perspektiv. Versionen av datasetet är 2020 som innehåller bilder från Japan, Indien och Tjeckien. Därför ärver datasetet ett domänskiftproblem. Nuvarande lösningar använder ensembleinlärning för att hantera ett sådant problem. Dessa lösningar kräver mycket datorkraft och utmanar anpassningsförmågan i realtidsapplikationer. För att mildra problemet, denna avhandling prover många tekniker för dataökningar som kan förbättra basmodellens prestanda. Huvudfokusen är att radera en spricka från en bild via en generativ AI (Erase), implementera vägyta segmentering via den Panoptic Segmentation (RS), lägga en persective-aware syntetik spricka (InjectPa) till segmenterade vögytan in bilden. Resultaten visar att den Erase + RS ökningsteknikerna förbättrar modellens F1 score när den tränas på Japan i datasetet i stället för att tränas alla länder samtidigt. Dessutom förbättrar den InjectPa tekniken inte basmodellen på båda fallen. Därefter flyttades experimentet till SBB-datasetet som innehåller närbilder av järnvägsslipers från kameror monterades framför ett diagnosfordon. Denna section följer de samma teknikerna men ändra segmentering modellen till den Segment Anything Model (SAM) eftersom förra segmentering modellen tränades på en Gatuvy dataset vilket gör den sårbar för närbilder. Den Erase + SAM ökningsteknikerna visar förbättringar på bbox/AP och validering. Ändå förbättrade den inte F1 score avsevört jämfört med basmodellen. Denna avhandling tillämpar också Förklarbar AI-namnet D-RISE för att avgöra vilken funktion som mest påverkar modellbeslutet. D-RISE visar att modellen som har dataökning kan uppmärksamma skadetypen potthål för vägbeläggningar och defekttypen spjälkning för järnvägsslipers än andra typer. Slutligen diskuterar avhandlingen resultaten och föreslår en strategi för framtida arbetsinsatser. Computer Vision Data Augmentation Object Detection Crack Detection Road Damage Detection Sleeper Defect Detection datorseende dataökning objektdetektering sprickdetektering vägbeläggning järnvägsslipers Civil Engineering Samhällsbyggnadsteknik

Search results