• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 58
  • 6
  • 3
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 87
  • 87
  • 38
  • 37
  • 21
  • 21
  • 19
  • 19
  • 18
  • 17
  • 16
  • 14
  • 14
  • 14
  • 13
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
51

Advanced Data Augmentation : With Generative Adversarial Networks and Computer-Aided Design

Thaung, Ludwig January 2020 (has links)
CNN-based (Convolutional Neural Network) visual object detectors often reach human level of accuracy but need to be trained with large amounts of manually annotated data. Collecting and annotating this data can frequently be time-consuming and financially expensive. Using generative models to augment the data can help minimize the amount of data required and increase detection per-formance. Many state-of-the-art generative models are Generative Adversarial Networks (GANs). This thesis investigates if and how one can utilize image data to generate new data through GANs to train a YOLO-based (You Only Look Once) object detector, and how CAD (Computer-Aided Design) models can aid in this process. In the experiments, different models of GANs are trained and evaluated by visual inspection or with the Fréchet Inception Distance (FID) metric. The data provided by Ericsson Research consists of images of antenna and baseband equipment along with annotations and segmentations. Ericsson Research supplied the YOLO detector, and no modifications are made to this detector. Finally, the YOLO detector is trained on data generated by the chosen model and evaluated by the Average Precision (AP). The results show that the generative models designed in this work can produce RGB images of high quality. However, the quality reduces if binary segmentation masks are to be generated as well. The experiments with CAD input data did not result in images that could be used for the training of the detector. The GAN designed in this work is able to successfully replace objects in images with the style of other objects. The results show that training the YOLO detector with GAN-modified data compared to training with real data leads to the same detection performance. The results also show that the shapes and backgrounds of the antennas contributed more to detection performance than their style and colour.
52

Uncertainty Estimation in Volumetric Image Segmentation

Park, Donggyun January 2023 (has links)
The performance of deep neural networks and estimations of their robustness has been rapidly developed. In contrast, despite the broad usage of deep convolutional neural networks (CNNs)[1] for medical image segmentation, research on their uncertainty estimations is being far less conducted. Deep learning tools in their nature do not capture the model uncertainty and in this sense, the output of deep neural networks needs to be critically analysed with quantitative measurements, especially for applications in the medical domain. In this work, epistemic uncertainty, which is one of the main types of uncertainties (epistemic and aleatoric) is analyzed and measured for volumetric medical image segmentation tasks (and possibly more diverse methods for 2D images) at pixel level and structure level. The deep neural network employed as a baseline is 3D U-Net architecture[2], which shares the essential structural concept with U-Net architecture[3], and various techniques are applied to quantify the uncertainty and obtain statistically meaningful results, including test-time data augmentation and deep ensembles. The distribution of the pixel-wise predictions is estimated by Monte Carlo simulations and the entropy is computed to quantify and visualize how uncertain (or certain) the predictions of each pixel are. During the estimation, given the increased network training time in volumetric image segmentation, training an ensemble of networks is extremely time-consuming and thus the focus is on data augmentation and test-time dropouts. The desired outcome is to reduce the computational costs of measuring the uncertainty of the model predictions while maintaining the same level of estimation performance and to increase the reliability of the uncertainty estimation map compared to the conventional methods. The proposed techniques are evaluated on publicly available volumetric image datasets, Combined Healthy Abdominal Organ Segmentation (CHAOS, a set of 3D in-vivo images) from Grand Challenge (https://chaos.grand-challenge.org/). Experiments with the liver segmentation task in 3D Computed Tomography (CT) show the relationship between the prediction accuracy and the uncertainty map obtained by the proposed techniques. / Prestandan hos djupa neurala nätverk och estimeringar av deras robusthet har utvecklats snabbt. Däremot, trots den breda användningen av djupa konvolutionella neurala nätverk (CNN) för medicinsk bildsegmentering, utförs mindre forskning om deras osäkerhetsuppskattningar. Verktyg för djupinlärning fångar inte modellosäkerheten och därför måste utdata från djupa neurala nätverk analyseras kritiskt med kvantitativa mätningar, särskilt för tillämpningar inom den medicinska domänen. I detta arbete analyseras och mäts epistemisk osäkerhet, som är en av huvudtyperna av osäkerheter (epistemisk och aleatorisk) för volymetriska medicinska bildsegmenteringsuppgifter (och möjligen fler olika metoder för 2D-bilder) på pixelnivå och strukturnivå. Det djupa neurala nätverket som används som referens är en 3D U-Net-arkitektur [2] och olika tekniker används för att kvantifiera osäkerheten och erhålla statistiskt meningsfulla resultat, inklusive testtidsdata-augmentering och djupa ensembler. Fördelningen av de pixelvisa förutsägelserna uppskattas av Monte Carlo-simuleringar och entropin beräknas för att kvantifiera och visualisera hur osäkra (eller säkra) förutsägelserna för varje pixel är. Under uppskattningen, med tanke på den ökade nätverksträningstiden i volymetrisk bildsegmentering, är träning av en ensemble av nätverk extremt tidskrävande och därför ligger fokus på dataaugmentering och test-time dropouts. Det önskade resultatet är att minska beräkningskostnaderna för att mäta osäkerheten i modellförutsägelserna samtidigt som man bibehåller samma nivå av estimeringsprestanda och ökar tillförlitligheten för kartan för osäkerhetsuppskattning jämfört med de konventionella metoderna. De föreslagna teknikerna kommer att utvärderas på allmänt tillgängliga volymetriska bilduppsättningar, Combined Healthy Abdominal Organ Segmentation (CHAOS, en uppsättning 3D in-vivo-bilder) från Grand Challenge (https://chaos.grand-challenge.org/). Experiment med segmenteringsuppgiften för lever i 3D Computed Tomography (CT) vissambandet mellan prediktionsnoggrannheten och osäkerhetskartan som erhålls med de föreslagna teknikerna.
53

Compare Accuracy of Alternative Methods for Sound Classification on Environmental Sounds of Similar Characteristics

Rudberg, Olov January 2022 (has links)
Artificial neural networks have in the last decade been a vital tool in image recognition, signal processing and speech recognition. Because of these networks' ability to be highly flexible, they suit a vast amount of different data. This flexible attribute is very sought for within the field of environmental sound classification. This thesis seeks to investigate if audio from three types of water usage can be distinguished and classified. The usage types investigated are handwashing, showering and WC-flushing. The data originally consisted of sound recordings in WAV format. The recordings were converted into spectrograms, which are visual representations of audio signals. Two neural networks are addressed for this image classification issue, namely a Multilayer Perceptron (MLP) and a Convolutional Neural Network (CNN). Further, these spectrograms are subject to both image preprocessing using a Sobel filter, a Canny edge detector and a Gabor filter while also being subjected to data augmentation by applying different brightness and zooming alterations. The result showed that the CNN gave superior results compared to the MLP. The image preprocessing techniques did not improve the data and the model performances, neither did augmentation or a combination between them. An important finding was that constructing the convolutional and pooling filters of the CNN into rectangular shapes and using every other filter type horizontally and vertically on the input spectrogram gave superior results. It seemed to capture more information of the spectrograms since spectrograms mainly contain information in a horizontal or vertical direction. This model achieved 91.14% accuracy. The result stemming from this model architecture  further contributes to the environmental sound classification community. / <p>Masters thesis approved 20th june 2022.</p>
54

Effekten av textaugmenteringsstrategier på träffsäkerhet, F1-värde och viktat F1-värde / The effect of text data augmentation strategies on Accuracy, F1-score, and weighted F1-score

Svedberg, Jonatan, Shmas, George January 2021 (has links)
Att utveckla en sofistikerad chatbotlösning kräver stora mängder textdata för att kunna anpassalösningen till en specifik domän. Att manuellt skapa en komplett uppsättning textdata, specialanpassat för den givna domänen och innehållandes ett stort antal varierande meningar som en människa kan tänkas yttra, är ett enormt tidskrävande arbete. För att kringgå detta tillämpas dataaugmentering för att generera mer data utifrån en mindre uppsättning redan existerande textdata. Softronic AB vill undersöka alternativa strategier för dataaugmentering med målet att eventuellt ersätta den nuvarande lösningen med en mer vetenskapligt underbyggd sådan. I detta examensarbete har prototypmodeller utvecklats för att jämföra och utvärdera effekten av olika textaugmenteringsstrategier. Resultatet av genomförda experiment med prototypmodellerna visar att augmentering genom synonymutbyten med en domänanpassad synonymordlista, presenterade märkbart förbättrade effekter på förmågan hos en NLU-modell att korrekt klassificera data, gentemot övriga utvärderade strategier. Vidare indikerar resultatet att ett samband föreligger mellan den strukturella variationsgraden av det augmenterade datat och de tillämpade språkparens semantiska likhetsgrad under tillbakaöversättningar. / Developing a sophisticated chatbot solution requires large amounts of text data to be able to adapt the solution to a specific domain. Manually creating a complete set of text data, specially adapted for the given domain, and containing a large number of varying sentences that a human conceivably can express, is an exceptionally time-consuming task. To circumvent this, data augmentation is applied to generate more data based on a smaller set of already existing text data. Softronic AB wants to investigate alternative strategies for data augmentation with the aim of possibly replacing the current solution with a more scientifically substantiated one. In this thesis, prototype models have been developed to compare and evaluate the effect of different text augmentation strategies. The results of conducted experiments with the prototype models show that augmentation through synonym swaps with a domain-adapted thesaurus, presented noticeably improved effects on the ability of an NLU-model to correctly classify data, compared to other evaluated strategies. Furthermore, the result indicates that there is a relationship between the structural degree of variation of the augmented data and the applied language pair's semantic degree of similarity during back-translations.
55

[pt] AVALIAÇÃO DE AUMENTO DE DADOS VIA GERAÇÃO DE IMAGENS SINTÉTICAS PARA SEGMENTAÇÃO E DETECÇÃO DE PÓLIPOS EM IMAGENS DE COLONOSCOPIA UTILIZANDO APRENDIZADO DE MÁQUINA / [en] EVALUATION OF DATA AUGMENTATION THROUGH SYNTHETIC IMAGES GENERATION FOR SEGMENTATION AND DETECTION OF POLYPS IN COLONOSCOPY IMAGES USING MACHINE LEARNING

VICTOR DE ALMEIDA THOMAZ 17 August 2020 (has links)
[pt] O câncer de cólon é atualmente a segunda principal causa de morte por câncer no mundo. Nos últimos anos houve um aumento do interesse em pesquisas voltadas para o desenvolvimento de métodos automáticos para detecção de pólipos e os resultados mais relevantes foram alcançados por meio de técnicas de aprendizado profundo. No entanto, o desempenho destas abordagens está fortemente associado ao uso de grandes e variados conjuntos de dados. Amostras de imagens de colonoscopia estão disponíveis publicamente, porém a quantidade e a variação limitada podem ser insuficientes para um treinamento bem-sucedido. O trabalho de pesquisa desta tese propõe uma estratégia para aumentar a quantidade e variação de imagens de colonoscopia, melhorando os resultados de segmentação e detecção de pólipos. Diferentemente de outros trabalhos encontrados na literatura que fazem uso de abordagens tradicionais de aumento de dados (data augmentation) e da combinação de imagens de outras modalidades de exame, esta metodologia enfatiza a criação de novas amostras inserindo pólipos em imagens de colonoscopia publicamente disponíveis. A estratégia de inserção faz uso de pólipos gerados sinteticamente e também de pólipos reais, além de aplicar técnicas de processamento para preservar o aspecto realista das imagens, ao mesmo tempo em que cria automaticamente amostras mais diversas com seus rótulos apropriados para fins de treinamento. As redes neurais convolucionais treinadas com estes conjuntos de dados aprimorados apresentaram resultados promissores no contexto de segmentação e detecção. As melhorias obtidas indicam que a implementação de novos métodos para aprimoramento automático de amostras em conjuntos de imagens médicas tem potencial de afetar positivamente o treinamento de redes convolucionais. / [en] Nowadays colorectal cancer is the second-leading cause of cancer death worldwide. In recent years there has been an increase in interest in research aimed at the development of automatic methods for the detection of polyps and the most relevant results have been achieved through deep learning techniques. However, the performance of these approaches is strongly associated with the use of large and varied datasets. Samples of colonoscopy images are publicly available, but the amount and limited variation may be insufficient for successful training. Based on this observation, a new approach is described in this thesis with the objective of increasing the quantity and variation of colonoscopy images, improving the results of segmentation and detection of polyps. Unlike other works found in the literature that use traditional data augmentation approaches and the combination of images from other exam modalities, the proposed methodology emphasizes the creation of new samples by inserting polyps in publicly available colonoscopy images. The insertion strategy makes use of synthetically generated polyps as well as real polyps, in addition to applying processing techniques to preserve the realistic aspect of the images, while automatically creating more diverse samples with their appropriate labels for training purposes. Convolutional neural networks trained with these improved datasets have shown promising results in the context of segmentation and detection. The improvements obtained indicate that the implementation of new methods for the automatic improvement of samples in medical image datasets has the potential to positively affect the training of convolutional networks.
56

Impact of data augmentations when training the Inception model for image classification

Barai, Milad, Heikkinen, Anthony January 2017 (has links)
Image classification is the process of identifying to which class a previously unobserved object belongs to. Classifying images is a commonly occurring task in companies. Currently many of these companies perform this classification manually. Automated classification however, has a lower expected accuracy. This thesis examines how automated classification could be improved by the addition of augmented data into the learning process of the classifier. We conduct a quantitative empirical study on the effects of two image augmentations, random horizontal/vertical flips and random rotations (&lt;180◦). The data set that is used is from an auction house search engine under the commercial name of Barnebys. The data sets contain 700 000, 50 000 and 28 000 images with each set containing 28 classes. In this bachelor’s thesis, we re-trained a convolutional neural network model called the Inception-v3 model with the two larger data sets. The remaining set is used to get more class specific accuracies. In order to get a more accurate value of the effects we used a tenfold cross-validation method. Results of our quantitative study shows that the Inception-v3 model can reach a base line mean accuracy of 64.5% (700 000 data set) and a mean accuracy of 51.1% (50 000 data set). The overall accuracy decreased with augmentations on our data sets. However, our results display an increase in accuracy for some classes. The highest flat accuracy increase observed is in the class "Whine &amp; Spirits" in the small data set where it went from 42.3% correctly classified images to 72.7% correctly classified images of the specific class. / Bildklassificering är uppgiften att identifiera vilken klass ett tidigare osett objekt tillhör. Att klassificera bilder är en vanligt förekommande uppgift hos företag. För närvarande utför många av dessa företag klassificering manuellt. Automatiserade klassificerare har en lägre förväntad nogrannhet. I detta examensarbete studeradas hur en maskinklassificerar kan förbättras genom att lägga till ytterligare förändrad data i inlärningsprocessen av klassificeraren. Vi genomför en kvantitativ empirisk studie om effekterna av två bildförändringar, slumpmässiga horisontella/vertikala speglingar och slumpmässiga rotationer (&lt;180◦). Bilddatasetet som används är från ett auktionshus sökmotor under det kommersiella namnet Barnebys. De dataseten som används består av tre separata dataset, 700 000, 50 000 och 28 000 bilder. Var och en av dataseten innehåller 28 klasser vilka mappas till verksamheten. I det här examensarbetet har vi tränat Inception-v3-modellen med dataset av storlek 700 000 och 50 000. Vi utvärderade sedan noggrannhet av de tränade modellerna genom att klassificera 28 000-datasetet. För att få ett mer exakt värde av effekterna använde vi en tiofaldig korsvalideringsmetod. Resultatet av vår kvantitativa studie visar att Inceptionv3-modellen kan nå en genomsnittlig noggrannhet på 64,5% (700 000 dataset) och en genomsnittlig noggrannhet på 51,1% (50 000 dataset). Den övergripande noggrannheten minskade med förändringar på vårat dataset. Dock visar våra resultat en ökad noggrannhet i vissa klasser. Den observerade högsta noggrannhetsökningen var i klassen Åhine &amp; Spirits", där vi gick från 42,3 % korrekt klassificerade bilder till 72,7 % korrekt klassificerade bilder i det lilla datasetet med förändringar.
57

Character Recognition in Natural Images Utilising TensorFlow / Teckenigenkänning i naturliga bilder med TensorFlow

Viklund, Alexander, Nimstad, Emma January 2017 (has links)
Convolutional Neural Networks (CNNs) are commonly used for character recognition. They achieve the lowest error rates for popular datasets such as SVHN and MNIST. Usage of CNN is lacking in research about character classification in natural images regarding the whole English alphabet. This thesis conducts an experiment where TensorFlow is used to construct a CNN that is trained and tested on the Chars74K dataset, with 15 images per class for training and 15 images per class for testing. This is done with the aim of achieving a higher accuracy than the non-CNN approach by de Campos et al. [1], that achieved 55.26%. The thesis explores data augmentation techniques for expanding the small training set and evaluates the result of applying rotation, stretching, translation and noise-adding. The result of this is that all of these methods apart from adding noise gives a positive effect on the accuracy of the network. Furthermore, the experiment shows that with a three layered convolutional neural network it is possible to create a character classifier that is as good as de Campos et al.'s. It is believed that even better results can be achieved if more experiments would be conducted on the parameters of the network and the augmentation. / Det är vanligt att använda konvolutionära artificiella neuronnät (CNN) för bildigenkänning, då de ger de minsta felmarginalerna på kända datamängder som SVHN och MNIST. Dock saknas det forskning om användning av CNN för klassificering av bokstäver i naturliga bilder när det gäller hela det engelska alfabetet. Detta arbete beskriver ett experiment där TensorFlow används för att bygga ett CNN som tränas och testas med bilder från Chars74K. 15 bilder per klass används för träning och 15 per klass för testning. Målet med detta är att uppnå högre noggrannhet än 55.26%, vilket är vad de campos et al. [1] uppnådde med en metod utan artificiella neuronnät. I rapporten utforskas olika tekniker för att artificiellt utvidga den lilla datamängden, och resultatet av att applicera rotation, utdragning, translation och bruspåslag utvärderas. Resultatet av det är att alla dessa metoder utom bruspåslag ger en positiv effekt på nätverkets noggrannhet. Vidare visar experimentet att med ett CNN med tre lager går det att skapa en bokstavsklassificerare som är lika bra som de Campos et al.s klassificering. Om fler experiment skulle genomföras på nätverkets och utvidgningens parametrar är det troligt att ännu bättre resultat kan uppnås.
58

Data Augmentations for Improving Vision-Based Damage Detection : in Land Transport Infrastructure / Dataökningar för att förbättra bildbaserade sprickdetektering : i landtransportinfrastruktur

Siripatthiti, Punnawat January 2023 (has links)
Crack, a typical term most people know, is a common form of distress or damage in road pavements and railway sleepers. It poses significant challenges to their structural integrity, safety, and longevity. Over the years, researchers have developed various data-driven technologies for image-based crack detection in road and sleeper applications. The image-based crack detection has become a promising field.  Many researchers use ensemble learning to win the Road Damage Detection Challenge. The challenge provides a street view dataset from several countries from different perspectives. The version of the dataset is 2020, which contains images from Japan, India, and Czech. Thus, the dataset inherits a domain shift problem. Current solutions use ensemble learning to deal with such a problem. Those solutions require much computational power and challenge adaptability in real-time applications. To mitigate the problem, the thesis experiments with various data augmentation techniques that could improve the base model performance. The main focuses are erasing a crack from an image using generative AI (Erase), implementing road segmentation by using the Panoptic Segmentation (RS) and injecting a perspective-aware synthetic crack (InjectPa) into the segmented road surface in the image. The results show that compared to the base model, the Erase + RS techniques improve the model's F1 score when trained only on Japan in the dataset rather than when trained on three countries simultaneously. Moreover, the InjectPa technique does not help improve the base model in both scenarios. Then, the experiment moved to the SBB dataset containing close-up images of sleepers from cameras mounted in front of the diagnostic vehicle. This section follows the same techniques but changes the segmentation model to the Segment Anything Model (SAM) because the previous segmentation model was trained on a street view dataset, making it vulnerable to close-up images. The Erase + SAM techniques show improvement in bbox/AP and validation loss. Nevertheless, it does not improve the F1 score significantly compared to the base model.  This thesis also applies the explainable AI name D-RISE to determine which feature most influences the model decision. D-RISE shows that the augmentation model can pay attention to the damage type pothole for road pavements and defect type spalling for sleepers than other types. Finally, the thesis discusses the results and suggests a strategy for future study. / Sprickor, en typisk term som de flesta känner till, är en vänlig form av skador i vägbeläggningar och järnvägsslipers. Det innebär betydande utmaningar för strukturella integritet, säkerhet och livslängd. Under årens lopp har olika datadrivna tekniker utvecklats för bildbaserade sprickdetektering i vägbeläggningar och järnvägsslipers applikationer. Den bildbaserade sprickdetekteringen har blivit ett lovande område. Många forskare använder ensembleinlärningsmodeller för att vinna den Road Damage Detection Challenge (Vägbeläggningar Detektering Utmaning). Utmaningen ger en Gatuvy dataset från flera länder från olika perspektiv. Versionen av datasetet är 2020 som innehåller bilder från Japan, Indien och Tjeckien. Därför ärver datasetet  ett domänskiftproblem. Nuvarande lösningar använder ensembleinlärning för att hantera ett sådant problem. Dessa lösningar kräver mycket datorkraft och utmanar anpassningsförmågan i realtidsapplikationer. För att mildra problemet, denna avhandling prover många tekniker för dataökningar som kan förbättra basmodellens prestanda. Huvudfokusen är att radera en spricka från en bild via en generativ AI (Erase), implementera vägyta segmentering via den Panoptic Segmentation (RS), lägga en persective-aware syntetik spricka (InjectPa) till segmenterade vögytan in bilden. Resultaten visar att den Erase + RS ökningsteknikerna förbättrar modellens F1 score när den tränas på Japan i datasetet i stället för att tränas alla länder samtidigt. Dessutom förbättrar den InjectPa tekniken inte basmodellen på båda fallen.  Därefter flyttades experimentet till SBB-datasetet som innehåller närbilder av järnvägsslipers från kameror monterades framför ett diagnosfordon. Denna section följer de samma teknikerna men ändra segmentering modellen till den Segment Anything Model (SAM) eftersom förra segmentering modellen tränades på en Gatuvy dataset vilket gör den sårbar för närbilder. Den Erase + SAM ökningsteknikerna visar förbättringar på bbox/AP och validering. Ändå förbättrade den inte F1 score avsevört jämfört med basmodellen.  Denna avhandling tillämpar också Förklarbar AI-namnet D-RISE för att avgöra vilken funktion som mest påverkar modellbeslutet. D-RISE visar att modellen som har dataökning kan uppmärksamma skadetypen potthål för vägbeläggningar och defekttypen spjälkning för järnvägsslipers än andra typer. Slutligen diskuterar avhandlingen resultaten och föreslår en strategi för framtida arbetsinsatser.
59

Natural Language Processing for Improving Search Query Results : Applied on The Swedish Armed Force's Profession Guide

Harju Schnee, Andreas January 2023 (has links)
Text has been the historical way of preserving and acquiring knowledge, and text data today is an increasingly growing part of the digital footprint together with the need to query this data for information. Seeking information is a constant ongoing process, and is a crucial part of many systems all around us. The ability to perform fast and effective searches is a must when dealing with vast amounts of data. This thesis implements an information retrieval system based on the Swedish Defence Force's profession guide, with the aim to produce a system that retrieves relevant professions based on user defined queries of varying size. A number of Natural Language Processing techniques are investigated and implemented, in order to transform the gathered profession descriptions a document embedding model, doc2vec, was implemented resulting in document vectors that are compared to find similarities between documents. The final system was evaluated by domain experts, represented by active military personal that quantified the relevancy of the profession retrievals into a measurable performance. The system managed to retrieve relevant information for 46.6% and 56.6% of the long- and short text inputs respectively. Resulting in a much more generalized and capable system compared to the search function available at the profession guide today.
60

MULTI-SPECTRAL FUSION FOR SEMANTIC SEGMENTATION NETWORKS

Justin Cody Edwards (14700769) 31 May 2023 (has links)
<p>  </p> <p>Semantic segmentation is a machine learning task that is seeing increased utilization in multiples fields, from medical imagery, to land demarcation, and autonomous vehicles. Semantic segmentation performs the pixel-wise classification of images, creating a new, segmented representation of the input that can be useful for detected various terrain and objects within and image. Recently, convolutional neural networks have been heavily utilized when creating neural networks tackling the semantic segmentation task. This is particularly true in the field of autonomous driving systems.</p> <p>The requirements of automated driver assistance systems (ADAS) drive semantic segmentation models targeted for deployment on ADAS to be lightweight while maintaining accuracy. A commonly used method to increase accuracy in the autonomous vehicle field is to fuse multiple sensory modalities. This research focuses on leveraging the fusion of long wave infrared (LWIR) imagery with visual spectrum imagery to fill in the inherent performance gaps when using visual imagery alone. This comes with a host of benefits, such as increase performance in various lighting conditions and adverse environmental conditions. Utilizing this fusion technique is an effective method of increasing the accuracy of a semantic segmentation model. Being a lightweight architecture is key for successful deployment on ADAS, as these systems often have resource constraints and need to operate in real-time. Multi-Spectral Fusion Network (MFNet) [ 1 ] accomplishes these parameters by leveraging a sensory fusion approach, and as such was selected as the baseline architecture for this research.</p> <p>Many improvements were made upon the baseline architecture by leveraging a variety of techniques. Such improvements include the proposal of a novel loss function categorical cross-entropy dice loss, introduction of squeeze and excitation (SE) blocks, addition of pyramid pooling, a new fusion technique, and drop input data augmentation. These improvements culminated in the creation of the Fast Thermal Fusion Network (FTFNet). Further improvements were made by introducing depthwise separable convolutional layers leading to lightweight FTFNet variants, FTFNet Lite 1 & 2.</p>

Page generated in 0.0535 seconds