• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 167
  • 30
  • 11
  • 7
  • 3
  • 2
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • Tagged with
  • 271
  • 103
  • 80
  • 78
  • 66
  • 50
  • 49
  • 48
  • 47
  • 45
  • 39
  • 38
  • 36
  • 29
  • 28
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
241

Instance Segmentation on depth images using Swin Transformer for improved accuracy on indoor images / Instans-segmentering på bilder med djupinformation för förbättrad prestanda på inomhusbilder

Hagberg, Alfred, Musse, Mustaf Abdullahi January 2022 (has links)
The Simultaneous Localisation And Mapping (SLAM) problem is an open fundamental problem in autonomous mobile robotics. One of the latest most researched techniques used to enhance the SLAM methods is instance segmentation. In this thesis, we implement an instance segmentation system using Swin Transformer combined with two of the state of the art methods of instance segmentation namely Cascade Mask RCNN and Mask RCNN. Instance segmentation is a technique that simultaneously solves the problem of object detection and semantic segmentation. We show that depth information enhances the average precision (AP) by approximately 7%. We also show that the Swin Transformer backbone model can work well with depth images. Our results also show that Cascade Mask RCNN outperforms Mask RCNN. However, the results are to be considered due to the small size of the NYU-depth v2 dataset. Most of the instance segmentation researches use the COCO dataset which has a hundred times more images than the NYU-depth v2 dataset but it does not have the depth information of the image.
242

[en] ENABLING AUTONOMOUS DATA ANNOTATION: A HUMAN-IN-THE-LOOP REINFORCEMENT LEARNING APPROACH / [pt] HABILITANDO ANOTAÇÕES DE DADOS AUTÔNOMOS: UMA ABORDAGEM DE APRENDIZADO POR REFORÇO COM HUMANO NO LOOP

LEONARDO CARDIA DA CRUZ 10 November 2022 (has links)
[pt] As técnicas de aprendizado profundo têm mostrado contribuições significativas em vários campos, incluindo a análise de imagens. A grande maioria dos trabalhos em visão computacional concentra-se em propor e aplicar novos modelos e algoritmos de aprendizado de máquina. Para tarefas de aprendizado supervisionado, o desempenho dessas técnicas depende de uma grande quantidade de dados de treinamento, bem como de dados rotulados. No entanto, a rotulagem é um processo caro e demorado. Uma recente área de exploração são as reduções dos esforços na preparação de dados, deixando-os sem inconsistências, ruídos, para que os modelos atuais possam obter um maior desempenho. Esse novo campo de estudo é chamado de Data-Centric IA. Apresentamos uma nova abordagem baseada em Deep Reinforcement Learning (DRL), cujo trabalho é voltado para a preparação de um conjunto de dados em problemas de detecção de objetos, onde as anotações de caixas delimitadoras são feitas de modo autônomo e econômico. Nossa abordagem consiste na criação de uma metodologia para treinamento de um agente virtual a fim de rotular automaticamente os dados, a partir do auxílio humano como professor desse agente. Implementamos o algoritmo Deep Q-Network para criar o agente virtual e desenvolvemos uma abordagem de aconselhamento para facilitar a comunicação do humano professor com o agente virtual estudante. Para completar nossa implementação, utilizamos o método de aprendizado ativo para selecionar casos onde o agente possui uma maior incerteza, necessitando da intervenção humana no processo de anotação durante o treinamento. Nossa abordagem foi avaliada e comparada com outros métodos de aprendizado por reforço e interação humano-computador, em diversos conjuntos de dados, onde o agente virtual precisou criar novas anotações na forma de caixas delimitadoras. Os resultados mostram que o emprego da nossa metodologia impacta positivamente para obtenção de novas anotações a partir de um conjunto de dados com rótulos escassos, superando métodos existentes. Desse modo, apresentamos a contribuição no campo de Data-Centric IA, com o desenvolvimento de uma metodologia de ensino para criação de uma abordagem autônoma com aconselhamento humano para criar anotações econômicas a partir de anotações escassas. / [en] Deep learning techniques have shown significant contributions in various fields, including image analysis. The vast majority of work in computer vision focuses on proposing and applying new machine learning models and algorithms. For supervised learning tasks, the performance of these techniques depends on a large amount of training data and labeled data. However, labeling is an expensive and time-consuming process. A recent area of exploration is the reduction of efforts in data preparation, leaving it without inconsistencies and noise so that current models can obtain greater performance. This new field of study is called Data-Centric AI. We present a new approach based on Deep Reinforcement Learning (DRL), whose work is focused on preparing a dataset, in object detection problems where the bounding box annotations are done autonomously and economically. Our approach consists of creating a methodology for training a virtual agent in order to automatically label the data, using human assistance as a teacher of this agent. We implemented the Deep Q-Network algorithm to create the virtual agent and developed a counseling approach to facilitate the communication of the human teacher with the virtual agent student. We used the active learning method to select cases where the agent has more significant uncertainty, requiring human intervention in the annotation process during training to complete our implementation. Our approach was evaluated and compared with other reinforcement learning methods and human-computer interaction in different datasets, where the virtual agent had to create new annotations in the form of bounding boxes. The results show that the use of our methodology has a positive impact on obtaining new annotations from a dataset with scarce labels, surpassing existing methods. In this way, we present the contribution in the field of Data-Centric AI, with the development of a teaching methodology to create an autonomous approach with human advice to create economic annotations from scarce annotations.
243

AI-based Quality Inspection forShort-Series Production : Using synthetic dataset to perform instance segmentation forquality inspection / AI-baserad kvalitetsinspektion för kortserieproduktion : Användning av syntetiska dataset för att utföra instans segmentering förkvalitetsinspektion

Russom, Simon Tsehaie January 2022 (has links)
Quality inspection is an essential part of almost any industrial production line. However, designing customized solutions for defect detection for every product can be costlyfor the production line. This is especially the case for short-series production, where theproduction time is limited. That is because collecting and manually annotating the training data takes time. Therefore, a possible method for defect detection using only synthetictraining data focused on geometrical defects is proposed in this thesis work. The methodis partially inspired by previous related work. The proposed method makes use of aninstance segmentation model and pose-estimator. However, this thesis work focuses onthe instance segmentation part while using a pre-trained pose-estimator for demonstrationpurposes. The synthetic data was automatically generated using different data augmentation techniques from a 3D model of a given object. Moreover, Mask R-CNN was primarilyused as the instance segmentation model and was compared with a rival model, HTC. Thetrials show promising results in developing a trainable general-purpose defect detectionpipeline using only synthetic data
244

Real-time hand segmentation using deep learning / Hand-segmentering i realtid som använder djupinlärning

Favia, Federico January 2021 (has links)
Hand segmentation is a fundamental part of many computer vision systems aimed at gesture recognition or hand tracking. In particular, augmented reality solutions need a very accurate gesture analysis system in order to satisfy the end consumers in an appropriate manner. Therefore the hand segmentation step is critical. Segmentation is a well-known problem in image processing, being the process to divide a digital image into multiple regions with pixels of similar qualities. Classify what pixels belong to the hand and which ones belong to the background need to be performed within a real-time performance and a reasonable computational complexity. While in the past mainly light-weight probabilistic and machine learning approaches were used, this work investigates the challenges of real-time hand segmentation achieved through several deep learning techniques. Is it possible or not to improve current state-of-theart segmentation systems for smartphone applications? Several models are tested and compared based on accuracy and processing speed. Transfer learning-like approach leads the method of this work since many architectures were built just for generic semantic segmentation or for particular applications such as autonomous driving. Great effort is spent on organizing a solid and generalized dataset of hands, exploiting the existing ones and data collected by ManoMotion AB. Since the first aim was to obtain a really accurate hand segmentation, in the end, RefineNet architecture is selected and both quantitative and qualitative evaluations are performed, considering its advantages and analysing the problems related to the computational time which could be improved in the future. / Handsegmentering är en grundläggande del av många datorvisionssystem som syftar till gestigenkänning eller handspårning. I synnerhet behöver förstärkta verklighetslösningar ett mycket exakt gestanalyssystem för att tillfredsställa slutkonsumenterna på ett lämpligt sätt. Därför är handsegmenteringssteget kritiskt. Segmentering är ett välkänt problem vid bildbehandling, det vill säga processen att dela en digital bild i flera regioner med pixlar av liknande kvaliteter. Klassificera vilka pixlar som tillhör handen och vilka som hör till bakgrunden måste utföras i realtidsprestanda och rimlig beräkningskomplexitet. Medan tidigare använts huvudsakligen lättviktiga probabilistiska metoder och maskininlärningsmetoder, undersöker detta arbete utmaningarna med realtidshandsegmentering uppnådd genom flera djupinlärningstekniker. Är det möjligt eller inte att förbättra nuvarande toppmoderna segmenteringssystem för smartphone-applikationer? Flera modeller testas och jämförs baserat på noggrannhet och processhastighet. Transfer learning-liknande metoden leder metoden för detta arbete eftersom många arkitekturer byggdes bara för generisk semantisk segmentering eller för specifika applikationer som autonom körning. Stora ansträngningar läggs på att organisera en gedigen och generaliserad uppsättning händer, utnyttja befintliga och data som samlats in av ManoMotion AB. Eftersom det första syftet var att få en riktigt exakt handsegmentering, väljs i slutändan RefineNetarkitekturen och både kvantitativa och kvalitativa utvärderingar utförs med beaktande av fördelarna med det och analys av problemen relaterade till beräkningstiden som kan förbättras i framtiden.
245

[en] GENERALIZATION OF THE DEEP LEARNING MODEL FOR NATURAL GAS INDICATION IN 2D SEISMIC IMAGE BASED ON THE TRAINING DATASET AND THE OPERATIONAL HYPER PARAMETERS RECOMMENDATION / [pt] GENERALIZAÇÃO DO MODELO DE APRENDIZADO PROFUNDO PARA INDICAÇÃO DE GÁS NATURAL EM DADOS SÍSMICOS 2D COM BASE NO CONJUNTO DE DADOS DE TREINAMENTO E RECOMENDAÇÃO DE HIPERPARÂMETROS OPERACIONAIS

LUIS FERNANDO MARIN SEPULVEDA 21 March 2024 (has links)
[pt] A interpretação de imagens sísmicas é uma tarefa essencial em diversas áreas das geociências, sendo um método amplamente utilizado na exploração de hidrocarbonetos. Porém, sua interpretação exige um investimento significativo de recursos, e nem sempre é possível obter um resultado satisfatório. A literatura mostra um número crescente de métodos de Deep Learning, DL, para detecção de horizontes, falhas e potenciais reservatórios de hidrocarbonetos, porém, os modelos para detecção de reservatórios de gás apresentam dificuldades de desempenho de generalização, ou seja, o desempenho fica comprometido quando utilizados em imagens sísmicas de novas explorações campanhas. Este problema é especialmente verdadeiro para levantamentos terrestres 2D, onde o processo de aquisição varia e as imagens apresentam muito ruído. Este trabalho apresenta três métodos para melhorar o desempenho de generalização de modelos DL de indicação de gás natural em imagens sísmicas 2D, para esta tarefa são utilizadas abordagens provenientes de Machine Learning, ML e DL. A pesquisa concentra-se na análise de dados para reconhecer padrões nas imagens sísmicas para permitir a seleção de conjuntos de treinamento para o modelo de inferência de gás com base em padrões nas imagens alvo. Esta abordagem permite uma melhor generalização do desempenho sem alterar a arquitetura do modelo DL de inferência de gás ou transformar os traços sísmicos originais. Os experimentos foram realizados utilizando o banco de dados de diferentes campos de exploração localizados na bacia do Parnaíba, no Nordeste do Brasil. Os resultados mostram um aumento de até 39 por cento na indicação correta do gás natural de acordo com a métrica de recall. Esta melhoria varia em cada campo e depende do método proposto utilizado e da existência de padrões representativos dentro do conjunto de treinamento de imagens sísmicas. Estes resultados concluem com uma melhoria no desempenho de generalização do modelo de inferência de gases DL que varia até 21 por cento de acordo com a pontuação F1 e até 15 por cento de acordo com a métrica IoU. Estes resultados demonstram que é possível encontrar padrões dentro das imagens sísmicas usando uma abordagem não supervisionada, e estas podem ser usadas para recomendar o conjunto de treinamento DL de acordo com o padrão na imagem sísmica alvo; Além disso, demonstra que o conjunto de treinamento afeta diretamente o desempenho de generalização do modelo DL para imagens sísmicas. / [en] Interpreting seismic images is an essential task in diverse fields of geosciences, and it s a widely used method in hydrocarbon exploration. However, its interpretation requires a significant investment of resources, and obtaining a satisfactory result is not always possible. The literature shows an increasing number of Deep Learning, DL, methods to detect horizons, faults, and potential hydrocarbon reservoirs, nevertheless, the models to detect gas reservoirs present generalization performance difficulties, i.e., performance is compromised when used in seismic images from new exploration campaigns. This problem is especially true for 2D land surveys where the acquisition process varies, and the images are very noisy. This work presents three methods to improve the generalization performance of DL models of natural gas indication in 2D seismic images, for this task, approaches that come from Machine Learning, ML, and DL are used. The research focuses on data analysis to recognize patterns within the seismic images to enable the selection of training sets for the gas inference model based on patterns in the target images. This approach allows a better generalization of performance without altering the architecture of the gas inference DL model or transforming the original seismic traces. The experiments were carried out using the database of different exploitation fields located in the Parnaíba basin, in northeastern Brazil. The results show an increase of up to 39 percent in the correct indication of natural gas according to the recall metric. This improvement varies in each field and depends on the proposed method used and the existence of representative patterns within the training set of seismic images. These results conclude with an improvement in the generalization performance of the DL gas inference model that varies up to 21 percent according to the F1 score and up to 15 percent according to the IoU metric. These results demonstrate that it is possible to find patterns within the seismic images using an unsupervised approach, and these can be used to recommend the DL training set according to the pattern in the target seismic image; Furthermore, it demonstrates that the training set directly affects the generalization performance of the DL model for seismic images.
246

Fighting Unstructured Data with Formatting Methods : Navigating Crisis Communication: The Role of CAP in Effective Information Dissemination / Bekämpar ostrukturerad data med formateringsmetoder : Att navigera i kriskommunikation: CAP:s roll i effektiv informationsspridning

Spridzans, Alfreds January 2024 (has links)
This study investigates the format of crisis communication by analysing a news archive dataset from Krisinformation.se, a Swedish website dedicated to sharing information about crises. The primary goal is to assess the dataset's structure and efficacy in meeting the Common Alerting Protocol (CAP) criteria, an internationally recognised format for emergency alerts. The study uses quantitative text analysis and data preprocessing tools like Python and Power Query to identify inconsistencies in the present dataset format. These anomalies limit the dataset's usefulness for extensive research and effective crisis communication. To address these issues, the study constructs two new datasets with enhanced column structures that rectify the identified problems. These refined datasets aim to improve the clarity and accessibility of information regarding crisis events, providing valuable insights into the nature and frequency of these incidents. Additionally, the research offers practical recommendations for optimising the dataset format to better align with CAP standards, enhancing the overall effectiveness of crisis communication on the platform. The findings highlight the critical role of structured and standardised data formats in crisis communication, particularly in the context of increasing climate-related hazards and other emergencies. By improving the dataset format, the study contributes to more efficient data analysis and better preparedness for future crises. The insights gained from this research are intended to assist other analysts and researchers in conducting more robust studies, ultimately aiding in developing more resilient and responsive crisis communication strategies. / Denna studie undersöker formatet för kriskommunikation genom att analysera ett nyhetsarkiv från Krisinformation.se, en svensk hemsida som är avsedd att dela information om kriser. Det primära målet är att bedöma datasetets struktur och effektivitet när det gäller att uppfylla kriterierna för Common Alerting Protocol (CAP), ett internationellt erkänt format för nödmeddelanden. I studien används kvantitativ textanalys och dataförberedande verktyg som Python och Power Query för att identifiera inkonsekvenser i det aktuella datasetformatet. Dessa anomalier begränsar datasetets användbarhet för omfattande forskning och effektiv kriskommunikation. För att ta itu med dessa frågor konstruerar studien två nya dataset med förbättrade kolumnstrukturer som åtgärdar de identifierade problemen. Dessa förfinade dataset syftar till att förbättra tydligheten och tillgängligheten av information om krishändelser, vilket ger värdefulla insikter om dessa händelsers karaktär och frekvens. Dessutom ger forskningen praktiska rekommendationer för att optimera datasetformatet så att det bättre överensstämmer med CAP-standarderna, vilket förbättrar den övergripande effektiviteten i kriskommunikationen på plattformen. Resultaten visar att strukturerade och standardiserade dataformat spelar en avgörande roll för kriskommunikation, särskilt i samband med ökande klimatrelaterade faror och andra nödsituationer. Genom att förbättra formatet på datasetet bidrar studien till effektivare dataanalys och bättre beredskap för framtida kriser. Insikterna från denna forskning är avsedda att hjälpa andra analytiker och forskare att genomföra mer robusta studier, vilket i slutändan bidrar till att utveckla mer motståndskraftiga och lyhörda strategier för kriskommunikation.
247

[en] PROPOSALS FOR THE USE OF REANALYSIS BASES FOR WIND ENERGY MODELING IN BRAZIL / [pt] PROPOSTAS DO USO DE BASES DE REANÁLISE PARA MODELAGEM DE ENERGIA EÓLICA NO BRASIL

SAULO CUSTODIO DE AQUINO FERREIRA 13 August 2024 (has links)
[pt] O Brasil sempre foi um país que teve sua matriz elétrica pautada majoritariamente em fontes renováveis, mais especificamente na hídrica. Com passar dos anos, esta tem se diversificado e demonstrado uma maior participação da fonte eólica. Para melhor explorála, pesquisas visando modelar seu comportamento são essenciais. Entretanto, não é sempre que se tem dados de velocidade do vento e de geração eólica disponíveis em quantidade e nas localidades de interesse. Esses dados são primordiais para identificar potenciais locais de instalação de parques eólicos, melhorar o desempenho dos existentes e estimular pesquisas de previsão e simulação da geração eólica que são entradas para auxiliar na melhor performance do planejamento e da operação do setor elétrico brasileiro. Na carência de dados de velocidade do vento, uma alternativa é o uso de dados vindos de base de reanálises. Elas disponibilizam longos históricos de dados de variáveis climáticas e atmosféricas para diversos pontos do globo terrestre e de forma gratuita. Desta forma, a primeira contribuição deste trabalho teve como foco a verificação da representatividade dos dados de velocidade do vento, disponibilizados pelo MERRA-2, no território brasileiro. Seguindo as recomendações da literatura, utilizou-se técnicas de interpolação, extrapolação e correção de viés para melhorar a adequação as velocidades fornecidas pela base de reanalise as que acontecem na altura dos rotores das turbinas dos parques eólicos. Em uma segunda contribuição combinou-se os dados do MERRA-2 com os de potência medidas em parques eólicos brasileiros para modelar de modo estocástico e não paramétrico a relação existente entre a velocidade e potência nas turbinas eólicas. Para isto utilizou-se as técnicas de clusterização, estimação das curvas de densidade e simulação. Por fim, em uma terceira contribuição, desenvolveu-se um aplicativo, no ambiente shiny, para disponibilizar as metodologias desenvolvidas nas duas primeiras contribuições. / [en] Brazil s energy landscape has historically relied heavily on renewable sources, notably hydropower, with wind energy emerging as a significant contributor in recent years. Understanding and harnessing the potential of wind energy necessitates robust modeling of its behavior. However, obtaining comprehensive wind speed and generation data, particularly in specific locations of interest, remains a challenge. In the absence of wind speed data, an alternative is to use data from a reanalysis database. They provide long histories of data on climatic and atmospheric variables for different parts of the world, free of charge. Therefore, the first contribution of this work focused on verifying the representativeness of wind speed data made available by MERRA-2 in Brazilian territory. Following literature recommendations, interpolation, extrapolation, and bias correction techniques were used to improve the adequacy of the speeds provided by the reanalysis based on those that occur at the height of the wind farm turbine rotors. In a second contribution, MERRA-2 data was combined with power measured in Brazilian wind farms to model in a stochastic and non-parametric way the relationship between speed and power in wind turbines. For this purpose, clustering, density curve estimation, and simulation techniques were used. Finally, the research culminates in the development of an application within the Shiny environment, offering a user-friendly platform to access and apply the methodologies devised in the preceding analyses. By making these methodologies readily accessible, the application facilitates broader engagement and utilization within the research community and industry practitioners alike.
248

Unraveling Complexity: Panoptic Segmentation in Cellular and Space Imagery

Emanuele Plebani (18403245) 03 June 2024 (has links)
<p dir="ltr">Advancements in machine learning, especially deep learning, have facilitated the creation of models capable of performing tasks previously thought impossible. This progress has opened new possibilities across diverse fields such as medical imaging and remote sensing. However, the performance of these models relies heavily on the availability of extensive labeled datasets.<br>Collecting large amounts of labeled data poses a significant financial burden, particularly in specialized fields like medical imaging and remote sensing, where annotation requires expert knowledge. To address this challenge, various methods have been developed to mitigate the necessity for labeled data or leverage information contained in unlabeled data. These encompass include self-supervised learning, few-shot learning, and semi-supervised learning. This dissertation centers on the application of semi-supervised learning in segmentation tasks.<br><br>We focus on panoptic segmentation, a task that combines semantic segmentation (assigning a class to each pixel) and instance segmentation (grouping pixels into different object instances). We choose two segmentation tasks in different domains: nerve segmentation in microscopic imaging and hyperspectral segmentation in satellite images from Mars.<br>Our study reveals that, while direct application of methods developed for natural images may yield low performance, targeted modifications or the development of robust models can provide satisfactory results, thereby unlocking new applications like machine-assisted annotation of new data.<br><br>This dissertation begins with a challenging panoptic segmentation problem in microscopic imaging, systematically exploring model architectures to improve generalization. Subsequently, it investigates how semi-supervised learning may mitigate the need for annotated data. It then moves to hyperspectral imaging, introducing a Hierarchical Bayesian model (HBM) to robustly classify single pixels. Key contributions of include developing a state-of-the-art U-Net model for nerve segmentation, improving the model's ability to segment different cellular structures, evaluating semi-supervised learning methods in the same setting, and proposing HBM for hyperspectral segmentation. <br>The dissertation also provides a dataset of labeled CRISM pixels and mineral detections, and a software toolbox implementing the full HBM pipeline, to facilitate the development of new models.</p>
249

Enhancing Fairness in Facial Recognition: Balancing Datasets and Leveraging AI-Generated Imagery for Bias Mitigation : A Study on Mitigating Ethnic and Gender Bias in Public Surveillance Systems

Abbas, Rashad, Tesfagiorgish, William Issac January 2024 (has links)
Facial recognition technology has become a ubiquitous tool in security and personal identification. However, the rise of this technology has been accompanied by concerns over inherent biases, particularly regarding ethnic and gender. This thesis examines the extent of these biases by focusing on the influence of dataset imbalances in facial recognition algorithms. We employ a structured methodological approach that integrates AI-generated images to enhance dataset diversity, with the intent to balance representation across ethnics and genders. Using the ResNet and Vgg model, we conducted a series of controlled experiments that compare the performance impacts of balanced versus imbalanced datasets. Our analysis includes the use of confusion matrices and accuracy, precision, recall and F1-score metrics to critically assess the model’s performance. The results demonstrate how tailored augmentation of training datasets can mitigate bias, leading to more equitable outcomes in facial recognition technology. We present our findings with the aim of contributing to the ongoing dialogue regarding AI fairness and propose a framework for future research in the field.
250

Instance Segmentation of Multiclass Litter and Imbalanced Dataset Handling : A Deep Learning Model Comparison / Instanssegmentering av kategoriserat skräp samt hantering av obalanserat dataset

Sievert, Rolf January 2021 (has links)
Instance segmentation has a great potential for improving the current state of littering by autonomously detecting and segmenting different categories of litter. With this information, litter could, for example, be geotagged to aid litter pickers or to give precise locational information to unmanned vehicles for autonomous litter collection. Land-based litter instance segmentation is a relatively unexplored field, and this study aims to give a comparison of the instance segmentation models Mask R-CNN and DetectoRS using the multiclass litter dataset called Trash Annotations in Context (TACO) in conjunction with the Common Objects in Context precision and recall scores. TACO is an imbalanced dataset, and therefore imbalanced data-handling is addressed, exercising a second-order relation iterative stratified split, and additionally oversampling when training Mask R-CNN. Mask R-CNN without oversampling resulted in a segmentation of 0.127 mAP, and with oversampling 0.163 mAP. DetectoRS achieved 0.167 segmentation mAP, and improves the segmentation mAP of small objects most noticeably, with a factor of at least 2, which is important within the litter domain since small objects such as cigarettes are overrepresented. In contrast, oversampling with Mask R-CNN does not seem to improve the general precision of small and medium objects, but only improves the detection of large objects. It is concluded that DetectoRS improves results compared to Mask R-CNN, as well does oversampling. However, using a dataset that cannot have an all-class representation for train, validation, and test splits, together with an iterative stratification that does not guarantee all-class representations, makes it hard for future works to do exact comparisons to this study. Results are therefore approximate considering using all categories since 12 categories are missing from the test set, where 4 of those were impossible to split into train, validation, and test set. Further image collection and annotation to mitigate the imbalance would most noticeably improve results since results depend on class-averaged values. Doing oversampling with DetectoRS would also help improve results. There is also the option to combine the two datasets TACO and MJU-Waste to enforce training of more categories.

Page generated in 0.0443 seconds