Spelling suggestions: "subject:"mass rmcnn"" "subject:"mass recnn""
1 |
Cascade Mask R-CNN and Keypoint Detection used in Floorplan ParsingEklund, Anton January 2020 (has links)
Parsing floorplans have been a problem in automatic document analysis for long and have up until recent years been approached with algorithmic methods. With the rise of convolutional neural networks (CNN), this problem too has seen an upswing in performance. In this thesis the task is to recover, as accurately as possible, spatial and geometric information from floorplans. This project builds around instance segmentation models like Cascade Mask R-CNN to extract the bulk of information from a floorplan image. To complement the segmentation, a new style of using keypoint-CNN is presented to find precise locations of corners. These are then combined in a post-processing step to give the resulting segmentation. The resulting segmentation scores exceed the current baseline of the CubiCasa5k floorplan dataset with a mean IoU of 72.7% compared to 57.5%. Further, the mean IoU for individual classes is also improved for almost every class. It is also shown that Cascade Mask R-CNN is better suited than Mask R-CNN for this task.
|
2 |
News article segmentation using multimodal input : Using Mask R-CNN and sentence transformers / Artikelsegmentering med multimodala artificiella neuronnätverk : Med hjälp av Mask R-CNN och sentence transformersHenning, Gustav January 2022 (has links)
In this century and the last, serious efforts have been made to digitize the content housed by libraries across the world. In order to open up these volumes to content-based information retrieval, independent elements such as headlines, body text, bylines, images and captions ideally need to be connected semantically as article-level units. To query on facets such as author, section, content type or other metadata, further processing of these documents is required. Even though humans have shown exceptional ability to segment different types of elements into related components, even in languages foreign to them, this task has proven difficult for computers. The challenge of semantic segmentation in newspapers lies in the diversity of the medium: Newspapers have vastly different layouts, covering diverse content, from news articles to ads to weather reports. State-of-the-art object detection and segmentation models have been trained to detect and segment real-world objects. It is not clear whether these architectures can perform equally well when applied to scanned images of printed text. In the domain of newspapers, in addition to the images themselves, we have access to textual information through Optical Character Recognition. The recent progress made in the field of instance segmentation of real-world objects using deep learning techniques begs the question: Can the same methodology be applied in the domain of newspaper articles? In this thesis we investigate one possible approach to encode the textual signal into the image in an attempt to improve performance. Based on newspapers from the National Library of Sweden, we investigate the predictive power of visual and textual features and their capacity to generalize across different typographic designs. Results show impressive mean Average Precision scores (>0:9) for test sets sampled from the same newspaper designs as the training data when using only the image modality. / I detta och det förra århundradet har kraftiga åtaganden gjorts för att digitalisera traditionellt medieinnehåll som tidigare endast tryckts i pappersformat. För att kunna stödja sökningar och fasetter i detta innehåll krävs bearbetning påsemantisk nivå, det vill säga att innehållet styckas upp påartikelnivå, istället för per sida. Trots att människor har lätt att dela upp innehåll påsemantisk nivå, även påett främmande språk, fortsätter arbetet för automatisering av denna uppgift. Utmaningen i att segmentera nyhetsartiklar återfinns i mångfalden av utseende och format. Innehållet är även detta mångfaldigt, där man återfinner allt ifrån faktamässiga artiklar, till debatter, listor av fakta och upplysningar, reklam och väder bland annat. Stora framsteg har gjorts inom djupinlärning just för objektdetektering och semantisk segmentering bara de senaste årtiondet. Frågan vi ställer oss är: Kan samma metodik appliceras inom domänen nyhetsartiklar? Dessa modeller är skapta för att klassificera världsliga ting. I denna domän har vi tillgång till texten och dess koordinater via en potentiellt bristfällig optisk teckenigenkänning. Vi undersöker ett sätt att utnyttja denna textinformation i ett försök att förbättra resultatet i denna specifika domän. Baserat pådata från Kungliga Biblioteket undersöker vi hur väl denna metod lämpar sig för uppstyckandet av innehåll i tidningar längsmed tidsperioder där designen förändrar sig markant. Resultaten visar att Mask R-CNN lämpar sig väl för användning inom domänen nyhetsartikelsegmentering, även utan texten som input till modellen.
|
3 |
Using Mask R-CNN for Instance Segmentation of Eyeglass Lenses / Användning av Mask R-CNN för instanssegmentering av glasögonlinserNorrman, Marcus, Shihab, Saad January 2021 (has links)
This thesis investigates the performance of Mask R-CNN when utilizing transfer learning on a small dataset. The aim was to instance segment eyeglass lenses as accurately as possible from self-portrait images. Five different models were trained, where the key difference was the types of eyeglasses the models were trained on. The eyeglasses were grouped into three types, fully rimmed, semi-rimless, and rimless glasses. 1550 images were used for training, validation, and testing. The model's performances were evaluated using TensorBoard training data and mean Intersection over Union scores (mIoU). No major differences in performance were found in four of the models, which grouped all three types of glasses into one class. Their mIoU scores range from 0.913 to 0.94 whereas the model with one class for each group of glasses, performed worse, with a mIoU of 0.85. The thesis revealed that one can achieve great instance segmentation results using a limited dataset when taking advantage of transfer learning. / Denna uppsats undersöker prestandan för Mask R-CNN vid användning av överföringsinlärning på en liten datamängd. Syftet med arbetet var att segmentera glasögonlinser så exakt som möjligt från självporträttbilder. Fem olika modeller tränades, där den viktigaste skillnaden var de typer av glasögon som modellerna tränades på. Glasögonen delades in i 3 typer, helbåge, halvbåge och båglösa. Totalt samlades 1550 träningsbilder in, dessa annoterades och användes för att träna modellerna. Modellens prestanda utvärderades med TensorBoard träningsdata samt genomsnittlig Intersection over Union (IoU). Inga större skillnader i prestanda hittades mellan modellerna som endast tränades på en klass av glasögon. Deras genomsnittliga IoU varierar mellan 0,913 och 0,94. Modellen där varje glasögonkategori representerades som en unik klass, presterade sämre med en genomsnittlig IoU på 0,85. Resultatet av uppsatsen påvisar att goda instanssegmenteringsresultat går att uppnå med hjälp av en begränsad datamängd om överföringsinlärning används.
|
4 |
Scene Recognition for Safety Analysis in Collaborative RoboticsWang, Shaolei January 2018 (has links)
In modern industrial environments, human-robot collaboration is a trend in automation to improve performance and productivity. Instead of isolating robot from human to guarantee safety, collaborative robotics allows human and robot working in the same area at the same time. New hazards and risks, such as the collision between robot and human, arise in this situation. Safety analysis is necessary to protect both human and robot when using a collaborative robot.To perform safety analysis, robots need to perceive the surrounding environment in realtime. This surrounding environment is perceived and stored in the form of scene graph, which is a direct graph with semantic representation of the environment, the relationship between the detected objects and properties of these objects. In order to generate the scene graph, a simulated warehouse is used: robots and humans work in a common area for transferring products between shelves and conveyor belts. Each robot generates its own scene graph from the attached camera sensor. In the graph, each detected object is represented by a node and edges are used to denote the relationship among the identified objects. The graph node includes values like velocity, bounding box sizes, orientation, distance and directions between the object and the robot.We generate scene graph in a simulated warehouse scenario with the frequency of 7 Hz and present a study of Mask R-CNN based on the qualitative comparison. Mask R-CNN is a method for object instance segmentation to get the properties of the objects. It uses ResNetFPN for feature extraction and adds a branch to Faster R-CNN for predicting segmentation mask for each object. And its results outperform almost all existing, single-model entries on instance segmentation and bounding-box object detection. With the help of this method, the boundaries of the detected object are extracted from the camera images. We initialize Mask R-CNN model using three different types of weights: COCO pre-trained weight, ImageNet pre-trained weight and random weight, and the results of these three different weights are compared w.r.t. precision and recall.Results showed that Mask R-CNN is also suitable for simulated environments and can meet requirements in both detection precision and speed. Moreover, the model trained used the COCO pre-trained weight outperformed the model with ImageNet and randomly assigned initial weights. The calculated Mean Average Precision (mAP) value for validation dataset reaches 0.949 with COCO pre-trained weights and execution speed of 11.35 fps. / I modern industriella miljöer, för att förbättra prestanda och produktivitet i automatisering är human-robot samarbete en trend. Istället för att isolera roboten från människan för att garantera säkerheten, möjliggör samarbets robotar att man och robot arbetar i samma område samtidigt. Nya risker, såsom kollisionen mellan robot och människa, uppstår i denna situation. Säkerhetsanalys är nödvändig för att skydda både människa och robot när man använder en samarbets robot.För att utföra säkerhetsanalys måste robotar uppfatta omgivningen i realtid. Denna omgivande miljö uppfattas och lagras i form av scen graf, som är ett direkt diagram med semantisk representation av miljön, samt förhållandet mellan de detekterade objekten och egenskaperna hos dessa objekt. För att skapa scen grafen används ett simulerat lager: robotar och människor arbetar i ett gemensamt område för överföring av produkter mellan hyllor och transportband. Varje robot genererar sin egen scen grafik från den medföljande kamerasensorn. I diagrammet presenteras varje detekterat objekt av en nod och kanterna används för att beteckna förhållandet mellan de identifierade objekten. Diagram noden innehåller värden som hastighet, gränsvärde, orientering, avstånd och riktningar mellan objektet och roboten.Vi genererar scen graf i ett simulerat lager scenario med frekvensen 7 Hz och presenterar en studie av Mask R-CNN baserat på den kvalitativa jämförelsen. Mask R-CNN är ett sätt att segmentera objekt exempel för att få objektens egenskaper. Det använder ResNetFPN för funktion extraktion och lägger till en gren till Snabbare R-CNN för att förutsäga segmenterings mask för varje objekt. Och dess resultat överträffar nästan alla befintliga, enkel modell poster, till exempel segmentering och avgränsning av objektiv detektering. Med hjälp av denna metod extraheras kanterna för det detekterade objektet från kamerabilderna. Vi initierar Mask R-CNN-modellen med tre olika typer av vikter: COCO-utbildade vikter, ImageNet-tränade vikter och slumpmässiga vikter, och resultaten av dessa tre olika vikter jämförs med avseende på precision och återkallelse.Resultaten visade att Mask R-CNN också är lämplig för simulerade miljöer och kan uppfylla kraven i både detekterings precision och hastighet. Dessutom använde den utbildade modellen de COCO-tränade vikterna överträffat modellen med slumpmässigt tilldelade initial vikter. Det beräknade medelvärdet för precision (mAP) för validerings dataset når 0.949 med COCO-pre-utbildade vikter och körhastighet på 11.35 fps.
|
5 |
AI-based autonomous forest stand generationSaveh, Diana January 2021 (has links)
In recent years, the tech is moving towards a more automized and smarter software. To achieve smarter software the implementation of AI is a step towards that goal. The forest industry needs to become more automized and decrease the manual labor. Decreasing manual labor will both have a positive impact on both the cost and the environment. After doing a literature study the conclusion was to use Mask R-CNN to be able to make the AI learn about the pattern of the different stands. The different stands were extracted and masked for the Mask R-CNN. First there was a comparison between the usage of a computer versus Google Colab, and the results show that Google Colab did deliver the results a little faster than on the computer. Using a smaller area with fewer stands gave a better result and decreased the risk of the algorithm crashing. Using 42 areas with about 10 stands in each gave better results than using one big area with 3248 stands. Using 42 areas gave the result of an average IoU of 42%. Comparing this to 6 areas with about 10 stands each gave the result of 28% IoU. The result of increasing the data split to 70/30 did gave the best IoU with the value of 47%.
|
6 |
Automated Building Extraction from Aerial Imagery with Mask R-CNNZilong Yang (9750833) 14 December 2020 (has links)
<p>Buildings are one of the fundamental sources of geospatial information for urban planning, population estimation, and infrastructure management. Although building extraction research has gained considerable progress through neural network methods, the labeling of training data still requires manual operations which are time-consuming and labor-intensive. Aiming to improve this process, this thesis developed an automated building extraction method based on the boundary following technique and the Mask Regional Convolutional Neural Network (Mask R-CNN) model. First, assisted by known building footprints, a boundary following method was used to automatically best label the training image datasets. In the next step, the Mask R-CNN model was trained with the labeling results and then applied to building extraction. Experiments with datasets of urban areas of Bloomington and Indianapolis with 2016 high resolution aerial images verified the effectiveness of the proposed approach. With the help of existing building footprints, the automatic labeling process took only five seconds for a 500*500 pixel image without human interaction. A 0.951 intersection over union (IoU) between the labeled mask and the ground truth was achieved due to the high quality of the automatic labeling step. In the training process, the Resnet50 network and the feature pyramid network (FPN) were adopted for feature extraction. The region proposal network (RPN) then was trained end-to-end to create region proposals. The performance of the proposed approach was evaluated in terms of building detection and mask segmentation in the two datasets. The building detection results of 40 test tiles respectively in Bloomington and Indianapolis showed that the Mask R-CNN model achieved 0.951 and 0.968 F1-scores. In addition, 84.2% of the newly built buildings in the Indianapolis dataset were successfully detected. According to the segmentation results on these two datasets, the Mask R-CNN model achieved the mean pixel accuracy (MPA) of 92% and 88%, respectively for Bloomington and Indianapolis. It was found that the performance of the mask segmentation and contour extraction became less satisfactory as the building shapes and roofs became more complex. It is expected that the method developed in this thesis can be adapted for large-scale use under varying urban setups.</p>
|
7 |
AI-based Quality Inspection forShort-Series Production : Using synthetic dataset to perform instance segmentation forquality inspection / AI-baserad kvalitetsinspektion för kortserieproduktion : Användning av syntetiska dataset för att utföra instans segmentering förkvalitetsinspektionRussom, Simon Tsehaie January 2022 (has links)
Quality inspection is an essential part of almost any industrial production line. However, designing customized solutions for defect detection for every product can be costlyfor the production line. This is especially the case for short-series production, where theproduction time is limited. That is because collecting and manually annotating the training data takes time. Therefore, a possible method for defect detection using only synthetictraining data focused on geometrical defects is proposed in this thesis work. The methodis partially inspired by previous related work. The proposed method makes use of aninstance segmentation model and pose-estimator. However, this thesis work focuses onthe instance segmentation part while using a pre-trained pose-estimator for demonstrationpurposes. The synthetic data was automatically generated using different data augmentation techniques from a 3D model of a given object. Moreover, Mask R-CNN was primarilyused as the instance segmentation model and was compared with a rival model, HTC. Thetrials show promising results in developing a trainable general-purpose defect detectionpipeline using only synthetic data
|
8 |
Evaluate Machine Learning Model to Better Understand Cutting in WoodAnam, Md Tahseen January 2021 (has links)
Wood cutting properties for the chains of chainsaw is measured in the lab by analyzing the force, torque, consumed power and other aspects of the chain as it cuts through the wood log. One of the essential properties of the chains is the cutting efficiency which is the measured cutting surface per the power used for cutting per the time unit. These data are not available beforehand and therefore, cutting efficiency cannot be measured before performing the cut. Cutting efficiency is related to the relativehardness of the wood which means that it is affected by the existence of knots (hardstructure areas) and cracks (no material areas). The actual situation is that all the cuts with knots and cracks are eliminated and just the clean cuts are used, therefore estimating the relative wood hardness by identifying the knots and cracks beforehand can significantly help to automate the process of testing the chain properties, saving time and material and give a better understanding of cutting wood logs to improve chains quality.Many studies have been done to develop methods to analyze and measure different features of an end face. This thesis work is carried out to evaluate a machinelearning model to detect knots and cracks on end faces and to understand their impact on the average cutting efficiency. Mask R-CNN is widely used for instance segmentation and in this thesis work, Mask R-CNN is evaluated to detect and segment knots and cracks on an end face. Methods are also developed to estimatepith’s vertical position from the wood image and generate average cutting efficiency graph based on knot’s and crack’s percentage at each vertical position of wood image.
|
9 |
Segmentace obrazových dat využitím hlubokých neuronových sítí / Image data segmentation using deep neural networksHrdý, Martin January 2021 (has links)
The main aim of this master’s thesis is to get acquainted with the theory of the current segmentation methods, that use deep learning. Segmentation neural network that will be capable of segmenting individual instances of the objects will be proposed and created based on theoretical knowledge. The main focus of the segmentation neural network will be segmentation of electronic components from printed circuit boards.
|
10 |
Application of Deep-learning Method to Surface Anomaly Detection / Tillämpning av djupinlärningsmetoder för detektering av ytanomalierLe, Jiahui January 2021 (has links)
In traditional industrial manufacturing, due to the limitations of science and technology, manual inspection methods are still used to detect product surface defects. This method is slow and inefficient due to manual limitations and backward technology. The aim of this thesis is to research whether it is possible to automate this using modern computer hardware and image classification of defects using different deep learning methods. The report concludes, based on results from controlled experiments, that it is possible to achieve a dice coefficient of more than 81%.
|
Page generated in 0.0496 seconds