Spelling suggestions: "subject:"semanticsegmentation"" "subject:"adaptivesegmentation""
41 |
Performance enhancement of wide-range perception issues for autonomous vehiclesSharma, Suvash 13 May 2022 (has links) (PDF)
Due to the mission-critical nature of the autonomous driving application, underlying algorithms for scene understanding should be given special care during their development. Mostly, they should be designed with precise consideration of accuracy and run-time. Accuracy should be considered strictly which if compromised leads to faulty interpretation of the environment that may ultimately result in accidental scenarios. On the other hand, run-time holds an important position as the delayed understanding of the scene would hamper the real-time response of the vehicle which again leads to unforeseen accidental cases. These factors come as the functions of several factors such as the design and complexity of the algorithms, nature of the encountered objects or events in the environment, weather-induced effects, etc.
In this work, several novel scene understanding algorithms in terms- of semantic segmentation are devised. First, a transfer learning technique is proposed in order to transfer the knowledge from the data-rich domain to a data-scarce off-road driving domain for semantic segmentation such that the learned information is efficiently transferred from one domain to another while reducing run-time and increasing the accuracy. Second, the performance of several segmentation algorithms is assessed under the easy-to-severe rainy condition and two methods for achieving the robustness are proposed. Third, a new method of eradicating the rain from the input images is proposed. Since autonomous vehicles are rich in sensors and each of them has the capability of representing different types of information, it is worth fusing the information from all the possible sensors. Forth, a fusion mechanism with a novel algorithm that facilitates the use of local and non-local attention in a cross-modal scenario with RGB camera images and lidar-based images for road detection using semantic segmentation is executed and validated for different driving scenarios. Fifth, a conceptually new method of off-road driving trail representation, called Traversability, is introduced. To establish the correlation between a vehicle’s capability and the level of difficulty of the driving trail, a new dataset called CaT (CAVS Traversability) is introduced. This dataset is very helpful for future research in several off-road driving applications including military purposes, robotic navigation, etc.
|
42 |
TwinLossGAN: Domain Adaptation Learning for Semantic SegmentationSong, Yuehua 19 August 2022 (has links)
Most semantic segmentation methods based on Convolutional Neural Networks (CNNs) rely on supervised pixel-level labelling, but because pixel-level labelling is time-consuming and laborious, synthetic images are generated by software, and their label information is already embedded inside the data; therefore, labelling can be done automatically. This advantage makes synthetic datasets widely used in training deep learning models for real-world cases. Still, compared to supervised learning with real-world labelled images, the accuracy of the models trained using synthetic datasets is not high when applied to real-world data.
So, researchers have turned their interest to Unsupervised Domain Adaptation (UDA), which is mainly used to transfer knowledge learned from one domain to another. That is why we can use synthetic data to train the model. Then, the model can use what it learned to deal with real-world problems. UDA is an essential part of transfer learning. It aims to make two domain feature distributions as close as possible. In other words, UDA is mainly used to migrate the learned knowledge from one domain to another, so the knowledge and distribution learned from the source domain feature space can be migrated to the target space to improve the prediction accuracy of the target domain.
However, compared with the traditional supervised learning model, the accuracy of UDA is not high when the trained UDA is used for scene segmentation of real images. The reason for the low accuracy of UDA is that the domain gap between the source and target domains is too large. The image distribution information learned by the model from the source domain cannot be applied to the target domain, which limits the development of UDA.
Therefore we propose a new UDA model called TwinLossGAN, which will reduce the domain gap in two steps. The first step is to mix images from the source and target domains. The purpose is to allow the model to learn the features of images from both domains well. Mixing is performed by selecting a synthetic image on the source domain and then selecting a real-world image on the target domain. The two selected images are input to the segmenter to obtain semantic segmentation results separately. Then, the segmentation results are fed into the mixing module. The mixing model uses the ClassMix method to copy and paste some segmented objects from one image into another using segmented masks. Additionally, it generates inter-domain composite images and the corresponding pseudo-label. Then, in the second step, we modify a Generative Adversarial Network (GAN) to reduce the gap between domains further. The original GAN network has two main parts: generator and discriminator. In our proposed TwinLossGAN, the generator performs semantic segmentation on the source domain images and the target domain images separately. Segmentations are trained in parallel. The source domain synthetic images are segmented, and the loss is computed using synthetic labels. At the same time, the generated inter-domain composite images are fed to the segmentation module. The module compares its semantic segmentation results with the pseudo-label and calculates the loss. These calculated twin losses are used as generator loss for the GAN cycle for iterations. The GAN discriminator examines whether the semantic segmentation results originate from the source or target domain.
The premise was that we retrieved data from GTA5 and SYNTHIA as the source domain data and images from CityScapes as the target domain data. The result was that the accuracy indicated by the TwinLossGAN that we proposed was much higher than the base UDA models.
|
43 |
Locating power lines in satellite images with semantic segmentationLundman, Erik January 2022 (has links)
The inspection of power lines is an important process to maintain a stable electrical infrastructure. Simultaneously it is very time consuming task considering there are 164 000 km of power lines in Sweden alone. A cheaper and more sustainable approach is an automatic inspection with drones. But for a successful inspection with drones, exact power line coordinates is needed, which is not always available. To identify power lines in satellite images a machine learning approach was implemented. In machine learning, semantic segmentation is the process of pixel-wise classification of an image. Where you not only label the entire image, but every pixel individually. This way not only the existence of a power line will be identified, but their position inside the image. This thesis aims to investigate if semantic segmentation is an effective approach to locate power lines in satellite images. And what methods can be used on the segmented output data to extract linestring coordinates representing the power line. Linear regression and a polygon centerline extraction method was implemented on the segmented output data to define a line that represents the true location of the power line. The semantic segmentation model could find power lines where they were clearly visible, but struggled where they were not very visible. From good output data from the segmentation model, the linear regression and the polygon centerline extraction methods could successfully extract linestring coordinates that represented the true location of the power line. In the best case around 67% of power lines was correctly identified. But still, with good output data from the model, complex shapes such as intersections might still get bad results. Even if the approach needs further work, and cannot reliably identify all power lines in the current state, it has proven that this could be a promising method to identify power lines in satellite images.
|
44 |
Adapting multiple datasets for better mammography tumor detection / Anpassa flera dataset för bättre mammografi-tumördetektionTao, Wang January 2018 (has links)
In Sweden, women of age between of 40 and 74 go through regular screening of their breasts every 18-24 months. The screening mainly involves obtaining a mammogram and having radiologists analyze them to detect any sign of breast cancer. However reading a mammography image requires experienced radiologist, and the lack of radiologist reduces the hospital's operating efficiency. What's more, mammography from different facilities increases the difficulty of diagnosis. Our work proposed a deep learning segmentation system which could adapt to mammography from various facilities and locate the position of the tumor. We train and test our method on two public mammography datasets and do several experiments to find the best parameter setting for our system. The test segmentation results suggest that our system could play as an auxiliary diagnosis tool for breast cancer diagnosis and improves diagnostic accuracy and efficiency. / I Sverige går kvinnor i åldrarna mellan 40 och 74 igenom regelbunden screening av sina bröst med 18-24 månaders mellanrum. Screeningen innbär huvudsakligen att ta mammogram och att låta radiologer analysera dem för att upptäcka tecken på bröstcancer. Emellertid krävs det en erfaren radiolog för att tyda en mammografibild, och bristen på radiologer reducerar sjukhusets operativa effektivitet. Dessutom, att mammografin kommer från olika anläggningar ökar svårigheten att diagnostisera. Vårt arbete föreslår ett djuplärande segmenteringssystem som kan anpassa sig till mammografi från olika anläggningar och lokalisera tumörens position. Vi tränar och testar vår metod på två offentliga mammografidataset och gör flera experiment för att hitta den bästa parameterinställningen för vårt system. Testsegmenteringsresultaten tyder på att vårt system kan fungera som ett hjälpdiagnosverktyg vid diagnos av bröstcancer och förbättra diagnostisk noggrannhet och effektivitet.
|
45 |
Integration of Continual Learning and Semantic Segmentation in a vision system for mobile roboticsEcheverry Valencia, Cristian David January 2023 (has links)
Over the last decade, the integration of robots into various applications has seen significant advancements fueled by Machine Learning (ML) algorithms, particularly in autonomous and independent operations. While robots have become increasingly proficient in various tasks, object instance recognition, a fundamental component of real-world robotic interactions, has witnessed remarkable improvements in accuracy and robustness. Nevertheless, most existing approaches heavily rely on prior information, limiting their adaptability in unfamiliar environments. To address this constraint, this thesis introduces the Segment and Learn Semantics (SaLS) framework, which combines video object segmentation with Continual Learning (CL) methods to enable semantic understanding in robotic applications. The research focuses on the potential application of SaLS in mobile robotics, with specific emphasis on the TORO robot developed at the Deutsches Zentrum für Luft- und Raumfahrt (DLR). Evaluation of the proposed method is conducted using a diverse dataset comprising various terrains and objects encountered by the TORO robot during its walking sessions. The results demonstrate the effectiveness of SaLS in classifying both known and previously unseen objects, achieving an average accuracy of 78.86% and 70.78% in the CL experiments. When running the whole method in the image sequences collected with TORO, the accuracy scores were of 75.54% and 84.75%, for known and unknown objects respectively. Notably, SaLS exhibited resilience against catastrophic forgetting, with only minor accuracy decreases observed in specific cases. Computational resource usage was also explored, indicating that the method is feasible for practical mobile robotic systems, with GPU memory usage being a potential limiting factor. In conclusion, the SaLS framework represents a significant step forward in enabling robots to autonomously understand and interact with their surroundings. This research contributes to the ongoing development of robotic systems that can operate effectively in unstructured environments, paving the way for more versatile and capable autonomous robots.
|
46 |
Multi-spectral Fusion for Semantic Segmentation NetworksEdwards, Justin 05 1900 (has links)
Indiana University-Purdue University Indianapolis (IUPUI) / Semantic segmentation is a machine learning task that is seeing increased utilization
in multiples fields, from medical imagery, to land demarcation, and autonomous vehicles.
Semantic segmentation performs the pixel-wise classification of images, creating a new, seg-
mented representation of the input that can be useful for detected various terrain and objects
within and image. Recently, convolutional neural networks have been heavily utilized when
creating neural networks tackling the semantic segmentation task. This is particularly true
in the field of autonomous driving systems.
The requirements of automated driver assistance systems (ADAS) drive semantic seg-
mentation models targeted for deployment on ADAS to be lightweight while maintaining
accuracy. A commonly used method to increase accuracy in the autonomous vehicle field is
to fuse multiple sensory modalities. This research focuses on leveraging the fusion of long
wave infrared (LWIR) imagery with visual spectrum imagery to fill in the inherent perfor-
mance gaps when using visual imagery alone. This comes with a host of benefits, such as
increase performance in various lighting conditions and adverse environmental conditions.
Utilizing this fusion technique is an effective method of increasing the accuracy of a semantic
segmentation model. Being a lightweight architecture is key for successful deployment on
ADAS, as these systems often have resource constraints and need to operate in real-time.
Multi-Spectral Fusion Network (MFNet) [1] accomplishes these parameters by leveraging
a sensory fusion approach, and as such was selected as the baseline architecture for this
research.
Many improvements were made upon the baseline architecture by leveraging a variety
of techniques. Such improvements include the proposal of a novel loss function categori-
cal cross-entropy dice loss, introduction of squeeze and excitation (SE) blocks, addition of
pyramid pooling, a new fusion technique, and drop input data augmentation. These improve-
ments culminated in the creation of the Fast Thermal Fusion Network (FTFNet). Further
improvements were made by introducing depthwise separable convolutional layers leading to
lightweight FTFNet variants, FTFNet Lite 1 & 2.
13
The FTFNet family was trained on the Multi-Spectral Road Scenarios (MSRS) and MIL-
Coaxials visual/LWIR datasets. The proposed modifications lead to an improvement over
the baseline in mean intersection over union (mIoU) of 2.92% and 2.03% for FTFNet and
FTFNet Lite 2 respectively when trained on the MSRS dataset. Additionally, when trained
on the MIL-Coaxials dataset, the FTFNet family showed improvements in mIoU of 8.69%,
4.4%, and 5.0% for FTFNet, FTFNet Lite 1, and FTFNet Lite 2.
|
47 |
Sequential Semantic Segmentation of Streaming Scenes for Autonomous DrivingCheng, Guo 12 1900 (has links)
Indiana University-Purdue University Indianapolis (IUPUI) / In traffic scene perception for autonomous vehicles, driving videos are available from
in-car sensors such as camera and LiDAR for road detection and collision avoidance. There are some existing challenges in computer vision tasks for video processing, including object detection and tracking, semantic segmentation, etc. First, due to that consecutive video frames have a large data redundancy, traditional spatial-to-temporal approach inherently demands huge computational resource. Second, in many real-time scenarios, targets move continuously in the view as data streamed in. To achieve prompt response with minimum latency, an online model to process the streaming data in shift-mode is necessary. Third, in addition to shape-based recognition in spatial space, motion detection also replies on the inherent temporal continuity in videos. While current works either lack long-term memory for reference or consume a huge amount of computation.
The purpose of this work is to achieve strongly temporal-associated sensing results in
real-time with minimum memory, which is continually embedded to a pragmatic framework
for speed and path planning. It takes a temporal-to-spatial approach to cope with fast
moving vehicles in autonomous navigation. It utilizes compact road profiles (RP) and motion profiles (MP) to identify path regions and dynamic objects, which drastically reduces video data to a lower dimension and increases sensing rate. Specifically, we sample one-pixel line at each video frame, the temporal congregation of lines from consecutive frames forms a road profile image; while motion profile consists of the average lines by sampling one-belt pixels at each frame. By applying the dense temporal resolution to compensate the sparse spatial resolution, this method reduces 3D streaming data into 2D image layout. Based on RP and MP under various weather conditions, there have three main tasks being conducted to contribute the knowledge domain in perception and planning for autonomous driving.
The first application is semantic segmentation of temporal-to-spatial streaming scenes,
including recognition of road and roadside, driving events, objects in static or motion. Since the main vision sensing tasks for autonomous driving are identifying road area to follow and locating traffic to avoid collision, this work tackles this problem by using semantic segmentation upon road and motion profiles. Though one-pixel line may not contain sufficient spatial information of road and objects, the consecutive collection of lines as a temporal-spatial image provides intrinsic spatial layout because of the continuous observation and smooth vehicle motion. Moreover, by capturing the trajectory of pedestrians upon their moving legs in motion profile, we can robustly distinguish pedestrian in motion against smooth background. The experimental results of streaming data collected from various sensors including camera and LiDAR demonstrate that, in the reduced temporal-to-spatial space, an effective recognition of driving scene can be learned through Semantic Segmentation.
The second contribution of this work is that it accommodates standard semantic segmentation to sequential semantic segmentation network (SE3), which is implemented as a new benchmark for image and video segmentation. As most state-of-the-art methods are greedy for accuracy by designing complex structures at expense of memory use, which makes trained models heavily depend on GPUs and thus not applicable to real-time inference. Without accuracy loss, this work enables image segmentation at the minimum memory. Specifically, instead of predicting for image patch, SE3 generates output along with line scanning. By pinpointing the memory associated with the input line at each neural layer in the network, it preserves the same receptive field as patch size but saved the computation in the overlapped regions during network shifting. Generally, SE3 applies to most of the current backbone models in image segmentation, and furthers the inference by fusing temporal information without increasing computation complexity for video semantic segmentation. Thus, it achieves 3D association over long-range while under the computation of 2D setting. This will facilitate inference of semantic segmentation on light-weighted devices.
The third application is speed and path planning based on the sensing results from
naturalistic driving videos. To avoid collision in a close range and navigate a vehicle in
middle and far ranges, several RP/MPs are scanned continuously from different depths for
vehicle path planning. The semantic segmentation of RP/MP is further extended to multi-depths for path and speed planning according to the sensed headway and lane position. We conduct experiments on profiles of different sensing depths and build up a smoothly planning framework according to their them. We also build an initial dataset of road and motion profiles with semantic labels from long HD driving videos. The dataset is published as additional contribution to the future work in computer vision and autonomous driving.
|
48 |
Pixel-level video understanding with efficient deep modelsHu, Ping 02 February 2024 (has links)
The ability to understand videos at the level of pixels plays a key role in a wide range of computer vision applications. For example, a robot or autonomous vehicle relies on classifying each pixel in the video stream into semantic categories to holistically understand the surrounding environment, and video editing software needs to exploit the spatiotemporal context of video pixels to generate various visual effects. Despite the great progress of Deep Learning (DL) techniques, applying DL-based vision models to process video pixels remains practically challenging, due to the high volume of video data and the compute-intensive design of DL approaches. In this thesis, we aim to design efficient and robust deep models for pixel-level video understanding of high-level semantics, mid-level grouping, and low-level interpolation.
Toward this goal, in Part I, we address the semantic analysis of video pixels with the task of Video Semantic Segmentation (VSS), which aims to assign pixel-level semantic labels to video frames. We introduce methods that utilize temporal redundancy and context to efficiently recognize video pixels without sacrificing performance. Extensive experiments on various datasets demonstrate our methods' effectiveness and efficiency on both common GPUs and edge devices. Then, in Part II, we show that pixel-level motion patterns help to differentiate video objects from their background. In particular, we propose a fast and efficient contour-based algorithm to group and separate motion patterns for video objects. Furthermore, we present learning-based models to solve the tracking of objects across frames. We show that by explicitly separating the object segmentation and object tracking problems, our framework achieves efficiency during both training and inference. Finally, in Part III, we study the temporal interpolation of pixels given their spatial-temporal context. We show that intermediate video frames can be inferred via interpolation in a very efficient way, by introducing the many-to-many splatting framework that can quickly warp and fuse pixels at any number of arbitrary intermediate time steps. We also propose a dynamic refinement mechanism to further improve the interpolation quality by reducing redundant computation. Evaluation on various types of datasets shows that our method can interpolate videos with state-of-the-art quality and efficiency.
To summarize, we discuss and propose efficient pipelines for pixel-level video understanding tasks across high-level semantics, mid-level grouping, and low-level interpolation. The proposed models can contribute to tackling a wide range of real-world video perception and understanding problems in future research.
|
49 |
Automatic Semantic Segmentation of Indoor DatasetsRachakonda, Sai Swaroop January 2024 (has links)
Background: In recent years, computer vision has undergone significant advancements, revolutionizing fields such as robotics, augmented reality, and autonomoussystems. Key to this transformation is Simultaneous Localization and Mapping(SLAM), a fundamental technology that allows machines to navigate and interactintelligently with their surroundings. Challenges persist in harmonizing spatial andsemantic understanding, as conventional methods often treat these tasks separately,limiting comprehensive evaluations with shared datasets. As applications continueto evolve, the demand for accurate and efficient image segmentation ground truthbecomes paramount. Manual annotation, a traditional approach, proves to be bothcostly and resource-intensive, hindering the scalability of computer vision systems.This thesis addresses the urgent need for a cost-effective and scalable solution byfocusing on the creation of accurate and efficient image segmentation ground truth,bridging the gap between spatial and semantic tasks. Objective: This thesis addresses the challenge of creating an efficient image segmentation ground truth to complement datasets with spatial ground truth. Theprimary objective is to reduce the time and effort taken for annotation of datasets. Method: Our methodology adopts a systematic approach to evaluate and combineexisting annotation techniques, focusing on precise object detection and robust segmentation. By merging these approaches, we aim to enhance annotation accuracywhile streamlining the annotation process. This approach is systematically appliedand evaluated across multiple datasets, including the NYU V2 dataset(consists ofover 1449 images), ARID(real-world sequential dataset), and Italian flats(sequentialdataset created in blender). Results: The developed pipeline demonstrates promising outcomes, showcasing asubstantial reduction in annotation time compared to manual annotation, thereby addressing the challenges posed by the cost and resource intensiveness of the traditionalapproach. We observe that although not initially optimized for SLAM datasets, thepipeline performs exceptionally well on both ARID and Italian flats datasets, highlighting its adaptability to real-world scenarios. Conclusion: In conclusion, this research introduces an innovative annotation pipeline,offering a systematic and efficient approach to annotation. It tries to bridge the gapbetween spatial and semantic tasks, addressing the pressing need for comprehensiveannotation tools in this domain.
|
50 |
Semantic Segmentation of Building Materials in Real World Images Using 3D Information / Semantisk segmentering av byggnadsmaterial i verkliga världen med hjälp av 3D informationRydgård, Jonas, Bejgrowicz, Marcus January 2021 (has links)
The increasing popularity of drones has made it convenient to capture a large number of images of a property, which can then be used to build a 3D model. The conditions of buildings can be analyzed to plan renovations. This creates an interest for automatically identifying building materials, a task well suited for machine learning. With access to drone imagery of buildings as well as depth maps and normal maps, we created a dataset for semantic segmentation. Two different convolutional neural networks were trained and evaluated, to see how well they perform material segmentation. DeepLabv3+, which uses RGB data, was compared to Depth-Aware CNN, which uses RGB-D data. Our experiments showed that DeepLabv3+ achieved higher mean intersection over union. To investigate if the information in the depth maps and normal maps could give a performance boost, we conducted experiments with an encoding we call HMN - horizontal disparity, magnitude of normal with ground, normal parallel with gravity. This three channel encoding was used to jointly train two CNNs, one with RGB and one with HMN, and then sum their predictions. This led to improved results for both DeepLabv3+ and Depth-Aware CNN. / Den ökade populariteten av drönare har gjort det smidigt att ta ett stort antal bilder av en fastighet, och sedan skapa en 3D-modell. Skicket hos en byggnad kan enkelt analyseras och renoveringar planeras. Det är då av intresse att automatiskt kunna identifiera byggnadsmaterial, en uppgift som lämpar sig väl för maskininlärning. Med tillgång till såväl drönarbilder av byggnader som djupkartor och normalkartor har vi skapat ett dataset för semantisk segmentering. Två olika faltande neuronnät har tränats och utvärderats för att se hur väl de fungerar för materialigenkänning. DeepLabv3+ som använder sig av RGB-data har jämförts med Depth-Aware CNN som använder RGB-D-data och våra experiment visar att DeepLabv3+ får högre mean intersection over union. För att undersöka om resultaten kan förbättras med hjälp av datat i djupkartor och normalkartor har vi kodat samman informationen till vad vi valt att benämna HMN - horisontell disparitet, magnitud av normalen parallell med marken, normal i gravitationsriktningen. Denna trekanalsinput kan användas för att träna ett extra CNN samtidigt som man tränar med RGB-bilder, och sedan summera båda predikteringarna. Våra experiment visar att detta leder till bättre segmenteringar för både DeepLabv3+ och Depth-Aware CNN.
|
Page generated in 0.0999 seconds