Global ETD Search

31	The "What"-"Where" Network: A Tool for One-Shot Image Recognition and Localization Hurlburt, Daniel 06 January 2021 (has links) One common shortcoming of modern computer vision is the inability of most models to generalize to new classes—one/few shot image recognition. We propose a new problem formulation for this task and present a network architecture and training methodology to solve this task. Further, we provide insights into how careful focus on how not just the data, but the way data presented to the model can have significant impact on performance. Using these method, we achieve high accuracy in few-shot image recognition tasks. computer vision semantic segmentation few-shot learning one-shot learning embedding Physical Sciences and Mathematics
32	Scene Recognition and Collision Avoidance System for Robotic Combine Harvesters Based on Deep Learning / 深層学習に基づくロボットコンバインハーベスタのためのシーン認識および衝突回避システム Li, Yang 23 September 2020 (has links) 京都大学 / 0048 / 新制・課程博士 / 博士(農学) / 甲第22784号 / 農博第2427号 / 新制\|\|農\|\|1081(附属図書館) / 学位論文\|\|R2\|\|N5304(農学部図書室) / 京都大学大学院農学研究科地域環境科学専攻 / (主査)教授飯田訓久, 教授近藤直, 教授中嶋洋 / 学位規則第4条第1項該当 / Doctor of Agricultural Science / Kyoto University / DFAM Robotic combine harvesters Collision avoidance Deep learning Scene recognition Collision avoidance Semantic segmentation 610
33	Locating power lines in satellite images with semantic segmentation Lundman, Erik January 2022 (has links) The inspection of power lines is an important process to maintain a stable electrical infrastructure. Simultaneously it is very time consuming task considering there are 164 000 km of power lines in Sweden alone. A cheaper and more sustainable approach is an automatic inspection with drones. But for a successful inspection with drones, exact power line coordinates is needed, which is not always available. To identify power lines in satellite images a machine learning approach was implemented. In machine learning, semantic segmentation is the process of pixel-wise classification of an image. Where you not only label the entire image, but every pixel individually. This way not only the existence of a power line will be identified, but their position inside the image. This thesis aims to investigate if semantic segmentation is an effective approach to locate power lines in satellite images. And what methods can be used on the segmented output data to extract linestring coordinates representing the power line. Linear regression and a polygon centerline extraction method was implemented on the segmented output data to define a line that represents the true location of the power line. The semantic segmentation model could find power lines where they were clearly visible, but struggled where they were not very visible. From good output data from the segmentation model, the linear regression and the polygon centerline extraction methods could successfully extract linestring coordinates that represented the true location of the power line. In the best case around 67% of power lines was correctly identified. But still, with good output data from the model, complex shapes such as intersections might still get bad results. Even if the approach needs further work, and cannot reliably identify all power lines in the current state, it has proven that this could be a promising method to identify power lines in satellite images. AI Machine learning Semantic segmentation Power lines Satellite images Computer and Information Sciences Data- och informationsvetenskap
34	Adapting multiple datasets for better mammography tumor detection / Anpassa flera dataset för bättre mammografi-tumördetektion Tao, Wang January 2018 (has links) In Sweden, women of age between of 40 and 74 go through regular screening of their breasts every 18-24 months. The screening mainly involves obtaining a mammogram and having radiologists analyze them to detect any sign of breast cancer. However reading a mammography image requires experienced radiologist, and the lack of radiologist reduces the hospital's operating efficiency. What's more, mammography from different facilities increases the difficulty of diagnosis. Our work proposed a deep learning segmentation system which could adapt to mammography from various facilities and locate the position of the tumor. We train and test our method on two public mammography datasets and do several experiments to find the best parameter setting for our system. The test segmentation results suggest that our system could play as an auxiliary diagnosis tool for breast cancer diagnosis and improves diagnostic accuracy and efficiency. / I Sverige går kvinnor i åldrarna mellan 40 och 74 igenom regelbunden screening av sina bröst med 18-24 månaders mellanrum. Screeningen innbär huvudsakligen att ta mammogram och att låta radiologer analysera dem för att upptäcka tecken på bröstcancer. Emellertid krävs det en erfaren radiolog för att tyda en mammografibild, och bristen på radiologer reducerar sjukhusets operativa effektivitet. Dessutom, att mammografin kommer från olika anläggningar ökar svårigheten att diagnostisera. Vårt arbete föreslår ett djuplärande segmenteringssystem som kan anpassa sig till mammografi från olika anläggningar och lokalisera tumörens position. Vi tränar och testar vår metod på två offentliga mammografidataset och gör flera experiment för att hitta den bästa parameterinställningen för vårt system. Testsegmenteringsresultaten tyder på att vårt system kan fungera som ett hjälpdiagnosverktyg vid diagnos av bröstcancer och förbättra diagnostisk noggrannhet och effektivitet. breast cancer detection domain adaptation transfer learning semantic segmentation style transfer Computer Sciences Datavetenskap (datalogi)
35	Integration of Continual Learning and Semantic Segmentation in a vision system for mobile robotics Echeverry Valencia, Cristian David January 2023 (has links) Over the last decade, the integration of robots into various applications has seen significant advancements fueled by Machine Learning (ML) algorithms, particularly in autonomous and independent operations. While robots have become increasingly proficient in various tasks, object instance recognition, a fundamental component of real-world robotic interactions, has witnessed remarkable improvements in accuracy and robustness. Nevertheless, most existing approaches heavily rely on prior information, limiting their adaptability in unfamiliar environments. To address this constraint, this thesis introduces the Segment and Learn Semantics (SaLS) framework, which combines video object segmentation with Continual Learning (CL) methods to enable semantic understanding in robotic applications. The research focuses on the potential application of SaLS in mobile robotics, with specific emphasis on the TORO robot developed at the Deutsches Zentrum für Luft- und Raumfahrt (DLR). Evaluation of the proposed method is conducted using a diverse dataset comprising various terrains and objects encountered by the TORO robot during its walking sessions. The results demonstrate the effectiveness of SaLS in classifying both known and previously unseen objects, achieving an average accuracy of 78.86% and 70.78% in the CL experiments. When running the whole method in the image sequences collected with TORO, the accuracy scores were of 75.54% and 84.75%, for known and unknown objects respectively. Notably, SaLS exhibited resilience against catastrophic forgetting, with only minor accuracy decreases observed in specific cases. Computational resource usage was also explored, indicating that the method is feasible for practical mobile robotic systems, with GPU memory usage being a potential limiting factor. In conclusion, the SaLS framework represents a significant step forward in enabling robots to autonomously understand and interact with their surroundings. This research contributes to the ongoing development of robotic systems that can operate effectively in unstructured environments, paving the way for more versatile and capable autonomous robots. Continual Learning Progressive Neural Networks mobile robotics Computer Vision Machine Learning Semantic Segmentation Robotics Robotteknik och automation
36	Multi-spectral Fusion for Semantic Segmentation Networks Edwards, Justin 05 1900 (has links) Indiana University-Purdue University Indianapolis (IUPUI) / Semantic segmentation is a machine learning task that is seeing increased utilization in multiples fields, from medical imagery, to land demarcation, and autonomous vehicles. Semantic segmentation performs the pixel-wise classification of images, creating a new, seg- mented representation of the input that can be useful for detected various terrain and objects within and image. Recently, convolutional neural networks have been heavily utilized when creating neural networks tackling the semantic segmentation task. This is particularly true in the field of autonomous driving systems. The requirements of automated driver assistance systems (ADAS) drive semantic seg- mentation models targeted for deployment on ADAS to be lightweight while maintaining accuracy. A commonly used method to increase accuracy in the autonomous vehicle field is to fuse multiple sensory modalities. This research focuses on leveraging the fusion of long wave infrared (LWIR) imagery with visual spectrum imagery to fill in the inherent perfor- mance gaps when using visual imagery alone. This comes with a host of benefits, such as increase performance in various lighting conditions and adverse environmental conditions. Utilizing this fusion technique is an effective method of increasing the accuracy of a semantic segmentation model. Being a lightweight architecture is key for successful deployment on ADAS, as these systems often have resource constraints and need to operate in real-time. Multi-Spectral Fusion Network (MFNet) [1] accomplishes these parameters by leveraging a sensory fusion approach, and as such was selected as the baseline architecture for this research. Many improvements were made upon the baseline architecture by leveraging a variety of techniques. Such improvements include the proposal of a novel loss function categori- cal cross-entropy dice loss, introduction of squeeze and excitation (SE) blocks, addition of pyramid pooling, a new fusion technique, and drop input data augmentation. These improve- ments culminated in the creation of the Fast Thermal Fusion Network (FTFNet). Further improvements were made by introducing depthwise separable convolutional layers leading to lightweight FTFNet variants, FTFNet Lite 1 & 2. 13 The FTFNet family was trained on the Multi-Spectral Road Scenarios (MSRS) and MIL- Coaxials visual/LWIR datasets. The proposed modifications lead to an improvement over the baseline in mean intersection over union (mIoU) of 2.92% and 2.03% for FTFNet and FTFNet Lite 2 respectively when trained on the MSRS dataset. Additionally, when trained on the MIL-Coaxials dataset, the FTFNet family showed improvements in mIoU of 8.69%, 4.4%, and 5.0% for FTFNet, FTFNet Lite 1, and FTFNet Lite 2. Neural Networks Semantic Segmentation Sensory Fusion Thermal Imagery Convolutional Neural Networks CNN
37	Sequential Semantic Segmentation of Streaming Scenes for Autonomous Driving Cheng, Guo 12 1900 (has links) Indiana University-Purdue University Indianapolis (IUPUI) / In traffic scene perception for autonomous vehicles, driving videos are available from in-car sensors such as camera and LiDAR for road detection and collision avoidance. There are some existing challenges in computer vision tasks for video processing, including object detection and tracking, semantic segmentation, etc. First, due to that consecutive video frames have a large data redundancy, traditional spatial-to-temporal approach inherently demands huge computational resource. Second, in many real-time scenarios, targets move continuously in the view as data streamed in. To achieve prompt response with minimum latency, an online model to process the streaming data in shift-mode is necessary. Third, in addition to shape-based recognition in spatial space, motion detection also replies on the inherent temporal continuity in videos. While current works either lack long-term memory for reference or consume a huge amount of computation. The purpose of this work is to achieve strongly temporal-associated sensing results in real-time with minimum memory, which is continually embedded to a pragmatic framework for speed and path planning. It takes a temporal-to-spatial approach to cope with fast moving vehicles in autonomous navigation. It utilizes compact road profiles (RP) and motion profiles (MP) to identify path regions and dynamic objects, which drastically reduces video data to a lower dimension and increases sensing rate. Specifically, we sample one-pixel line at each video frame, the temporal congregation of lines from consecutive frames forms a road profile image; while motion profile consists of the average lines by sampling one-belt pixels at each frame. By applying the dense temporal resolution to compensate the sparse spatial resolution, this method reduces 3D streaming data into 2D image layout. Based on RP and MP under various weather conditions, there have three main tasks being conducted to contribute the knowledge domain in perception and planning for autonomous driving. The first application is semantic segmentation of temporal-to-spatial streaming scenes, including recognition of road and roadside, driving events, objects in static or motion. Since the main vision sensing tasks for autonomous driving are identifying road area to follow and locating traffic to avoid collision, this work tackles this problem by using semantic segmentation upon road and motion profiles. Though one-pixel line may not contain sufficient spatial information of road and objects, the consecutive collection of lines as a temporal-spatial image provides intrinsic spatial layout because of the continuous observation and smooth vehicle motion. Moreover, by capturing the trajectory of pedestrians upon their moving legs in motion profile, we can robustly distinguish pedestrian in motion against smooth background. The experimental results of streaming data collected from various sensors including camera and LiDAR demonstrate that, in the reduced temporal-to-spatial space, an effective recognition of driving scene can be learned through Semantic Segmentation. The second contribution of this work is that it accommodates standard semantic segmentation to sequential semantic segmentation network (SE3), which is implemented as a new benchmark for image and video segmentation. As most state-of-the-art methods are greedy for accuracy by designing complex structures at expense of memory use, which makes trained models heavily depend on GPUs and thus not applicable to real-time inference. Without accuracy loss, this work enables image segmentation at the minimum memory. Specifically, instead of predicting for image patch, SE3 generates output along with line scanning. By pinpointing the memory associated with the input line at each neural layer in the network, it preserves the same receptive field as patch size but saved the computation in the overlapped regions during network shifting. Generally, SE3 applies to most of the current backbone models in image segmentation, and furthers the inference by fusing temporal information without increasing computation complexity for video semantic segmentation. Thus, it achieves 3D association over long-range while under the computation of 2D setting. This will facilitate inference of semantic segmentation on light-weighted devices. The third application is speed and path planning based on the sensing results from naturalistic driving videos. To avoid collision in a close range and navigate a vehicle in middle and far ranges, several RP/MPs are scanned continuously from different depths for vehicle path planning. The semantic segmentation of RP/MP is further extended to multi-depths for path and speed planning according to the sensed headway and lane position. We conduct experiments on profiles of different sensing depths and build up a smoothly planning framework according to their them. We also build an initial dataset of road and motion profiles with semantic labels from long HD driving videos. The dataset is published as additional contribution to the future work in computer vision and autonomous driving. Sequential Semantic Segmentation Autonomous Driving Temporal-to-Spatial Video Profile Inference Network
38	Pixel-level video understanding with efficient deep models Hu, Ping 02 February 2024 (has links) The ability to understand videos at the level of pixels plays a key role in a wide range of computer vision applications. For example, a robot or autonomous vehicle relies on classifying each pixel in the video stream into semantic categories to holistically understand the surrounding environment, and video editing software needs to exploit the spatiotemporal context of video pixels to generate various visual effects. Despite the great progress of Deep Learning (DL) techniques, applying DL-based vision models to process video pixels remains practically challenging, due to the high volume of video data and the compute-intensive design of DL approaches. In this thesis, we aim to design efficient and robust deep models for pixel-level video understanding of high-level semantics, mid-level grouping, and low-level interpolation. Toward this goal, in Part I, we address the semantic analysis of video pixels with the task of Video Semantic Segmentation (VSS), which aims to assign pixel-level semantic labels to video frames. We introduce methods that utilize temporal redundancy and context to efficiently recognize video pixels without sacrificing performance. Extensive experiments on various datasets demonstrate our methods' effectiveness and efficiency on both common GPUs and edge devices. Then, in Part II, we show that pixel-level motion patterns help to differentiate video objects from their background. In particular, we propose a fast and efficient contour-based algorithm to group and separate motion patterns for video objects. Furthermore, we present learning-based models to solve the tracking of objects across frames. We show that by explicitly separating the object segmentation and object tracking problems, our framework achieves efficiency during both training and inference. Finally, in Part III, we study the temporal interpolation of pixels given their spatial-temporal context. We show that intermediate video frames can be inferred via interpolation in a very efficient way, by introducing the many-to-many splatting framework that can quickly warp and fuse pixels at any number of arbitrary intermediate time steps. We also propose a dynamic refinement mechanism to further improve the interpolation quality by reducing redundant computation. Evaluation on various types of datasets shows that our method can interpolate videos with state-of-the-art quality and efficiency. To summarize, we discuss and propose efficient pipelines for pixel-level video understanding tasks across high-level semantics, mid-level grouping, and low-level interpolation. The proposed models can contribute to tackling a wide range of real-world video perception and understanding problems in future research. Computer science Efficient deep model Video frame interpolation Video object segmentation Video semantic segmentation
39	Semantic Segmentation of Building Materials in Real World Images Using 3D Information / Semantisk segmentering av byggnadsmaterial i verkliga världen med hjälp av 3D information Rydgård, Jonas, Bejgrowicz, Marcus January 2021 (has links) The increasing popularity of drones has made it convenient to capture a large number of images of a property, which can then be used to build a 3D model. The conditions of buildings can be analyzed to plan renovations. This creates an interest for automatically identifying building materials, a task well suited for machine learning. With access to drone imagery of buildings as well as depth maps and normal maps, we created a dataset for semantic segmentation. Two different convolutional neural networks were trained and evaluated, to see how well they perform material segmentation. DeepLabv3+, which uses RGB data, was compared to Depth-Aware CNN, which uses RGB-D data. Our experiments showed that DeepLabv3+ achieved higher mean intersection over union. To investigate if the information in the depth maps and normal maps could give a performance boost, we conducted experiments with an encoding we call HMN - horizontal disparity, magnitude of normal with ground, normal parallel with gravity. This three channel encoding was used to jointly train two CNNs, one with RGB and one with HMN, and then sum their predictions. This led to improved results for both DeepLabv3+ and Depth-Aware CNN. / Den ökade populariteten av drönare har gjort det smidigt att ta ett stort antal bilder av en fastighet, och sedan skapa en 3D-modell. Skicket hos en byggnad kan enkelt analyseras och renoveringar planeras. Det är då av intresse att automatiskt kunna identifiera byggnadsmaterial, en uppgift som lämpar sig väl för maskininlärning. Med tillgång till såväl drönarbilder av byggnader som djupkartor och normalkartor har vi skapat ett dataset för semantisk segmentering. Två olika faltande neuronnät har tränats och utvärderats för att se hur väl de fungerar för materialigenkänning. DeepLabv3+ som använder sig av RGB-data har jämförts med Depth-Aware CNN som använder RGB-D-data och våra experiment visar att DeepLabv3+ får högre mean intersection over union. För att undersöka om resultaten kan förbättras med hjälp av datat i djupkartor och normalkartor har vi kodat samman informationen till vad vi valt att benämna HMN - horisontell disparitet, magnitud av normalen parallell med marken, normal i gravitationsriktningen. Denna trekanalsinput kan användas för att träna ett extra CNN samtidigt som man tränar med RGB-bilder, och sedan summera båda predikteringarna. Våra experiment visar att detta leder till bättre segmenteringar för både DeepLabv3+ och Depth-Aware CNN. Machine learning deep learning semantic segmentation convolutional neural networks building materials material recognition Signal Processing Signalbehandling
40	Object Detection in Paddy Field for Robotic Combine Harvester Based on Semantic Segmentation / セマンティックセグメンテーションに基づくロボットコンバインのための物体検出 Zhu, Jiajun 25 September 2023 (has links) 京都大学 / 新制・課程博士 / 博士(農学) / 甲第24913号 / 農博第2576号 / 新制\|\|農\|\|1103(附属図書館) / 京都大学大学院農学研究科地域環境科学専攻 / (主査)教授飯田訓久, 教授近藤直, 教授野口良造 / 学位規則第4条第1項該当 / Doctor of Agricultural Science / Kyoto University / DFAM Robotic combine harvester Real-time object detection Deep learning Semantic segmentation Paddy field 610

Search results