Automatic Generation of Video Navigation from Google Street View Database with Object Detection, Image Inpainting and Stereoscopic Virtual Reality Display / 自動產生Google街景導覽影片並提供物件偵測、影像修補與3D虛擬實境顯示

博士 / 國立臺灣科技大學 / 資訊管理系 / 107 / In recent years, there are abundant researches in artificial intelligence and deep learning. At the same time, Google Street View images are often used by us. We can use Google Street View to look up the scene views of destination where we want to go to. However, there is not much work that can automatically transform Google Street View images directly to a navigation video with the functionalities of object detection and image inpainting, and there is also not much work that can make the generated navigation video used together with a HTC Vive for displaying the 3DVR360 effect.
In my works, this study tries to combine currently the two most popular computer science researches of deep learning (or artificial intelligence) and virtual reality. Totally, this study has developed the three versions of my system for the navigation video generation. First, in this GSVPlayer-HH&I (i.e. Google Street View Player with HOG+Haar and Inpainting), the system mainly adopts the CPU-based methods for object detection and image inpainting. Second, in this GSVPlayer-FRRCNN&I (i.e. Google Street View Player with Faster R-CNN and Inpainting), based on the foundation of GSVPlayer-HH&I, the system instead uses the GPU-based methods (Faster R-CNN) for object detection. Third, in this GSVPlayer-3DVR360 (i.e. Google Street View Player with Stereoscopic Virtual Reality 360 Display), the system implements a series of image processing, monocular depth map estimation, DIBR and 3DVR360 display. One of the results gained is that, even though there is a problem of longer computation time in this system, all users are still satisfied with this GSVPlayer-3DVR360.
In my dissertation, for the three versions of my system, the results and evaluations regarding both quantities and qualities are presented respectively, and the discussion and limitation are explicitly explained. In conclusion, briefly speaking, the system I proposed is a complete integrated framework.
In future works, there are several potential directions can be explored and researched, including the use of multiple computing servers, a new CNN of monocular depth estimation with the temporal sequence, synthesizing novel frames, the YOLO object detection method, and object detection and image inpainting on high-resolution images.

Identiferoai:union.ndltd.org:TW/107NTUS5396011
Date January 2019
CreatorsYuan-Bang Cheng, 鄭元棓
ContributorsChuan-Kai Yang, Teng-Wen Chang, 楊傳凱, 張登文
Source SetsNational Digital Library of Theses and Dissertations in Taiwan
Languageen_US
Detected LanguageEnglish
Type學位論文 ; thesis
Format167

Page generated in 0.0086 seconds