Lecture videos are good sources for people to learn new things. Students commonly use online videos to explore various domains. However, some recorded videos are posted on online platforms without being post-processed due to technology and resource limitations. In this work, we focus on the research of developing an intelligent system to automatically extract essential information, including the main instructor and screen, in a lecture video in several scenarios by using modern deep learning techniques. This thesis aims to combine the extracted essential information to render the videos and generate a new layout with a smaller file size than the original one. Another benefit of using this approach is that the users may save video post-processing time and costs. State-of-the-art object detection models, an algorithm to correct screen display, tracking the instructor, and other deep learning techniques were adopted in the system to detect both the main instructor and the screen in given videos without much of the computational burden.
There are four main contributions:
1. We built an intelligent video analysis and post-processing system to extract and reframe detected objects from lecture videos.
2. We proposed a post-processing algorithm to localize the frontal human torso position in processing a sequence of frames in the videos.
3. We proposed a novel deep learning approach to distinguish the main instructor from other instructors or audiences in several complex situations.
4. We proposed an algorithm to extract the four edge points of a screen at the pixel level and correct the screen display in various scenarios.
Identifer | oai:union.ndltd.org:UMASS/oai:scholarworks.umass.edu:masters_theses_2-2095 |
Date | 14 May 2021 |
Creators | Wang, Xi |
Publisher | ScholarWorks@UMass Amherst |
Source Sets | University of Massachusetts, Amherst |
Detected Language | English |
Type | text |
Format | application/pdf |
Source | Masters Theses |
Page generated in 0.0024 seconds