Master of Science / Electronic theses and dissertations (ETDs) are structured documents in which chapters are major components. There is a lack of any repository that contains chapter boundary details alongside these structured documents. Revealing these details of the documents can help increase accessibility.
This research explores the manipulation of ETDs marked up using LaTeX to generate chapter boundaries. We use this to create a data set of 1,459 ETDs and their chapter boundaries. Additionally, for the task of automatic segmentation of unseen documents, we prototype three deep learning models that are trained using this data set. We hope to encourage researchers to incorporate LaTeX manipulation techniques to create similar data sets.
Identifer | oai:union.ndltd.org:VTETD/oai:vtechworks.lib.vt.edu:10919/113246 |
Date | 18 January 2023 |
Creators | Manzoor, Javaid Akbar |
Contributors | Computer Science and Applications, Fox, Edward A., Wu, Jian, Heath, Lenwood S. |
Publisher | Virginia Tech |
Source Sets | Virginia Tech Theses and Dissertation |
Language | English |
Detected Language | English |
Type | Thesis |
Format | ETD, application/pdf |
Rights | In Copyright, http://rightsstatements.org/vocab/InC/1.0/ |
Page generated in 0.0018 seconds