Return to search

Improving Automatic Transcription Using Natural Language Processing

Digital Democracy is a CalMatters and California Polytechnic State University initia-tive to promote transparency in state government by increasing access to the Califor-nia legislature. While Digital Democracy is made up of many resources, one founda-tional step of the project is obtaining accurate, timely transcripts of California Senateand Assembly hearings. The information extracted from these transcripts providescrucial data for subsequent steps in the pipeline. In the context of Digital Democracy,upleveling is when humans verify, correct, and annotate the transcript results afterthe legislative hearings have been automatically transcribed. The upleveling processis done with the assistance of a software application called the Transcription Tool.The human upleveling process is the most costly and time-consuming step of the Dig-ital Democracy pipeline. In this thesis, we hypothesize that we can make significantreductions to the time needed for upleveling by using Natural Language Processing(NLP) systems and techniques. The main contribution of this thesis is engineeringa new automatic transcription pipeline. Specifically, this thesis integrates a new au-tomatic speech recognition service, a new speaker diarization model, additional textpost-processing changes, and a new process for speaker identification. To evaluate the system’s improvements, we measure the accuracy and speed of the newly integrated features and record editor upleveling time both before and after the additions.

Identiferoai:union.ndltd.org:CALPOLY/oai:digitalcommons.calpoly.edu:theses-4454
Date01 March 2024
CreatorsKiefer, Anna
PublisherDigitalCommons@CalPoly
Source SetsCalifornia Polytechnic State University
Detected LanguageEnglish
Typetext
Formatapplication/pdf
SourceMaster's Theses

Page generated in 0.0019 seconds