Return to search

Strainer: State Transcript Rating for Informed News Entity Retrieval

Over the past two decades there has been a rapid decline in public oversight of state and local governments. From 2003 to 2014, the number of journalists assigned to cover the proceedings in state houses has declined by more than 30\%. During the same time period, non-profit projects such as Digital Democracy sought to collect and store legislative bill and hearing information on behalf of the public. More recently, AI4Reporters, an offshoot of Digital Democracy, seeks to actively summarize interesting legislative data.
This thesis presents STRAINER, a parallel project with AI4Reporters, as an active data retrieval and filtering system for surfacing newsworthy legislative data. Within STRAINER we define and implement a process pipeline by which information regarding legislative bill discussion events can be collected from a variety of sources and aggregated into feature sets suitable for machine learning. Utilizing two independent labeling techniques we trained a variety of SVM and Logistic Regression models to predict the newsworthiness of bill discussions that took place in the California State Legislature during the 2017-2018 session year. We found that our models were able to correctly retrieve more than 80\% of newsworthy discussions.

Identiferoai:union.ndltd.org:CALPOLY/oai:digitalcommons.calpoly.edu:theses-4127
Date01 June 2022
CreatorsGerrity, Thomas M
PublisherDigitalCommons@CalPoly
Source SetsCalifornia Polytechnic State University
Detected LanguageEnglish
Typetext
Formatapplication/pdf
SourceMaster's Theses

Page generated in 0.0021 seconds