In the era of total digitization of documents, navigating vast and heterogeneous data landscapes presents significant challenges for effective information retrieval, both for humans and digital agents. Traditional methods of knowledge organization often struggle to keep pace with evolving user demands, resulting in suboptimal outcomes such as information overload and disorganized data. This thesis presents a case study on a pipeline that leverages principles from cognitive science, graph theory, and semantic computing to generate semantically organized knowledge graphs. By evaluating a combination of different models, methodologies, and algorithms, the pipeline aims to enhance the organization and retrieval of digital documents. The proposed approach focuses on representing documents as vector embeddings, clustering similar documents, and constructing a connected and scalable knowledge graph. This graph not only captures semantic relationships between documents but also ensures efficient traversal and exploration. The practical application of the system is demonstrated in the context of digital libraries and academic research, showcasing its potential to improve information management and discovery. The effectiveness of the pipeline is validated through extensive experiments using contemporary open-source tools.
Identifer | oai:union.ndltd.org:CALPOLY/oai:digitalcommons.calpoly.edu:theses-4495 |
Date | 01 June 2024 |
Creators | Luu, Erik E |
Publisher | DigitalCommons@CalPoly |
Source Sets | California Polytechnic State University |
Detected Language | English |
Type | text |
Format | application/pdf |
Source | Master's Theses |
Page generated in 0.0018 seconds