Global ETD Search

Return to search

<b>PROBABILISTIC ENSEMBLE MACHINE LEARNING APPROACHES FOR UNSTRUCTURED TEXTUAL DATA CLASSIFICATION</b>

<p dir="ltr">The volume of big data has surged, notably in unstructured textual data, comprising emails, social media, and more. Currently, unstructured data represents over 80% of global data, the growth is propelled by digitalization. Unstructured text data analysis is crucial for various applications like social media sentiment analysis, customer feedback interpretation, and medical records classification. The complexity is due to the variability in language use, context sensitivity, and the nuanced meanings that are expressed in natural language. Traditional machine learning approaches, while effective in handling structured data, frequently fall short when applied to unstructured text data due to the complexities. Extracting value from this data requires advanced analytics and machine learning. Recognizing the challenges, we developed innovative ensemble approaches that combine the strengths of multiple conventional machine learning classifiers through a probabilistic approach. Response to the challenges , we developed two novel models: the Consensus-Based Integration Model (CBIM) and the Unified Predictive Averaging Model (UPAM).The CBIM and UPAM ensemble models were applied to Twitter (40,000 data samples) and the National Electronic Injury Surveillance System (NEISS) datasets (323,344 data samples) addressing various challenges in unstructured text analysis. The NEISS dataset achieved an unprecedented accuracy of 99.50%, demonstrating the effectiveness of ensemble models in extracting relevant features and making accurate predictions. The Twitter dataset, utilized for sentiment analysis, demonstrated a significant boost in accuracy over conventional approaches, achieving a maximum of 65.83%. The results highlighted the limitations of conventional machine learning approaches when dealing with complex, unstructured text data and the potential of ensemble models. The models exhibited high accuracy across various datasets and tasks, showcasing their versatility and effectiveness in obtaining valuable insights from unstructured text data. The results obtained extend the boundaries of text analysis and improve the field of natural language processing.</p>

10.25394/pgs.25669425.v1

probablistic approach

Machine Learning Ensemble Method

ensemble approaches

unstructured text data

Injury classification

sentiment analyis

Identifer	oai:union.ndltd.org:purdue.edu/oai:figshare.com:article/25669425
Date	26 April 2024
Creators	Srushti Sandeep Vichare (17277901)
Source Sets	Purdue University
Detected Language	English
Type	Text, Thesis
Rights	CC BY 4.0
Relation	https://figshare.com/articles/thesis/_b_PROBABILISTIC_ENSEMBLE_MACHINE_LEARNING_APPROACHES_FOR_UNSTRUCTURED_TEXTUAL_DATA_CLASSIFICATION_b_/25669425

Page generated in 0.002 seconds

<b>PROBABILISTIC ENSEMBLE MACHINE LEARNING APPROACHES FOR UNSTRUCTURED TEXTUAL DATA CLASSIFICATION</b>

Description

Links & Downloads

Tags

Additional Fields