Return to search

Using statistical learning to predict survival of passengers on the RMS Titanic

Master of Science / Statistics / Christopher Vahl / When exploring data, predictive analytics techniques have proven to be effective. In this report, the efficiency of several predictive analytics methods are explored. During the time of this study, Kaggle.com, a data science competition website, had the predictive modeling competition, "Titanic: Machine Learning from Disaster" available. This competition posed a classification problem to build a predictive model to predict the survival of passengers on the RMS Titanic. The focus of our approach was on applying a traditional classification and regression tree algorithm. The algorithm is greedy and can over fit the training data, which consequently can yield non-optimal prediction accuracy. In efforts to correct such issues with using the classification and regression tree algorithm, we have implemented cost complexity pruning and ensemble methods such as bagging and random forests. However, no improvement was observed here which may be an artifact associated with the Titanic data and may not be representative of those methods’ performances. The decision trees and prediction accuracy of each method are presented and compared. Results indicate that the predictors sex/title, fare price, age, and passenger class are the most important variables in predicting survival of the passengers.

Identiferoai:union.ndltd.org:KSU/oai:krex.k-state.edu:2097/20541
Date January 1900
CreatorsWhitley, Michael Aaron
PublisherKansas State University
Source SetsK-State Research Exchange
Languageen_US
Detected LanguageEnglish
TypeReport

Page generated in 0.002 seconds