It has been 111 years since the horrible but very famous tragedy that shocked the world. The RMS Titanic sank in the early morning hours of 15 April in the North Atlantic Ocean, four days into her maiden voyage from Southampton, UK to New York City, USA.
For this anniversary, we, at The Stats People, decided to participate in a Kaggle data analysis competition. We were given a random sample of people who were passengers/crew members on the Titanic. By just knowing a few details about each passenger, we were able to analyse the sample and create a predictive model using regression methods, so to estimate each passenger’s chance of surviving the 1912 tragedy.
The Important details:
In total there were an estimated 1,517 people killed in the sinking of the Titanic, 832 passengers and 685 crew members, out of an approximate total of 2,226 people on board. The initial sample contained 891 people, the information for each then enabled us to train and test the model for context.
Kaggle provided a limited amount of information about those in our sample, who were on the ship. The attributes we were provided with were:
- Social Class
- Boarding location
- Number of siblings/spouses and parents/children travelling with a passenger
- Flag of survival
- Ticket number
For our analysis, we built two models: a simple model where each predictor has a consistent effect and a more complex model where the effects of predictors vary depending on other predictors that are present. The simple one helps us summarise the average impact of each predictor, while the complex one significantly improves prediction. Prior to running these models, we had to interpolate missing cases for some potential predictors (although later in the analysis a number of these were found to be redundant). By using all the observed variables, we were able to make accurate predictions for these missing observations.
The following infographic reveals our findings. We hope you will find it as interesting as we did!
Reference: Facts about titanic