Classification
Assignment
Predicting
the survival of a passenger on the Titanic
1. See
the attached dataset that contains 891 observations randomly selected from
among the roughly 2200 passengers on board. Roughly 32% were rescued.
2. Perform
a multiple regression with “Survived” as the dependent, and Age, Class
(socio-economic class, where 1=upper, 2=middle and 3=poor), Sex and Siblings as
the independent variables. This has already been done – you can compare your
output to this to confirm that you got the same results. If you do not have access
to a regression package, then use the output provided in the worksheet.
3. Copy
the predicted values from the residual output to a new worksheet, along with
the original values of Survived next to them.
4. Sort
both columns in descending order of the predicted values.
5. Create
a Vlookup table with bins (groups) for the predicted
values.
6. Use
the Vlookup function to assign a bin value to each
predicted value in the column next to it.
7. Create
a pivot table to see how the original y values are spread across the prediction
bins.
8. Sort
the predicted score groups in descending order.
9. Create
a cumulative percentage table like the one discussed in class.
Answer
the following based on the table created in part 9 above:
1. Interpret
any one number in each of the cumulative percent columns.
2. Interpret
any one number in each of the last two columns.
Overall, how good of a model would you say this is?
What cutoff score would you use to predict that a person survived?