Classification Assignment

Predicting the survival of a passenger on the Titanic

 

1.      See the attached dataset that contains 891 observations randomly selected from among the roughly 2200 passengers on board. Roughly 32% were rescued.

2.      Perform a multiple regression with “Survived” as the dependent, and Age, Class (socio-economic class, where 1=upper, 2=middle and 3=poor), Sex and Siblings as the independent variables. This has already been done – you can compare your output to this to confirm that you got the same results. If you do not have access to a regression package, then use the output provided in the worksheet.

3.      Copy the predicted values from the residual output to a new worksheet, along with the original values of Survived next to them.

4.      Sort both columns in descending order of the predicted values.

5.      Create a Vlookup table with bins (groups) for the predicted values.

6.      Use the Vlookup function to assign a bin value to each predicted value in the column next to it.

7.      Create a pivot table to see how the original y values are spread across the prediction bins.

8.      Sort the predicted score groups in descending order.

9.      Create a cumulative percentage table like the one discussed in class.

 

Answer the following based on the table created in part 9 above:

Pick any score cutoff value (any row of the table):

1.      Interpret any one number in each of the cumulative percent columns.

2.      Interpret any one number in each of the last two columns.

Overall, how good of a model would you say this is? What cutoff score would you use to predict that a person survived?