PMBA 8040 Project Guidelines

 

 

 

  1. Think of something that you may want to predict or classify in your business (the business of one of the group members). If you can find data for that variable and potential predictor variables, that is ideal. If not, use a dataset available online. A couple of sources are:
    1. UCI Machine Learning Repository
    2. Kaggle

 

  1. Find a suitable dependent variable to predict from the dataset you pick – if you pick a categorical one, make sure you only use two categories. We have not discussed how to build a model to predict anything with 3 or more categories in the dependent variable. Remember that you can have independent variables that are of the multi-category type – you simply have to make as many columns of 0/1 as the number of categories minus 1.

 

  1. Build a prediction or classification model and interpret. If you can also incorporate Segmentatation (with Cluster Analysis or otherwise with a categorical variable), that is good, but not required.

 

Written Report Guidelines

 

  1. Introduction – what is the goal of the project?

 

  1. Data
    1. Source, variables (put data dictionary in appendix)
    2. Sample Size
    3. Dependent Variable, Outcome period, Sample Time Frame (be aware that the outcome period can be 0 if the prediction is for right now rather than the future. For example, predicting a home price based on area, number of bedrooms, etc., would have an outcome period of 0 since the predicted price is estimated for the current moment).
    4. Data Preparation, if any was needed (aggregation, variable creation, data cleaning)

 

  1. Methodology
    1. Preliminary Analysis – compute Means and Standard Deviations of each variable in the dataset so you get a sense of what the data look like.
    2. Show scatterplots of each variable against the dependent (if dependent is categorical, then do a pivot table instead, to show how the categories are spread across the independent variable values).
    3. Do a Regression to predict of classify

 

  1. Results
    1. Show scorecard (results of final regression in plain English – write out the model and interpret it)
    2. (put actual regression results  only in appendix)
    3. Evaluate using R-square and SE, or using the tabulation of classification results by score group.

 

  1. Implementation
    1. Discuss how the client should implement your model – If a classification model, what cutoff scores you recommend, what strategies/decisions go with the cutoffs.

 

 

Oral Presentation

 

Present the project as you might to a client. As with the written report, start with an introduction (even though the client knows what their problem is, you discuss it anyhow), then data, results of analysis, and recommendation.