PMBA 8040 Project Guidelines
- Think of
something that you may want to predict or classify in your business (the
business of one of the group members). If you can find data for that
variable and potential predictor variables, that is ideal. If not, use a
dataset available online. A couple of sources are:
- UCI Machine Learning Repository
- Kaggle
- Find a suitable
dependent variable to predict from the dataset you pick – if you pick a
categorical one, make sure you only use
two categories. We have not discussed how to build a model to predict
anything with 3 or more categories in the dependent variable. Remember
that you can have independent variables that are of the multi-category
type – you simply have to make as many columns of 0/1 as the number of
categories minus 1.
- Build a
prediction or classification model and interpret. If you can also
incorporate Segmentatation (with Cluster
Analysis or otherwise with a categorical variable), that is good, but not
required.
Written Report Guidelines
- Introduction –
what is the goal of the project?
- Data
- Source,
variables (put data dictionary in appendix)
- Sample Size
- Dependent
Variable, Outcome period, Sample Time Frame (be aware that the outcome
period can be 0 if the prediction is for right now rather than the
future. For example, predicting a home price based on area, number of
bedrooms, etc., would have an outcome period of 0 since the predicted
price is estimated for the current moment).
- Data
Preparation, if any was needed (aggregation, variable creation, data
cleaning)
- Methodology
- Preliminary
Analysis – compute Means and Standard Deviations of each variable in the
dataset so you get a sense of what the data look like.
- Show
scatterplots of each variable against the dependent (if dependent is
categorical, then do a pivot table instead, to show how the categories
are spread across the independent variable values).
- Do a Regression
to predict of classify
- Results
- Show scorecard
(results of final regression in plain English – write out the model and
interpret it)
- (put actual
regression results only in appendix)
- Evaluate using
R-square and SE, or using the tabulation of classification results by
score group.
- Implementation
- Discuss how the
client should implement your model – If a classification model, what
cutoff scores you recommend, what strategies/decisions go with the
cutoffs.
Oral Presentation
Present the project as you might to a client. As with the written
report, start with an introduction (even though the client knows what their
problem is, you discuss it anyhow), then data, results of analysis, and
recommendation.