Dr. Satish Nargundkar
Part A: Regression Interpretation
1. A zoologist builds a regression model to predict the height (in feet) at which birds of a certain type build a nest on Oak trees in a forest, based on the type of Oak tree that the nest is on, and the amount of rainfall (in inches) in the month before nest building starts. There are three types of Oak trees the zoologist is interested in – Bur Oak, Pin Oak and White Oak. She creates dummies for Pin Oak and White Oak. Regression coefficients are shown below. Assume all are significant.
R-Square = 0.75
Intercept |
50.0 |
Rainfall (in) |
-1.5 |
Pin Oak |
-7.0 |
White Oak |
9.0 |
a. Interpret the R-Square value of 0.75 in the context of this problem.
b. What is the predicted height of the nest on a Pin Oak if there was 4 inches of rain?
c. Interpret the coefficient -7.0 for Pin Oak.
d. According to the model, on what type of Oak tree would the nests be at the lowest height, assuming rainfall was held constant?
e. If Pin Oak was made the baseline and Bur Oak was a dummy variable, how would the intercept and coefficients change, all else being the same?
New Intercept = ______
Coefficient for Bur Oak = _______Coefficient for White Oak = _______
2. Consider the partial data from a regression output below. The regression has 3 independent variables. Fill in the shaded blanks in the table.
R-squared |
0.75 |
Standard Error |
10 |
Observations |
|
|
df |
SS |
MS |
F |
Sig-F |
Regression |
|
|
|
|
1.07E-07 |
Residual(Error) |
|
|
|
|
|
Total |
28 |
10,000 |
|
|
|
3. What are the null and alternate hypotheses in a simple regression? Write in plain language and mathematically.
4. You have a categorical variable in your data about undergraduate students that identifies their enrollment status as Freshman, Sophomore, Junior, and Senior. Consider the values for the first 6 records in your sample shown below in the table. Fill in as many of the blank columns below as needed (and no more) to code the enrollment Status variable as dummy variables to use in regression. Label the columns and fill in the numeric values for each dummy variable.
Obs |
Status |
|
|
|
|
1 |
Freshman |
|
|
|
|
2 |
Senior |
|
|
|
|
3 |
Senior |
|
|
|
|
4 |
Sophomore |
|
|
|
|
5 |
Junior |
|
|
|
|
6 |
Sophomore |
|
|
|
|
5. What are the assumptions of Linear Regression? For each one, write what one can do if the assumption is violated.
Part B: Working with the Data
Download the BDNF Dataset [You may work either in Excel or SPSS].
A researcher is trying to determine if exercise affects
different people differently. Specifically, when people exercise, there is an increase
in a protein called BDNF (Brain derived neurotrophic factor). Is this Increase in BDNF different among males
and females, and is it different among different ethnicities?
The dataset provided shows data on Increase in BDNF in the body due to Exercise, for people in the age range of 18-25, along with their Gender and Ethnicity.
1. Create
a Histogram of the variable Exercise
to look at its distribution. Compute the Mean and Standard Deviation of the
variable as well.
2. Create
a Pivot table in Excel to analyze the Increase
in BDNF by Ethnicity and Gender simultaneously. The table should
look roughly as follows, with the Average, Standard Deviation, and the Count
values for Increase in BDNF in each
of the cells:
Column Labels |
|||
Row Labels |
Female |
Male |
Grand Total |
African |
|
|
|
Asian |
|
|
|
Caucasian |
|
|
|
Grand Total |
|
|
|
Interpret the table – what can you
say about the impact of Gender and Ethnicity overall? Does there seem to
be an interaction between Gender and
Ethnicity? In other words, is the
impact of Gender on Increase in BDNF different for different Ethnicities?
3. Create
a scatterplot of Increase in BDNF
and Exercise. Make sure Increase in
BDNF is on the Y axis. Interpret.
4. Perform
a multiple regression analysis using the dataset provided (use all the Xs
available for the regression), to predict Increase in BDNF in the body. For
Gender, code Male as 1 and Female as 0 (baseline). For Ethnicity, create two
dummies, one for Asian and one for African (keep Caucasian as the
baseline).
Note that in Excel, all the Xs that you will use must be in contiguous
columns.
5. Is
the regression significant overall? What does that mean?
6. Is
each of the variables in the model significant at the 5% level? Interpret.
7. What
is the R-squared value? What does it mean?
Now run the regression again after removing variables that are not
significant at the 5% level.
(remember that you may have to move data in the worksheet so that the
variables left in the regression are all in contiguous columns).
8. What
is the final equation to predict BDNF increase? What would your model predict
as the increase in BDNF for a person who exercises 20 minutes, and is a
Caucasian Female?
9. Interpret
the coefficients for Asian and African.
10. If you want to improve the R-square value of the model, is increasing the sample size the best way? Why or why not?
Part C: Your group research project
As you complete this assignment, remember to think about how
this can help you with your group research project idea that you need to present
in April. You need a research idea, a brief literature review, a model you wish
to test, and a methods section that says how you plan to collect the data and analyze
it quantitatively. You can use the same idea that you use for the qualitative
methods class if you wish. However, it must be reasonably modified in approach so
you can discuss how the same question may be answered with quantitative
analysis.