EMBA 8150 Regression Assignment

 

A researcher is trying to determine if exercise affects different people differently. Specifically, when people exercise, there is an increase in a protein called BDNF (Brain derived neurotrophic factor). Is this Increase in BDNF different among males and females, and is it different among different ethnicities?

 

The dataset provided shows data on Increase in BDNF in the body due to Exercise, for people in the age range of 18-25, along with their Gender and Ethnicity.

 

1.      Create a Histogram of the variable Exercise to look at its distribution. Compute the Mean and Standard Deviation of the variable as well.

 

2.      Create a Pivot table in Excel to analyze the Increase in BDNF by Ethnicity and Gender simultaneously. The table should look as follows, with the Average, Standard Deviation, and the Count values for Increase in BDNF in each of the cells:

 

Column Labels

Row Labels

Female

Male

Grand Total

African

 

 

 

Asian

 

 

 

Caucasian

 

 

 

Grand Total

 

 

 

 

Interpret the table – what can you say about the impact of Gender and Ethnicity overall? Does there seem to be an interaction between Gender and Ethnicity? In other words, is the impact of Gender on Increase in BDNF different for different Ethnicities?

 

3.      Create a scatterplot of Increase in BDNF and Exercise. Make sure Increase in BDNF is on the Y axis. Interpret.

 

4.      Perform a multiple regression analysis using the dataset provided (use all the Xs available for the regression), to predict Increase in BDNF in the body. For Gender, code Male as 1 and Female as 0 (baseline). For Ethnicity, create two dummies, one for Asian and one for African (keep Caucasian as the baseline). 

 

Note that in Excel, all the Xs that you will use must be in contiguous columns.

 

5.      Is the regression significant overall? What does that mean?

 

6.      Is each of the variables in the model significant at the 5% level? Interpret.

 

7.      What is the R-squared value? What does it mean?

 

Now run the regression again after removing variables that are not significant at the 5% level. 

(remember that you may have to move data in the worksheet so that the variables left in the regression are all in contiguous columns).

 

8.      What is the final equation to predict BDNF increase? What would your model predict as the increase in BDNF for a person who exercises 20 minutes, and is a Caucasian Female?

 

9.      Interpret the coefficients for Asian and African.

 

10.  If you want to improve the R-square value of the model, is increasing the sample size the best way? Why or why not?