Question
Hospital administrator wished to study the relation between patient satisfaction Y) and patient's age (X1, in years), severity of illness (X2 _ an index), and anxiety level (X3 an index): The administrator randomly selected 46 patients and collected the data presented below, where larger values of Y, X2_ and X3 are, respectively, associated with more satisfaction, increased severity of illness_ and more anxiety: (Exercise problem 6.15 in the textbook): Using the attached data (assignment2_
hospital administrator wished to study the relation between patient satisfaction Y) and patient's age (X1, in years), severity of illness (X2 _ an index), and anxiety level (X3 an index): The administrator randomly selected 46 patients and collected the data presented below, where larger values of Y, X2_ and X3 are, respectively, associated with more satisfaction, increased severity of illness_ and more anxiety: (Exercise problem 6.15 in the textbook): Using the attached data (assignment2_ 1.Rdata), answer the questions below_ You can load the .Rdata file using the 'load' function: load('assignment2 1.Rdata Create scatter plot matrix using plot() command: Can you detect any linear relationship? Fit multiple linear regression model using all the predictor variables (X1-X3)_ Create summary of the regression fit and interpret the results (using summary() function): Which coefficients are statistically significant? Which are not? (Use a = 0.10) Test the following null and alternative hypotheses: Ho: B1 Bz B3 0 vs Hi:at least one coefficient is not 0 What is the implication of the hypothesis test result? Fit the regression model again with only the significant variables. State the fitted model: Are the coefficients are all significant now? Can we say that the fitted model is useful for predict the patient satisfaction? Why? Find the confidence intervals for the coefficients in the model that you fitted in part d. Based on the model fitted in part d. find the prediction interval for new observation with X1-50 and X3-2.6_ Interpret the found interval. (Define 'new' dataset using the following command: xnew-data frame(X1-50,X3-2.6))


Answers
The data set HAPPINESS contains independently pooled cross sections for the even years from 1994
through $2006,$ obtained from the General Social Survey. The dependent variable for this problem is a measure of "happiness," vhappy, which is a binary variable equal to one if the person reports being
"very happy" (as opposed to just "pretty happy" or "not too happy").
(i) Which year has the largest number of observations? Which has the smallest? What is the percentage of people in the sample reporting they are "very happy"?
(ii) Regress vhappy on all of the year dummies, leaving out $y 94$ so that 1994 is the base year. Compute a heteroskedasticity-robust statistic of the null hypothesis that the proportion of very happy people has not changed over time. What is the $p$ -value of the test?
(iii) To the regression in part (ii), add the dummy variables occattend and regattend. Interpret their
coefficients. (Remember, the coefficients are interpreted relative to a base group.) How would you summarize the effects of church attendance on happiness?
(iv) Define a variable, say highinc, equal to one if family income is above $\$ 25,000 .$ (Unfortunately, the same threshold is used in each year, and so inflation is not accounted for. Also, $\$ 25,000$ is hardly what one would consider "high income.") Include highinc, unem $10,$ educ, and teens in the regression in part (iii). Is the coefficient on regattend affected much? What about its statistical significance?
(v) Discuss the signs, magnitudes, and statistical significance of the four new variables in part (iv).
Do the estimates make sense?
(vi) Controlling for the factors in part (iv), do there appear to be differences in happiness by gender
or race? Justify your answer.