So this is another problem with statistical inference. And this time we have an interesting regression equation that we're gonna look at that has a bunch of linear terms that were used to working with. But we also have the addition of a quadratic term or ah, squared term. So you'll read about this in the problem. But just a heads up brandy in a slight new element to this problem. So part one of this problem asked you to estimate the following a regression model. And as I typically do all just write it out in the function form and here that the dependent variable is the education of the individual in the sample. Uh, that would be the highest level of education completed by that person. Okay? And their education is a function of both of their parents highest completed education. So there mother's education and their fathers, education and as well as their own ability. And this is a measure that's hard to explain and not really in the data, but just think of it is innate ability. And also here's the quadratic term I was mentioning at the beginning is ability squared and while we have this quadratic term in here. You just think of this as well. Someone's ability might influence or increase their eventual educational attainment over their life. But you could also think that potentially there's decreasing returns to the ability you have as a person from birth. So at really high levels of ability, um, your educational attainment won't increase as much as it waas. If you had lower levels of ability compared to someone who had a little bit more than you, so does that decreasing returns to ability type. Um, concept is captured in this quadratic term here, so that's just some background not relevant for actually solving the problem. Uh huh. But para one asked you to report the results of this regression here and the usual form, and so I'll just write out the, um, the intercept First, I didn't include the intercept here, but the intercept. I'll just right here. So here's the coefficient for that intercept, and then it's standard error. Then I'll go along and start with the mother's education. Variable here, start 0.19 It's positive. Positive coefficient makes sense, So if your have a mother with more education, you're likely to also have a higher educational attainment If you're in the sample, how about the fathers Education? So also, you should get a positive coefficient here, which also makes sense both of these coefficients standard errors tell us that they're both these educational variables air statistically significant for sure. How about the ability? So we hope this will be positive. And it is in a 0.401 there with a again, a small standard error so definitely statistically significant here. And finally, the square of ability is positive there. So this is actually interesting. So I hypothesized just a minute or two ago that there might be decreasing returns to ability. Um, meaning that really high levels of ability. You don't really You're not predicted. Thio, uh, have, ah that much higher educational attainment, however, looks like I might have been a little wrong because this is a positive coefficient and it is statistically significant, so that would actually represent increasing returns to ability. So it hi, higher levels of ability. You actually are predicted to have even higher and higher increases in your educational attainment across individuals. That's interesting. Very interesting result there. So that's the first part of part one. The other part they want you to do in this first little part is test the null hypothesis, the following the hypothesis that education is linearly related thio ability. So the null hypothesis you can think of as that the coefficient for ability squared. Cool. Zero. Right? So the null hypothesis is that education is just linearly related to ability. So that's another way of saying that you're representing that Is that the coefficient on that quadratic term is zero, which gives the alternative is that the relationship is quadratic, so just changes a little bit to say. The alternative hypothesis is the This coefficient of the quadratic term does not equal zero. Right. So to test this hypothesis, you have to do one of our favorite tests. We use this a lot is the F test, right? So you have toe run your unrestricted regression, which includes the quadratic terms. So it includes the squared ability term and also run the restricted aggression without the quadratic term and just plug in the F statistic formula, those sums of squared residuals. So I'll just tell you what your statistics should be, actually, all right at the actual numbers that I got here. Um, and we'll look at the F statistic together, so new calculation should look somewhat like this. Um, this is the sum of squared residuals here for the restricted regression. So without the quadratic term, and then subtract off of that the sum of squared residuals Tibet, and write this. Okay, here we go. Sum of squared residuals for the unrestricted regression. So that's including the quadratic term. We only have one restriction here. So put one here and then in the dominator again, as we always do. You repeat that sum of squared residuals for the unrestricted regression and divided by our sample size. Minus are independent are independent variables plus one. So ah, that would be five. And so once you do all this calculation, she got the out of statistic of me to steal this really quick. All right, it does kind of look like this will be a rather large statistic at this point. So I'm expecting this to be relatively large and sure enough, So let's let's just say this is about equal to 37 should equal about 37 so I'll circle that. So if the statistic is that large. That's obviously going to reject the null hypothesis so we can reject the null and accept the alternative hypothesis, which is that the, uh, quadratic term should be in the regression. And another way of saying that is that education is quadratic lee related to ability, right? Which we could have guessed up here just from this coefficient. The fact that it's statistically significant, but that's a good check. All right, so part two, you have to dio a, um, kind of a similar a similar exercise. But this time we're testing whether, well, that's what color you dread. Whether beta one equals beta two, which is saying, does a year of Mother's education have the same impact on on individuals? Education? Is that the same impact as the Year of Fathers education? So, looking at whether these coefficients are are equal the coefficients on these these two variables. So what you should do for Part two is, uh, first of all, run the regression self. Just run the regression from part one. The way that's laid out, Make sure you include include the quadratic ability term, right, because we found out from part one that it, uh, that we rejected that null hypothesis. So just from the regression that's it's given and after that, Um, I'm not sure what software package you use, but there should be some sort of option for this and the taxes you use. I use data, something you can dio. The syntax is just test whether to coefficients are equal and the syntax is as following. So for our purposes, it would go test mother or mud M o T H education test, but a Duke equal father education because those are the those the variable names in the data set. So that's just be in the command line in stadia. And if you're in another software package, there should be another way to do that sort of test and trying to look that up if you don't know it. But run that test. What should pop out of that test on? This is what state he gives out. Hopefully other packages do a similar thing. It gives you that the F statistic equals 3.75 so that looks like it's gonna be significant, right? Um so, which means, if it's significant means it looks like we might reject the null hypothesis. So reject the concept that ah, year of Mother's Education has the same impact on on individuals education as another year of fathers education. But they asked for the P value. So I'll just write down the P value for this sober doing what the problem asks, and you should get a P value of 0.531 I'll just circle that because that will be our answer for part two. And so that's not quite at the 5% level of significance. So you could almost make a case either way. Um, depending what significance level you're you've planned on using or think is appropriate. Um, you could either say that at the 10% level. It looks like we reject the hypothesis that those coefficients on Mother education and Father Education are the same. But you could also make the argument that we would we are, except at the 5% or 1% level that they are the same. So depending on what what, you're comfortable with what you think is appropriate. That's the conclusion there. Great. That's part two. Part three. Ask you toe ad the two tuition variables to college tuition variables to the regression from Part one. So I'm just going to write right out how this would kind of go. So you just regress the regression for part one so regress education on the mother's education, father's education, ability and squared ability and the first tuition variable. So the college tuition at age 17 noted by two it 17 there and the other tuition variable, which is almost the same but tuition at age 18 and and her air term. So that should be the aggression you run for part three, and you have to determine whether these two new variables are jointly statistically significant. Is there a whole go back to our old friend, right? The statistic and do what we know how to do so. First number here would be the some of some of squared residuals for, um, some square residuals for the restricted regression, which means not including the tuition variables and this number would be the sum of squared residuals for the unrestricted aggression, which would include the tuition of variables here. Of course, we have to restrictions so there to there and finish out this statistic by filling in the denominators numbers, um, with our sample says here of of this. And now we have a total of six, right? We have. So we have six independent variables, So six plus one is seven here. So our F statistic, once you run these numbers should be 0.83 That's a low ebb. Statistics. So that looks like we're not going toe. Uh huh. It was like, we're going to say that the the two tuition variables here are not jointly significant, so Ah, it's right. Variables or not Big capital letters Not jointly. Significant. So that low that low ab statistic value is all we need. Toe tell us that that's part three. Two more parts. This problem. Do you have part four? Part four asked you to find just the simple correlation between those two tuition variables. We just added, So regards what program you use. They're gonna have a way for you. They should have a way for you to find out. Tests this correlation pretty easily. This is the status. Intacs, I guess. And you should get First of all, what's our hypothesis for this correlation? So tuition at age 17 vs tuition at age 18 mine would be that they're super, super, highly, positively correlated. Right, um, of your expectation. Tuition shouldn't waver too much from year to year. And when I ran ran this. I did get exactly that. So I got a 0.98 which is very, very, very high. Positive correlations. Almost one. So really not much difference here over a year between these two amounts. So that's good. That's good check. And once we have that, the problem asked you to explain why using instead of these individual variables, why, instead of using those new aggression while you want, why, we might want to instead just use an average of them. So just to be here, to note, that is average. So yeah, so why might we want to use an average of these two variables And the one? The main reason I think of is that it's think about it. It would cut down on unnecessary variation. Um, so I'm not how to say this. So variation cut down, and I want you to think of that is unnecessary variation. Right, because we're not getting really any more information from using tuition 18 variable in addition to the tuition 17. Right When we're thinking of, um, explaining the variation in in educational use of education. So cut down the necessary variation. That would be one reason because right there, almost perfectly correlated. So we just want to include we do want to make sure you include this tuition cost information, but an average would cut down this unnecessary variation. Me just right. Anus necessary. Unnecessary variation is cut down. Okay, so then I asked what happens when you use the average right? So you have to kind of go back and you have to create a new variable. What I did was I created a variable called average a PG Tuition equals are existing variables that were given in the data data set. So just some divided by two. So, however, which way you do that in your in your statistical software, create that average variable and run a new regression So the progression will run here in Part four is the same as three, except instead of using instead of using these blue for this, instead of using both of these variables will use the average. So what you want to do is regress everything in and part one. Plus, let's see that'll be beta beta, five beta, five times their newly created average tuition variable. What state are term? Right? So that's our our new regression. That's what you want to run here. And what you should find out have joined this regression. Is that Yeah, the P value for beta five. So a beta five pat let me switch colors again. So it's a little little clear here. You should get that beta five hat. So the coefficient on the average tuition variable. Um, first of all, it's it's it's positive, but the P value this is the most interesting part of the P value equals but point to. And if you look closely at the regression up in part three, they didn't ask you to look at this. But it might be interesting interesting to check this out. The P values for tuition 17 intuition 18 were between point A and 180.0.9. So this average here has brought the P value down toe 0.2. So, still not arguably not still statistically significant there, but at I have to say compare so compared that p value to when the tuition of variables were used separately. It was the P values for them are about 0.82 point nine approximately. So you could come away with FIS regression with a possible conclusion that creating this average variable is cut down on unnecessary variation. It also picks up three idea that it's closer to having some effect on eventual educational attainment, which lines up with what we'd expect. You know, the cost of cost of college does have an impact on whether we decide to go back for a year, or whether to differ. And things like that that should be part for should be good in after you do that. And then kind of we have Parts five, which asked if the if the findings for the average tuition variable we just found in Part four make sense when interpreted. Causeway. So I'll just write down what I got for the actual coefficient. Uh, I'll straight down for average tuition. So my estimated coefficient from Part four equaled 16 and 0.12 is the standard error. So again, uh, not quite statistically significance. But what we're interesting here is the coefficient right and especially the fact that it's positive. So when you think about this interpretation. This is saying that a, um if average tuition goes up by others to say one unit, so I'm not sure what the units are. But if average tuition goes up, then your education or individuals education on average is predicted to also go up. That's not what we expect, right? So we expect, as the cost of going to school goes up on average, uh, controlling for other factors that were controlling for we expect education thio, um, go down or this coefficient to be negative. So kind of brings up the question like What's going on here, But I might be going on also is something that OLS isn't very OLS estimation like this is not very good at picking up, which is, if you think about if people who complete more education. So to say, a person here has, uh, just say higher education eventually gets more education, but also went to also might have gone thio institutions or toe colleges with systematically higher tuition than our other are other person over here, um, you end up getting not quite high oven education and went to institutions with lower average tuition. So if you think about compared to these two different scenarios, let's say, and there's there's may be good reason to believe that, um, people who complete mawr education systematically tend institutions that charge higher average tuition. So that would be one reason of explaining and maybe the most possible reason for explaining this. It's, ah positive coefficient on average tuition, which is not what we would expect normally. But OLS is not good at picking up this sort of this interplay here. This kind of self selection. Um, yeah, that's what that's how you can end in the problem.