Question
1. Suppose you have one continuous predictor X and binary categorical rcsponse Y which can takc valucs or 2. Suppose you collected train- ing data from the two classes and obtained class-specific sample means and p2 along ` with the pooled variance estimatc over the two classcs_ 1. (AOpt total, Spt for cach question)Assume equal class priors and derive the LDA classification rule for this problem_ Sketch the estimatcd class-conditional densitics and show your decision boundary on the plot Make
1. Suppose you have one continuous predictor X and binary categorical rcsponse Y which can takc valucs or 2. Suppose you collected train- ing data from the two classes and obtained class-specific sample means and p2 along ` with the pooled variance estimatc over the two classcs_ 1. (AOpt total, Spt for cach question) Assume equal class priors and derive the LDA classification rule for this problem_ Sketch the estimatcd class-conditional densitics and show your decision boundary on the plot Make sure you labcl the axes and indicate the numcrical valuc for the boundary; let'$ call it Suppose the estimates were in fact obtained from 100 training points among which 40 were from class and 60 wcrc from class Suppose now you will estimate class priors from data, repeat all the calculations in part and obtain new boundary valuc, let s call it : Without actually doing this would you be able to tell whether will be thc same as less than Or greater than is there no way to tell? Explain your answer without calculating Note: It s ok to recheck your answer once YOu have actually calculate in part (c); but your explanation must not involve the numcrical valuc Now calculate the new boundary value € described in Part (6). Suppose in addition to the pooled covariance value &2 now tell you the individual class specific covariances wCrC estimated = 0.25 and 63 =15. Based on this nw information_ would yoU recommend using LDA or QDA and why? Derive the QDA rule for purt (d) , assuming equal class priors.


Answers
To complete this exercise you need a software package that allows you to generate data from the uni-
form and normal distributions.
$(i)$ Start by generating 500 observations on $x_{r}$ -the explanatory variable - from the uniform dis-
tribution with range $[0,10]$ . (Most statistical packages have a command for the Uniform( $0,1 )$
distribution; just multiply those observations by $10 .$ ) What are the sample mean and sample
standard deviation of the $x_{i}$ ?
$(ii)$ Randomly generate 500 errors, $u_{i},$ from the Normal(0,36) distribution. (If you generate a
Normal( $(0,1),$ as is commonly available, simply multiply the outcomes by six.) Is the sample ave
erage of the $u_{i}$ exactly zero? Why or why not? What is the sample standard deviation of the $u_{i} ?_{i} ?$
$\begin{aligned} \text { (iii) Now generate the } y_{i} \text { as } & \\ & y_{i}=1+2 x_{i}+u_{i} \equiv \beta_{0}+\beta_{1} x_{i}+u_{i} \end{aligned}$
$\begin{array}{l}{\text { that is, the population intercept is one and the population slope is two. Use the data to run the }} \\ {\text { regression of } y_{i} \text { on } x_{i} \text { . What are your estimates of the intercept and slope? Are they equal to the }} \\ {\text { population values in the above equation? Explain. }}\end{array}$
$\begin{array}{l}{\text { (iv) Obtain the OLS residuals, } \hat{u}_{i} \text { , and verify that equation }(2.60) \text { holds (subject to rounding error) }} \\ {\text { (v) Compute the same quantities in equation }(2.60) \text { but use the errors } u_{i} \text { in place of the residuals. }} \\ {\text { Now what do you conclude? }}\end{array}$
$\begin{array}{l}{\text { (vi) Repeat parts (i), (ii), and (iii) with a new sample of data, starting with generating the } x_{F} \text { Now }} \\ {\text { what do you obtain for } \hat{\beta}_{0} \text { and } \hat{\beta}_{1} ? \text { Why are these different from what you obtained in part (iii)? }}\end{array}$
Part one. The Poland L s estimate of beta one oh is 0.36 zero. If the change in concentrate concentration is 0.1, then the change in the log of fair would be Beijing one head times the change in concentration and that would be 0.36 times 0.1, which is 0.36 That implies airfare is estimated to be about 3.6% higher. Part two, The 95% confidence interval obtain using the usual L s standard error is 0.301 2.419 And if we use the fully robust standard Iran's we will get point 245 and 2450.475 which is wider than the one above. The wider confidence interval is appropriate as the neglected serial correlation introduced uncertainty into our parameter estimation. Yeah, Part three. The quadratic has a use shape form, and the turning point is calculated by mhm taking partial derivative of lock of airfare with respect to lock of distance. And you will set that derivative equal zero. You wouldn't be able to find the value of lack of distance where the slope becomes positive, sir. the value of a lot of distance at the turning point is you will take 0.902 divided by two times 20.103 and you can get 4.38 This is the lock of distance, sir. When you convert it back, the value of distance is exponential of 4.38 Okay, about 80. And the shortest distance in the sample is 95 miles. So the turning point is outside the range of the data, which is a good thing in this case, what is being captured in an increasing elasticity affair with respect your distance As fare increases hard for the random effect, estimate of data one is 10.209 which is a bit smaller than the parent LS estimate. This estimate still implies a positive relationship between fair and concentration. The estimate is also very significant, with a T statistic of 7.88 Part five. The fixed effects estimate of beta one is 10.169 which is lower but not so different from the random effect estimate. And this is so because the value of, um, a perimeter in Equation 11 equation 14.11. Yeah, let's say it's, um, Fate. A hat. The Fed ahead is about 0.9, so random effects and fixed effects as meats are fairly similar. Remember, random effect uses a quasi demeaning. That depends on the estimate of this fada, I suggest in equation 14.11. Hard six. Heterogeneous effect. A supply could capture two types of factors that might correlate with concentration. Variable mhm. First, it could be factors about cities mhm near the two airports, for example, population, education level and type of employers. These factors could affect the demand for air travel, and the second set of factors could be factors relate you geographical features and infrastructure condition, such as highway qualities and whether the city locates near a river. So these factors are able to change over time. But in a short time period, let's say, um, the length of the time study in their sample. They are roughly time constant course, Yeah, and so they are able to be captured by a sub I. There are various factors like that, and it's better if we are able to control for them. So in part seven, it is more appropriate to choose to fix effect, estimate
What we are given the following data points X. Y. Listed at the top of this white board. And we want to use that answer information to answer the following six questions A through F. As follows. First in part A on the left, we want to produce a scatter plot of this data. We've already done so with the scattered provided right below and the data points marked with X's or crosses next. We want to compute the sums and the correlation coefficient are on the right. I've already listed the sums out their computers simply by following the formula sum of all X values, some of our Y values and so on. The correlation coefficient. R. Is given by this formula which makes use of the sample size and and the sounds we just computed, plugging in these values, we get articles 0.9 98. Next part C. We want to find the X. Meanwhile, I mean and the constant related to the equation line of best fit. So exciting. And what I mean are simply given as follows. Remember that the being a value they're given by the following formulas. He takes his input and the sums. It's very similar to the correlation coefficient are plugging in. We get the equals 4.509 and plugging in Wiebe R. E. And explode at a gives 33.696 Guest we have our equation for the line of best fit. White hot equals 33.6964 point 509 X. Next we want to plot this. Why had onto our scatter plot, Making sure to include our expire and are Y. Bar we do sell it as is observable here. Next let's calculate R squared and interpret so R squared to simply 0.9954 That means that approximately 99.54% of our data can be rather 99.54% of variations in our data can be explained to the data itself, and roughly half of a percent cannot be explained. Finally, for F we want to project Why, for X equals 12. Using Ry had equation. Doing so, we obtain 87.804.
This is the result for Part one. We use the full sample, which has 177 observations. From this estimation, we obtain the student ized residuals and we call them as t r. Supply. The number of student ized residuals, which are above 1.96 in absolute value, is nine. If the student ized residuals were independently drawn from a standard normal distribution, we would expect about by percent of our sample. Sir, 177 times 5%. You will get a number between eight and nine. It is 8.56 something we would expect between eight and nine cases of student ties residuals to be above two. It is so because in a standard normal distribution, about 95.5% of the observations are within two standard errors off standard deviation are within two standard deviation. And in a standard normal distribution, the standard deviation is one. So 95.5% of the observation are within two equivalently. It means 5% up to 5% of the observations are either above two or less than minus two. That's why we have the 5% number here and as just say, um, right here to be above two in absolute value, you can check. You can fact check this statement and you will find that there are eight observation with student ties. Residuals above two in absolute value for three. The student ties residuals are used to detect are liars. We will drop. There are liars, which are defined as observation with student ties. Residuals above 1.9 16 absolute value so we wouldn't drop there NYT, cases we find out in Part two, we will re estimate the model in part one again using 169 observations. This is the result. Compare with the regression. In Part one, we find that the main coefficient become significant at the 1% level. Let me come back to part one, so the first one lakh of sales is significant at the 1% level I windy note with three stars. Lakoff M. Kate Evil is significant at the 5% level, so two Stars CEO 10. Thank you significant at the 1% level, and CEO 10 square is significant at the 5% level. Back Thio, Part three, lock of sales is significant at 1% level. Still lock of em Katie Value before it was significant at the 5% level. Now it is significant. At the 1% level, nothing changed in terms of significance level for C E. 0. 10 and for CEO 10 square. Okay, nothing changed. It is still significant at the 5% level. Yeah, so we have beta head of lock of sales Mhm and CEO 10. They have the same level of significance. The exactly value is different, but not too substantial to give them a new level of significance. The estimates on em, Katie Value increase in magnitude and significance level. Okay, You may also notice that the magnitude of the estimates on sales and CEO 10 decrease, but not too much. And the coefficient of CEO 10 square does not change in terms of magnitude. Now we will use least absolute deviation to estimate the regression in part One again, this is part four. We re news all the data Here is the result. The l. A. D method is, um, estimated with with a different, um methodology. So it doesn't have the are square. It is estimated by maximum likelihood So to measure the fit of the model, you will need to look at their results generated by their statistical software. And you will look at the lock likelihood value. I would not report that here because we don't care about the fit of the model in this problem, we care about the estimates of their explanatory variables. Compare this regression with previous regression where we use l s. We see that beta one. The coefficient of lack of cells is closer to that of the restricted sample and the restricted. Simple is the regression. In part three, where we drop the outlier observations. We don't have the same observation for beta three beta three hat. The coefficient of CO 10 is actually closer to the estimate in the full sample. Even these results part five, we will be able to evaluate this statement dropping our lawyers based on extreme values of student ties. Residuals makes the resulting L s estimates closer to the L A G estimates on the phone sample. This statement is not always true. It is not true to every estimate
We want to use the sample of data points X. Y. Listed at the top of this white board to answer the following questions. A through F. We go through them one by one together. Now to start off with on the left for part A. We want to produce a scatter plot of these data points. You'll see that I already included a scatter plot right below where the data points X. Y. Are marketed by black crosses are black accents next to the right for part B. We want to compute the relevant sums and the correlation coefficient. R. I've already included the values of the sums here, where they are determined simply by following the equations exactly. So some X. Is some of the individual X values, some X squared. Is some of the individual X value squared and so on. Our is given by the formula that as follows. It takes as input sample size and and the something just computed as you can see, Plugging in, we get our equals .8351 next. We want to find the equation of the line of best fit for the sample data. To do so we have to find the mean of our x and Y values as well as the parameters for that best foot line first or X and Y means are the some of the X values over M X Y equals 16.65 and same for why Why articles 80 next. We find the parameters for the best fit line. The slope B is given by the equation here, which we know is incredibly similar to be our equation. It takes as input our sample size and and the majority of the summer's just computed plugging in. We get B equals 3.291 for the slope and then plug in B Y bar and X bar into our equation on the right gives intercept 25.232 meaning we have a line of best fit why a hat equals 25.232 plus 3.291 X. Next we return to the left of the plot or right of the scatter plot, complete part D which is plotting ry hat We make sure that our plot of white hat includes are expiring ry bar as we've done here. Next in part in the bottom. Right, let's calculate our coefficient of determination R squared which is simply the square of our correlation coefficient. And interpret it reem r squared equals 0.6974 We interpret this to mean roughly 70 of our variation in the data can be explained by the corresponding variation in X and the least squares line accordingly. 30 of our data or rather the variation on our data remains unexplained. Finally, the bottom we want to protect, why, for x equals 19. Plugging into white hat simply gives y equals 87.761. Yeah, yeah.