Question
Refer to the Baseball 2018 data, which report information on the 30 Major League B season. Let the number of games won be the dependent variable and the following variables= team batting average; team earned run average (ERA); number of home in the American or the National League:Click here for the Excel Data File0-1. Develop a correlation matrix (Negative amounts should be indicated by & ming decimal places )WinsERABAHRWins ERA BA HRa-2. Which independent variables have strong or weak corre
Refer to the Baseball 2018 data, which report information on the 30 Major League B season. Let the number of games won be the dependent variable and the following variables= team batting average; team earned run average (ERA); number of home in the American or the National League: Click here for the Excel Data File 0-1. Develop a correlation matrix (Negative amounts should be indicated by & ming decimal places ) Wins ERA BA HR Wins ERA BA HR a-2. Which independent variables have strong or weak correlations with the depend


Answers
American League baseball games are played under the designated hitter rule, meaning that pitchers, often weak hitters, do not come to bat. Baseball owners believe that the designated hitter rule means more runs scored, which in turn means higher attendance. Is there evidence that more fans attend games if the teams score more runs? Data collected from American League games during the 2010 season indicate a correlation of 0.667 between runs scored and the number of people at the game. (mlb.mlb.com)
a) Does the scatterplot indicate that it's appropriate to calculate a correlation? Explain.
b) Describe the association between attendance and runs scored.
c) Does this association prove that the owners are right that more fans will come to games if the teams score more runs?
So we can say yes, because there's a medium linear relationship. It's not strong, it's not weak, it's definitely linear. And as run scores increased, there's an increase in attendance. And no, I do not think that.
Okay, So here were given data about batting averages, a number of wins for the baseball league, and we are asked to determine whether or not there's a strong linear relationship that exists between the number of wins and the team's batting averages. Using the sample of the data right here, we can just insert each of these X and Y variables into our stat edit on our graphing calculators. Uh, going to our calculate. So on my t I 84 I do this by once. It's entered into my l one and L two go into stat scroll over to elk and down to number four Linear regression being certain that your stat diagnostics are turned on. That way we can get our correlation of determination, our coefficient of determination and coefficient of correlation. Uh, in doing this, we end up finding that we have a linear function equal to 0.319 x plus 2 33 0.918 However, we also have a, uh, coefficient of correlation, which is only equal to 0.537 Generally, if we have a strong linear relationship, this should be greater than or equal to 0.7, which is not so. We can conclude from here that we actually do not have a strong linear relationship within this data, Yes.
All right, I'm going to use Microsoft Excel for this question. Um Let's see here. So we've got National League and American League and we've got a range 2.24-2.246. Um but it's different between National American American League. So um let me put in the numbers .24-2.246 uh .247 2.251 0.25 to 2.256 point 2572 point 261 .262 groups, 2.266. 2.271. So that's for National League. And I'm going to put en el above it. I'm gonna do the same thing for american League A. L .244, two 0255 256261 two 62 67 .268 273.274, Okay, so now I'm just going to take the average of each of the bins. That's going to give me an average here. And I'm just gonna pop you that and paste it down here. Okay, now there's a number next to each one, three players, 6 players, one player 11 11, 1, 362130. I haven't mentioned it. Okay, So now I want to take um the first column times the second column. Okay. And I'll just drag that the whole way down. Okay, I'm gonna some this So that means there's 33 players there and some this we just copied from above And 15 players here. Okay, so um what did I do with this? So I multiplied by three. Okay, this is going to give me the mean, because I'm going to some this. Okay, And then I'm going to divide by the 33, so the mean Is 25, 7 makes sense. And then down here, so I'm going to write mean underneath it. Give myself one more spot here, right, mean underneath it, and this All right, Count how many there are. Yeah, Okay, now the difference from the meat is going to equal this, minus The mean, and I'm going to press F 4 to keep that there. Oh, I must have clicked the word mean, I wanted it to be E seven, No E eight, Yeah, Okay, so distance from the mean. Um I'm gonna square that distance. It's gonna drag it the whole way down up. No, not gonna drag it to here. Now I'm gonna copy this, Paste it down here. This will be the distance from the mean here, which is at 17 doesn't seem right? Oh, squared I squared it. All right, so there's that's the square of the deviations, square of the deviations. Now, I'm going to take that and I'm going to multiply it by how many there are, mm. Just copy that may sit down here. Okay, so now I need oh yeah, this will be fine because this one ends up being zero anyway, because there are zero of them. Okay, so now I'm going to some this and I could just copy this right here. Not exactly some of the square of the deviations, and then I'll type it down here, some of the square of the deviations. So since this is a sample, I take the sum of the squares of the deviations, which is here, the sum of the squares of the deviations. Let me think here, yep, which is this? And then I have to divide by N -1, which is the count minus one, that's the variance. So this divide by the count, nope, abide by the count minus one. That's the variance. Okay, and then, so I'm gonna write variants here and then the standard deviation yeah, is just going to be the square root of the various. Okay, so we see that in this sample there's a greater standard deviation in the american league, but that's probably because there were fewer uh chosen. Okay, uh compare the results, that's what I just did. Um It's a little bit greater in the american league, thank you for watching
Alright, so given our table we can use the help of maple. So we're gonna put in um what we see here into maple. And then we get A. And B. As we see here. Therefore um our final model is given as this 692.8121333 times 1.94533 65 to the X. So um we have that X. Here represents the number of years since the year 1990. Therefore we have that X. Is going to be equal to while 2000 and um yeah, Well 2020 -1990. Which is going to be equal The 30. So therefore we just put in 30 here in for X. And we get that y. 2020 Is going to be equal to. Well this number here, right, just putting in 30 for X. And we get this is going to be Um 10,000 410.9136 About so therefore it's gonna be approximately 10,411. Um and again this is in thousands of dollars. So 10,410,000 $1000. All right. Yeah. Yeah. And then for part B. When I calculate the average salary in the year 2000 using our model. um so now um we just go ahead and put in um 10 for X. And we have that why of 2000 Um is going to be approximately equal to 1710. And again this is in thousands of dollars. So. Okay. No. Yeah. All right. Um so therefore the actual average salary in the year 2000 is going to be um $1,984,000. So therefore the actual average salary um is going to be greater than the calculated average salary that we have here. Yeah. And to uh then calculate the average salary in the year 2020, we have that Y of 2020 is going to be equal to again in putting 30 into our model. So we have here is going to be equal to um I want $10,721,000. The people. No. And we see that the average salary in the year 2020 is going to rise after the inclusion of the data of the year 2000, which is $19 or $1,984,000.