5

Benfords law states that the probability distribution of the first digits of many items (e.g. populations and expenses) is not uniform, but has the probabilities sh...

Question

Benfords law states that the probability distribution of the first digits of many items (e.g. populations and expenses) is not uniform, but has the probabilities shown in this table_ Business expenses tend to follow Benfords Law, because there are generally more small expenses than large expenses Perform Goodness of Fit" Chi-Squared hypothesis test (a 0.05) to see if these values are consistent with Benfords Law If they are not consistent, it there might be embezzelment _ Complete this tabl

Benfords law states that the probability distribution of the first digits of many items (e.g. populations and expenses) is not uniform, but has the probabilities shown in this table_ Business expenses tend to follow Benfords Law, because there are generally more small expenses than large expenses Perform Goodness of Fit" Chi-Squared hypothesis test (a 0.05) to see if these values are consistent with Benfords Law If they are not consistent, it there might be embezzelment _ Complete this table The sum of the observed frequencies is 138 Observed Expected Benford's Frequency Frequency Law P(X) (Counts) (Counts) 301 176 125 097 079 067 058 051 046 Report all answers accurate to three decimal places_ What is the chi-square test-statistic for this data? (Report answer accurate to three decimal places.) x? What is the P-value for this sample? (Report answer accurate to decimal places:) P-value The P-value less than (or equal to) greater than This P-Value leads to decision to:- reject the null hypothesis fail to reject the null hypothesis As such, the final conclusion is that There is sufficient evidence to warrant rejection of the claim that these expenses are consistent with Benfords Law: There is not sufficient evidence to warrant rejection of the claim that these expenses are consistent with Benfords Law: -



Answers

Benford’s law Faked numbers in tax returns, invoices, or expense account claims often display patterns that aren’t present in legitimate records. Some patterns are obvious and easily avoided by a clever crook. Others are more subtle. It is a striking fact that the first digits of numbers in legitimate records often follow a model known as Benford’s law.3 Call the first digit of a randomly chosen record X for short. Benford’s law gives this probability model for X (note that a first digit can’t be 0):
(a) Are these data inconsistent with Benford’s law? Carry out an appropriate test at the A
0.05 level to support your answer. If you find a significant result, perform a follow-up analysis.
(b) Describe a Type I error and a Type II error in this setting, and give a possible consequence of each. Which do you think is more serious?

Okay, So what do we have in this question? Now? Ben Ford's law states that the first non zero digits off numbers Ronald random from a large, complex data file have a certain type of a probability distribution, and it has been given to us. Oh, kit. So that is his draw. The stable. This is very important. So this is going to be first non zero digit first, non zero did it All right. Now, the probability according to Ben Ford's law. So this is going to be the probability. As for Ben Fords law, Okay. And then we're going to have the sample frequency. Let us call these the observed values. The observed values. Okay, so, non zero digits, these rains all the way from 1 to 9. So this is going to be a long table. So this is 1234 567 eat and nine. Okay. Now, what are the probabilities that are given to us According to this law, zero point 301 0.301 Then this is 0.176 Then this is 0.125 Then this is 0.97 This is 0.7 nine. 0.670 point 058 0.51 and then we have 0.46 Yes. Okay. This thing is disturbing as well. Again, It's not pay attention to that. Because if I refresh the page, it'll go. I left to right. All of this again. So what were the observed values? The argument was 83 49 30 to 22 83 49 32 20 to 25 18, 13, 25 18, 13. Then we have 17 and 16, 17 and 16. Okay, these are the observed values. Now, this is all the data that is given to us. And this total sample size is 2 75. Mr. Oral, sample size is 2 75. All right, now, what is the question they're saying? Use a 1% level of significance to test the claim that the distribution, our first non zero visits in this accounting file, follows the band Ford's Law. Okay, So the first point is that our Alfa 0.1 our Alfa is 0.1 What is our null hypothesis are null hypothesis is going to be that the distribution, the distribution off first, non zero desserts off first, non zero. Okay, digit in the accounting I am file follows the band. Ford's law doesn't mean that the distributions are basically the same. Follows the Ben Fords a lot. What was what is going to be the alternative hypothesis? The alternative hypothesis will be that the distribution off first non zero digits in the accounting file in the accounting file doesn't follow the band. Ford's law. Okay, now, in order to as this plane, we're going to perform ah Chi Square test and one of the first step to performing a chi Square test, it is finding the expected values finding the expected values for all the categories in the formula. For that is the formula to find this between values is sample size, the sample size that we have multiplied by the probability of the proportion off each category multiplied by the probability or the proportion off each category off each category. So Okay, so let me just put this formula interaction over you. This table will give us the expected values. This table is going to give us the expected values. Okay. All right. What is our sample size? It is to so many faith. So what is going to be the expected value for one? The expected value in this case will be to 75 multiplied by 0.301 82.775 82.775 Then for the next one is to 75 multiplied by 0.176 This is 48.4 48.4. Then it is to 75 multiplied by 0.125 34 point 375 Then we have 2. 75 multiplied by 0.97 This is 26.675 26.67 Faith. Then this is to 75 multiple of a 0.79 This is 21.725 21.72 Faith. Okay, then it is to 75 multiplied by 0.67 This is 18.4 to 5. 18.4 to five. Then there's 2 75 multiplied by 0.58 This is 15.95 15.95 Yeah. Then this is 0.51 multiplied by who's 35 that has 14.25 14.25 And then we have 0.46 multiplied by 2. 75. This is 12 0.65 These are the expected values. What is the next step? We're going to calculate the individual chi square values for all of these categories. Okay, so how is this given? Well, the formula to find the individual keister values is find the difference between the observed and the expected values square it divided by the expected value. Now, once you find all the individual values, you will some them all up, and it will give you the overall Caires questions. Stick for the problem. Let us look at this formula in action. So we're here for the first category. The difference is going to be between 83 on 82.775 We're going to square this and divide this by the expected value that is 82.775 So this is 0.6 So let me just write this zero. Then we have difference between 49 48.4. We square this and divide this by 48.4. This is 0.7 0.7 Then the difference between 34.375 and 32. We square this and divide this by 34 point 375 This is 0.16 0.16 This is the difference between 26.675 and 22. We square this 4.675 and divide this by 26 point six. Certain faith. So this is almost 0.82 Then it is the difference between 25 21 point 7 to 5. We square this and divide this by 21.75 21 0.7 to fight. This is 0.493 0.493 Okay, then we have a difference between 18 point 4 to 5 and 18. So this is 180.4 25 We square there's and divided by 18.4 to 5. So this is 0.90 point 0098 So, Yeah, this is 0.98 Okay, then the difference between 15.95 and 13, we square this 2.95 and divide this by 15.95 This is 0.0 point 545 Then we have the difference between 17 minus 14.25 We square this and divide this by the expected well do 14 0.0 to fight 0.63 0.63 Then we have the difference. Between 16 on 12.65 We square this and divided by 12.65 and we get over here 0.89 Okay, Now we're going to add all of these up. So this is zero plus 0.7 plus 0.16 plus 0.82 plus 0.493 plus 0.98 plus 0.5 for five. 0.63 plus 0.89 This is the total edition comes out to 3.553 point 55 So I can say that my chi square for this entire question is 3.55 Now what else do I need in order to find the P value? Well, I need the degrees of freedom which is given by the formula number off categories, number off categories minus one. How many categories do I have? 123456789 So this is going to be nine minus one, or I can write this as eight. All right, So I have my chi square statistic, and I also have my degrees of freedom. Now what I'm going to use is I'm going to use an online too, to get my p value. So my chi square statistic is 3.55 My visa freedom is eight. My level of significant if I look at the question is 1%. So this is 0.1 I have calculated and I find that my P value is 0.89 My people live is 0.895 Okay, it is 0.895 What was my Alfa? My Alfa was 0.1 So I can see that my P value is much greater than my Alfa hence, I will say that I fail to reject my null hypothesis. H not. This means What do I say? I say that I do not have enough statistical evidence. Studies tickle evidence to suggest that to suggest that what was the wording? To suggest that the distribution off first, non zero digits, that the distribution off first non zero digits in this accounting file in this accounting fine in this accounting file does not follow does not follow Ben Ford's lock When Ford's No right, what was the alternative hypothesis? It waas That doesn't follow the went for lawyer. So I do not have enough statistical evidence to suggest that the distribution of first non zero deserts in this accounting file does not follow Ben Folds Law, and that's how we go about doing this question.

So this problem has several parts to it. First thing I need to do is I need to identify the significance level and from the story problem at 0.01. My null hypothesis is that P is indeed 0.301. And then my alternate is that P is actually greater than We're told. We have a sample size of 228. R equals 92. and again the probability is 0.3 01. So that means Q is 0.699. This will be a standard normal distribution. Just checking his NP greater than five. And yes, it is because 228 times 0.301 is indeed greater than five because it is Approximately 68.6. And then when I test end times Q 2 28 times 0.699. that gives me an approximate value of 1594. And that is indeed greater than five. Now I need to find P hat and because we're doing standards normal, I'm looking for the Z value. I'm not going to do this by hand. I'm going to use technology so I've already got it done here. But let me work through it with you stat Test. This is a one proportion z test. So number five The probability piece of 0.301. My ex this goes with the r value that was 92. And my sample sizes 2 28. And we want to test greater than so I'm going to click on calculate and there's the information I need so I got this on the other screen so my P hat is approximately point 40 and then rounding let's say we'll go three places and then my Z value is 3.37 So if I'm drawing this on the curve Here's 3.37 and I'm shading to the right and then here's my P. Value. So for part C. I need to check is my P. Value less than greater than or equal to α. So this either the -4 means it's a very small number. That's a form of scientific notation. So zero. Then the six will make the three round up. So .0004. And is that greater than less than or equal to? It's less than. So that means we will reject the null hypothesis. And then what does that mean in this in context of the problem? So at the 1% significance The sample data indicates that the proportion of numbers in the revenue file with a leading digit of one exceeds zero 301 So at the 1% significance level the sample. So we reject them all. Okay so let's interpret that If p. is in fact larger than 0.301. What does that tell you? Does it seem that there are too many numbers in the file with leading ones? And the answer would be yes. I guess I could have take this one out too. Huh? It indicates There are too many numbers leading with one. So what does that mean? As far as I could this indicate that the books have been cooked, chances are you're not writing numbers that are too big. So it could be that there was an error somewhere. Um The IRS of course should investigate more because it is very unusual. Could be some kind of as it says in the book. Could be profit skimming. So they take The extra and then right in the books one. But the bottom line is the FBI should investigate. So it could be a mistake. Or it could be that somebody is actually writing lower numbers in than what are actually true. And then finally, what does it mean to reject the null in this situation? So it's really important to know that by rejecting the nol, we have haven't actually approved then all to be false. The data did lead us. So that indicates that too many numbers start with one. So more investigation is needed.

So in this particular problem were asked to do several things. So first of all, we need to identify Alpha, the significance level is 1% 0.01. And we know that the null hypothesis is that P is 0.301. And then the alternate hypothesis would be that P is less than now to figure out what kind of distribution it is. It's going to be standard, normal. And a quick check Is N. Times P. Greater than five. And indeed it is. So to 15 time zero point 301 is indeed greater than five because it equals 64.7 about And then his end times Q Greater than five. So if P is 301, Then 1 -301 will give us Q. So Q will be 0.699. So when I multiply those together To 15 times 0.699, I get approximately 150. So yes, that is indeed greater than five. Now I need to find the test statistic which in this case will be P hat and they're asking us to find the Z value. So I'm going to do this all at once with my calculator. Probably got it worked out but let me walk through it with you. So stat tests. This is a one proportion Z test. So number five. Now the probability of success is .301. We're told that our is 46. So in the calculator that's the x the total is population is 215. Were testing if it's actually a less than cursor down to calculate and the information they need. It's all right there. So I have this on the other screen. So P hat Is .21 depending on rounding here 214 and Z is -2.78. And in this particular situation I'm also given the p. value Which is this one. I know you got all these peas, you got the little P. For probability and you've got row and you've got your P. Value 27 So that's part C. Were asked for to find the P value. So if we're if we're actually shading this then we make our normal curve And here's negative 2.78. And I'm shading to the left. So that's if we had to shade it. Now I need to check this. So is this P value less than greater than or equal to my significance level? And I can see that my P value is less than or equal to. So that means I need to reject the no and then how do I write that out? How do I explain that? I would say something like At the 1% level of significance. The sample data indicate that the population proportion in the revenue is less Than 0.301. So this next part is asking for your opinion. And if you're doing a multiple choice question, the answer might be a little bit different than what I worded here. But This indicates the fact that P is in fact if he is in fact less than 0.31, Then that indicates that there are not enough numbers that start with one. So, yes. So what does that mean as a stockholder? Well, as a stockholder, that could mean that the value of your stock is inflated for the FBI. That might be a red flag to investigate. Because, according to Bedford's law, there should be a certain amount of Values that start with one. So for a stockholder it could mean that your stock is not worth as much as you think it is. And then for FBI this could be an indication that there's fraud. Now. Finally, just because we reject the null hypothesis doesn't mean that we have proved anything. So we did not prove H. Of zero, which was the fact that the probability should be this. Mhm. All we did was take some sample data and because the sample data let us to reject the null, then there could be too few numbers with leading digits of one. So you need to investigate more. So it's not an indication that this is actually false, but it's an indication that more investigation needs to be done. Maybe another sample or maybe a larger sample.

All right. So the problem. We're pulling invoices, and we know the probability of getting eight or nine as first judge's 90.97 and asked two questions. First is how many invoices will be expected pull before gain or nine. In other words, what's expected value for geometric, Siri's or for a dream? I could probably just one overpay or one over 10.97 which is approximately 10.3. Then we won't know it took 40 invoices to get in eight or nine. We won't know if that's unreasonable. So we're going to see what portion of the probability is given for getting really large draws like 40 41 etcetera. There was one other. Probably the X is greater than or equal of 40 which is going to be one minus The probability that X is less than 40 which is going to equal one minus one minus one minus p. And I'll put that in a minute to the ex because call that this This is my CDF of a geometric. And what is that really equal? Well, that's gonna be one time soon, Right? One minus one plus one minus point 097 And what is X will for going less than 40? Since we're doing imagers because you can't have happen invoiced, I'd be 30 nine and that is the equal one minus one goes way. And so he's in my calculator point hoops over at the one minus point approximately point, 018 seven, or about 1.87%. So very unlikely. So we would expect fraud, because that's suspicious probability. So that means the numbers are probably fraudulent. Yeah, we're done.


Similar Solved Questions

5 answers
Score: 0 of 1 pt5.7.11Use ' the given area t0 find the height and base of the 4=124 triangleThe height of the triangle(Type an integer or a simplified fraction )Emter your answerin tne answer box and paltt Inen click Check Teinaining Answer
Score: 0 of 1 pt 5.7.11 Use ' the given area t0 find the height and base of the 4=124 triangle The height of the triangle (Type an integer or a simplified fraction ) Emter your answerin tne answer box and paltt Inen click Check Teinaining Answer...
5 answers
Exercise 3 The number of stops X day for delivery truck: driver is Poisson with mean Conditional on their being X stops the expected distance driven by the driver Y Normal with mean of a miles, and standard deviation of Bx miles: Give the mean and variance of the numbers of miles she drives per day:
Exercise 3 The number of stops X day for delivery truck: driver is Poisson with mean Conditional on their being X stops the expected distance driven by the driver Y Normal with mean of a miles, and standard deviation of Bx miles: Give the mean and variance of the numbers of miles she drives per da...
5 answers
Question 44What is the specific term for a particular version of a gene?Previous
Question 44 What is the specific term for a particular version of a gene? Previous...
5 answers
Aqueous hydrochloric acid (HCI) will react with solid sodium hydroxide (NaOH) to produce aqueous sodium chloride (NaCl) and liquid water (Hzo) Suppose 31.7 g of hydrochloric acid is mixed with 65 of sodium hydroxide_ Calculate the minimum mass of hydrochloric acid that could be left over by the chemical reaction. Be sure your answer has the correct number of significant digits:
Aqueous hydrochloric acid (HCI) will react with solid sodium hydroxide (NaOH) to produce aqueous sodium chloride (NaCl) and liquid water (Hzo) Suppose 31.7 g of hydrochloric acid is mixed with 65 of sodium hydroxide_ Calculate the minimum mass of hydrochloric acid that could be left over by the chem...
4 answers
8 6 8 8 8 8 8 3 8 8 Your answ 2 [TorineevencprobJb P(4-0.12, 1 1 Fn UO:0 7 1
8 6 8 8 8 8 8 3 8 8 Your answ 2 [ Torine evencprobJb P(4-0.12, 1 1 Fn UO:0 7 1...
5 answers
8 [15 marks] Using (t) Laplace cos(v transforms 3 dv = solve 241 2 integro differential equation
8 [15 marks] Using (t) Laplace cos(v transforms 3 dv = solve 241 2 integro differential equation...
5 answers
Find the magnitudes of the horizontal and vertical components for Ihe vector if a is the direction angle ol v from Ihe horizontal:I = 299The magnitude of the horizontal component of v is (Rorn Io the rearest integer as needed )The Miagri je of Ihe vertical component of v is (Round t0 the nearest integer as noedod )
Find the magnitudes of the horizontal and vertical components for Ihe vector if a is the direction angle ol v from Ihe horizontal: I = 299 The magnitude of the horizontal component of v is (Rorn Io the rearest integer as needed ) The Miagri je of Ihe vertical component of v is (Round t0 the nearest ...
4 answers
Now remove the point with coordinates (10, 10) and repeat parts and (b). d. What do you conclude about the possible effect from a single pair of values
Now remove the point with coordinates (10, 10) and repeat parts and (b). d. What do you conclude about the possible effect from a single pair of values...
4 answers
In Exercises $7-12,$ use the leading-term test and your knowledge of $y$ -intercepts to match the function with one of the graphs $(a)-(f),$ which follow.A. GRAPH NOT COPYB. GRAPH NOT COPYC. GRAPH NOT COPYD. GRAPH NOT COPYE. GRAPH NOT COPYF. GRAPH NOT COPY$$f(x)=x^{5}-x^{4}+x^{2}+4$$
In Exercises $7-12,$ use the leading-term test and your knowledge of $y$ -intercepts to match the function with one of the graphs $(a)-(f),$ which follow. A. GRAPH NOT COPY B. GRAPH NOT COPY C. GRAPH NOT COPY D. GRAPH NOT COPY E. GRAPH NOT COPY F. GRAPH NOT COPY $$f(x)=x^{5}-x^{4}+x^{2}+4$$...
5 answers
Use the shell mathod to find the volume of the solid genarated by revolving Ihe region bounded by the given curves and linas about Ihe *-axis. x=8-Y,x=y,y=032116r
Use the shell mathod to find the volume of the solid genarated by revolving Ihe region bounded by the given curves and linas about Ihe *-axis. x=8-Y,x=y,y=0 321 16r...
5 answers
EnteredAnswer PreviewResult6.108613385263533 cos(1) + 4 sin(1) +incorrectThe answer above is NOT correct;(1 point) Evaluate the line integral IcF 0 <0<1.where F(x, Y, 2) = 3 sin xi + 4c0S yj + Sxzk and C is given by the vector function r(t) = fi- rj +pk_3cos(1)+4sin(1)+48/11Preview My AnswersSubmit Answers
Entered Answer Preview Result 6.10861338526353 3 cos(1) + 4 sin(1) + incorrect The answer above is NOT correct; (1 point) Evaluate the line integral IcF 0 <0<1. where F(x, Y, 2) = 3 sin xi + 4c0S yj + Sxzk and C is given by the vector function r(t) = fi- rj +pk_ 3cos(1)+4sin(1)+48/11 Preview M...
5 answers
Find the domain:g(x) = 44+5xThe domain is (Type your answer in interval notatlon: Use Integers or fractions for any numbers in the expression )Show your work beloi
Find the domain: g(x) = 44+5x The domain is (Type your answer in interval notatlon: Use Integers or fractions for any numbers in the expression ) Show your work beloi...
5 answers
Encuentre Ia serie de Taylor para f (2) = 1' 32? + 1 , centrada en a = 1.
Encuentre Ia serie de Taylor para f (2) = 1' 32? + 1 , centrada en a = 1....
5 answers
1. (10 points) Find the indefinite integral J 7arsdc.
1. (10 points) Find the indefinite integral J 7arsdc....

-- 0.018469--