Problem with RAND and RANDBETWEEN?

I

Iggles14

Version: 2008 Operating System: Mac OS X 10.6 (Snow Leopard) Processor: Intel I've got a simple spreadsheet set up to do a simulation of dice rolls. The formula I am using is this:

=RANDBETWEEN(1,6)

I have this formula repeated in four columns and 1000 rows for a total of 4000 trials. This should be more than enough for the central limit theorem to come into play. I would expect each number from 1 to 6 to occur with roughly equal frequency, say between 660-670 times.

In my last run, I had 647 occurrences of 2 and 696 of 6. Every time I have run the simulation, there has been a difference of at least 30 between the count for the most and least frequent roll, and the difference has been as high as 61.

Any ideas? Am I doing something wrong?
 
M

Mike Middleton

Iggles14 -
Any ideas? <

Search for "law of large numbers," not "central limit theorem."
Am I doing something wrong? <

Probably not.

But you've tried only 4,000 trials.

And you're already typically within 5% of the expected relative frequency.

- Mike
http://www.MikeMiddleton.com


Version: 2008 Operating System: Mac OS X 10.6 (Snow Leopard) Processor: Intel I've got a simple spreadsheet set up to do a simulation of dice rolls. The formula I am using is this:

=RANDBETWEEN(1,6)

I have this formula repeated in four columns and 1000 rows for a total of 4000 trials. This should be more than enough for the central limit theorem to come into play. I would expect each number from 1 to 6 to occur with roughly equal frequency, say between 660-670 times.

In my last run, I had 647 occurrences of 2 and 696 of 6. Every time I have run the simulation, there has been a difference of at least 30 between the count for the most and least frequent roll, and the difference has been as high as 61.

Any ideas? Am I doing something wrong?
 
I

Iggles14

Mike -- you're right about CLT vs. large numbers, of course. I was distracted because I am also trying to produce a distribution curve for when 2 dice are rolled. I would expect that with 3000 trials (6000 rolls) that the mean of that distribution would reliably be 7 (CLT). It is not. It is between 6.95 and 7.05 less than 50% of the time.

I just ran a series of trials for RANDBETWEEN (1,2), and even with 40,000 trials (a1:an1000), the error is almost always at least 0.1% (0.001) and is sometimes as high as 0.6%.

The general rule of thumb for a Monte Carlo trial is that 1000 trials should be plenty.
 
M

Mike Middleton

Iggles14 -

I suggest you learn about the "standard error of the mean," e.g., in
Wikipedia
http://en.wikipedia.org/wiki/Standard_error_(statistics)
or other sources, like a basic statistics book or web searches.

Although a Monte Carlo simulation of 1,000 trials may be satisfactory for
determining the general shape of a distribution, the appropriate number of
trials when estimating the mean depends on the confidence interval you wish
to construct. For example, to reduce the width of a specific confidence
interval in half, you need a sample size four times as large.

- Mike
http://www.MikeMiddleton.com




Mike -- you're right about CLT vs. large numbers, of course. I was
distracted because I am also trying to produce a distribution curve for when
2 dice are rolled. I would expect that with 3000 trials (6000 rolls) that
the mean of that distribution would reliably be 7 (CLT). It is not. It is
between 6.95 and 7.05 less than 50% of the time.

I just ran a series of trials for RANDBETWEEN (1,2), and even with 40,000
trials (a1:an1000), the error is almost always at least 0.1% (0.001) and is
sometimes as high as 0.6%.

The general rule of thumb for a Monte Carlo trial is that 1000 trials should
be plenty.
 
I

Iggles14

Mike and Bob -- I'm not dealing with samples or normal distributions. I'm trying to generate random numbers in a flat distribution.

I did some additional research, and it turns out that Excel generates &quot;pseudo-random&quot; numbers. Excel is not able to generate true random numbers that would duplicate coin flips or dice rolls. Nor is any computer. So it seems that my gut feel was correct.

I googled &quot;random number generator&quot; and found a couple of sites that use different algorithms to produce random number sets that are closer to true randomness.
 
M

Mike Middleton

Iggles14 -

This may be a matter of terminology, but I suggest that when you are using
RAND the way you described in your previous messages you are "sampling."

When you sample a total (sum of two dice) or a mean (based on two or more
random numbers) as you described, the central limit theorem applies and the
normal distribution becomes relevant.

There is variation from sample to sample, even with a sample size of 4,000
or 40,000, but that variation can be described using the standard error and
normal distribution.

As far as I know, the random number generator in Mac Excel 2008 is the same
one that was first implemented in Excel 2003, and it passes the standard
tests for randomness.

For more information, see the Microsoft Knowledge Base article 828795
"Description of the RAND function in Excel 2007 and in Excel 2003"

http://support.microsoft.com/kb/828795

- Mike
http://www.MikeMiddleton.com


Mike and Bob -- I'm not dealing with samples or normal distributions. I'm
trying to generate random numbers in a flat distribution.

I did some additional research, and it turns out that Excel generates
"pseudo-random" numbers. Excel is not able to generate true random numbers
that would duplicate coin flips or dice rolls. Nor is any computer. So it
seems that my gut feel was correct.

I googled "random number generator" and found a couple of sites that use
different algorithms to produce random number sets that are closer to true
randomness.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top