Advanced Probability.

H

HelpMePlease

Hello,
I have a set of data that is 14 cells. I want to find the probability
that the average of 5 randomly selected cells out of these 14 is over a
certain point. HOW?!
 
G

Gary''s Student

You would need to know something about the distribution of the values in the
14 cells. However, let's say you know nothing about the values:

You don't know how large or small they are.
You don't even know if there are repeated values in the 14 or not.

You can STILL get an estimate of the probability by performing a Monti Carlo
analysis.

The first step is to make data. Pick 5 at random, calculate the average and
store the result. Repeat this many, many times. Say we end up with data
from A1 thru A10000 (each cell containing an average)

Now knowing the MAX and MIN of the 10,000 sample we make bins for the data.

The third step is to add counts to the bins based upon the 10,000 values.
The bins represent a probability distribution.

The last step is to compare the reference value with the bin counts.
 
L

Luke

Gary''s Student said:
You would need to know something about the distribution of the values in the
14 cells. However, let's say you know nothing about the values:

You don't know how large or small they are.
You don't even know if there are repeated values in the 14 or not.

You can STILL get an estimate of the probability by performing a Monti Carlo
analysis.

The first step is to make data. Pick 5 at random, calculate the average and
store the result. Repeat this many, many times. Say we end up with data
from A1 thru A10000 (each cell containing an average)

Now knowing the MAX and MIN of the 10,000 sample we make bins for the data.

The third step is to add counts to the bins based upon the 10,000 values.
The bins represent a probability distribution.

The last step is to compare the reference value with the bin counts.

Just wondering if you could provide an example to the answer you provided
that would visually show an elaboration.

Not to hijack this post, but I made a post earlier "Subject:
Predict/Probability" that was, I think, similar but I got no response. I
provided a link to a sample xls file and I would love to finally see a good
visual make up.

Thank you now and well into the future,
Luke
 
G

Gary''s Student

Hi Luke:

Thanks for your interest. If you feel comfortable with VBA, then I will
demo a solution completely in VBA. All you would need to do is to paste the
macros into a fresh, new workbook. The macros would create the data on a
worksheet, run the Monti Carlo, prepare the binning and present the
probability distribution.

Check back tomorrow!!
 
M

MyVeryOwnSelf

I have a set of data that is 14 cells. I want to find the probability
Just wondering if you could provide an example to the answer you
provided that would visually show an elaboration.

I tried doing Monte Carlo this way.

Put the 14 original numbers in A1:N1.

Put the "certain point" in O1.

In A2 put
=RAND()
and extend to N2.

In A3 put
=--(RANK(A2,2:2)<6)
and extend to N3.

In P3 put
=--(SUMPRODUCT($A$1:$N$1,$A3:$N3)/5>$O$1)

In Q1 put
=AVERAGE(P:p)

Copy the pair of rows 2:3 and paste downward for thousands of rows.

The probability estimate is in Q1.
 
L

Luke

Very Nice, I will whatch for it.. I am just learning VBA, very light, but I
will be brave. I hope all this helps "HelpMePlease" as well.

Could you look at http://www.freefilehosting.net/download/3fg7a and see if
your VBA will apply to it? You can read about what I am looking for in my
post on 4/22/2008 Subject: "Predict/Probability"

Thank you
Luke
 
J

joeu2004

You can STILL get an estimate of the probability by performing
a Monti Carlo analysis.

That's Monte Carlo, with an "e", at least in English.
The first step is to make data.  Pick 5 at random, calculate the average
and store the result.  Repeat this many, many times.  Say we end up
with data from A1 thru A10000 (each cell containing an average)

If you are going to go to the trouble of generating that many 5-tuple
selections, you might as well generate all selections of 5 out of 14.
There are only 2002.
The third step is to add counts to the bins based upon the 10,000
values.  The bins represent a probability distribution.

Or simply sort the 2002 data (averages of 5)....
The last step is to compare the reference value with the bin counts.

....And count the number greater than the reference value.

(If you are going to store the 2002-data sorted table in a worksheet,
you could use LOOKUP.)

But I don't think that is a good general solution. It may or may not
be what the instructor is looking for, depending on what class this is
for.

I will post a separate posting with a more general statistical
approach, if someone else does not beat me to it. I don't have time
at the moment.

(I might even get around to posting the VB function to generate all
2002 combinations. It's quite straight-forward.)
 
J

joeu2004

PS....

You can STILL get an estimate of the probability by performing
a Monti Carlo analysis.
[....]
If you are going to go to the trouble of generating that many 5-tuple
selections, you might as well generate all selections of 5 out of 14.
There are only 2002.

Not to disparage "Gary's" approach or discourage him from posting the
VB Monte Carlo simulation. I, too, would be interested in seeing the
implementation. I struggle with VB because I use it so infrequently.
 
J

joeu2004

If you are going to go to the trouble of generating that many 5-tuple
selections, you might as well generate all selections of 5 out of 14.
There are only 2002.
[....]
Or simply sort the 2002 data (averages of 5)....
[....]
....And count the number greater than the reference value.

See the VB sub gendata below. It's a hack. Skilled VB programmers
might offer improvements.

Usage: create a new worksheet called "data". Then highlight the
range with the 14 numbers in the original worksheet; these must be
horizontal (i.e. part of a row). Then execute the macro. The
probability that the average of 5 random numbers out of the 14 is
greater than "x" is (replace "x" with a number):

=match(x, data!$a1:$a2002) / 2002

The macro is fast enough. But as it turns out, 90+% of the time is
spent in sorting the resulting array. In this case, it is faster to
use a function, even if we recreate the 2002 possible averages each
time. (Programming note: that could be avoided by storing into a
Public variable the first time.)

See the VB function genprob below. Usage: first argument is "x";
second argument is the 14-number range (again, horizontal).

In that case it would be useful to know the min and max 5-tuple
averages. See the VB function minmax below. Usage: first argument
is -1 for min, 1 for max; second argument is the 14-number range
(horizontal).

But the VB sub gendata is useful because it "returns" the entire array
of 2002 averages. That allows us to do some statistical analysis.
For example, use the Excel Histogram add-in (Data Analysis) and Chart
Wizard to graph the distribution of the averages. You will discover
that it is a normal distribution.

That is an important discovery. It demonstrates a principle of the
Central Limit Theorem that can be used to develop a more general
statistical solution. If this is for a class in statistics and
probability, I suspect that is what your instructor wants you to use
for the solution.

Obviously, more needs to be said. But I have run out of time. I'll
try to post back later. But this is best explained by the likes of
Jerry Lewis and Mike Middleton, if they are reading this thread.


-----

Sub gendata()
Dim pop As Variant
Dim x(1 To 2002) As Double
Dim i1 As Integer, i2 As Integer, i3 As Integer, i4 As Integer, i5 As
Integer
Dim i As Integer, j As Integer, k As Integer
Dim t As Double
Dim n As Long

If Selection.Count <> 14 Then Exit Sub
pop = Selection.Value
If UBound(pop, 1) <> 1 Then Exit Sub ' must be horizontal
n = 0
For i1 = 1 To 10
For i2 = i1 + 1 To 11
For i3 = i2 + 1 To 12
For i4 = i3 + 1 To 13
For i5 = i4 + 1 To 14
n = n + 1
x(n) = (pop(1, i1) + pop(1, i2) + pop(1, i3) + pop(1, i4) + pop(1,
i5)) / 5
Next i5: Next i4: Next i3: Next i2: Next i1

' sort in ascending order
For i = 1 To n - 1
k = i
For j = i + 1 To n
If x(j) < x(k) Then k = j
Next j
If k <> i Then
t = x(i): x(i) = x(k): x(k) = t
End If
Next i

Worksheets("Data").Range("A1:A2002").Value =
WorksheetFunction.Transpose(x)
End Sub


Function genprob(ref As Double, rng As Range) As Double
Dim pop As Variant
Dim x As Double
Dim i1 As Integer, i2 As Integer, i3 As Integer, i4 As Integer, i5 As
Integer
Dim cnt As Long, n As Long

If rng.Count <> 14 Then Exit Function
pop = rng.Value
If UBound(pop, 1) <> 1 Then Exit Function ' must be horizontal
n = 0: cnt = 0
For i1 = 1 To 10
For i2 = i1 + 1 To 11
For i3 = i2 + 1 To 12
For i4 = i3 + 1 To 13
For i5 = i4 + 1 To 14
n = n + 1
x = (pop(1, i1) + pop(1, i2) + pop(1, i3) + pop(1, i4) + pop(1,
i5)) / 5
If x <= ref Then cnt = cnt + 1
Next i5: Next i4: Next i3: Next i2: Next i1
genprob = cnt / n
End Function


Function minmax(mm As Integer, rng As Range) As Double
Dim pop As Variant
Dim x As Double
Dim i1 As Integer, i2 As Integer, i3 As Integer, i4 As Integer, i5 As
Integer

If rng.Count <> 14 Then Exit Function
pop = rng.Value
If UBound(pop, 1) <> 1 Then Exit Function ' must be horizontal
xmin = 1E+308
xmax = -1E+308
For i1 = 1 To 10
For i2 = i1 + 1 To 11
For i3 = i2 + 1 To 12
For i4 = i3 + 1 To 13
For i5 = i4 + 1 To 14
x = (pop(1, i1) + pop(1, i2) + pop(1, i3) + pop(1, i4) + pop(1,
i5)) / 5
If x < xmin Then xmin = x
If x > xmax Then xmax = x
Next i5: Next i4: Next i3: Next i2: Next i1
minmax = IIf(mm < 0, xmin, xmax)
End Function
 
J

joeu2004

Errata....

 The probability that the average of 5 random numbers out of the 14
is greater than "x" is (replace "x" with a number):
=match(x, data!$a1:$a2002) / 2002

Actually, that is the probability of "less than or equal to".
Correction:

=1 - match(x, data!$a1:$a2002) / 2002

Similarly, in the VB function genprob, change the comparison to:

  If x > ref Then cnt = cnt + 1

(Caveat emptor: if you are cut-and-pasting the VB code, be aware that
some lines might have been broken arbitrarily when I posted. The VBE
will flag this errors; the remedy is simply to go to the end of the
first line and press Space and Delete.)
 
G

Gary''s Student

Sub monti()

' gsnuxx

Set data = Range("A1:A14")
Set rnnd = Range("B1:B14")
Set sortarea = Union(data, rnnd)
Set sample = Range("A1:A5")

' put the data in A1 thru A14

Cells.Clear
data.Value = Application.Transpose(Array(1, 1, 2, 3, 4, 5, 6, 20, 21, 22,
23, 24, 25, 26))
rnnd.Formula = "=rand()"

' run the Monti Carlo

Application.ScreenUpdating = False
For i = 1 To 10000
sortarea.Sort Key1:=Range("B1"), header:=xlNo
Cells(i, "C").Value = Application.WorksheetFunction.Average(sample)
Next

' Put in the analysis equations

Range("F1").Formula = "=max(C:C)"
Range("F2").Formula = "=min(C:C)"
Range("D1").Formula = "=F2"
Range("D2").Formula = "=D1+(F$1-F$2)/10"
Range("D2").Copy Range("D3:D11")
Range("E2").Formula = "=COUNTIF(C:C,""<="" & D2)"
Range("E2").Copy Range("E3:E11")
Application.ScreenUpdating = True
End Sub


1. cols A & B are filled
2. cols A & B are "shuffled" 10000 times and the average of the A1 thru A5
is recorded
3. analysis equations are pasted in.

Here is what D2 thru E11 look like in a typical run:

2.2
4.38 93
6.56 307
8.74 1284
10.92 2801
13.1 4912
15.28 7228
17.46 8641
19.64 9701
21.82 9904
24 10000

Column D are the bin tops and column E are the counts. This says that there
is a 100% probability that a randomly chosen 5 sample will average at or
below 24

There is a 49.12% probability that the average will at or below 13.1, etc.

If you want a specific value not at a bin edge (say 12) then use:

=COUNTIF(C:C,"<=12")
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top