Need help calculating probabilities

K

Kiffar

My actual dataset is much more complicated, but hopefully thi
simplification will do the trick.

Let's say cells A1 and A2 are =randbetween(1,10), cell A3 is =5, cell
B1 and B2 are =randbetween(1,11), and cell B3 is =6. Cell C1 i
=A1+A2+A3, and cell C2 is =B1+B2+B3. C3 is =(A3-B3)/((A3+B3)/2), a typ
of percentage difference calculation. Every time I hit F9 the values i
column C will of course be a different number.

I would like cell D1 to tell me how often C3>.4, D2 to tell me how ofte
.4>C3>=.2, D3 to tell me how often .2>C3>-.2, D4 to tell me how ofte
-.2>=C3>-.4, and D5 to tell me how often C3<=-.4

There's a strong preference towards calculating the values of column
by only referencing column C, if at all possible
 
J

joeu2004

Kiffar said:
Let's say cells A1 and A2 are =randbetween(1,10),
cell A3 is =5, cells B1 and B2 are =randbetween(1,11),
and cell B3 is =6. Cell C1 is =A1+A2+A3, and cell C2
is =B1+B2+B3. C3 is =(A3-B3)/((A3+B3)/2), a type of
percentage difference calculation. Every time I hit F9
the values in column C will of course be a different
number.

I would like cell D1 to tell me how often C3>.4, D2 to
tell me how often 4>C3>=.2, D3 to tell me how often
.2>C3>-.2, D4 to tell me how often -.2>=C3>-.4, and D5
to tell me how often C3<=-.4

First, in your effort to simplify the situation, you have defined a
nonsensical problem.

Since A3 and B3 are constants, C3 is the constant expression (5-6)*2/(5+6)
= -2/11, which is about -0.18. Ergo, -0.2<C3<0.2 is always true.

I presume you intended to write that C3 is =(C1-C2)*2/(C1+C2). Note that I
made an algebraic simplification of (C1-C2)/((C1+C2)/2).

-----

To do __exactly__ as you describe, I would create a Worksheet_Calculate
event macro to update the counts in D1:D5.

Private Sub Worksheet_Calculate()
Application.EnableEvents = False
Select Case Range("c3")
Case Is >= 0.4
Range("d1") = Range("d1") + 1
Case Is >= 0.2
Range("d2") = Range("d2") + 1
Case Is > -0.2
Range("d3") = Range("d3") + 1
Case Is > -0.4
Range("d4") = Range("d4") + 1
Case Else
Range("d5") = Range("d5") + 1
End Select
Application.EnableEvents = True
End Sub

To create the event macro, right-click on the worksheet tab at the bottom,
click on View Code, then copy the text above and paste into the VBA editing
pane on the right. You can now close the VBA window.

Note: To save the macro with the workbook in Excel 2007 and later, be sure
to save as a macro-enabled file (xlsm).
 
J

joeu2004

PS.... I said:
To do __exactly__ as you describe, I would create a Worksheet_Calculate
event macro to update the counts in D1:D5.

Although that is what I would do based on your design and solution
limitations, I can offer one alternative that does not require a macro.

However, I do not recommend it.

This solution requires the use of one extra cell. I use F1, which is
initially empty.

Then enter the following formulas:

C3: =IF(F1="","",(C1-C2)*2/(C1+C2))
D1: =IF(C3="",0,D1+(C3>=0.4))
D2: =IF(C3="",0,D2+(0.2<=C3)*(C3<0.4))
D3: =IF(C3="",0,D3+(-0.2<C3)*(C3<0.2))
D4: =IF(C3="",0,D4+(-0.4<C3)*(C3<=-0.2))
D5: =IF(C3="",0,D5+(C3<=-0.4))

Set the Iterative calculation option with Max Iterations set to 1.

When you are ready to start your experiment, set F1 to 1. That is the first
recalculations. Press F9 successively for subsequent recalculations.

Clear F1 to start the counters at zero again.

To set the Iterative calculation option:

1. In Excel 2003, click on Tools, Options, Calculation.
2. In Excel 2007, click on the Office Button, Excel Options, Formulas.
3. In Excel 2010, click on File, Options, Formulas.

Caveat: I deprecate the use of the Iterative calculation option for two
reasons. First, it is unreliable in some situations, although it is
probably reliable in this limited usage. Second, it masks mistaken circular
references; that is, you will not get an error message.

But again, I would do this experiment very differently altogether, not
relying on either a Calculate event macro or the iterative calculation
option.
 
K

Kiffar

You were correct in ascertaining the nature of my typo in writing up th
original post, I have edited it accordingly.

'joeu2004[_2_ said:
;1607369']PS.... I said:


But again, I would do this experiment very differently altogether, not
relying on either a Calculate event macro or the iterative calculation
option.


Is this because it would be inaccurate, or because there's a mor
efficient method? I have no preference toward any type of method, a
long as it's accurate. I've never used macros or anything fancy lik
that before, though, so I might need handholding implementing suc
things. To give you an idea of my familiarity with Excel functions,
just learned how if statements work this week. Though I'd prefer t
limit the number of simulations, so that each time I change one of th
non-random variables, it doesn't take very long to update th
calculations, as my actual datasets have billions of possibilities. Th
results are still based off of C1 and C2 effectively, there's just muc
more that goes into calculating those two cells in the real sheet I hav
going
 
J

joeu2004

Kiffar said:
Is this because it would be inaccurate, or because there's
a more efficient method?

The latter.


Kiffar said:
I'd prefer to limit the number of simulations, so that each
time I change one of the non-random variables, it doesn't
take very long to update the calculations, as my actual
datasets have billions of possibilities.

The "best" implementation approach will depend on the exact nature of the
simulation.

I would encourage you to share all the details with us. You can upload an
example Excel file (devoid of any private data) that demonstrates the
problem to a file-sharing website.

Then post the "shared", "public" or "view-only" link (aka URL; http://...)
in a response here. The following is a list of some free file-sharing
websites; or use your own.

Box.Net: http://www.box.net/files
Windows Live Skydrive: http://skydrive.live.com
MediaFire: http://www.mediafire.com
FileFactory: http://www.filefactory.com
FileSavr: http://www.filesavr.com
RapidShare: http://www.rapidshare.com


Alternatively, send email to me at joeu2004 "at" hotmail.com, ideally with a
full description and an Excel file attachment.

(Some forums censor real email addresses.)


Kiffar said:
I've never used macros or anything fancy like that before,
though, so I might need handholding implementing such
things. To give you an idea of my familiarity with Excel
functions, I just learned how if statements work this week.

Generally, VBA (macros) is the best way to implement simulations, first
because the logic is sometimes difficult to write in Excel, and second
because we can avoid recalculations at unexpected times. The use of
RANDBETWEEN and RAND usually causes recalculations every time any cell in
any worksheet in the workbook is edited.

Be that as it may, here is a quick-and-dirty way to do the simulation that
you offered as an example. You will note that the cell usage is very
different from what you asked for. Is that really a deal-breaker?

In Sheet1:
A1: =RANDBETWEEN(1,10)
B1: =RANDBETWEEN(1,10)
C1: =RANDBETWEEN(1,11)
D1: =RANDBETWEEN(1,11)
E1: =A1+B1+5
F1: =C1+D1+6
G1: =(C1-C2)*2/(C1+C2)

Copy A1:G1 down through A10000:F10000. One way to do that easily:
1. Copy A1:G1.
2. Type A2:G10000 into the Name Box, then press Enter to select that range.
3. Paste by pressing ctrl+V.

Note that columns E, F and G correspond to your C1, C2 and C3. There are
just 10,000 of them -- 10,000 simulations.

For now, copy G1:G10000 and paste-special-value into H1:H10000. That
"freezes" the simulation. Note that G1:G10000 et al will continue to
change. We don't care.

I put the random data into a separate worksheet so that we can use a macro
later, if you wish, to "freeze" the simulation instead of using
copy-and-paste-special-value.

In Sheet2:
D1: =COUNTIF(Sheet1!H1:H10000,">=0.4")
D2: =COUNTIF(Sheet1!H1:H10000,">=0.2") - D1
D3: =COUNTIF(Sheet1!H1:H10000,">-0.2") - D1 - D2
D4: =COUNTIF(Sheet1!H1:H10000,">-0.4") - SUM(D1:D3)
D5: =COUNTIF(Sheet1!H1:H10000,"<=-0.4")
D7: =SUM(D1:D5) ' should be 10000, of course

For D2:D4, we could use COUNTIFS in Excel 2007 and later instead. For
example:
=COUNTIFS(Sheet1!H1:H10000,">=0.2",Sheet1!H1:H10000,"<0.4")

E1: =D1/$D$7
Copy E1 and paste into E2:E5. Format E1:E5 as Percentage. Those are your
probabilities.

With your example data, RANDBETWEEN(1,10) and RANDBETWEEN(1,11), the
probability distribution should be close to the following:

8.98% C3>=0.4
10.96% 0.2<=C3<0.4
38.69% -0.2<C3<0.2
18.19% -0.4<C3<=-0.2
23.18% C3<=-0.4

For your example, the expected probability distribution can be computed
using a macro. Let me know if you are interested.

Is that an approach you can work with?
 
K

Kiffar

It looks like that would work in concept. Here's the actual sheet I'
working on (In Excel 2007)
https://rapidshare.com/files/1408769342/SampleZ.xlsm

What I've been referring to as C1 and C2 are actually the pink and gree
numbers in columns K and W. Those numbers are simply autosums of K2
K26 and W2: W26, respectively. How each of those cells calculates it
value is not currently decided with certainty (and in fact, has change
since I've uploaded the sample and began writing this, and likely wil
again). What is known, which hopefully is sufficient for the discussio
at hand, is that all of K2: K26 and W2: W26 will be calculated the sam
way. Additionally, it is certain that the random number component fo
each row will always be either a number from 1-20 (columns I and Y), o
a manually entered value in the adjacent column (J and X). Are ther
any changes I should be aware of that if made would throw of th
probability calculations? One notable thing that could change is th
addition of more columns and rows, would macros written for the shee
automatically change the cells they refer to in order to accommodate th
same way cell functions do?

For simplification I had cut out some of the range classifications, th
full list is >=.4, >=.2, >=.1, >=.05, >-.05, >-.1, >-.2, >-.4, and =<-.
You can see in the middle where each classification will go, unde
chance of victory. The chance of defeat section will simply be a mirro
of the chance of victory section using linked cells. I'm leanin
towards deleting it though, if it matters. The decimals will b
displayed as percentages, I had intended to simply do that via the cel
format menu. 5% through negative 5% is a stalemate, 5 through 9.99~ i
marginal, etc.

Note that the percentage difference calculation in the middle is showin
the results for the current engagement is an =abs function to alway
turn the result into a positive number. Thus the probabilit
calculations should use ($J$27-$W$27)/(($J$27+$W$27)/2) rather tha
referring to M34.

I've switched off Excel's automatic formula recalculation for now, s
that it only recalculates random numbers when I press F9. However, i
seems this is a global setting that will apply to all sheets. It i
desirable to have the randomly generated numbers and probabilities no
change unless manually refreshed for this specific worksheet only, s
would it be possible to replace =randbetween() with a macro? Would i
be possible to make a refresh button on the sheet labeled "New Match"
 
K

Kiffar

'joeu2004[_2_ said:
;1607436']"Kiffar said:
It looks like that would work in concept.
Here's the actual sheet I'm working on (In Excel 2007):
https://rapidshare.com/files/1408769342/SampleZ.xlsm-

Sorry, but I cannot respond in a timely manner. Dealing with a family
tragedy.

Not a problem, this isn't something that needs to be done quickly. I'l
continue to post more recent versions of the sheet as I work on it
based on the feedback of the people that will be using it
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top