How to use covar in random number generation?

J

joeu2004

How can I use covariance in random number generation
of a normal distribution?

For example, I would use VBA to implement Box-Mueller
using mean and std dev. I have implemented B-M in C.
Can someone provide an algorithm (VBA, C or pseudocode)
that incorporates covariance between two and among
3 or more random variables, each normally distributed?

Is that called multivariate normal random number
generation?

FYI, This is for Monte Carlo simulation, if that matters.
And I would be content with (even prefer) an algorithm
that depends on a uniform RNG.

Also, can I accomplish this without using VBA?

For example, some people have suggested using
NORMDIST(RAND(),mean,sd) as an RNG with normal
distribution based on mean and std dev. I don't know
how good that is and how that compares quality-wise to
Box-Mueller, even assuming that RAND() is a good
uniform RNG. (I believe some people question that
assumption.)

Frankly, I don't know how good Box-Mueller is quality-wise
either, assuming a good uniform RNG to start with. I
believe some people quibble over B-M as well. But I
prefer to keep the discussion simple for now. I am
interested in fundamental algorithms, not quibbles about
perfection.
 
J

Jerry W. Lewis

I think you meant NORMINV(RAND()...) not NORMDIST(RAND()...)
The former is a mathematically exact way to generate normal random variates,
as is Box-Muller. Numerically, I would avoid the INV approach unless you
have Excel 2003, sincer earlier inverse functions are not equal to the task.

Both approaches can magnifiy inadequacies of the uniform random number
generator. See
http://groups.google.com/group/microsoft.public.excel.misc/msg/cfb3b6faa69eb726
for more comments in this vein.

As for a correlation structure, if x,y,z are independent normal random
variables, then x+z and y+z are correlated normal random variables. You just
need to generate a few extra variables to drive your correlation structure.

Jerry
 
M

Mike Middleton

(e-mail address removed) -

Browse to Google Groups and search for "excel correlated normal random"
(without the quotes), and you'll get plenty of answers to your questions.

- Mike
www.mikemiddleton.com
 
J

joeu2004

Jerry W. Lewis said:
I think you meant NORMINV(RAND()...) not NORMDIST(RAND()...)

Yes. And thanks for your response. I was hoping to catch
your attention :).

Also, immediately after I posted, I wondered if I am more
interested in correlation (e.g. RSQ) than covariance. The
non-academic material I am reading uses the terms
interchangably. I'm afraid I fell into the same trap. Since
they seem to be related, I hope it does not matter much;
just a detail, I hope.
As for a correlation structure, if x,y,z are independent
normal random variables, then x+z and y+z are correlated
normal random variables. You just need to generate
a few extra variables to drive your correlation structure.

I sense that this is very relevant and important to me. But
I confess that you lost me completely. (My fault, not yours.)
I have no idea what to do with the correlation "structure"
(matrix?) in order to build the RNG.

Can you be more specific?

Perhaps a numerical example would help. The following
is based on real data. I wish I had numbers that
demonstrate closer correlation and negative correlation.
But it is difficult to tweak the data to make that happen.
I hope you can imagine that is the case. (And I hope the
table is columnarized properly when I post this.)

avg sd
X 13.1% 18.9%
Y 11.0% 11.9%
Z 4.9% 2.1%

rsq:
X Y Z
X 1 0.7221 0.0048
Y 0.7221 1 0.0030
Z 0.0048 0.0030 1

covar:
X Y Z
X 0.0228 0.0182 -0.0003
Y 0.0182 0.0135 -0.0001
Z -0.0003 -0.0001 0.0004


Ignoring the correlation factor, an RNG might be simply
NORMINV(RAND(),avg,sd). Alternatively, the Box-Mueller
algorithm is (simplified for illustration only):

x1 = 2*rnd() - 1
x2 = (2*rnd() - 1) * sqrt(1 - x1^2)
w = x1^2 + x2^2
return x1*sqrt(2*ln(w)/w)

But if I simply do that for each of X, Y and Z and
ignore correlation, I worry that the results of a
simulation might not demonstrate the expected
"negative" correlation (I wish) between Y and Z and
the "close" correlation (I wish) between X and Y.

So how do I change (or replace) the simple RNG to
incorporate correlation? What would the Excel formula
be? Or what would the VBA algorithm look like --
perhaps just a tweak of the B-M algorithm?
 
J

joeu2004

Mike Middleton said:
Browse to Google Groups and search for "excel correlated
normal random" (without the quotes)

I am glad I got your attention, too :).

Thanks. That does indeed provide more useful hits than
the google search I had done earlier. It finds some useful
papers, which I have yet to read -- including one that I
had failed to bookmark some months ago and could not
find again :->). And I am still plowing through the results
to see if there are any other useful hits.

In the meantime, I am particularly interested in the method
you suggested in Jan05. It is simple enough even for me :).

Can you answer some questions?

To generate random values from two correlated normal
distributions X and Y, you suggested:

z1 = norminv(rand(), 0, 1)
z2 = norminv(rand(), 0, 1)
x = meanX + stdevX*z1
y = meanY + stdevY*(z1*r + z2*sqrt(1-r^2))

1. Did I translate your formulas correctly? I changed the
notation slightly.

2. Does this method have a common name (a la "Box-Mueller"
or "polar method")? It would be useful to know for my
own documentation as well as if/when I discuss it with
others.

3. I believe "rho" is sqrt(rsq(X,Y)) -- i.e. pearson(X,Y).
Right? You had noted that "rho" must be between -1
and +1.

4. I believe the above works well for two random variables.
What about for N random variables, N >= 3?

For example, for random variables X1, X2, X3 and X4,
would I compute simply:

z1 = norminv(rand(), 0, 1)
z2 = norminv(rand(), 0, 1)
z3 = norminv(rand(), 0, 1)
z4 = norminv(rand(), 0, 1)
x1 = meanX1 + stdevX1*z1
x2 = meanX2 + stdevX2*(z1*rX12 + z2*sqrt(1-rX12^2))
x3 = meanX3 + stdevX3*(z1*rX13 + z3*sqrt(1-rX13^2))
x4 = meanX4 + stdevX4*(z1*rX14 + z4*sqrt(1-rX14^2))

where, "rX12" is "rho" for X1 and X2; "rX13" is "rho" for
X1 and X3; etc.

The choice of the correlation with X1 for all cases seems
somewhat arbitrary. I wonder if there are more terms or
more complex terms in the multivariate case, or if the
algorithm should be embellished in some other way.

Thanks again for the pointer (google search).
 
J

Jerry W. Lewis

The usual representation of Box-Muller is
http://www.taygeta.com/random/gaussian.html

If X,Y, and Z are independent N(0,1), a*X+b*Y and c*Y+d*Z are distributed
N(0,a^2+b^2) and N(0,c^2+d^2) with covariance b*c. Just choose a,b,c, and d
that will give the desired covariance structure (which will be unchanged by
constants to give the desired mean).

More generally, if y is a column vector of n independent N(0,1) variables,
and M is a matrix of constants with n columns, then
MMULT(M,y)
is multivariate normal with mean vector 0 and variance-covariance matrix
MMULT(M,TRANSPOSE(M)). Again you can add in constants for the desired mean
without violence to the covariance structure.

Jerry
 
J

joeu2004

Errata ("oh, for the love of P...!") ....

I wrote in one response:
I wondered if I am more interested in correlation
(e.g. RSQ) than covariance

And in another response:
I believe "rho" is sqrt(rsq(X,Y)) -- i.e. pearson(X,Y).

I meant pearson() in both cases, which is neither rsq()
nor sqrt(rsq()). I guess it is called r"sq" for a reason ;-).
Klunk!

Lo and behold, pearson() does give me the close(er)
and negative correlations factors that I expected from
the descriptions of the original data.

pearson(X,Y) = 0.8497
pearson(X,Z) = -0.0695
pearson(Y,Z) = -0.0544
 
M

Mike Middleton

(e-mail address removed) -

(1) Looks OK for the two-variable situation.

(2) I don't think there is a special name for these expressions, which are
special cases of the general mulitvariate normal expressions. Box-Muller is
a method for obtaining a single normal random number from two uniform random
numbers.

(3) OK. Rho is the correlation coefficient (Excel's CORREL and PEARSON
worksheet functions).

(4) Your four-variable expressions are not correct. The general case needs
matrix notation involving all pairwise correlations. Here's some expressions
(can't remember the source) for the special case of three normal random
numbers:

If x1, x2, and x3 are independent random variates, and c12, c13, and c23 are
the desired correlations among them, then the correlated random variates v1,
v2, and v3, can be calculated most easily as follows:

v1 = x1
v2 = x1*c12 + x2*(1 - c12^2)^0.5
v3 = x1*c13 + x2*(c23 - c12*c13)/((1-c12^2)^0.5) +
x3*[1-([c23 - c12*c13]^2)/(1-c12^2)-c13^2]^0.5

- Mike
www.mikemiddleton.com
 
J

joeu2004

Mike Middleton said:
(4) Your four-variable expressions are not correct. The
general case needs matrix notation involving all pairwise
correlations.

As I expected. I just wanted confirmation because the OP
in the Jan05 tried to draw the same simplifying (and hopeful)
conclusion that I did, and no one ever corrected him.
If x1, x2, and x3 are independent random variates, and
c12, c13, and c23 are the desired correlations among
them, then the correlated random variates v1, v2, and
v3, can be calculated most easily as follows:
v1 = x1
v2 = x1*c12 + x2*(1 - c12^2)^0.5
v3 = x1*c13 + x2*(c23 - c12*c13)/((1-c12^2)^0.5) +
x3*[1-([c23 - c12*c13]^2)/(1-c12^2)-c13^2]^0.5

This is actually very helpful, if only for illustrative purposes.
Could you provide the equations for __4__ independent
random variates? It is not yet clear to me where this
progression is headed.
 
M

Mike Middleton

(e-mail address removed) -

You wrote: > Could you provide the equations for __4__ independent random
variates? <

No, I cannot. Perhaps you can find someone else to translate the
general-case matrix operations into "equations."

- Mike
www.mikemiddleton.com

Mike Middleton said:
(4) Your four-variable expressions are not correct. The
general case needs matrix notation involving all pairwise
correlations.

As I expected. I just wanted confirmation because the OP
in the Jan05 tried to draw the same simplifying (and hopeful)
conclusion that I did, and no one ever corrected him.
If x1, x2, and x3 are independent random variates, and
c12, c13, and c23 are the desired correlations among
them, then the correlated random variates v1, v2, and
v3, can be calculated most easily as follows:
v1 = x1
v2 = x1*c12 + x2*(1 - c12^2)^0.5
v3 = x1*c13 + x2*(c23 - c12*c13)/((1-c12^2)^0.5) +
x3*[1-([c23 - c12*c13]^2)/(1-c12^2)-c13^2]^0.5

This is actually very helpful, if only for illustrative purposes.
Could you provide the equations for __4__ independent
random variates? It is not yet clear to me where this
progression is headed.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top