how i calculate pearson correlation coefficient for excel graph

  • Thread starter university student that is lost
  • Start date
U

university student that is lost

how do I calculate the pearson correlation coefficient for on microsoft excel
for a grapg of information. If it is not possible is there another program
that can do this for me?
 
J

Jon Peltier

In a cell in the worksheet, enter this formula

=CORREL({y range},{x range})

where {x range} and {y range} are the ranges occupied by the X and Y values
in the sheet. You can select them by clicking and dragging while you are
entering the formula.

- Jon
-------
Jon Peltier, Microsoft Excel MVP
Tutorials and Custom Solutions
http://PeltierTech.com
_______


"university student that is lost" <university student that is
(e-mail address removed)> wrote in message
news:D[email protected]...
 
D

Del Cotter

In a cell in the worksheet, enter this formula

=CORREL({y range},{x range})

Shouldn't that be =PEARSON({y range},{x range})? I only have Excel 97,
but I'm surprised to see that PEARSON() and CORREL() do not have a
pointer to each other in the "See also" section of the help file.
 
J

Jon Peltier

A quick glimpse at Google led me to believe they were one and the same. Just
now Wikipedia tells me:

"The CORREL() function in many major spreadsheet packages, such as Microsoft
Excel and Gnumeric calculates Pearsons correlation coefficient."

Wolfram adds that:

"The correlation coefficient is also known as the product-moment coefficient
of correlation or Pearson's correlation."

I wouldn't count on the Excel help files to be much help, especially about
statistics.

- Jon
 
D

Del Cotter

You're right, a quick test of the two functions suggests they give the
same numeric result every time.
 
J

James Silverton

Hello, Jon!
You wrote on Sun, 17 Sep 2006 00:28:59 -0400:

JP> "The CORREL() function in many major spreadsheet packages,
JP> such as Microsoft Excel and Gnumeric calculates Pearsons
JP> correlation coefficient."

JP> Wolfram adds that:

JP> "The correlation coefficient is also known as the
JP> product-moment coefficient of correlation or Pearson's
JP> correlation."

JP> I wouldn't count on the Excel help files to be much help,
JP> especially about statistics.

It is interesting that the Help entries for PEARSON and CORREL
look confusingly different. Rather than analyse the equations, I
calculated PEARSON and CORREL for several sets of data with
identical numerical results :)

James Silverton
Potomac, Maryland

E-mail, with obvious alterations:
not.jim.silverton.at.comcast.not
 
J

Jerry W. Lewis

Correl is the better choice. In Excel 2003 and later Correl and Pearson are
identical. Prior to 2003, they are mathematically equivalent, but Peason is
numerically inferior.

Jerry
 
J

James Silverton

Jerry W. Lewis said:
Correl is the better choice. In Excel 2003 and later Correl
and Pearson are
identical. Prior to 2003, they are mathematically equivalent,
but Peason is
numerically inferior.

Without disputing your analysis, since all my tests seem to
produce identical results with Excel 2002, can you point me to
details of why PEARSON is inferior? Perhaps, you might let me
have some data that would produce different answers.
 
J

James Silverton

Hello, James!
You wrote on Sun, 17 Sep 2006 23:52:37 -0400:


JS> message
??>> Correl is the better choice. In Excel 2003 and later
??>> Correl and Pearson are identical. Prior to 2003, they are
??>> mathematically equivalent, but Peason is numerically
??>> inferior.
??>>
JS> Without disputing your analysis, since all my tests seem to
JS> produce identical results with Excel 2002, can you point me
JS> to details of why PEARSON is inferior? Perhaps, you might
JS> let me have some data that would produce different answers.

I'll just add that I checked my results again but I had to look
at the 14th or 15th decimal to get a numerical difference. I
suppose there must be data where the difference is more apparent
but I'd debate whether the differences of the coefficients have
any meaning for my data.


James Silverton
Potomac, Maryland

E-mail, with obvious alterations:
not.jim.silverton.at.comcast.not
 
J

Jerry W. Lewis

The numerical problems with the pre-2003 Pearson algorithm are the same as
with the pre-2003 StDev, Rsq, Slope, etc. They have been discussed in the
statistical literature for over 40 years and in these newsgroups for over 10.
You might find
http://groups.google.com/group/microsoft.public.excel.worksheet.functions/msg/d6a03470e7a1c650
to be useful.

You will find a number of threads where people report Rsq<0 or Abs(Rsq)>1.
In those instances Pearson will show much worse numerical problems than your
simple tests have shown.

For a relatively simple example, put =$C$1 in A1:B1 and =$C$1+1 in A2:B2.
=CORREL(A1:A2,B1:B2) and =PEARSON(A1:A2,B1:B2) should both be 1 for any
numeric value in C1. However if C1 contains 1E8 then PEARSON will give
#DIV/0, and if C1 contains 1E12 then PEARSON will give -1.

Jerry
 
J

James Silverton

Jerry W. Lewis said:
The numerical problems with the pre-2003 Pearson algorithm are
the same as
with the pre-2003 StDev, Rsq, Slope, etc. They have been
discussed in the
statistical literature for over 40 years and in these
newsgroups for over 10.
You might find
http://groups.google.com/group/microsoft.public.excel.worksheet.functions/msg/d6a03470e7a1c650
to be useful.

I did the numerical testing because, as I mentioned, the
information available in HELP was rather obtuse. Thank you for a
most enlightening reply!

I must admit that in the days when I relied on statistical
tests for hypothesis testing, I did not use functions from
general programs like Excel, nor indeed use Pearson's test.
These days, my data is such that trends are visible even from
graphical display. As will be obvious, I'm a user rather than a
theorist but it's always comforting to know the limitations of
calculation formulae:)

Have the statistical formulae in Excel 2003 on been rigorously
tested? As you remarked in the reference, it is amazing how long
Microsoft kept using unstable calculation techniques
 
J

Jerry W. Lewis

James Silverton said:
... Thank you for a
most enlightening reply!

You're welcome, glad it helped.
Have the statistical formulae in Excel 2003 on been rigorously
tested? As you remarked in the reference, it is amazing how long
Microsoft kept using unstable calculation techniques

Univariate statistics functions (StDev, StDevP, Var, VarP) worked fine out
of the box. Bivariate Statistics functions (Slope, Intercept, SteYX, etc)
could produce incorrect results with non-numeric or empty cells in the data
range; that was fixed in a March 2004 patch (as was a bug in the new algorith
for RAND).

When LINEST 2003 estimates a parameter to be exactly zero, the estimate
should be distrusted unless confirmed by alternate calculations; this is
fixed in 2007 beta.

Probability calculations (other than the much improved 2003 NORMDIST and
NORMINV) remain inadequate. ...INV functions do a better job of inverting
....DIST functions, but continuous distribution functions (...DIST) continue
to have limited accuracy and continue to become totally inaccurate in at
least one tail. Algorithms were changed for discrete distributions
functions, but they introduce new numerical problems that are still not fixed
in 2007 beta. The gold standard for probability calculations remains Ian
Smith's library
http://members.aol.com/iandjmsmith/examples.xls
which is actually better (both accuracy and working range) than any other
double precision implementation that I am aware of, (including dedicated
statistics packages and math libraries).

Jerry
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top