Excel's COMBIN and integers

J

Jerry W. Lewis

This is intended as an addendum to
http://groups.google.com/group/microsoft.public.excel/browse_thread/thread/36affa04b5577be3
which I cannot reply to directly because the MS community interface appears
to no longer support replying to the microsoft.public.excel group, my ISP no
longer supports NNTP newsgroups at all, and Google does not support posts
without displaying my real e-mail address.

An obvious calculation for Combin(n,r) is
EXP( GAMMALN(n+1)-GAMMALN(r+1)-GAMMALN(n-r+1) )

For large n, accuracy can be reduced due to cancellation problems. Ian
Smith discussed how to avoid these cancellation problems through a simple
auxilary function; unfortunately, AOL stopped hosting his web page.

In Excel, accuracy is also lost because the Excel implementation of GAMMALN
only gives about 10-figure accuracy, which is curious, because COMBIN's
results seem consistent with an underlying machine precision implementation
of GAMMALN.

Since Excel does not support the (mathematically and statistically useful)
analyitic continuation of COMBIN to non-integers, it is sloppy that they did
not round the result to an integer when that result is <= 2^53-1 =
9007199254740991; but it is an easy matter for the user to rectify this in
practice.

Jerry
 
D

David Biddulph

If your ISP doesn't support NNTP access, use the server msnews.microsoft.com
 
J

Joe User

Jerry W. Lewis said:
An obvious calculation for Combin(n,r) is
EXP( GAMMALN(n+1)-GAMMALN(r+1)-GAMMALN(n-r+1) )

"Obviously"! ;-)

Yeah, I had come across this formula myself. Well, the equivalent:
Gamma(n+1)/Gamma(r+1)/Gamma(n-r+1). (But Excel 2003 does not have a GAMMA
function per se, AFAIK.)

But I did not think that was the explanation since the Excel implementation
results in an error of about 1E-8, very much larger than the 2^-46 error
that COMBIN(9,3) produces, a one-bit error in the least-significant bit.

In fact, for COMBIN(n,k) with n=1,...,53 and k=1,...,n, 60% of the results
are integers, and the error is not more than the 3 least-significant bits,
and usually only in the least-significant bit [1]. (However, I did not
determine the accuracy of the integral results.)

I would be surprised to see that kind of accuracy from a Gamma
approximation, much less Exp and GammaLn approximations.

However, I admit that I am not familiar with implementations of these
approximations. And perhaps we can expect any approximation errors to
cancel out [2]. Moreover, Jerry says that GAMMALN "only gives about
10-figure accuracy". That would certainly contribute to, if not explain,
the large error in an Excel implemenation of the formula.

I would be interested in seeing the exact results for either the GammaLn or
Gamma formula from an independent math program. By "exact" results, I mean
either the 64-bit binary representation or some equivalent decimal
presentation of it. For example, Excel COMBIN(9,3) results in 84-2^-46, or
&h4054FFFF,FFFFFFFF, or 83.9999999999999,857891452847979962825775146484375.
The latter two are my own stylistic presentation.


-----
Endnotes

[1] For Excel COMBIN(n,k) with n=1,...,53 and k=1,...,n, 853 results are
integers, 333 are off in the least-significant bit, 222 are off in the 2
least-significant bits, and 23 are off in the 3 least-significant bits.
Note that "off" means off from the rounded integer. I did not determine the
accuracy of the integers.

[2] For example, if COMBIN(53,21) were evaluated effectively by Prod(k,
k=33,...,53)/Fact(32), an Excel implementation (i.e. using 64-bit
intermediate subproducts) results in exactly the correct integer, despite
the fact that Prod(k) results in an incorrect integer. Fact(32) does result
in exactly the correct integer.


----- original message -----
 
J

Joe User

David Biddulph said:
If your ISP doesn't support NNTP access, use the server
msnews.microsoft.com

If you mean using Outlook Express, Windows Mail or something similar to set
up an "account" that connects to the MSnews server, that __does__ use the
NNTP protocol.

But most people do not use the correct terminology when talking about
newsgroups (aka discussion groups). So it is unclear exactly what Jerry
means.

When I look back at Jerry's postings (e.g. Nov 2008), everything is
consistent with his posting through the MS Discussion Groups server, and it
is consistent with his latest posting [1]. But he might have been unaware
of that if he was using an interface provided by his ISP.

I wouldn't mind understanding this in more detail. What did Jerry do
exactly in the past? What did he do differently for his latest posting?

(We should probably take this discussion to a new thread or offline using
email.)


-----
Endnotes

[1] Relevant headers from Jerry's latest posting:

Message-ID: <[email protected]>
X-Newsreader: Microsoft CDO for Windows 2000
NNTP-Posting-Host: tk2msftibfm01.phx.gbl 10.40.244.149

Headers from Jerry's Nov 2008 posting (and many others):

Message-ID: <[email protected]>
X-Newsreader: Microsoft CDO for Windows 2000
NNTP-Posting-Host: tk2msftibfm01.phx.gbl 10.40.244.149

Compare with headers for my posting through the MSDG web interface:

Message-ID: <[email protected]>
X-Newsreader: Microsoft CDO for Windows 2000
NNTP-Posting-Host: tk2msftibfm01.phx.gbl 10.40.244.149

Constrast with headers from my posting through the MSnews server using OE:

X-Newsreader: Microsoft Outlook Express 6.00.2900.5512
Message-ID: <[email protected]>
NNTP-Posting-Host: c-24-6-189-80.hsd1.ca.comcast.net 24.6.189.80

The domain name (suffix) in the Message-ID header is usually indicative of
the server to which the message was posted: microsoft.com for the MSDG
server; phx.gbl for the MSnews server. However, the posting agent (local
program) can create its own Message-ID header.

However, also note the NNTP-Posting-Host header.


----- original message -----
 
J

Jerry W. Lewis

Joe User said:
I would be surprised to see that kind of accuracy from a Gamma
approximation, much less Exp and GammaLn approximations.

You don't get it from Excel's 10-figure approximation to GammaLn, but that
does not mean that the approach is bad.
I would be interested in seeing the exact results for either the GammaLn or
Gamma formula from an independent math program.

My statement that COMBIN's results seemed consistent with an underlying
machine precision implementation of GAMMALN was based on comparing results
with
exp(lgamma(n+1)-lgamma(r+1)-lgamma(n-r+1))
in R, which you can access from Excel via the RExcel COM server

Here are some specific resuls:

COMBIN(9,3) COMBIN(53,21)
COMBIN 8.3999999999999986E1 3.1798644182805512E14
R lgamma 8.3999999999999986E1 3.1798644182804919E14
R lchoose 8.3999999999999986E1 3.1798644182805375E14
DP 8.3999999999999986E1 3.1798644182805375E14
EP 8.4000000000000000E1 3.1798644182805500E14

DP and EP are hypothetical implementations operations performed in the order
of my original post with a lnGAMMA that is exact to the last bit in double
precision (8-byte real) and extended precision (10-byte real), but no extra
steps to minimize cancellation errors.

Jerry
 
J

Jerry W. Lewis

Recently, my posts have been through
http://www.microsoft.com/office/community/en-us/default.mspx?d=1
when an interesting thread showed up on a Google filter for key words that I
follow. That recent approach is becoming increasingly unsatisfactory,
because there seems to be a significant lag in Google indexing, plus finding
the threads that I want to respond to in the MS interface is becoming more
unreliable. In particular, I have seen no evidence that MS continues to
index the microsoft.public.excel group at all.

Prior to that I subscribed to newsgroups in Thunderbird through my ISP's
news server, but my ISP no longer supports that

Thanks for the reference to
msnews.microsoft.com
that does seem to work for MS news groups.

Jerry


Joe User said:
David Biddulph said:
If your ISP doesn't support NNTP access, use the server
msnews.microsoft.com

If you mean using Outlook Express, Windows Mail or something similar to set
up an "account" that connects to the MSnews server, that __does__ use the
NNTP protocol.

But most people do not use the correct terminology when talking about
newsgroups (aka discussion groups). So it is unclear exactly what Jerry
means.

When I look back at Jerry's postings (e.g. Nov 2008), everything is
consistent with his posting through the MS Discussion Groups server, and it
is consistent with his latest posting [1]. But he might have been unaware
of that if he was using an interface provided by his ISP.

I wouldn't mind understanding this in more detail. What did Jerry do
exactly in the past? What did he do differently for his latest posting?

(We should probably take this discussion to a new thread or offline using
email.)


-----
Endnotes

[1] Relevant headers from Jerry's latest posting:

Message-ID: <[email protected]>
X-Newsreader: Microsoft CDO for Windows 2000
NNTP-Posting-Host: tk2msftibfm01.phx.gbl 10.40.244.149

Headers from Jerry's Nov 2008 posting (and many others):

Message-ID: <[email protected]>
X-Newsreader: Microsoft CDO for Windows 2000
NNTP-Posting-Host: tk2msftibfm01.phx.gbl 10.40.244.149

Compare with headers for my posting through the MSDG web interface:

Message-ID: <[email protected]>
X-Newsreader: Microsoft CDO for Windows 2000
NNTP-Posting-Host: tk2msftibfm01.phx.gbl 10.40.244.149

Constrast with headers from my posting through the MSnews server using OE:

X-Newsreader: Microsoft Outlook Express 6.00.2900.5512
Message-ID: <[email protected]>
NNTP-Posting-Host: c-24-6-189-80.hsd1.ca.comcast.net 24.6.189.80

The domain name (suffix) in the Message-ID header is usually indicative of
the server to which the message was posted: microsoft.com for the MSDG
server; phx.gbl for the MSnews server. However, the posting agent (local
program) can create its own Message-ID header.

However, also note the NNTP-Posting-Host header.


----- original message -----

David Biddulph said:
If your ISP doesn't support NNTP access, use the server
msnews.microsoft.com

.
 
D

Dana DeLouis

You don't get it from Excel's 10-figure approximation to GammaLn, but that
does not mean that the approach is bad.


My statement that COMBIN's results seemed consistent with an underlying
machine precision implementation of GAMMALN was based on comparing results
with
exp(lgamma(n+1)-lgamma(r+1)-lgamma(n-r+1))
in R, which you can access from Excel via the RExcel COM server

Here are some specific resuls:

COMBIN(9,3) COMBIN(53,21)
COMBIN 8.3999999999999986E1 3.1798644182805512E14
R lgamma 8.3999999999999986E1 3.1798644182804919E14
R lchoose 8.3999999999999986E1 3.1798644182805375E14
DP 8.3999999999999986E1 3.1798644182805375E14
EP 8.4000000000000000E1 3.1798644182805500E14

DP and EP are hypothetical implementations operations performed in the order
of my original post with a lnGAMMA that is exact to the last bit in double
precision (8-byte real) and extended precision (10-byte real), but no extra
steps to minimize cancellation errors.

Jerry

Hi. If anyone is interested...
If one were to calculate Combin(100,36) via the basic equation, then the
numerator alone (ie 100!) has 158 digits.
Here's an alternative for integer inputs.

Sub Demo()
Debug.Print dCombin(53, 21)
Debug.Print dCombin(100, 36)
Debug.Print dCombin(120, 28)
End Sub

Returns:
317,986,441,828,055
1,977,204,582,144,932,989,443,770,175
1,763,957,085,749,372,402,201,417,160


Function dCombin(x, y)
Dim t As Variant
Dim J

x = x - y
t = CDec(x + 1)

For J = 2 To y
t = t * (x + J) / J
Next J

dCombin = FormatNumber(t, 0, , , vbTrue)
End Function

= = = = = = =
HTH :>)
Dana DeLouis
 
J

Joe User

Jerry W. Lewis said:
Thanks for the reference to msnews.microsoft.com
that does seem to work for MS news groups.

But note that the MSnews server deletes articles after 90 days, at least in
the m.s.excel* newsgroups. Fortunately, Google Groups archives articles
"forever" -- well, from May 1996 anyway.

Recently, my posts have been through
http://www.microsoft.com/office/community/en-us/default.mspx?d=1 [....]
I have seen no evidence that MS continues to
index the microsoft.public.excel group at all.

You are simply using the "wrong" MSDG web interface.

I use www.microsoft.com/communities/newsgroups/en-us. It does provide
access to m.s.excel. You use
www.microsoft.com/office/community/en-us/default.mspx. It does not provide
access to m.s.excel, as you note.

I did not realize there is another URL. There is some sense to it:
www.microsoft.com/communities provides access to more than just MS Office
"discussion groups".

Nonetheless, they both access the same "discussion group" archive, which I
call the MSDG server. The server has the internal name
tk2msftsbfm01.phx.gbl with at least two internal network addresses,
10.40.244.148 and 10.40.244.149. The two logical interfaces do not
distinguish the two web interfaces. Some of my postings go through the
*.148 interface; others go through the *.149 interface; all were entered
through the same web site, www.microsoft.com/communities/newsgroups/en-us.

The fact that the two different MS web interfaces provide different views of
the Office/Excel "discussion groups" is yet-another example that MS does not
have its act together, IMHO. So what else is new?

Anyway, if you want to access the Office/Excel "discussion groups" on the
MSDG server per se, use the www.microsoft.com/communities/newsgroups/en-us
interface. It provides the more complete list of the Office/Excel
"discussion groups". The content of the "discussion groups" common to both
web interfaces is the same.

Of course, the MSnews server is more direct and more reliable. But I
sometimes use the MSDG server to respond to other MSDG users because there
tends to be a 30-40-min delay for MSnews server postings to be pulled by the
MSDG server. On the other hand, MSDG server postings tend to be pushed to
the MSnews server in a timely manner.


----- original message -----

Jerry W. Lewis said:
Recently, my posts have been through
http://www.microsoft.com/office/community/en-us/default.mspx?d=1
when an interesting thread showed up on a Google filter for key words that
I
follow. That recent approach is becoming increasingly unsatisfactory,
because there seems to be a significant lag in Google indexing, plus
finding
the threads that I want to respond to in the MS interface is becoming more
unreliable. In particular, I have seen no evidence that MS continues to
index the microsoft.public.excel group at all.

Prior to that I subscribed to newsgroups in Thunderbird through my ISP's
news server, but my ISP no longer supports that

Thanks for the reference to
msnews.microsoft.com
that does seem to work for MS news groups.

Jerry


Joe User said:
David Biddulph said:
If your ISP doesn't support NNTP access, use the server
msnews.microsoft.com

If you mean using Outlook Express, Windows Mail or something similar to
set
up an "account" that connects to the MSnews server, that __does__ use the
NNTP protocol.

But most people do not use the correct terminology when talking about
newsgroups (aka discussion groups). So it is unclear exactly what Jerry
means.

When I look back at Jerry's postings (e.g. Nov 2008), everything is
consistent with his posting through the MS Discussion Groups server, and
it
is consistent with his latest posting [1]. But he might have been
unaware
of that if he was using an interface provided by his ISP.

I wouldn't mind understanding this in more detail. What did Jerry do
exactly in the past? What did he do differently for his latest posting?

(We should probably take this discussion to a new thread or offline using
email.)


-----
Endnotes

[1] Relevant headers from Jerry's latest posting:

Message-ID: <[email protected]>
X-Newsreader: Microsoft CDO for Windows 2000
NNTP-Posting-Host: tk2msftibfm01.phx.gbl 10.40.244.149

Headers from Jerry's Nov 2008 posting (and many others):

Message-ID: <[email protected]>
X-Newsreader: Microsoft CDO for Windows 2000
NNTP-Posting-Host: tk2msftibfm01.phx.gbl 10.40.244.149

Compare with headers for my posting through the MSDG web interface:

Message-ID: <[email protected]>
X-Newsreader: Microsoft CDO for Windows 2000
NNTP-Posting-Host: tk2msftibfm01.phx.gbl 10.40.244.149

Constrast with headers from my posting through the MSnews server using
OE:

X-Newsreader: Microsoft Outlook Express 6.00.2900.5512
Message-ID: <[email protected]>
NNTP-Posting-Host: c-24-6-189-80.hsd1.ca.comcast.net 24.6.189.80

The domain name (suffix) in the Message-ID header is usually indicative
of
the server to which the message was posted: microsoft.com for the MSDG
server; phx.gbl for the MSnews server. However, the posting agent (local
program) can create its own Message-ID header.

However, also note the NNTP-Posting-Host header.


----- original message -----

David Biddulph said:
If your ISP doesn't support NNTP access, use the server
msnews.microsoft.com
--
David Biddulph


This is intended as an addendum to
http://groups.google.com/group/microsoft.public.excel/browse_thread/thread/36affa04b5577be3
which I cannot reply to directly because the MS community interface
appears
to no longer support replying to the microsoft.public.excel group, my
ISP
no
longer supports NNTP newsgroups at all, and Google does not support
posts
without displaying my real e-mail address.
...

.
 
D

Dana DeLouis

Just to add:
If one were to calculate Combin(20,15), it is quicker for the vba code
to calculate the value via Combin(20,5). Hence this small adjustment...

Sub Demo()
Debug.Print dCombin(120, 92)
End Sub

Returns:
1,763,957,085,749,372,402,201,417,160


Function dCombin(x, y)
Dim t As Variant
Dim J, k

If x < 2 * y Then y = x - y
k = x - y
t = CDec(k + 1)

For J = 2 To y
t = t * (k + J) / J
Next J

dCombin = FormatNumber(t, 0, , , vbTrue)
End Function


<snip>
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top