Getting around Dependency Tree Performance Problems

  • Thread starter AllSensibleNamesTaken
  • Start date
A

AllSensibleNamesTaken

Hello all,
I'm looking for tips on how to avoid the seemingly intermitent, yet cripling
performance bottleneck that occurs when calling User Defined Functions that
refer to large ranges containing formulas.

Let me describe a simple setup that triggers the performance problem I am
seeing.

PROBLEM SETUP
----------------
On a new blank spreadsheet, setup a column containing 14,000 static (ie. not
formulas) numbers. Let's call this the Static column.

Setup another 14,000 row column next to the Static column. Let's call this
the Formula column. Each cell in the Formula column should manipulate the
equivalent cell in the Static column using a formula (subtracting a number
from the number in the static column should suffice).

enter a call to the UDF "doit" on another cell, passing it the entire
Formula column as the input argument range. An implementation of doit is
found below

Public Function doit(theRange As Range) As Long
'Doing something with theRange seems to trigger the performance
'bottleneck
Dim dummy as Integer
If IsEmpty(theRange) Then dummy = 0

'Count number of times this function is called
static numberOfCalls as Long
numberOfCalls = numberOfCals+1

doit=numberOfCalls
End Function
--------------

PROBLEM OBSERVATIONS
--------------------------
1)
The performance hit is due to the doit function intermitently being called
14,000 times by excel (once for each cell in the input argument range)
This happens most often though not always when requesting a Full recalc with
dependency tree rebuild (Ctrl+Alt+Shift+F9) though it also happens, albeit
less frequently with all other calculate requests, namely:
Full Recalc (Ctrl+Alt+F9)
Recalc (F9)
Sheet Recalc (Shift+F9)

2)
Note that this DOES NOT seem to happen if instead of the Formula Column, we
pass the Static column to the doit function

3)
Note also that this does not seem to happen if the column size is small

4)
In a more complex spreadsheet with several "doit" calls referring to various
large formula ranges, this intermitent 14,000 times call repeat problem
occurs much more often as it seems to affect each doit cell at different
times. The more doIt cells there are the more chances for the problem to
occur. INterestingly though I have not yet seen an ocasion when it's affected
more than one doit cell at a time.


PROBLEM ANALYSIS
----------------------
I am experiencing this using Excel 2003 and my thoughts on what is happening
based on the observations above are the following:

1)
The fact that the call is repeated once for each cell in the passed range
(14,000 times) and that the problem occurs most often when requesting a Full
Recalc with dependency tree rebuild leads me to beleive that the problem is
caused by the dependency tree logic

2)
The fact that the problem only manifest itself when the passed range is made
up of formulas which themselves depend on other cells, also points to a
dependency tree problem as clearly this setup has a higher level of
dependencies than when the passed range is just the static data column

3)
The fact that the problem does not manifest itself for small ranges points
to a possible Excel 2003 dependency limit breach causing Excel to always do a
full recalc rather than store a large dependency tree. However, why is the
behaviour intermitent? And why is the limit breached for such a small
spreadsheet. (Read more on limits on these links)
http://blogs.msdn.com/excel/archive/2005/09/26/474258.aspx
http://www.decisionmodels.com/calcsecretsf.htm

5)
I am also surprised that the problem does not occur unless the UDF mentions
the passed range in the body of the code. ie the range being in the argment
list on its own does not cause the problem. It's only when you try t use the
argument in the body


WHAT NEXT?
--------------

I would be very greatful if anyone with experience in building large
spreadsheets or with experience of this particular problem could share how
they've been getting around this issue

Thanks so much
 
C

Charles Williams

There are several things here that can cause poor performance.

1. Bug in Excel VBA that causes the VBE title bar to be refreshed each time
a VBA UDF is executed, unless the calculation is initated from VBA: bypass
is to run in manual calc mode and trap all calculation key sequences (F9,
Shift/F9, Ctrl/alt/F9 etc) to use a VBA Application.Calculate,
Activesheet.calculate, application.calculateFull etc command.
This will dramatically speed up each execution of the UDF.

2. VBA UDFs are called by Excel even when their arguments are uncalculated.
This can cause the UDF to be repeatedly called as you have discovered.
Uncalculated cells appear to VBA as Empty cells so for single cells you can
use an IsEmpty() test. For a Range it is not very efficient to loop through
every cell testing for IsEmpty so its faster to have code like this

Public Function tester1(theRange As Range)
On Error GoTo Fail
If Application.WorksheetFunction.CountA(theRange) <> theRange.Rows.Count
Then Exit Function
lCount = lCount + 1
tester1 = lCount
Exit Function
Fail:
End Function

Alternatively you can change the formula to something like
=Counta(B1:B14000)*0+Tester1(b1:b14000)

3. This problem only occurs when the the calculation sequence has not been
optimised by previously being calculated. Thus for the second of two
successive calculations the UDF will usually only be called once.
Note modifying a formula flags it as uncalculated and puts it at in first
position in the calculation chain (LIFO).

Using these techniques will reduce the worst-case timing for the 14000 cell
range on my system from 65 seconds to 2 seconds. The second full calculation
will take milliseconds.

See http://www.DecisionModels.calcsecretsj.htm and for a discussion of UDFs
and http://www.decisionmodels.com/calcsecretsc.htm and associated pages for
a discussion of the calculation sequnce and dependency trees.


Charles
___________________________________
London Excel Users Conference April 1-2
The Excel Calculation Site
http://www.decisionmodels.com
 
C

Charles Williams

Actually I have just discovered that using theRange.Rows.Count seems to work
much faster:

Option Explicit
Dim lCount As Long
Public Function tester1(theRange As Range)
Dim n As Long
On Error GoTo Fail
n = theRange.Rows.Count

lCount = lCount + 1
tester1 = lCount
Exit Function
Fail:
End Function


Charles
___________________________________
London Excel Users Conference April 1-2
The Excel Calculation Site
http://www.decisionmodels.com
 
C

Charles Williams

Ooops ...
That does not work there is no proper dependency on the Range...

Please ignore.
Charles
___________________________________
London Excel Users Conference April 1-2
The Excel Calculation Site
http://www.decisionmodels.com
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top