32 bit DLL with 64 bit Excel 2010

S

Steve Flaum

We have an app which creates large Excel workbooks. For example, one
workbook has 1,000 worksheets. In other cases there are fewer worksheets but
the Excel file can be 80 MB or larger. Since this sometimes crashes Excel
2007 and 2003, I'm considering using 64 bit Excel 2010. Would 64-bit Excel
be more reliable with workbooks this size than 32-bit Excel? Is 32-bit Excel
2010 any more (or less) reliable with large workbooks than Excel 2007?

The problem with using 64-bit Excel is that the workbook uses a 32-bit
native code DLL. Specifically, the workbook executes a VBA macro with the
following statements:

Dim mCCalc As Object
Set mCCalc = CreateObject(strName, "")
mCCalc.Init Application

The 32-bit DLL executes many Excel methods, using the reference to Excel
passed in the 3rd line above.

The DLL is coded in VB 6.0, so we cannot compile it into a 64-bit DLL. We've
looked into porting it to VB 2010, but that would be an impractically-large
project because the DLL has about 25,000 lines of code, much of which would
require manual conversion.

I've read that a 32-bit activeX control cannot be used with 64-bit Excel
2010, but this isn't an ActiveX control. Is there a way to use it with
64-bit Excel? For example, could Tlbimp.exe create a wrapper which would
make the 32 bit DLL look like a 64-bit managed-code DLL? If so, could I call
the latter from VBA? What would this do to execution speed? (The current
design runs the DLL in process with Excel, but can nevertheless run for a
couple of hours.)

Thanks.

Steve
 
C

Charles Williams

Hi Steve,

AFAIK there is no way to make your 32-bit VB6 DLL run with 64-bit
Excel.

The only easy solution to VB6 DLLs with 64-bit Excel 2010 is to
convert it to VBA, which runs happily with 64-bit 2010 (you need to
convert any Windows API calls) - you may lose 10% or so execution
speed if the DLL does extremely heavy calculations (you lose security
of course).

If you profile your solution, is the time spent in
- Excel calculating,
- in transferring data between VB6 and Excel,
- in manipulating the Excel object model,
- or in calculations inside VB6?

I don't consider 80 MB workbooks as pushing Excel limits: Memory
capacity of Excel 2003 and 2007 will easily cope with 80MB workbooks,
so I don't see any advantage to 64-bit as long as you don't need more
than 1 or 2 gigabytes.

From a reliability point of view Excel 2003 is usually better than
Excel 2007. Excel 2010 seems very promising for reliability, but
really its too early to tell.

regards
Charles
 
P

Peter T

Steve Flaum said:
We have an app which creates large Excel workbooks. For example, one
workbook has 1,000 worksheets. In other cases there are fewer worksheets
but the Excel file can be 80 MB or larger. Since this sometimes crashes
Excel 2007 and 2003, I'm considering using 64 bit Excel 2010. Would 64-bit
Excel be more reliable with workbooks this size than 32-bit Excel? Is
32-bit Excel 2010 any more (or less) reliable with large workbooks than
Excel 2007?

The problem with using 64-bit Excel is that the workbook uses a 32-bit
native code DLL. Specifically, the workbook executes a VBA macro with the
following statements:

Dim mCCalc As Object
Set mCCalc = CreateObject(strName, "")
mCCalc.Init Application

The 32-bit DLL executes many Excel methods, using the reference to Excel
passed in the 3rd line above.

The DLL is coded in VB 6.0, so we cannot compile it into a 64-bit DLL.
We've looked into porting it to VB 2010, but that would be an
impractically-large project because the DLL has about 25,000 lines of
code, much of which would require manual conversion.

I've read that a 32-bit activeX control cannot be used with 64-bit Excel
2010, but this isn't an ActiveX control. Is there a way to use it with
64-bit Excel? For example, could Tlbimp.exe create a wrapper which would
make the 32 bit DLL look like a 64-bit managed-code DLL? If so, could I
call the latter from VBA? What would this do to execution speed? (The
current design runs the DLL in process with Excel, but can nevertheless
run for a couple of hours.)

Thanks.

Steve
 
P

Peter T

Sorry for the accidental post

I can't add to the reply given by Charles but just this bit
The DLL is coded in VB 6.0, so we cannot compile it into a 64-bit DLL.
We've looked into porting it to VB 2010, but that would be an
impractically-large project because the DLL has about 25,000 lines of
code, much of which would require manual conversion.

Are you sure so much would need conversion. Apart from Form code, which
would indeed need to be re-made, I imagine there'd be much less in the way
of conversion required from VB6 to VBA than the other way round.

Regards,
Peter T
 
S

Steve Flaum

Thanks very much Charles.

My app occasionally crashes Excel, usually (maybe only) when I'm copying a
range. This happens most with "large" (by my standards) workbooks. I suspect
that's because I manipulate the Excel object model more when I have more
data. This suspicion is supported by the fact that improvements to my
algorithm which reduce the amount of data manipulation also reduce the
number of crashes, even when the workbook size isn't reduced.

On the other hand, I've also overcome crashes by dividing a large workbook
into multiple small workbooks and paging the temporarily-unneeded ones to
disk. This implied that freeing Excel resources helped, although it might
just be that making this change "jiggled" the code in such a way that the
crash I was working on at the time went away. However, it made me think that
a version of Excel with more resources -- i.e. 64 bit Excel 2010 -- might
help.

We're seeing some demand for larger workbooks, but I don't expect to go over
a few hundred MB. Actually, with current reliability, we probably cannot get
that far, because there will be too many crashes if we process that much
data. This reliability issue is my priority, and the reason for my original
question.

I've crashed both Excel 2003 and Excel 2007, depending on the data. So far
no single set of data has crashed both of them. That is, if one of them
crashes I've always been able to work around the problem by switching to the
other. Another interesting observation is that Excel 2003 crashes have (so
far) always been reproducible while Excel 2007 crashes have (so far) always
been intermittent. That is, restarting my program "fixes" an Excel 2007
crash (at least temporarily) but never an Excel 2003 crash. Some years ago I
had one repeatable crash with Excel 2003 that, after month of working
together, Microsoft tech support and the developer they were working with
admitted was the result of a bug in a generic Office garbage collection
routine. A pointer in a linked list of cell formats was getting corrupted.
Unfortunately, they weren't prepared to fix this, although at that time
Excel 2003 was the latest version. Since that experience I haven't been as
enthusiastic about Excel 2003 reliability as many people are.

Unfortunately, since my program can run for an hour or more, restarting
isn't a good solution. Also, since we sell this program commercially, we
can't tell our customers to switch back and forth between Excel 2003 and
Excel 2007 to see which one works best for them.

I've done a lot of profiling. The results vary depending on the options
selected in my program, but typically the large items are:

1) Creating new worksheets and setting the column widths in these sheets.
One of the reasons for the time variation is that some customers use only a
dozen relatively large sheets (in which case, obviously, creating them
doesn't take much time) while others use up to 1,000 small sheets. In the
latter case, a half hour can be spent creating the sheets and setting column
widths. Nearly half of this time is spent setting column widths (including
hiding columns) in the newly-created sheets. The positive aspect of this is
that these operations have been dead reliable (so far).

2) Excel calculating. Incidentally, the worksheets I calculate are tiny --
typically a few hundred or maybe a few thousand cells -- but I calculate
these tiny sheets tens of thousands of times, perhaps even hundreds of
thousands of times in extreme cases. Larger sheets are created by copying
data from many tiny sheets without much further calculation.

3) Miscellaneous other manipulations to the Excel object model, mainly
copying ranges (again, this is where Excel crashes on me), transferring
values and formulae between Excel and VB, and changing row heights. One
thing that really surprised me is that my app often runs faster when I copy
a single cell at a time, rather than a larger range in a single operation.
Probably this is because I use application logic to copy only cells which
changed, but when I copy an entire range I cannot omit cells which I know
didn't change.

4) My VB code. I put this last because there's not much you can say about
it, but it takes maybe 30-50% of the time.

I've spent a lot of time optimizing my app, mainly improving the algorithms
to reduce the amount of work done by Excel. This time has, naturally,
focused on the slow operations. The result is that (with the exception of a
few items like creating new sheets and changing column widths) I've "knocked
off the peaks" -- i.e. speeded up the slow operations -- and now have an app
that spends its time distributed over a pretty wide range of areas.

Your comments that VBA would slow my app down by only ~10% is very
interesting, and surprising. I would have thought that the difference would
be far greater than that, because of the difference between compiled and
interpreted code. Are you sure about that? Do you know why the difference
isn't greater? If the reason is that some operations would run faster,
offsetting increased time taken by others, maybe I should move part of my
app to VBA.

Incidentally, my DLL does call the Windows API, as well as another process
of my own. Also, a VB 2008 app calls directly into the DLL (Excel passes a
reference to the DLL to the managed app), but that doesn't take a lot of
time so I could pass the calls from the managed code to the VBA via Excel.
My source code, including forms, is > 10MB

Anyway, based on your comments, it sounds like there would be little benefit
from using 64 bit Excel, so probably I should just continue to run in as a
32-bit app under WoW.

Thanks again.

Steve
 
S

Steve Flaum

Thanks, Pete.

I didn't mean that converting from VB 6 to VBA would be hard, but rather
that converting from VB 6 to VB 2010 would be hard.

Steve
 
C

Charles Williams

Which method are you using to copy ranges?
Have you noticed any difference in reliability using different
methods? (I am always suspicious of using the clipboard).

Most of the VB/VBA time is usually spent either in the VB runtime or
in the Xl object model calls. For these operations there is no time
difference because its the same code for both VBa and VB6.
If you are doing very heavy arithmetic calculations in VB compiled VB6
may be faster, but that's probably the only case where you get a
performance improvement (unless you have thousands of UDFs and hit the
VBE refresh bug).

I believe Excel 2010 has performance improvements in setting column
widths.

Presumably you have looked at setting column widths and hiding columns
once, then copying the formatted sheet(s).
And presumably you have looked at using Range.calculate and/or
Sheet.Calculate to minimise the repetitive calc time.

The last time I did any work on a highly looped calculation (10K
loops) we got the time down a lot by storing the results in arrays and
then writing them out in bulk at the end, and by removing all
unnecessary sheets and stuff during the loop.

regards
Charles
 
S

Steve Flaum

"Which method are you using to copy ranges?" RangeFrom.Copy RangeTo
I tried other methods without noticing any difference, but only with Excel
2003 and only when trying to overcome a specific bug, so I can't make a
general statement about whether there's a difference in reliability on
average.

One problem with all of the copy methods is that they seem to use the
clipboard implicitly, if I don't do it explicitly. That means my users
cannot effectively use their computer while my program is running, even
though I run at a lower-than-normal priority (switching temporarily to a
high priority while copying ranges, so that the user doesn't destroy the
data in the clipboard). This is not as critical as the reliability and speed
issues, but it would be nice if there were a way to correct it.

"Most of the VB/VBA time is usually spent either in the VB runtime or in the
Xl object model calls." Thanks for that info. I didn't know that,
particularly regarding the VB runtime, and it's very interesting. When I
switched from VB3 to VB4 (or whenever it was that compiled VB was
introduced) I got a big speed improvement, so I figured the same would be
true here, but that was a different program which didn't use Excel. Does it
also imply that I could improve speed by converting my DLL to VB 2010? I'd
have to run out of Excel's process, but maybe the runtime would be faster. I
don't have the time to do this now, but maybe someday.

"If you are doing very heavy arithmetic calculations in VB (or) you have
thousands of UDFs " I don't do either. It's mainly a bunch of logic, copying
data around, and calling the Excel object model, plus some database IO.

"I believe Excel 2010 has performance improvements in setting column widths"
Terrific. I'll have to try that.

"Presumably you have looked at setting column widths and hiding columns
once, then copying the formatted sheet(s)." Yes, I tried that, but it didn't
help. What sometimes does work is that my code can create a pool of empty
worksheets in the master workbook before the program is run. You can then
use the same master workbook each time the program is run, so the empty
workbooks are created once and used for years by a given user, rather than
being rebuilt each time my program is executed. (Different users have empty
worksheets with different column widths, but any given user never or rarely
changes his or her column widths.) This eliminates the problem when a medium
number of worksheets (e.g. 100) are used. However, when the collection of
empty workbooks is very large (e.g. 1,000), it seems to slow down other
parts of the program, offsetting the gain from eliminating the need to build
new ones. I haven't had time to look into the reason for this. Maybe it's
something I can solve it when I do look into it, but if Excel 2010 improves
this enough that would be a simpler solution.

"presumably you have looked at using Range.calculate and/or
Sheet.Calculate." Yes, I do these things, and they help a lot, especially
Sheet.Calculate. The timing data I mentioned is based on using them.

Incidentally, I also have the option of periodically executing
CalculateFullRebuild at user-specified intervals. My theory was that this
would reinitialize Excel's internal structures and therefore eliminate
crashes. Sometimes it does help. Other times it makes the problem worse. I
can often overcome crashes by playing around with the rebuild frequency or
turning this function off, but I haven't found a pattern to when it helps or
how to find the optimum frequency, other than trial and error with a given
user's data. (As with column widths, each user has patterns in the way he
uses the program.)

"...we got the time down a lot by storing the results in arrays and then
writing them out in bulk at the end..." Do you mean that you calculated the
results in Excel, copied these results to arrays, deleted the sheets in
which the calculations were made, recreated the sheets at the end, and then
copied the results back? If so, do you know why this helped? If the reason
was that it avoided unnecessary Excel calculations, I already accomplish
that by using Range.calculate and Sheet.Calculate, as you suggested. If the
benefit was that fewer Excel internal resources were used, that sounds like
something I should try. It might also improve reliability. In fact, the way
my program is structured would lend itself to this. One complication,
however, is that I need to copy not only values and formulae but also
formats; different rows are differently formatted. However, there's a
limited set of row formats, so I could record the formats for each row and
copy a range containing each row's formats.

I've thought of multi-threading my program so that several iterations of the
loop are done in parallel, thereby taking advantage of multiple cores. The
problem is that you can't have multiple instances of Excel running, although
I understand that I could do this by using multiple users, perhaps via
impersonation. However, that sounds tricky, plus I don't know how
communication between users would impact speed. Anyway, it's another thing I
haven't had time to try. Although Excel 2007 uses multiple threads,
benchmarks show that this doesn't help me. I suspect that it mainly benefits
large worksheets, rather than many small ones, each calculated in a
different iteration through the loop. Does any version of Excel 2010 let VB
6 create multiple instances and use them simultaneously?

Again, thanks very much.

Steve
 
C

Charles Williams

"Which method are you using to copy ranges?" RangeFrom.Copy RangeTo
I tried other methods without noticing any difference, but only with Excel
2003 and only when trying to overcome a specific bug, so I can't make a
general statement about whether there's a difference in reliability on
average.

Might be worth trying a copy via a Variant array instead of the
clipboard.
also imply that I could improve speed by converting my DLL to VB 2010?
..net is currently very slow with Excel unless you use one of the
packages that work through the XLL C API
"...we got the time down a lot by storing the results in arrays and then
writing them out in bulk at the end..." Do you mean that you calculated the
results in Excel, copied these results to arrays, deleted the sheets in
which the calculations were made, recreated the sheets at the end, and then
copied the results back?
What we did was:
create minimum size workbook for calculation (all static stuff
removed, close full-sized workbook)
Change input data, Calculate, store results in array, Loop
After the loop reopen the full-sized workbook and write out all the
results from the array.
I've thought of multi-threading my program

RangeCalc and VBA are not multithreaded in 2007.
Maybe you could generate partitioned subset workbooks, open them in
multiple separate Excel instances, write the results to files, then at
the end read back all the files and recreate the master workbook with
the results from the files.
 
S

Steve Flaum

"Might be worth trying a copy via a Variant array instead of the clipboard."
But that would copy only values or formulae, right? I need to copy all
formats too, and I don't know a practical way to do that except to copy a
range.

"What we did was: create minimum size workbook for calculation (all static
stuff removed, close full-sized workbook), Change input data, Calculate,
store results in array, Loop. After the loop, reopen the full-sized workbook
and write out all the results from the array."
Why did you close & reopen the full-sized workbook? Since you calculated
only the small workbook, it would seem that leaving the big workbook in RAM
wouldn't matter. I do believe that it would help -- I've seen that closing
unused books can improve reliability, so it seems reasonable that it would
also improve speed -- but I don't understand why. When I saw the reliability
improvment, I thought I had been exhausting some internal Excel resource,
and that was the motivation for my original question about using 64-bit
Excel. If that's wrong, what's the benefit of closing workbooks which aren't
being used? I could defer setting the formats until calculation finishes if
there's a benefit to removing the large workbooks from RAM.

"RangeCalc and VBA are not multithreaded in 2007. Maybe you could generate
partitioned subset workbooks, open them in multiple separate Excel
instances, write the results to files, then at the end read back all the
files and recreate the master workbook with the results from the files."
But wouldn't the separate instances interfer with one another? I use
RangeCalc, which I could replace with Sheet.Calc. However, I'd still be left
with VBA. There aren't many lines of VBA code, but they're called a lot.
Could I overcome this via the XLL C API?

Also, at the moment I'm doing a lot of Range.Copy, which uses the sysem
clipboard, but I could Range.Copy via your suggestion above. I could then
copy formats in a separate, single-threaded phase at the end. Unfortunately,
there doesn't seem to be a programmatic way to copy via a clipboard other
than the common system clipboard.

Are Range.Copy, Range.Calc, & VBA the only reasons multiple instances would
interfere with one another? Does Excel 2007 differ from Excel 2010 in this
regard?

Thanks again.
 
C

Charles Williams

"Might be worth trying a copy via a Variant array instead of the clipboard."
But that would copy only values or formulae, right? I need to copy all
formats too, and I don't know a practical way to do that except to copy a
range.
You could use PasteSpecial for the formats, or create some Styles and
apply them which would avoid the clipboard.
Handling formatting separately from values should also allow more
optimisation of the overall process.
"What we did was: create minimum size workbook for calculation (all static
stuff removed, close full-sized workbook), Change input data, Calculate,
store results in array, Loop. After the loop, reopen the full-sized workbook
and write out all the results from the array."
Why did you close & reopen the full-sized workbook? Since you calculated
only the small workbook, it would seem that leaving the big workbook in RAM
wouldn't matter.
Excel calculates and handles many things at Excel session level rather
than workbook level, so there are overheads associated with having
unused stuff in RAM (not to mention Cache problems etc).
"RangeCalc and VBA are not multithreaded in 2007. Maybe you could generate
partitioned subset workbooks, open them in multiple separate Excel
instances, write the results to files, then at the end read back all the
files and recreate the master workbook with the results from the files."
But wouldn't the separate instances interfer with one another? I use
RangeCalc, which I could replace with Sheet.Calc. However, I'd still be left
with VBA. There aren't many lines of VBA code, but they're called a lot.
Could I overcome this via the XLL C API?
If you have 2 separate Excel sessions running they would each have
their own resource pools. Windows should allocate the requirements
across multiple CPUs etc without undue interference.

regards
Charles
 
S

Steve Flaum

Based on your comments, I just started to change my program to copy through
a VB array at the end of execution, rather than keeping all workbooks open
in Excel and continually using Range.Copy. However, I came up against a
snafu.

I copy formulae, not values, from the many small worksheets to the final
large workbooks. This copy may be to a different row. (The coumns don't
change.) Therefore, I rely on Range.Copy to adjust references (i.e. cell
addresses). If I copy through an array, I must adjust row number by one of
the following methods:

1) Copy the last formulae from the array to the large workbooks first, then
insert rows to make space for the formulae which were computed earlier. The
insert would, of course, adjust row numbers. However, I'm concerned that
frequent row insertion might cause a problem with reliability or speed,
although I've never done this. Do you know if this is likely to be an issue?
If not, I'll test it, but that will take a good deal of programming, so I
thought I'd check first.

2) Alternatively, at the end of execution I could write the formulae from
the array back to the small worksheet and use Range.Copy to populate the
large workbooks while adjusting references. This wouldn't elliminate
Range.Copy, of course, but would let me populate one large workbook at a
time, minimizing the number of workbooks open in Excel.

3) Lastly, I could programmatically parse the formulae and adjust row
numbers myself, but this sounds like a lot of programming, and a lot of
debugging.

Do you have an opinion of which method is best? If none looks good to you,
do you have any other suggestions?

Thanks again.

Steve
 
C

Charles Williams

If all your formulae used relative row numbers for the row numbers
that need to adjust it might work OK?
(Think whether the R1C1 version of the formula needs to change)

frequent row insertion is very slow, even in manual calc mode, because
Excel has to scan all the formulae for adjustments. The time to insert
a row is therefore a function of the number of formulae (probably in
all open workbooks).
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top