Opening/Closing Large Files

N

Nigel

I have a VBA process that reads thru a single folder (on a corporate
network - high performance) of Excel workbooks, sequentially opens, extracts
a small number of data, then close without save. Only one workbook is open
at anyone time. There are 1350 workbooks which are all about ~7MB in size.

When the process first starts files are opened and closed reasonably
quickly, at about 12 per minute, as the process progresses it gets slower
and slower and eventually appears to stop (no error messages).

It appears as if system resources are being consumed as each file is
processed that are not released. Is this possible or is something else
causing the problem? My code is as follows.... the names of each files to
process are already stored on sheet (shcontrol) column A

Sub ProcessFiles()

Dim dataWb As Workbook
Dim xr As Long, nextData As Long

Application.EnableEvents = False
Application.ScreenUpdating = False

' start row for data
nextData = 1
' process each workbook
For xr = 1 To 1350

' open data workbook
Set dataWb = Workbooks.Open("H:\myFiles\" & shControl.Cells(xr, 1))

' get data
With dataWb.Sheets("Appraisal")
shControl.Cells(nextData, 4) = .Cells(10, "I").Value
shControl.Cells(nextData, 5) = .Cells(11, "I").Value
shControl.Cells(nextData, 6) = .Cells(12, "I").Value
End With

' advance o/p row
nextData = nextData + 1

' close workbook
dataWb.Close False

Next xr

Application.EnableEvents = True
Application.ScreenUpdating = True

End Sub
 
D

dbKemp

I have a VBA process that reads thru a single folder (on a corporate
network - high performance) of Excel workbooks, sequentially opens, extracts
a small number of data, then close without save. Only one workbook is open
at anyone time. There are 1350 workbooks which are all about ~7MB in size.

When the process first starts files are opened and closed reasonably
quickly, at about 12 per minute, as the process progresses it gets slower
and slower and eventually appears to stop (no error messages).

It appears as if system resources are being consumed as each file is
processed that are not released. Is this possible or is something else
causing the problem? My code is as follows.... the names of each files to
process are already stored on sheet (shcontrol) column A

Sub ProcessFiles()

Dim dataWb As Workbook
Dim xr As Long, nextData As Long

Application.EnableEvents = False
Application.ScreenUpdating = False

' start row for data
nextData = 1
' process each workbook
For xr = 1 To 1350

' open data workbook
Set dataWb = Workbooks.Open("H:\myFiles\" & shControl.Cells(xr, 1))

' get data
With dataWb.Sheets("Appraisal")
shControl.Cells(nextData, 4) = .Cells(10, "I").Value
shControl.Cells(nextData, 5) = .Cells(11, "I").Value
shControl.Cells(nextData, 6) = .Cells(12, "I").Value
End With

' advance o/p row
nextData = nextData + 1

' close workbook
dataWb.Close False

Next xr

Application.EnableEvents = True
Application.ScreenUpdating = True

End Sub

--

Regards,
Nigel
(e-mail address removed)

Look at John Walkenbach's site for article on reading from closed
workbooks:
http://j-walk.com/ss/excel/tips/tip82.htm
Also:
shControl.Cells(nextData, 4) might be shControl.Cells(nextData,
4).value
Maybe you should set dataWb=nothing on every loop.
 
D

Dave Peterson

I don't have a good suggestion.

Maybe a bad one. Keep track of what file you're processing and after a few
(before it slows down to a crawl), close (and save the file), close excel,
reopen excel and your file and do the next batch.

Maybe closing excel woudn't be necessary???

======
And I've never had to do this with 1350 workbooks, but since you're just
retrieving 3 values, maybe it would be quicker to build formulas and plop them
into the cells directly.

If it slows down after a bunch, then do it in groups and convert the formulas to
values for each bunch.

Just guesses, though. I'd test on a much smaller number of files, too.
 
N

Nigel

Thanks for the tip, the get data from closed file using John Walkenbach code
looks interesting. I will try that out.

--

Regards,
Nigel
(e-mail address removed)
 
N

Nigel

Hi Dave,
Well it might come to that. . The files are tracked and I am considering
creating 10 sub folders with less then 135 files each as this number appears
to work OK. So to get it done I will probably do that.

I do want to discover what is causing this breakdown though so I shall run
some more tests as I am not sure why the system slows or indeed if it is
Excel or the OpSys.

Cheers

--

Regards,
Nigel
(e-mail address removed)
 
D

Dave Peterson

I'd really consider building the formulas--I'm not sure if you read that
portion.
 
J

Jim Cone

Nigel,
Since I am not on a network, I hesitate to reply.
Your other responses cover most of the ideas I considered,
however, you also might try using an object reference instead of
the sheet code name - "shControl".
Also, is there any kind of security protection on the network that
might shutdown access after repeated access requests?
That might be something your IT people could have set up.
--
Jim Cone
San Francisco, USA
http://www.realezsites.com/bus/primitivesoftware
(Excel Add-ins / Excel Programming)



"Nigel"
wrote in message
I have a VBA process that reads thru a single folder (on a corporate
network - high performance) of Excel workbooks, sequentially opens, extracts
a small number of data, then close without save. Only one workbook is open
at anyone time. There are 1350 workbooks which are all about ~7MB in size.

When the process first starts files are opened and closed reasonably
quickly, at about 12 per minute, as the process progresses it gets slower
and slower and eventually appears to stop (no error messages).

It appears as if system resources are being consumed as each file is
processed that are not released. Is this possible or is something else
causing the problem? My code is as follows.... the names of each files to
process are already stored on sheet (shcontrol) column A

-snip-
Regards,
Nigel
(e-mail address removed)
 
N

Nigel

Hi Jim

The object reference I will try.

I have just checked the PC resources and it does not appear to be
accumulating more, as you might expect the resources go down as the file is
opened and return to similar levels after the file is closed. I cannot see
any leakage and I am stumped.

Good point on the network front, I need to check with the IT guys, but this
slowdown is progressive as more files get processed it get slower. From 5
secs per file at the start down to 15 secs per file by 100 files, 30 secs
per file by 200.....etc. Not sure the network would slow like this unless
there is some form of bandwidth throttling?
 
N

Nigel

Yes, I did read it and am going to give it a try, together with the read
closed file approach from John W site.

Thanks

--

Regards,
Nigel
(e-mail address removed)
 
C

Charles Williams

I suspect its the string pool ...
AFAIK the way Excel works it stores every unique string it finds in the
string pool and then every time it finds a string it looks it up in the
string pool to see if it exists.
I think the string pool only ever gets larger and slower within an excel
session.

If it is the root cause of the problem then the only solution is to close
Excel partway through and start from where you left off.

Charles
__________________________________________________
Outlines for my Sessions at the Australia Excel Users Group
http://www.decisionmodels.com/OZEUC.htm
 
N

Nigel

Good call. I read the MS article on the string pool and one fix is to
create a large string pool by filling a temporary sheet with a large number
of strings. When re-opened this pool results in Excel not adding more as it
already exists. Sample code below, which I am about to put to the test.

Public Sub GrowStringPool()
Dim row As Integer
Dim col As Integer
Dim text As String
text = "Here is some text:"
For col = 1 To 21
For row = 1 To 1000
Cells(row, col).Value = text + Str(row * col)
row = row + 1
Next
col = col + 1
Next
End Sub

--

Regards,
Nigel
(e-mail address removed)
 
N

Nigel

It worked!

I added a large string pool, re-opened the file and run my code. The total
load time for a few files trebled. Clearing the sheet with the strings on
and re-running the code, reduced the load time back. This appears to be the
evidence that the string pool is an issue.

I am now testing with a high volume of files.

Thanks for the help

--

Regards,
Nigel
(e-mail address removed)
 
C

Charles Williams

I had not seen the MSKB article before, well found.

Let us know if the suggested bypass does indeed speed things up.

regards
Charles
__________________________________________________
Outlines for my Sessions at the Australia Excel Users Group
http://www.decisionmodels.com/OZEUC.htm
 
D

Dave Peterson

Just trying to summarize...

This KB article??

http://support.microsoft.com/kb/911580/en-us
Importing data may take longer than expected or Excel may appear to stop
responding (hang) when you import data into a workbook in Excel 2002 or
in Office Excel 2003

But I'm still kind of confused at the solution for Nigel.

(This is taken mostly from that KB article.)

1. Start Excel, and open the workbook where you want to import the data.
2-6 . Add a temporary worksheet to the workbook and fill it with a bunch of
strings
7. Save the workbook.
8. Close and reopen the workbook. Excel will evaluate that the workbook already
contains a large existing string pool. This will prevent the addition of new
string data into the existing string pool.
9. Delete the temporary worksheet that you added in step 2.

10. Nigel would start his routine that gets data from his 1350 other workbooks.

I wasn't sure if step 10 should come before step 9.
 
C

Charles Williams

Well, IF it works I imagine it would presumably be because

- Excel has an internal optimisation which says the string table has a
maximum size (<21000 entries in these Excel versions, wonder what size it is
in Excel2007), and if you open a workbook containing a string table of this
size Excel sets a session variable to say do not add any more strings to the
table.

- deleting the sheet that the string table was derived from empties the
string table but does not reset the Do_not_add switch

- because the string table is empty and the do_not_add switch is on it is
very fast to search the string table and no time is used adding anything to
the string table (the time for both these operations would presumably be
large for a large string table).

So if this guess/explanation is correct yes you should do step 9 before step
10.

But I am not convinced that all this is going to work any faster!

BTW we could probably find out by experiment what the max size of the string
table is for Excel 2007 since its a separate component inside the new file
format.

Charles
__________________________________________________
Outlines for my Sessions at the Australia Excel Users Group
http://www.decisionmodels.com/OZEUC.htm
 
D

Dave Peterson

Thanks for your thoughts, Charles.

My gut feeling is that it would still be quicker to build formulas to retrieve
just 3 values from each of the 1350 workbooks.

Do you have a recommendation for Nigel?
 
J

Jim Cone

Some more thoughts for what they are worth...
1. Those wouldn't all be xl 2003 workbooks that calculate when they open?
2. Could you move the folder to your own drive, do your work and
move it back?
3. Dave's idea about using formulas seems more like the way to go.
You wouldn't have to open each workbook and since the cell locations
are fixed they couldn't be very complicated.
--
Jim Cone
San Francisco, USA
http://www.realezsites.com/bus/primitivesoftware
(Excel Add-ins / Excel Programming)



in message
Hi Jim
The object reference I will try.

I have just checked the PC resources and it does not appear to be
accumulating more, as you might expect the resources go down as the file is
opened and return to similar levels after the file is closed. I cannot see
any leakage and I am stumped.

Good point on the network front, I need to check with the IT guys, but this
slowdown is progressive as more files get processed it get slower. From 5
secs per file at the start down to 15 secs per file by 100 files, 30 secs
per file by 200.....etc. Not sure the network would slow like this unless
there is some form of bandwidth throttling?
 
C

Charles Williams

Yes, I think you are right ... I would think thats the way to go in this
case (assuming that reading from closed workbooks does not build the string
table).

(but I still want to know if the magic string table trick works!!!)

My gut feeling is that it would still be quicker to build formulas to
retrieve
just 3 values from each of the 1350 workbooks.


Charles
__________________________________________________
Outlines for my Sessions at the Australia Excel Users Group
http://www.decisionmodels.com/OZEUC.htm
 
D

Dave Peterson

I'm interested in Nigel's results, too.

I had to do something similar a longgggg time ago. But I didn't know the
worksheet names and had to retrieve more than 3 values. (and not close to 1000
files!)

It was much, much quicker to copy the files to my local C: drive and run it
there than to open across the network.
 
N

Nigel

Thanks Guys for stimulating the debate, the string pool solution did not
speed it up, it did slow it down in so much that I loaded a large pool as
per MS article and timed the open-extract-close steps which increased, I
then removed the temporary sheet and the process went back to normal speed.
If the limit is 21,000 pool items I guess I reach that at around 700 files
the point at which my process as at a near standstill, so I could save-open
and resume maybe?

However I am working on a formula version as proposed and will be testing
that soon.

I will let you know how it goes. Thanks for all your advice.

--

Regards,
Nigel
(e-mail address removed)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top