Need FASTEST way to get data from a large closed Excel File

D

Dave B

Hello,

Excel 2000

I have created a user report generator that gets data from a large (42,000
rows and 208 Columns) closed Excel file.
I've optimized my code and creating the report takes 3 seconds AFTER I get
the data. The problem is getting the data takes 30 seconds. I want to get
the specific data and put it into an array in my code. Then it is very fast
to manipulate.

My code opens the file, sorts it (primarily by Customer Code) searches
through a column that contains Customer Codes, once it finds the first
occurence of the customer code, it grabs data from 32 columns (Not all
together but scattered) and puts it into an array, then loops until the last
occurrence of the customer code is found.

Opening the file takes 15 seconds and gathering the data takes 15 seconds
resulting in 30 seconds.

If it will help, we can create a file that captures the rows that each
company's data is in so we can get the data more efficiently. I've tried
using ADO with limited success - it doesn't save time and doesn't collect
all the data in columns that have varying data types. I've never used ADO
and there might be better ways to do it with ADO.

I've also tried creating links to the database file and putting the data
into an array, but again this doesn't save time.

Any suggestions will be appreciated. If you are really into it, I pasted my
code below for your perusal and critical comments.

Thanx !!

dave


Below is the code now in use:

Option Base 1

Sub Getdata()
Application.DisplayAlerts = False
Application.screenupdating = False
On Error GoTo ErrorHandler

Dim i As Long
Dim starttime
Dim endtime
Dim countloop

ThisWorkbook.Activate

'Define which columns (in required order) to pull

'**If adding/deleting columns MUST change NumberOfItems
'BEST TO ADD AT END - that way it won't affect other subs that look for
specific location in MyArray
MyGet = Array(2, 3, 21, 128, 155, 149, 150, 151, 152, 64, 15, 11, 12, 14,
148, 13, 156, 1, 5, 6, 7, 8, 16, 19, 142, 166, 170, 169, 171, 172, 173, 174)

'IMPORTANT: Set number of items that above array contains - used throughout
this macro
NumberOfItems = 32

ReDim myArray(20000, NumberOfItems) As Variant

'Set i to 1 to begin copy at first row which is column heading
i = 1

'Open database file MyFileName as selected in opening macro
Workbooks.Open Filename:=MyfileName
Capturefilename = ActiveWorkbook.Name

'++++++++++++++++++++++++++++++++++++++++++++++++++++++
'VERY IMPORTANT: SORTS MUST BE IN CORRECT ORDER
'Logic is tied to company & book changes as well as others

'+++++++++++++++++++++++++++++++++++++++++++++++++++++++


'First Sort: FirmAcct, Order1, Tenor
Range("A2").Select
Selection.Sort Key1:=Range("S2"), Order1:=xlAscending, Key2:=Range("EL2") _
, Order2:=xlAscending, Key3:=Range("FM2"), Order3:=xlAscending,
Header:=xlGuess, _
OrderCustom:=1, MatchCase:=False, Orientation:=xlTopToBottom

'Second Sort Company, Book
' (Sort Company in descending order to get "AVM" (which is by far the
largest) to bottom - makes looping more efficient
Range("A2").Select
Selection.Sort Key1:=Range("B2"), Order1:=xlDescending, Key2:=Range("C2") _
, Order2:=xlAscending, Header:= _
xlGuess, OrderCustom:=1, MatchCase:=False, Orientation:=xlTopToBottom


'Put titles in first row of array
For a = 1 To NumberOfItems
myArray(1, a) = Cells(1, MyGet(a))
Next a

'Start at location "B2"
Cells(2, 2).Select
Set curcell = ActiveCell


'Cycle thru all cells in column "F" until blank cell is encountered ("")
'First Cell with a "" (No data or blanks - absolutely empty) will cause loop
to stop
Do While curcell.Value <> ""

If curcell.Value = MyCompany Then

Do While curcell.Value = MyCompany

i = i + 1
'Put selected columns in the current row into array
For j = 1 To NumberOfItems
myArray(i, j) = Cells(curcell.Row, MyGet(j))
Next j

'Increment active cell to next cell below
Set curcell = curcell.Offset(rowOffset:=1, columnOffset:=0)
Loop

'Once company changes no need to continue - data is sorted by company
Exit Do
Else
GoTo NextItem
End If

NextItem:

'Increment active cell to next cell below
Set curcell = curcell.Offset(rowOffset:=1, columnOffset:=0)
curcell.Select

Loop

'Counts how many items put in array less the first row (Titles)
TotalHits = i - 1

'Close database file
Windows(Capturefilename).Close

'Activate this workbook
ThisWorkbook.Activate

GoTo Skip

'Gives error message instead of vb error if file doesn't exist
'And stops macro execution
ErrorHandler:
MsgBox ("Error occured while trying to get database file" & Chr(13) &
"Ensure file " & MyfileName & " exists")
End

Skip:

If TotalHits < 1 Then
Range("E2").Select
MsgBox ("Did not find any data for selected company !" & Chr(13) & " -
Please ensure company code is entered correctly-")
End
End If

End Sub
 
T

Tom Ogilvy

If you are willing to break you data up into separate files, then those
wouldn't take nearly as long to open. I might suggest putting the data in
..csv files and using low level file IO to open them and parse out the data.

http://support.microsoft.com/support/excel/content/fileio/fileio.asp
File Access with Visual Basic® for Applications

or if you want to preprocess you data, you might want to look at this:

http://support.microsoft.com/default.aspx?scid=kb;en-us;165942&Product=vb6
HOWTO: Write Data to a File Using WriteFile API
 
D

Dave B

Thanks Tom.

I experimented with creating separate files per customer and it vastly
improves the speed - 35 seconds down to 6 or less seconds. Since most
customers have less than 2,500 rows of data, this seems like the best
solution. This may not be an option since I am dependent on other
developers to re-write their code to create separate files instead of one
large file. There also may be other "reasons" why we need to keep the "big"
file like other programs already link to it, etc.

My task is to create an Excel file that the users will use to get their
customer data. If it takes too long, the users don't like it. But, for
what it has to do, I think it runs pretty well at this point. My boss wants
me to call Microsoft and pay for someone to suggest other ways to do this
that may be faster or to verify that it is as fast as can be given the
constraints.

I will look into the links you supplied for more ideas - thanks very much
for your help and advice.

dave
 
T

Tom Ogilvy

Maybe put the data into Access and use ADO. Might be faster. You can
transfer an ado record set to an array in one step. I believe it will be
transposed from what you would expect.


Regards,
Tom Ogilvy
 
A

Alex@JPCS

Dave,
I have followed your thread with Tom, and IMVHO, I think processing this in
less than a minute isn't too bad at all.

You mentioned:
I've also tried creating links to the database file and putting the data
into an array, but again this doesn't save time.

Have you tried reading this into a VBA array, and doing sorts and
manipulations on the VBA array instead of on the sheet? My guess is that 15
of your seconds are related to the multiple read statements.

AlexJ
 
D

Dave B

Tom,

I would have created the report generator in Access but was directed to NOT
use Access. In fact, when I first started this project I loaded Access onto
my computer and was working away when the person heading this project came
by and almost threw up. They were adamant that I not introduce new software
from what they are used to. So, I must do it all in Excel. Can't even
use a report generator although now I've just about written one in vb to do
what I've been requested to do. I must say that I've learned a lot so it
was a good experience. I'm now mildly stressed-out because I can't get it
any faster with the constrainsts I've been given. I guess they'll be
satisfied once I get a Microsoft expert to say it's as fast as it can be
given the current environment.

Thanks again for your input - when I first started this job a year ago you
pointed me towards the "On-Time" function in Excel (needed DDE links to
update while macro was running). That made me a hero because it paved the
way to convert over 100 LOTUS files to Excel.

dave
 
J

Jake Marx

Hi Dave,

Although ADO is sometimes difficult to use in certain situations (as you
have found), it may end up being much faster than what you are using now.
If the data isn't too sensitive, feel free to email the workbook to me if
you'd like. I'll see what I can do with it using ADO.

My email: jake <at> longhead [period] com

--
Regards,

Jake Marx
www.longhead.com

[please keep replies in the newsgroup - email address unmonitored]
 
D

Dave B

Alex,

I've tried so many things that I'm not sure what I've done anymore. I did
try putting the linked cells into an array and then clearing the linked
cells. It worked well but didn't cut down on time. Using arrays is the
biggest time-saver I've found but you have to get the data in them first.
It was a neat process that I will use with other less bulky files in the
future. Bill Manville on this News group suggested that (Thanks again Bill)

Thanks for the input.

dave
 
T

Tom Ogilvy

ADO should be able to read an access mdb file without having access on the
machine. You can also link to is using a query table from excel or from a
pivottable (neither require code, but can be coded to change what records
are returned). I haven't done much linking to external data using these
methods so I can't say what the performance is, but what I have done has
been very fast.

Of course, to work with the data outside Excel would require access, but
this may not be a constraint.
 
J

jaf

Hi Dave,
I'm not looking want to confuse the issue, but have you tried data>get
external data, database query, using your Excel workbook as the datasource.
You can sort the data as it is being pulled from the closed workbook.
I just did a quick test and Excel pulled 600 rows &12 columns of data in
under .2 seconds.

This will use MSQuery and write a SQL query. Similar to ADO but all Excel.
Record a macro while running the query and Excel will write the code or save
the query and it can be edited later.
 
D

Dave B

John,

I did work in that direction for a while. It took about 3 seconds off the
time and I found that the Header titles change daily which blows up the
query. Several of the headings in the database file have the current date
and end of last month as part of the title - which isn't good database
technique but it's what I have to work around.

Thanks for the input - all the techniques suggested in the newsgroup I've
tried work great with smaller files but when the database file has 42,000
rows and 208 columns, it gums things up a bit.

dave
 
D

Dave B

Tom,

Good advice but I can't win that argument to use Access. The database file
is Excel and I must use Excel to create the reports. Part of the reason is
valid because everyone here knows Excel but few if any know Access. In the
past (before me) we had an Access guru design many cool reports but when he
left, no one maintained the reports/links and they became unusable.

At least now I am more comfortable telling my boss that we can't make the
file run faster with the given constraints. Still haven't called Microsoft
on the subject but I will in the next few days after exhausting you guys and
other sources!

Thanks for all input.

dave
 
J

jaf

Hi Dave,
FYI: I don't work much with db's and queries but 3 seconds is a significant
performance gain.
There are querydef's that can dynamically retrieve the field names each time
it's run.



I'm not as good at coding as most people here, but I was looking at the code
in your original post.
You said.
"My code opens the file, sorts it (primarily by Customer Code) searches
through a column that contains Customer Codes, once it finds the first
occurrence of the customer code, it grabs data from 32 columns (Not all
together but scattered) and puts it into an array, then loops until the last
occurrence of the customer code is found.

Opening the file takes 15 seconds and gathering the data takes 15 seconds
resulting in 30 seconds."


You do a lot of "selects" and "Set curcell" to populate your array using
unnecessary CPU cycles.
You don't show what you do with the array. I presume it's in another sub.

My point being a worksheet is an array.
If you can populate the array with the do loop you should be able to
populate the new sheet/wb instead of the array.
Eliminating the middleman and 15 seconds or so.

(how is mycompany rowvalue found)

'Put titles in first row of array
For A = 1 To NumberOfItems
newsheet.cells(1, A) = oldsheet.Cells(1, MyGet(A))
Next A

for i=1 to numberofitems '2="A2"
for j=1 to 33
newsheet.cells(i+1,j)=oldsheet.cells(mycompany.row+i,myget(j))


Simplistic quasi pseudo code but may look something like that.
 
D

Dave B

John,

Thanks for your suggestions. The sub I sent is the one that takes around 30
seconds. The rest take around 5 seconds and are quite involved with
calculations and cell placements/formats for the final Excel report. To use
the rest of my code and to place things properly on the report, I first need
an array in the designated order - cannot paste directly to worksheet (which
would still be at the 30 second mark anyway).

I've found that doing things within vb and utilizing arrays vastly speeds
things up. I usually work with much smaller files in automated processes
where time (30 seconds to one minute) is not a concern. I'm not a trained
programmer so I'm sure there are things that I'm not aware of or that I can
do better.

I noticed that I used selects, etc and have since cleaned those up but with
little or no performance gain. If I reduce the database file to say 2,500
rows, everything runs very fast. So I'm convinced that I can not get much
better performance in time without breaking down the database file to
smaller units. That may not be possible since the Excel database file is a
nightly dump of our mainframe data and I don't think "they" will want to
modify/add to that.

Thanks for your help.

dave
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top