Which is faster--Array or ADO

R

RLN60

RE: Access 2007

I have 70k rows in one table and about 50k rows in another table. They are
joined together via proper keys and are indexed. I need to load them into
either an Array or and ADO recordset, do a bunch of processing then create an
Excel export and wanted to know which is faster:
1) Loading 70k rows from a table into an ADO recordset and working with it
in ADO
2) Loading 70k rows from a table into an Array and working with it in the
array.

I would like to know a bit about the theory behind why one would be faster
than another. Arrays are in RAM. Is the ADO recordset in RAM also?

Thanks.
 
A

Albert D. Kallal

In most cases, it's probably better to use your record set.

I mean if you think about this process, you are ALWAYS loading something
into RAM be it an array, or that of a reocrd set.

So, one is not necessarily going to your processing speed because it is ram.
In fact ALL processing in the computer is done in ram.


I mean:

#1 using an array:

loop
read one record from disk into array
modify this record
now write record back to disk
until all records done
write data results out to spreadsheet

#2 using an reocrdset
loop
read one record from disk into reocrdset
modify this record - write back to disk
untill all reocrds done
write data results out to spradsheet

I mean, in both cases, the BULK of your time going to be reading from the
hard disk, and not the processing part anway.

The other advantage of using a record set tends to be that you can use SQL
sum functions and things like joins.

For example you might have a customer's table, and a customer invoice table.
If you load all the customers into an array, and then load all the customer
invoices into another arrary then you'll have to use a significant amount of
additional processing to join up the two arrays to total up each customer
invoices. In you use a reocrdset, then you can use a SQL join to join up to
two tables process the whole thing is one pass. In fact, you might even have
sql generate the totals for you and not even have to process this reocrd by
reocrd.

sql = join of two tables
loop
read one reocrd (that is a join of two tables)
modify or process
untill all reords are done

Furthermore in many cases you can use SQL to do an update in one shot and
this again tends to be faster than writing your own custom code to modify
the data one reocrd by one record at a time.

At the end of the day without knowing what kind of processing you doing here
it hard to guess.

In most cases there is not going to be a lot of difference. To load data
into a array you have to FIRST pull it into a reocrdset anyway!

So, using SQL commands and record sets tends to be far less work, and
requires far less code to accomplish the same goal. Furthermore you have
this query engine that optimizes the task at hand (it can make decisions as
whether it's better to read some data sequentially, use an index to pull the
data out, it makes intelligent decisions about how it will grab the data).

In both cases you going to read a reord from disk....modify it in memory,
and save it back. So, pulling the data into an arrray (extra work and time
and code and proecessing to setup and build that array), and THEN modifing
the data, and THEN writing a bunch of code to write out the array back to
disk is not going to save any disk i/o here at all.

Of course there can be some advantages of an array and that depends on how
the data is going to be manipulated. For the most part if you think about
what's going on here, you have to read the data from disk in both cases into
RAM to be able to do anything to that data.

It also depends on "how" you plan to decide what data needs to be read into
the arrary for procesing. If you not needing all reocrds, then it can often
be a waste of computer resources to read the whole table into that array
especially for turns out that you don't need to process all records in that
array.

So there's no simple answer, and there's no way of knowing without seeing
the kind of processing and the approach to your problem you trying to solve
here. So a clear-cut answer based on the informaton you given is not really
possbile.

However the above does point out that simply choosing an array because it
uses RAM is not necessarily going to be faster, since in both cases you have
to read the information from the hard disk into RAM before you can actually
do anything with the data anyway.
 
V

vanderghast

Depends of your join and of your need.

Any solution based on 'data lives in memory' is hardly multi-users friendly:
if data change, you need to resych it, not necessary trivial.

There is also two opposite 'ways' of doing it: the first one that I will
call the DAO way, just to give it a name, would be to create bookmarks for
each row in the result of the join, and here that can make up to 3.5G rows
if you use a Cartesian join ( 3.5G = 70K times 50K ). Clearly, the whole
data (of the join) may not fit in committed memory, AND you may end up with
multiple duplicates Disk IO, when you navigate over this monster, that is,
reading the same initial row of one of the table, multiple times from DISK
(which is slower than reading cached data in memory) . The other way, let
us call it, ADONet, again, just to give a name to it, would be to read each
table, locally, so taking only 120K of committed memory ( = 70K + 50K), and
'make' the join locally, such as looping over each table now in memory
(explicit looping or using some other mechanism, call it LINQ or by other
name, it does not really matter for the illustration). That solution will be
much faster than the first, since any INITIAL row would only be read ONLY
ONCE, but YOU may have to add additionnal functionnality, all by yourself,
which may introduce additionnal more or less complex behaviors (lock,
update/delete/append/resynch, etc).

In fact, your "array" solution is close to the one I called ADONet, but you
have to add the join mechanism, yourself, which can be easy, or very
difficult, depending on your EXACT scenario.




Vanderghast, Access MVP
 
D

david

Depends what you mean by 'faster' :~)

It will be faster for you to use ADO. ADO has lots of features
to make data manipulation easier and more logical and less error
prone.

It may be faster for the computer if you do it in an Array. Arrays
don't have lots of features internally to keep up-to-date and
synchronised, so they run faster.

How much faster? Well, I've done both, and I can tell you
that for me, using a DAO recordset is about a day to 2 weeks
faster.

Since I can do the DAO stuff in a couple of hours, I would never
consider using an Array until I had already done the DAO solution,
built the framework and the tests, and demonstrated that [it is
already fast enough].

Ok, sometimes it wasn't fast enough, but that was rare, and
computers were slower then.

(david)
 
R

RLN60

Albert, thank you for your response. You are correct in that there is no
"cookie-cutter" approach to the solution, in that it depends on the initial
problem that is trying to be solved. I use recordsets pretty much by default
because of the SQL join advantage. I just came across an app the other day
that had a lot of array processing in it, which got me to wondering,
therefore my post.

For the most part, I tend to stick with ADO recordsets because of the SQL
join advantage as you mentioned. I think the only time I read stuff into an
array was when there were only a small handful of records from single tables
that required no joins to other tables.

I do appreciate you taking the time to explain in detail. Have a good
weekend.

Ron
 
R

RLN60

Vanderghast,

As you mentioned DAO here, now my curiosity is getting the best of me here,
since DAO is built into the Jet Database engine, is it faster than ADO, since
it is 'internal'?

The internals of Access have intrigued me for a long time, and I want to
learn more about it, but have not found good books on the subject; how
knowing about the internals can make for better, more efficient development.
 
D

Douglas J. Steele

Since DAO was developed specifically for Jet databases, yes, it's almost
always faster than ADO (which has to go through more levels of abstraction)
 
R

RLN60

Thanks for the clarification Doug. I figured DAO was quicker, being part of
Jet, but it is good to have an MVP to confirm that. :)
 
V

vanderghast

DAO is *not* internal. SQL *is* internal.

DAO and ADO are both external libraries which allow you to get data OUT of
the database, INTO a recordset object, as example. ADO, mainly ADONet,
allows to minimize the data transfer over the network (from the database to
the PC consuming the data) by letting you make the join locally (on the PC)
rather than at the database side, and thus, if you consider the data
transfer over the network, ADOnet can be an order of magnitude faster. But
at a price if you need to resynch the data.



Vanderghast, Access MVP
 
V

vanderghast

Well, note that Jet, usually, makes the join locally, at the consumer side,
but MS SQL Server (or pass through queries in general) are executed on the
database server and the result, send over the network.


Vanderghast, Access MVP
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Top