"Normalizing" imported data

P

pccdavef

Hello all -

I have a situation where I'm importing substantial amounts of data (30,000 to
120,000 rows at a pop) from three external sources through a text file or an
Excel spreadsheet into three different data tables. I've established lookup
tables for those fields that can reasonably be normalized between and amongst
the data tables.

The process I'm going through is:

1. Import the raw data into an "import" table that matches the structure of
the source data. Also included in the import table are columns for foreign
keys of 'normalizable' fields, which are set to 0 when the source data is
imported.

2. Append any new lookup data that may be present in the source file to the
lookup tables.

3. Run a series of update queries on the import table to update the foreign
key fields with the keys of the lookup data. Depending on the source data
file, there are between 3 and 7 of these update queries.

4. Append new records into the data table using only the foreign key values
where applicable.

I'm discovering that the update queries in step 3 are taking a LONG time to
run (several minutes each), which is going to annoy my users to no end.

My questions are:
- are there other, better processes or data structures to use?
- is there a way of optimizing update queries?

I appreciate any help or suggestions -
Dave
 
J

Jeff Boyce

JOPO (just one person's opinion)...

Adding rows of data can slow down if your tables are indexed, as Access will
need to check existing rows to enforce constraints and do the indexing.

On the other hand, removing the index could mean loading bogus data.

Would your users be willing to trade a longer load time for guarantees that
they have good data?

--

Regards

Jeff Boyce
Microsoft Access MVP

Disclaimer: This author may have received products and services mentioned in
this post. Mention and/or description of a product or service herein does
not constitute endorsement thereof.

Any code or pseudocode included in this post is offered "as is", with no
guarantee as to suitability.

You can thank the FTC of the USA for making this disclaimer
possible/necessary.
 
S

Stefan Hoffmann

hi Dave,

3. Run a series of update queries on the import table to update the foreign
key fields with the keys of the lookup data. Depending on the source data
file, there are between 3 and 7 of these update queries.

I'm discovering that the update queries in step 3 are taking a LONG time to
run (several minutes each), which is going to annoy my users to no end.

My questions are:
- are there other, better processes or data structures to use?
- is there a way of optimizing update queries?
I have to assume, that you don't have indexed these fields in the import
table? So you're running up to seven queries against the whole tables.

I simply would not update my import tables. Only insert the new records

INSERT INTO
(fieldlist)
SELECT fieldsFromImport, fieldsFromLookup
FROM [importTable]
INNER JOIN [lookupTable] ON <condition>

Use one INSERT with all necessary JOINs. This should reduce at least the
number of table reads from seven to one.

mfG
--> stefan <--
 
P

pccdavef

Thanks Stefan -

I like that idea; I'll have to play with it some, but I think you're right -
it's much more efficient. Just a couple of hellacious queries.

Thanks!
Dave

Stefan said:
hi Dave,
3. Run a series of update queries on the import table to update the foreign
key fields with the keys of the lookup data. Depending on the source data
[quoted text clipped - 6 lines]
- are there other, better processes or data structures to use?
- is there a way of optimizing update queries?
I have to assume, that you don't have indexed these fields in the import
table? So you're running up to seven queries against the whole tables.

I simply would not update my import tables. Only insert the new records

INSERT INTO
(fieldlist)
SELECT fieldsFromImport, fieldsFromLookup
FROM [importTable]
INNER JOIN [lookupTable] ON <condition>

Use one INSERT with all necessary JOINs. This should reduce at least the
number of table reads from seven to one.

mfG
--> stefan <--
 
R

rolaaus

I'm not sure how accurate this would be, or how applicable it would be to
your version of Access, but how about converting your query to VBA code?
 
J

Jeff Boyce

In your experience, does your VBA code run faster (I'm assuming a
record-by-record iteration through a recordset) than the set-level
operations that Access queries are designed to provide?

(I'm always looking for counter-examples and/or confirmation...)

--

Regards

Jeff Boyce
Microsoft Access MVP

Disclaimer: This author may have received products and services mentioned in
this post. Mention and/or description of a product or service herein does
not constitute endorsement thereof.

Any code or pseudocode included in this post is offered "as is", with no
guarantee as to suitability.

You can thank the FTC of the USA for making this disclaimer
possible/necessary.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top