500mb+ DB - Help!

T

TWilson

I have a test data DB with data from 4 searate part testers. They all have
fields with the date, the time, which tester it is, and whether it passed or
not. The main tester just also does ~100 other tests and outputs that data.
I therefore get 4 spreadsheets everyday. Because of that I went ahead and
combined all the data into in one table so when I analyzed it I could get for
instance the total amount of failures for the day/week/month. Of course this
leaves 3/4 of the table being blank. Doing this with one month is fine, the
database is only about 40mb and it doesnt take too long to bring up various
graphs to look at the data. Now that I want to look at the data from the
past year, with a pivot chart it takes 30mins+ with every step meaning to get
the chart I want would probably take over 2 hours! Even though there are
about 100 tests Im only usually graphing 1 yet it looks like my computer is
trying to squeeze the entire db into memory.

At first I tried separating tables by months and doing union querys to
combine months if I wanted, but that was a litte tedious. Ive also
compacted/repaired.

Is there anything I can do to get things to go a little quicker?
 
T

TWilson

No, I made sure there is only one column for each type of data they share. I
use a macro in excel that combines the data before I import it to access. My
headings are basically: Date - Time - Tester - Pass/Fail - Tests (~100 of
which are only filled by the first tester).
 
K

Kevin3NF

ok...I'll ask the obvious question...have you compacted the db?
Tools>>Database Utilities>>Compact Database
 
J

John Vinson

No, I made sure there is only one column for each type of data they share. I
use a macro in excel that combines the data before I import it to access. My
headings are basically: Date - Time - Tester - Pass/Fail - Tests (~100 of
which are only filled by the first tester).

Do you mean that you have ~104 Fields in this table - one field for
each test!? If so you're still emphatically NOT normalized.

John W. Vinson[MVP]
 
T

TWilson

Ahhhh, I see. Something else I need to learn to do. Im sorry, Im very new
to access.

How do you guys reccomend I should go about splitting up the table to
normalize it?
 
J

John Vinson

Ahhhh, I see. Something else I need to learn to do. Im sorry, Im very new
to access.

How do you guys reccomend I should go about splitting up the table to
normalize it?

Three tables.

Runs
RunID
RunTime <date and time in one field, don't use two>
Tester
<other info about the test as a whole>

Tests
TestID
TestDescription <what's now your fieldnames>

Results
RunID <link to Runs, who ran the test when
TestID <what they ran
Result < the result of this test

This gives you "tall-thin" tables; if some tests weren't run you'll
simply have no records in the table at all for that test.

You *MIGHT* want a pass/fail field in the Runs table, but strictly
speaking it's derived data - I don't know what constitutes a pass or a
failure so I'm not sure how you'ld derive it.

John W. Vinson[MVP]
 
T

TWilson

Ok. Couple questions.

About the pass/fail stuff, I neglected to mention a detail about that so
things looked simpler. The testers actually report a "status", with pass or
a failure code. There's a failure code depending upon which test the part
failed for. I guess Ill also add that once a part fails, no other tests are
done and the failure code for that certain test is reported. I also have a
list that identifies each failure code. So would I be right to add another
table:

Status
Status ID (failure code)
Failure Name,

then add another field to the "Runs" table for the Status ID?



From before you have a general idea of what my first table looked like.
That is also pretty much the format the testers give me the data. How do I
easily put all this data into their respective tables?
 
J

John Vinson

About the pass/fail stuff, I neglected to mention a detail about that so
things looked simpler. The testers actually report a "status", with pass or
a failure code. There's a failure code depending upon which test the part
failed for. I guess Ill also add that once a part fails, no other tests are
done and the failure code for that certain test is reported. I also have a
list that identifies each failure code. So would I be right to add another
table:

Status
Status ID (failure code)
Failure Name,

then add another field to the "Runs" table for the Status ID?

You've got it!! Exactly.
From before you have a general idea of what my first table looked like.
That is also pretty much the format the testers give me the data. How do I
easily put all this data into their respective tables?

A "Normalizing Union Query" can do this. If your incoming data
(spreadsheet?) has

Date - Time - Tester - Pass/Fail - Tests (~100 of
which are only filled by the first tester).

then you could create a Union Query:

SELECT [Date] + [Time] AS TestDateTime, Tester, Test1 FROM Wideflat
WHERE Test1 IS NOT NULL
UNION ALL
SELECT [Date] + [Time], Tester, Test2 FROM WideFlat
WHERE Test2 IS NOT NULL
UNION ALL
<etc. etc.>

This union query will "unravel" the spreadsheet into a tall-thin
table; you can then base one or more Append queries on the Union query
to populate your normalized tables.

John W. Vinson[MVP]
 
T

TWilson

Ok, Im a little lost. Very new to access remember.

I assume Im supposed to import that wide spreadsheet and then run that query
on it? Well, thats what I did, but then under the "test1" column, the name
of the first test is displayed then all the data from the other tests goes
under it. How will it know where the different tests are in the column?

Then say I did get that working. Can you give me a little help, maybe an
example of what the append query would look like? How would I go about
assigning a key to the tables? I also assume all these "tables" will
actually be queries?

John Vinson said:
From before you have a general idea of what my first table looked like.
That is also pretty much the format the testers give me the data. How do I
easily put all this data into their respective tables?

A "Normalizing Union Query" can do this. If your incoming data
(spreadsheet?) has

Date - Time - Tester - Pass/Fail - Tests (~100 of
which are only filled by the first tester).

then you could create a Union Query:

SELECT [Date] + [Time] AS TestDateTime, Tester, Test1 FROM Wideflat
WHERE Test1 IS NOT NULL
UNION ALL
SELECT [Date] + [Time], Tester, Test2 FROM WideFlat
WHERE Test2 IS NOT NULL
UNION ALL
<etc. etc.>

This union query will "unravel" the spreadsheet into a tall-thin
table; you can then base one or more Append queries on the Union query
to populate your normalized tables.

John W. Vinson[MVP]
 
J

John Vinson

Ok, Im a little lost. Very new to access remember.

I assume Im supposed to import that wide spreadsheet and then run that query
on it? Well, thats what I did, but then under the "test1" column, the name
of the first test is displayed then all the data from the other tests goes
under it. How will it know where the different tests are in the column?

Then say I did get that working. Can you give me a little help, maybe an
example of what the append query would look like? How would I go about
assigning a key to the tables? I also assume all these "tables" will
actually be queries?

Ok, I guess I misinterpreted the data.

Please post the fieldnames of your (linked or imported) spreadsheet
table, perhaps with a row or two of sample data.

I do not understand your last question at all. I was not suggesting
multiple "tables".

John W. Vinson[MVP]
 
T

TWilson

Sorry, they were actually two completely separate questions. I do understand
that Im gonna start out with a single tall thin table. I just also need help
after that. Lets take things a step at a time then.

The data from the 4 testers is actually given to me in 4 separate
spreadsheets. Since tester 1 is the only one that does all those separate
tests, its ok to combine them into one table.

Here's an example with actual table field names and data. With the 1st part
it passed tester 1, but failed tester 2.

Date and Time - Fixture - Status Code - Part - Pallet ID - Test1 - Test2...etc
1/1/2006 8:00am - Fixture 1 - Pass - 1 - 1 - 10.5 -
0.05...etc
1/1/2006 8:05am - Fixture 2 - 1000 - 1 - 1 - -
1/1/2006 8:10am - Fixture 1 - Pass - 1 - 2 - 10.4 -
0.06...etc


Heres some field info relating to terms we were previously using, along with
some new ones:
There is actually a Date and Time field - I was using text to columns to
split it up, so we can just use that one.
Fixture = Tester
Status code = failure code
Part - We actually run about 3 different parts. They're all tested the same
though.
Pallet ID - The ID number of the pallet the part sits on as it goes down the
assembly line
Test1-~Test100 - They all have their own separate names, we can leave it at
test1, test2...etc for now.

I really appreciate your help BTW :)
 
D

dbahooker

you shouldn't be using MDB format for anything in the year 2006.

it is completely obsolete.

I reccomend using SQL Server; or mySql.
 
J

John Vinson

Sorry, they were actually two completely separate questions. I do understand
that Im gonna start out with a single tall thin table. I just also need help
after that. Lets take things a step at a time then.

The data from the 4 testers is actually given to me in 4 separate
spreadsheets. Since tester 1 is the only one that does all those separate
tests, its ok to combine them into one table.

Here's an example with actual table field names and data. With the 1st part
it passed tester 1, but failed tester 2.

Date and Time - Fixture - Status Code - Part - Pallet ID - Test1 - Test2...etc
1/1/2006 8:00am - Fixture 1 - Pass - 1 - 1 - 10.5 -
0.05...etc
1/1/2006 8:05am - Fixture 2 - 1000 - 1 - 1 - -
1/1/2006 8:10am - Fixture 1 - Pass - 1 - 2 - 10.4 -
0.06...etc
Heres some field info relating to terms we were previously using, along with
some new ones:
There is actually a Date and Time field - I was using text to columns to
split it up, so we can just use that one.
Fixture = Tester
Status code = failure code
Part - We actually run about 3 different parts. They're all tested the same
though.
Pallet ID - The ID number of the pallet the part sits on as it goes down the
assembly line
Test1-~Test100 - They all have their own separate names, we can leave it at
test1, test2...etc for now.

So you want to migrate the data from these spreadsheets into a
tall-thin table (that's what I'd recommend anyway!)?

What's the structure of the tall-thin? What's its Primary Key?

Assuming that you have the normalized table set up as follows:

TestDateTime
Fixture
StatusCode
PartID
PalletID
TestCode
TestResult

then you could create a UNION query based on your spreadsheet:

SELECT [Date and Tim], [Fixture], [Status Code], [Part ID], [Pallet
ID], "Test1" AS TestCode, [Test1] As TestResult
FROM [yourspreadsheet]
WHERE [Test1] IS NOT NULL
UNION
SELECT [Date and Tim], [Fixture], [Status Code], [Part ID], [Pallet
ID], "Test2" AS TestCode, [Test2] As TestResult
FROM [yourspreadsheet]
WHERE [Test2] IS NOT NULL
UNION

<etc etc>

You probably will need two or more queries, 100 columns is a lot and
will be liable to the dreaded Query Too Complex error.

You can then base Append queries on these UNION queries to populate
your tall thin table.

John W. Vinson[MVP]
 
A

aaron.kempf

oh so John Vinson finally admits that MDB queries randomly CRAP OUT?

that is 40% of the reason that MDB sucks balls-- every once in a while
things crap out; not telling you any real debugging information-- just
giving up and throwing a tissy fit.

I find it humourous that you pansies still use MDB.
it is for pansies; wimps and retards that don't have the capacity to
learn SQL Server.

SQL Express / MSDE is JUST AS FREE as MDB.

MDB is stupid.
Spit on anyone that uses it for anything






John said:
Sorry, they were actually two completely separate questions. I do understand
that Im gonna start out with a single tall thin table. I just also need help
after that. Lets take things a step at a time then.

The data from the 4 testers is actually given to me in 4 separate
spreadsheets. Since tester 1 is the only one that does all those separate
tests, its ok to combine them into one table.

Here's an example with actual table field names and data. With the 1st part
it passed tester 1, but failed tester 2.

Date and Time - Fixture - Status Code - Part - Pallet ID - Test1 - Test2...etc
1/1/2006 8:00am - Fixture 1 - Pass - 1 - 1 - 10.5 -
0.05...etc
1/1/2006 8:05am - Fixture 2 - 1000 - 1 - 1 - -
1/1/2006 8:10am - Fixture 1 - Pass - 1 - 2 - 10.4 -
0.06...etc
Heres some field info relating to terms we were previously using, along with
some new ones:
There is actually a Date and Time field - I was using text to columns to
split it up, so we can just use that one.
Fixture = Tester
Status code = failure code
Part - We actually run about 3 different parts. They're all tested the same
though.
Pallet ID - The ID number of the pallet the part sits on as it goes down the
assembly line
Test1-~Test100 - They all have their own separate names, we can leave it at
test1, test2...etc for now.

So you want to migrate the data from these spreadsheets into a
tall-thin table (that's what I'd recommend anyway!)?

What's the structure of the tall-thin? What's its Primary Key?

Assuming that you have the normalized table set up as follows:

TestDateTime
Fixture
StatusCode
PartID
PalletID
TestCode
TestResult

then you could create a UNION query based on your spreadsheet:

SELECT [Date and Tim], [Fixture], [Status Code], [Part ID], [Pallet
ID], "Test1" AS TestCode, [Test1] As TestResult
FROM [yourspreadsheet]
WHERE [Test1] IS NOT NULL
UNION
SELECT [Date and Tim], [Fixture], [Status Code], [Part ID], [Pallet
ID], "Test2" AS TestCode, [Test2] As TestResult
FROM [yourspreadsheet]
WHERE [Test2] IS NOT NULL
UNION

<etc etc>

You probably will need two or more queries, 100 columns is a lot and
will be liable to the dreaded Query Too Complex error.

You can then base Append queries on these UNION queries to populate
your tall thin table.

John W. Vinson[MVP]
 
T

TWilson

The more I learn about this the more I become concerned about different things.

Are you sure an append query is what I want to be using to combine the 3
separate queries I made (too avoid the too complex error)?

Will all of this update automatically seeing as I will be importing new data
on a daily basis?
 
J

John Vinson

The more I learn about this the more I become concerned about different things.

Are you sure an append query is what I want to be using to combine the 3
separate queries I made (too avoid the too complex error)?

I have no way of knowing WHAT YOU WANT. My telepathy is on the blink.

Three append queries will move the data from your spreadsheets into a
new table.
Will all of this update automatically seeing as I will be importing new data
on a daily basis?

No, of course it will NOT do so. An append query runs, copies its
data, and then it's done.

If you want to store your data in three spreadsheets, and have any
updates to those spreadsheets instantly reflected in your Access
application, then an append query is *not* the appropriate tool.

John W. Vinson[MVP]
 
T

TWilson

I want to do what you reccomended for me to do.

Just to be sure we're on the same page, here's what I have so far.

1. I get 4 separate excel spreadsheets of data from the 4 testers.
2. I combine them into 1 excel spreadsheet. (Im now done with excel)
3. I import that excel spreadsheet into Access into a flat wide table with a
test per field.
4. I created 3 queries (to avoid the query too complex error) which
converted and separated the wide table into 3 tall thin queries putting all
the tests and test results into thier own single fields. (Instead of ~107
fields, Im down to 7)

Now how do I combine the 3 queries into a single table?

Also, please keep in mind I will be importing new data to this db daily. I
would also like everything to automatically go where it needs to after I
import the spreadsheet. When this is all set up I would like for it to be -
1. Import excel spreadsheet 2. I can now analyze the data in line/bar graphs
(eg. - test data vs some time period). Does this completely change the way I
have been setting this up?
 
J

John Vinson

I want to do what you reccomended for me to do.

Just to be sure we're on the same page, here's what I have so far.

1. I get 4 separate excel spreadsheets of data from the 4 testers.
2. I combine them into 1 excel spreadsheet. (Im now done with excel)

Any reason to do so, rather than import them individually? Do all the
spreadsheets have the same structure, or does each tester have a
different suite of tests?
3. I import that excel spreadsheet into Access into a flat wide table with a
test per field.

That could be done using import, or you could link to the spreadsheet
(or sheets). If you're worried about database bloat, the link would be
preferable, I'd expect.
4. I created 3 queries (to avoid the query too complex error) which
converted and separated the wide table into 3 tall thin queries putting all
the tests and test results into thier own single fields. (Instead of ~107
fields, Im down to 7)

Now how do I combine the 3 queries into a single table?

Create the new Table with appropriate fields, indexing, and datatypes,
and base an Append query on each of the three normalizing UNION
queries.
Also, please keep in mind I will be importing new data to this db daily.

Will the spreadsheets contain only *new* data? or will they contain
data which is already in the normalized table?
I would also like everything to automatically go where it needs to after I
import the spreadsheet.

Then you'll need a Macro, or (much better) VBA code to run the various
imports and queries in the appropriate sequence.
When this is all set up I would like for it to be -
1. Import excel spreadsheet 2. I can now analyze the data in line/bar graphs
(eg. - test data vs some time period). Does this completely change the way I
have been setting this up?

Since I don't know how you've been setting it up, all I can say is
"probably so". You'ld use a Query selecting a particular test from the
tall-thin table, probably with a date range. This wouldn't be all that
different than selecting a fieldname from your wide-flat table, but
it's certainly not identical.

John W. Vinson[MVP]
 
T

TWilson

Alright so I have the db setup the way you reccomended. I also went ahead
and linked spreadsheets when I import data.

I have this:

Runs
RunID
Date and Time
Fixture Code
Status Code
Part
Pallet

Tests
TestID
TestDescription

Status
Status Code
Status Description

Fixture
Fixture Code
Fixture Description

Results
RunID
TestID
Result

Everything was going good. Reduced the db size by 25% and everything. Ive
got another problem now though. When I run various queries to graph the data
it still takes forever to process. My graphs are usually line or bar graphs
where Ill have:
Date/Time on the x-axis
Result on the y-axis
Status Description, Test Description, Fixture Description, Part, Pallet as
filters.

Here's the SQL for it:

SELECT Runs.[Date and Time], Fixture.Fixture, Status.Status, Runs.Part,
Runs.Pallet, Tests.Test, Results.Result
FROM Tests INNER JOIN (Status INNER JOIN ((Fixture INNER JOIN Runs ON
Fixture.[Fixture Code] = Runs.Fixture) INNER JOIN Results ON Runs.ID =
Results.ID) ON Status.[Status Code] = Runs.[Status Code]) ON Tests.TestID =
Results.TestID;

Am I not doing my query correctly or something? Right now I only have 1
month in the database (at 22mb right now) yet access completely depletes my
512mb memory and my page file goes upwards of 1.5gb? What's going on?

Maybe if I had a better understanding on how access works with data, I could
be proactive in maybe coming up with my own solution. When running a query
what does access put in memory? Will attempting to split everything up
further into more tables help? Something like a table for every test? Is
there a basic rule for storing data so that access can work as quickly and
efficiently as possible?

Thanks again for putting up with me. :)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top