Table design for time periods - advice sought

N

Nigel

I'd appreciate any thoughts on optimum design for the following situation.

I need to store data relating to periods of activity of a counter device
(counts animals). The device may be active (data collecting) or not, and if
not there's a list of valid reasons why not (e.g. power failure). Futhermore
different data-sampling routines may be active, though only one at a time.
The user may also have validated certain periods of operation which may or
may not relate to the data collection periods and these periods must also be
stored.

Current thinking is to have a table [ tblPeriods] with fields:
Period_ID
StartTime
EndTime
PeriodType (what type of period e.g. DataCollection, DataValidation,
SystemDown)
pnt_SamplingProtocol (pointer to table holding data sampling details)

If the period is of type SystemDown, then additional information on the
reason for the downtime period is stored in another table tblDownTimeReason)
and linked via a one-many-many-one intermediate table.

I can see that, strictly speaking, the 'entities' stored in [tblPeriods] are
not all the same (some are downtime periods some are data-collection
periods) and, as periods of different type have different field
requirements, I imagine they should really be in separate tables. Doing this
would result in multiple tables (e.g. [tblDownTimePeriods],
[tblDataCollectionPeriods]...) with very similar structures (ie. all will
have PeriodID, StartTime, EndTime etc.).

Althought this might be more academically correct it will mean most querying
(e.g 'show all periods of any type between date_1 and date_2' ) will
probably involve several UNIONs to get a result which could be a problem.

I can see a variety of other ways of storing these data, but there seem to
be compromises or other issues with each option. I would appreciate ny
independent thoughts or suggestions, or pointers to useful guide to table/db
design for more complex situations.

Thanks
Nigel
 
C

Chaim

Nigel,

Are the period IDs unique? And can there be overlaps in the time periods (ID
1 between 12:00 and 12:10 and ID 2 between 12:05 and 12:10)?

If the answer to the first question is Yes and to the second No, then why
can you not simply GROUP BY a period type (Collection, Validation, etc) to
get the different sets of data? If when you write:
Althought this might be more academically correct it will mean most querying
(e.g 'show all periods of any type between date_1 and date_2' ) will
probably involve several UNIONs to get a result which could be a problem.

you are referring only to this tblPeriods table and not the detail tables
(which you say would differ from each other and therefore would make UNIONing
those impossible or improbable at best), you've already avoided the need for
the UNION.

Why not use an auxilliary table tblPeriodType (TypeID long, TypeName text)
such as (1, "Collection"). Then use the TypeID to capture the period type. No
UNIONs involved, can segregate (or not) by period type, can easily add
new/rename old types, and would further aid normalization by putting the
'pnt_SamplingProtocol' in the tblPeriodType table so this data wouldn't be
redundant in records. As you've laid it out here, everytime 'Protocol 1' is
used, the same 'pointer' (table name?) would appear in your master table.
Assuming all Protocol 1's would point to the same place, why make things
redundant?

Nigel said:
I'd appreciate any thoughts on optimum design for the following situation.

I need to store data relating to periods of activity of a counter device
(counts animals). The device may be active (data collecting) or not, and if
not there's a list of valid reasons why not (e.g. power failure). Futhermore
different data-sampling routines may be active, though only one at a time.
The user may also have validated certain periods of operation which may or
may not relate to the data collection periods and these periods must also be
stored.

Current thinking is to have a table [ tblPeriods] with fields:
Period_ID
StartTime
EndTime
PeriodType (what type of period e.g. DataCollection, DataValidation,
SystemDown)
pnt_SamplingProtocol (pointer to table holding data sampling details)

If the period is of type SystemDown, then additional information on the
reason for the downtime period is stored in another table tblDownTimeReason)
and linked via a one-many-many-one intermediate table.

I can see that, strictly speaking, the 'entities' stored in [tblPeriods] are
not all the same (some are downtime periods some are data-collection
periods) and, as periods of different type have different field
requirements, I imagine they should really be in separate tables. Doing this
would result in multiple tables (e.g. [tblDownTimePeriods],
[tblDataCollectionPeriods]...) with very similar structures (ie. all will
have PeriodID, StartTime, EndTime etc.).

Althought this might be more academically correct it will mean most querying
(e.g 'show all periods of any type between date_1 and date_2' ) will
probably involve several UNIONs to get a result which could be a problem.

I can see a variety of other ways of storing these data, but there seem to
be compromises or other issues with each option. I would appreciate ny
independent thoughts or suggestions, or pointers to useful guide to table/db
design for more complex situations.

Thanks
Nigel
 
J

John Nurick

Hi Nigel,

A few comments

1) "Data validation" seems to be something the user does rather than the
machine (you say data validation periods may not relate to data
collection periods). If so, the data validation records don't belong in
a table recording the activity (or otherwise) of the counter. Either use
a separate table to store information about when validation was done, or
else store "events" instead of periods: e.g. "Machine starts counting",
"User starts sampling", "Machine becomes inactive".

2) If you're storing contiguous periods, the EndTime field is probably
redundant.

3) You wrote
as periods of different type have different field
requirements, I imagine they should really be in separate tables.

Don't worry about having a few fields that are only required for some
records (e.g. pnt_SamplingProtocol only needed for sampling periods,
downtime fields only needed for down periods). It's not 100% normalised
but it's usually practical. Access minimises the space needed for empty
fields, and I wouldn't worry about it unless there are going to be
millions of records.

If things are more complicated than that, the technique is called
"subclassing". You have one table (let's call it tblPeriods) with
Period_ID and all the fields common to all the different kinds of
period. Then you have other tables, related 1:1 to tblPeriods on
Period_ID, each of which contains only the fields particular to one sort
of period.


I'd appreciate any thoughts on optimum design for the following situation.

I need to store data relating to periods of activity of a counter device
(counts animals). The device may be active (data collecting) or not, and if
not there's a list of valid reasons why not (e.g. power failure). Futhermore
different data-sampling routines may be active, though only one at a time.
The user may also have validated certain periods of operation which may or
may not relate to the data collection periods and these periods must also be
stored.

Current thinking is to have a table [ tblPeriods] with fields:
Period_ID
StartTime
EndTime
PeriodType (what type of period e.g. DataCollection, DataValidation,
SystemDown)
pnt_SamplingProtocol (pointer to table holding data sampling details)

If the period is of type SystemDown, then additional information on the
reason for the downtime period is stored in another table tblDownTimeReason)
and linked via a one-many-many-one intermediate table.

I can see that, strictly speaking, the 'entities' stored in [tblPeriods] are
not all the same (some are downtime periods some are data-collection
periods) and, as periods of different type have different field
requirements, I imagine they should really be in separate tables. Doing this
would result in multiple tables (e.g. [tblDownTimePeriods],
[tblDataCollectionPeriods]...) with very similar structures (ie. all will
have PeriodID, StartTime, EndTime etc.).

Althought this might be more academically correct it will mean most querying
(e.g 'show all periods of any type between date_1 and date_2' ) will
probably involve several UNIONs to get a result which could be a problem.

I can see a variety of other ways of storing these data, but there seem to
be compromises or other issues with each option. I would appreciate ny
independent thoughts or suggestions, or pointers to useful guide to table/db
design for more complex situations.

Thanks
Nigel
 
N

Nigel

John

1) Data validation is indeed something done by the user and on that basis
alone should be stored in a separate table from Counter events. I was trying
to make things easier for querying by the ultimate user by storing in a
single table where most fields are the same. Probably a bad idea.

2) Periods are not necessarily contiguous so I think I need EndTime (or
Duration). Periods of different type can overlap too.

3) Your response was the sort of pragmatic view I was hoping for! Trying to
do it 'properly' but also to make it understandable by the ultimate users
(not me).

I had thought about the sub-classing approach - though I used an
intermediate table (holding PeriodID and ProtocolID ) at first to give a
one-many-many-one relationship as client first indicated that a period might
have more than one protocol. This has now changed so only one protocol at
most. I'd stopped considering the one-to-one approach as read somewhere it's
not a particularly usual thing to do but maybe this is one of those
occasions.

Thanks for the ideas. Keen to improve table relationships / DB design
skills - any thoughts on good texts or other sources? Most user manuals
really don't cover in any depth.

Nigel


John Nurick said:
Hi Nigel,

A few comments

1) "Data validation" seems to be something the user does rather than the
machine (you say data validation periods may not relate to data
collection periods). If so, the data validation records don't belong in
a table recording the activity (or otherwise) of the counter. Either use
a separate table to store information about when validation was done, or
else store "events" instead of periods: e.g. "Machine starts counting",
"User starts sampling", "Machine becomes inactive".

2) If you're storing contiguous periods, the EndTime field is probably
redundant.

3) You wrote
as periods of different type have different field
requirements, I imagine they should really be in separate tables.

Don't worry about having a few fields that are only required for some
records (e.g. pnt_SamplingProtocol only needed for sampling periods,
downtime fields only needed for down periods). It's not 100% normalised
but it's usually practical. Access minimises the space needed for empty
fields, and I wouldn't worry about it unless there are going to be
millions of records.

If things are more complicated than that, the technique is called
"subclassing". You have one table (let's call it tblPeriods) with
Period_ID and all the fields common to all the different kinds of
period. Then you have other tables, related 1:1 to tblPeriods on
Period_ID, each of which contains only the fields particular to one sort
of period.


I'd appreciate any thoughts on optimum design for the following situation.

I need to store data relating to periods of activity of a counter device
(counts animals). The device may be active (data collecting) or not, and if
not there's a list of valid reasons why not (e.g. power failure). Futhermore
different data-sampling routines may be active, though only one at a time.
The user may also have validated certain periods of operation which may or
may not relate to the data collection periods and these periods must also be
stored.

Current thinking is to have a table [ tblPeriods] with fields:
Period_ID
StartTime
EndTime
PeriodType (what type of period e.g. DataCollection, DataValidation,
SystemDown)
pnt_SamplingProtocol (pointer to table holding data sampling details)

If the period is of type SystemDown, then additional information on the
reason for the downtime period is stored in another table tblDownTimeReason)
and linked via a one-many-many-one intermediate table.

I can see that, strictly speaking, the 'entities' stored in [tblPeriods] are
not all the same (some are downtime periods some are data-collection
periods) and, as periods of different type have different field
requirements, I imagine they should really be in separate tables. Doing this
would result in multiple tables (e.g. [tblDownTimePeriods],
[tblDataCollectionPeriods]...) with very similar structures (ie. all will
have PeriodID, StartTime, EndTime etc.).

Althought this might be more academically correct it will mean most querying
(e.g 'show all periods of any type between date_1 and date_2' ) will
probably involve several UNIONs to get a result which could be a problem.

I can see a variety of other ways of storing these data, but there seem to
be compromises or other issues with each option. I would appreciate ny
independent thoughts or suggestions, or pointers to useful guide to table/db
design for more complex situations.

Thanks
Nigel
 
N

Nigel

Chaim

Period IDs would indeed be unique if I held all periods (regardless of type)
in one table [tblPeriods], overlaps between periods are certainly possible.

In essence I was trying to decide between two broad approaches:
(i) all periods in one table [tblPeriods] which be simpler to query (no need
for UNIONs for simple queries at least) but where some fields relate only to
certain records (John's response below helps on this) and,
(ii) each period type has own table, but which would require a UNION query
for even basic analyses.

A typical query might be: "For a given time period (can't think of a better
term, "time window" perhaps), show the total amount of SystemDowntime" This
requires that any downtime periods in [tblPeriods] that straddle the
beginning or end of the required window are clipped to determine the actual
time in the window. The only way I've found of handling this (and it seems
to work fine) involves a stack of UNIONs in a query to retrieve all the
relevant bits of the periods in the time window.

Thanks for your suggestions.
Nigel

Chaim said:
Nigel,

Are the period IDs unique? And can there be overlaps in the time periods (ID
1 between 12:00 and 12:10 and ID 2 between 12:05 and 12:10)?

If the answer to the first question is Yes and to the second No, then why
can you not simply GROUP BY a period type (Collection, Validation, etc) to
get the different sets of data? If when you write:
Althought this might be more academically correct it will mean most querying
(e.g 'show all periods of any type between date_1 and date_2' ) will
probably involve several UNIONs to get a result which could be a
problem.

you are referring only to this tblPeriods table and not the detail tables
(which you say would differ from each other and therefore would make UNIONing
those impossible or improbable at best), you've already avoided the need for
the UNION.

Why not use an auxilliary table tblPeriodType (TypeID long, TypeName text)
such as (1, "Collection"). Then use the TypeID to capture the period type. No
UNIONs involved, can segregate (or not) by period type, can easily add
new/rename old types, and would further aid normalization by putting the
'pnt_SamplingProtocol' in the tblPeriodType table so this data wouldn't be
redundant in records. As you've laid it out here, everytime 'Protocol 1' is
used, the same 'pointer' (table name?) would appear in your master table.
Assuming all Protocol 1's would point to the same place, why make things
redundant?

Nigel said:
I'd appreciate any thoughts on optimum design for the following situation.

I need to store data relating to periods of activity of a counter device
(counts animals). The device may be active (data collecting) or not, and if
not there's a list of valid reasons why not (e.g. power failure). Futhermore
different data-sampling routines may be active, though only one at a time.
The user may also have validated certain periods of operation which may or
may not relate to the data collection periods and these periods must also be
stored.

Current thinking is to have a table [ tblPeriods] with fields:
Period_ID
StartTime
EndTime
PeriodType (what type of period e.g. DataCollection, DataValidation,
SystemDown)
pnt_SamplingProtocol (pointer to table holding data sampling details)

If the period is of type SystemDown, then additional information on the
reason for the downtime period is stored in another table tblDownTimeReason)
and linked via a one-many-many-one intermediate table.

I can see that, strictly speaking, the 'entities' stored in [tblPeriods] are
not all the same (some are downtime periods some are data-collection
periods) and, as periods of different type have different field
requirements, I imagine they should really be in separate tables. Doing this
would result in multiple tables (e.g. [tblDownTimePeriods],
[tblDataCollectionPeriods]...) with very similar structures (ie. all will
have PeriodID, StartTime, EndTime etc.).

Althought this might be more academically correct it will mean most querying
(e.g 'show all periods of any type between date_1 and date_2' ) will
probably involve several UNIONs to get a result which could be a problem.

I can see a variety of other ways of storing these data, but there seem to
be compromises or other issues with each option. I would appreciate ny
independent thoughts or suggestions, or pointers to useful guide to table/db
design for more complex situations.

Thanks
Nigel
 
J

John Nurick

Nigel,

Subclassing is one of the few occasions where 1:1 relationships make
sense (another is situations where you wnat to for security reasons to
deny some users access to some fields of what would normally be a single
record: put the confidential fields in another table related 1:1).

For a book, I'd recommend Rebecca Riordan's new "Designing Effective
Database Systems", published a few weeks ago.


John

1) Data validation is indeed something done by the user and on that basis
alone should be stored in a separate table from Counter events. I was trying
to make things easier for querying by the ultimate user by storing in a
single table where most fields are the same. Probably a bad idea.

2) Periods are not necessarily contiguous so I think I need EndTime (or
Duration). Periods of different type can overlap too.

3) Your response was the sort of pragmatic view I was hoping for! Trying to
do it 'properly' but also to make it understandable by the ultimate users
(not me).

I had thought about the sub-classing approach - though I used an
intermediate table (holding PeriodID and ProtocolID ) at first to give a
one-many-many-one relationship as client first indicated that a period might
have more than one protocol. This has now changed so only one protocol at
most. I'd stopped considering the one-to-one approach as read somewhere it's
not a particularly usual thing to do but maybe this is one of those
occasions.

Thanks for the ideas. Keen to improve table relationships / DB design
skills - any thoughts on good texts or other sources? Most user manuals
really don't cover in any depth.

Nigel


John Nurick said:
Hi Nigel,

A few comments

1) "Data validation" seems to be something the user does rather than the
machine (you say data validation periods may not relate to data
collection periods). If so, the data validation records don't belong in
a table recording the activity (or otherwise) of the counter. Either use
a separate table to store information about when validation was done, or
else store "events" instead of periods: e.g. "Machine starts counting",
"User starts sampling", "Machine becomes inactive".

2) If you're storing contiguous periods, the EndTime field is probably
redundant.

3) You wrote
as periods of different type have different field
requirements, I imagine they should really be in separate tables.

Don't worry about having a few fields that are only required for some
records (e.g. pnt_SamplingProtocol only needed for sampling periods,
downtime fields only needed for down periods). It's not 100% normalised
but it's usually practical. Access minimises the space needed for empty
fields, and I wouldn't worry about it unless there are going to be
millions of records.

If things are more complicated than that, the technique is called
"subclassing". You have one table (let's call it tblPeriods) with
Period_ID and all the fields common to all the different kinds of
period. Then you have other tables, related 1:1 to tblPeriods on
Period_ID, each of which contains only the fields particular to one sort
of period.


I'd appreciate any thoughts on optimum design for the following situation.

I need to store data relating to periods of activity of a counter device
(counts animals). The device may be active (data collecting) or not, and if
not there's a list of valid reasons why not (e.g. power failure). Futhermore
different data-sampling routines may be active, though only one at a time.
The user may also have validated certain periods of operation which may or
may not relate to the data collection periods and these periods must also be
stored.

Current thinking is to have a table [ tblPeriods] with fields:
Period_ID
StartTime
EndTime
PeriodType (what type of period e.g. DataCollection, DataValidation,
SystemDown)
pnt_SamplingProtocol (pointer to table holding data sampling details)

If the period is of type SystemDown, then additional information on the
reason for the downtime period is stored in another table tblDownTimeReason)
and linked via a one-many-many-one intermediate table.

I can see that, strictly speaking, the 'entities' stored in [tblPeriods] are
not all the same (some are downtime periods some are data-collection
periods) and, as periods of different type have different field
requirements, I imagine they should really be in separate tables. Doing this
would result in multiple tables (e.g. [tblDownTimePeriods],
[tblDataCollectionPeriods]...) with very similar structures (ie. all will
have PeriodID, StartTime, EndTime etc.).

Althought this might be more academically correct it will mean most querying
(e.g 'show all periods of any type between date_1 and date_2' ) will
probably involve several UNIONs to get a result which could be a problem.

I can see a variety of other ways of storing these data, but there seem to
be compromises or other issues with each option. I would appreciate ny
independent thoughts or suggestions, or pointers to useful guide to table/db
design for more complex situations.

Thanks
Nigel
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top