How do you mulitply in a field?

J

Jamie Collins

Regarding autonumber or other arbitrary PKs

Wait up. PRIMARY KEY designation is *always* arbitrary but I think you
meant 'artificial key' (or similar). Do you mean keys in the logical
model or in the physical implementation...?
understand
that I *never* argued in favor or creating a table for which the "only
unique constraint is defined on a sole autonumber column".

Do I not recall correctly a thread where you had an entity type
'trainer' (natural person) where there was no industry standard
identifier and the compound of the available attributes (company
identifier, person full name) did not yield uniqueness so you omitted
a key from your logical model and used an autonumber PRIMARY KEY in
your Access implementation? Apologies in advance if I've recalled
incorrectly (google groups archive search seem to be broken just now).

Also, note that I was addressing the group rather than you personally;
again, sorry if I was not clear.
If I have a listing of
vendors, what is the *unchanging* unique constraint?

In the logical model, a good key should be *stable*; immutability is
the ideal since but real life is not always ideal.

I too wish all your vendors were issued with an unchanging identifier.
In which sector do to you operate e.g. DUNS number? In mine,
incorporated bodies must be registered with a government agency which
issues a public and unchanging (for all practical purposes)
identifier. Although there are incentives for businesses to
incorporate, I still need to model unincorporated businesses;
fortunately, we have "passing off" laws which makes trading name (yes,
can change) plus trading period a reasonably stable key. I also need
to model businesses from all other jurisdictions for which an
enterprise key of our own design is used.

I've found trusted sources of potentially useful identifiers wary of
revealing details (I found the 'DX Exchange' people in the UK most
unhelpful).

Yes, there are no easy answers, autonumber included.

My main problem with autonumbers is that they do not exist in the
logical model, therefore they are not a logical key. This was the
problem I recalled from your 'trainers' scenario.
A record's
uniqueness is one thing. The means of identifying it in relationships is
another.

In the physical implementation I've no problem with you or anyone
using a so-called surrogate if you also have a natural key or
enterprise key which exists external to the database. I wouldn't
recommend doing it myself for a variety of reasons e.g. doing so makes
data harder to read.
maybe cascading updates of multi-field keys are not a problem. I would
rather avoid them

Ideally (that word again), I think the SQL DBMS should store the key
value only once and use an internal surrogate, in the way you do by
hand with autonumber, to point to it; this way an ON UPDATE CASCADE
action would physically change only one value. Such SQLs exist but
professionally I need use Microsoft's products, at least in the
immediate future; Microsoft doesn't have them so I go without.

So my ideal surrogate would be hidden and I'd still see the real key
values in the referencing tables but that's not the reality for me.
With your way (by hand with autonumber) you either have to work with
the meaningless values or use a JOIN (or three or thirteen) to see the
real values, so it's not for me thanks and good luck to you.
You will not change my mind on the subject, nor I yours.

One thing that tickles me about the 'autonumber PK' advocates it that
they most often use the incremental Long Integer flavour, presumably
because it makes the data easier to read! If I cannot dissuade you
from autonumber, I urge you to choose random: it will improve
concurrency (let's not start on the whole 'physical clustering on
disk' thing), you will be less inclined to expose the autonumber of
users and, having to type the values over and over, you may come to
appreciate a well designed key (fixed width, check digit, etc). It's
'cruel to be kind'; making you type replication IDs (GUID) would just
be cruel <g>.

Jamie.

--
 
B

BruceM

(ie a Make Table query
I don't get your 'Make Table query' example. IMO no application should
be creating permanent tables on the fly.

I think that one of the main uses of a make table query is to reassemble
data that were previously stored in a spreadsheet or some such. In that
case the data wrangling is indeed not part of the data storage model, but
rather a form of data entry.
 
B

BruceM

Hi again. You were sort of overlooked in all of that, weren't you?

Let's rethink this a bit. The purpose of the database is to store the fees.
The calculations can be done wherever they are needed. If you want to have
a printout of records within a period of time (a month, for instance) you
can add a calculated (unbound) text box to the report footer. For its
Control Source:
=Sum([SomeField]) where SomeField is a numeric field for which you wish to
calculate the total. If you want to limit the report to a block of time, in
the report's Record Source query you can add a criteria to the ProjectDate
field (or whatever you call it):
Between [Enter start date] And [Enter end date]
What version of Access are you using? You should be able to create a query,
save it with a name, and select that named query as the report's (or form's)
Record Source.
You can use a similar technique to what I described for a report to get a
sum in a form. In a report you can group the records (by month, for
instance) and calculate the sum for each month. You can do running sums in
reports, and you can use queries to calculate totals in various ways. What
exactly do you need to do? In describing this, describe the table structure
and relationships, as well as the real world situation behind the database.
 
J

Jamie Collins

Technically, normalisation applies only to the logical relation, not to a
physical table or any view of it.

We need a conceptual layer. The one I am employing is SQL (the
language)...
Views are not virtual tables

In SQL terms, that's a misstatement because a SQL VIEW is a virtual
table.
If you open a "table" to look at its data,
you're visualising that data through an additional layer called a view

Again, in SQL terms, that's a misstatement because a SQL VIEW is a
table (lowercase) but a SQL TABLE is not a SQL VIEW.

FWIW my source of reference is the SQL-92 specification.
As I would hope you know, any
visualisation of the data conained within a table, is a view

In your entire post, if you substituted your term 'view' for my term
'resultset' then we would be in broad agreement. As it is, you seem to
use 'view' to mean 'SQL VIEW' and 'resultset' interchangeably, which I
find more than a little confusing to be honest.

In application design I often write a vanilla SQL query and think to
myself, "I'm sure this would be useful in other situations" and would
therefore look to create a persisted object in my SQL DBMS. But how to
decide whether to expose it as a SQL VIEW or a SQL PROCEDURE? Often
the choice of PROCEDURE is a no-brainer (contains control of flow
procedural code, aggregated results make no sense without delimiting
parameter values, etc) but it only become a VIEW if I can achieve the
same degree of normalization as a comparable base table. I simply do
not apply normalization to a PROCEDURE's resultset.
Just as an aside; in a later post you state that we should always aim for
the highest normal form. That's not entirely accurate or desirable. I'm sure
that when you think about it, you'll agree. DKNF or (heaven forbit) Lossless
Joins are not for the feint of heart; nor are they anywhere near practical
in 99.99% of cases.

Yes said:
Even in data warehouses, we rarely go beyond 4NF.

I think you are mistaken. I was taught that 5NF is always achievable
and that data warehouses are often purposely denormalized. A single
column lookup table is in 5NF and that's a very common design in OLTP!

Jamie.

--
 
B

BruceM

Jamie Collins said:
Wait up. PRIMARY KEY designation is *always* arbitrary but I think you
meant 'artificial key' (or similar). Do you mean keys in the logical
model or in the physical implementation...?

Yes, I meant "artificial". You have completely lost me when you say the PK
designation is always arbitrary. Are you saying the designation is
arbitrary? I think I use the term PK where you use "unique constraint".
Perhaps there is a difference between the two, but in any case, when I refer
to a PK I mean the field or fields that are used for relationships. The
record's uniqueness is an atttibute of the record aside from the fact that
it has an autonumber or other artificial key. To put it another way, the
autonumber PK is a convenience, not the only thing that is unique about the
record.
Do I not recall correctly a thread where you had an entity type
'trainer' (natural person) where there was no industry standard
identifier and the compound of the available attributes (company
identifier, person full name) did not yield uniqueness so you omitted
a key from your logical model and used an autonumber PRIMARY KEY in
your Access implementation? Apologies in advance if I've recalled
incorrectly (google groups archive search seem to be broken just now).

No, you misunderstood somewhat, but you have quite a memory. There were
several issues there. There are enough natural fields to ensure uniqueness,
but that would mean about a six-field key. For most purposes the FirstName,
LastName, MI are enough, but do not provide a guarantee. Another issue was
that most training is done by employees (supervisors, etc.), but that some
training is done by outside people. If I use EmployeeID (or a multi-field
natural key) to identify the trainer, I would need to enter outside trainers
into the Employee table (not desirable) or into a Trainer table with similar
fields, but in that case I would need to come up with a single field PK that
would not conflict with the one in the Employee table, or else create
another multi-field PK. Or I could just store the name, with enough detail
that there is no ambiguity about the person's identify, and be done with it.
Since most outside trainers conduct a session or two only, storing the names
in their own table, or creating a record in the Employee table, seems to
make little sense.
Also, note that I was addressing the group rather than you personally;
again, sorry if I was not clear.


In the logical model, a good key should be *stable*; immutability is
the ideal since but real life is not always ideal.

I too wish all your vendors were issued with an unchanging identifier.
In which sector do to you operate e.g. DUNS number? In mine,
incorporated bodies must be registered with a government agency which
issues a public and unchanging (for all practical purposes)
identifier. Although there are incentives for businesses to
incorporate, I still need to model unincorporated businesses;
fortunately, we have "passing off" laws which makes trading name (yes,
can change) plus trading period a reasonably stable key. I also need
to model businesses from all other jurisdictions for which an
enterprise key of our own design is used.

Some vendors are incorporated bodies, and some are local machinists who
construct tooling. In any case, the typical situation is to enter the
proposed vendor into the database, then to obtain tax ID or other such
information. Needing a tax ID before the vendor can be entered into the
database is not a real-world option.

A "reasonably stable" key worries me. Again, remember that I am talking
about the field or combination of fields that are related to other tables.
I link to an employee table from several databases. If the employee's name
is part of the key, an employee whose name changes means that several
databases now need to be updated. For that matter, when they change the
employee ID number, as happened recently, it becomes rather awkward and
time-consuming.
I've found trusted sources of potentially useful identifiers wary of
revealing details (I found the 'DX Exchange' people in the UK most
unhelpful).

Yes, there are no easy answers, autonumber included.

My main problem with autonumbers is that they do not exist in the
logical model, therefore they are not a logical key. This was the
problem I recalled from your 'trainers' scenario.


In the physical implementation I've no problem with you or anyone
using a so-called surrogate if you also have a natural key or
enterprise key which exists external to the database. I wouldn't
recommend doing it myself for a variety of reasons e.g. doing so makes
data harder to read.


Ideally (that word again), I think the SQL DBMS should store the key
value only once and use an internal surrogate, in the way you do by
hand with autonumber, to point to it; this way an ON UPDATE CASCADE
action would physically change only one value. Such SQLs exist but
professionally I need use Microsoft's products, at least in the
immediate future; Microsoft doesn't have them so I go without.

So my ideal surrogate would be hidden and I'd still see the real key
values in the referencing tables but that's not the reality for me.
With your way (by hand with autonumber) you either have to work with
the meaningless values or use a JOIN (or three or thirteen) to see the
real values, so it's not for me thanks and good luck to you.

The surrogate key, when there is one, is hidden from view. I can see it if
I choose, but it is not exposed to the user. The user, however, will see
enough detail to know which Jamie Collins they are selecting when there are
two employees with that name. The user can identify the unique record when
needed without ever seeing the surrogate key.
One thing that tickles me about the 'autonumber PK' advocates it that
they most often use the incremental Long Integer flavour, presumably
because it makes the data easier to read! If I cannot dissuade you
from autonumber, I urge you to choose random: it will improve
concurrency (let's not start on the whole 'physical clustering on
disk' thing), you will be less inclined to expose the autonumber of
users and, having to type the values over and over, you may come to
appreciate a well designed key (fixed width, check digit, etc). It's
'cruel to be kind'; making you type replication IDs (GUID) would just
be cruel <g>.

I don't care what the autonumber PK looks like, or whether it is random or
sequential or whatever. The choice has nothing whatever to do with making
the data easier to read, because I do not read the autonmber field except
during the development stages. There is no temptation to expose it to the
users. I can imagine limited situations where it would be OK to do so. For
instance, a calls database may use a number as a reference for the call, in
the same way that electronic banking uses an apparently arbitrary number as
a transaction reference. But this is the infrequent exception. I don't
know about replication IDs, so if I am creating a mess for some future
situation about which I have no understanding now, so be it.
 
T

Tony Toews [MVP]

John W. Vinson said:
Ummm....

Access 2.0, back 15 years ago, let you base reports on queries. And I'm pretty
certain that 1.0 and 1.1 did also.

<picking nits> Access 2.0 was AFAIK dated 1994 thus it was 13 years
ago. </picking nits>

Tony
--
Tony Toews, Microsoft Access MVP
Please respond only in the newsgroups so that others can
read the entire thread of messages.
Microsoft Access Links, Hints, Tips & Accounting Systems at
http://www.granite.ab.ca/accsmstr.htm
Tony's Microsoft Access Blog - http://msmvps.com/blogs/access/
 
G

Graham R Seach

Jamie,

The original point you tried to make was that it is wrong to include
calculated fields in a persisted view (capitalisation nothwithstanding), due
to the rules of normalisation. I'm sorry but you are incorrect. It is OK to
include calculated fields in such views because they do not form part of the
logical model, which is where the rules of normalisation apply. We normalise
relations - not tables. Normalisation is not developed at physical level
(where tables and views exist). You have to remember that data modelling is
an implementation-independent exercise, and strictly speaking, you should
not enforce the constraints imposed by one level on the objects of another,
because those objects simply do not exist in the former.

In physical terms, views are not virtual tables, because they do not store
data and nor do they (necessarily) represent formal entities. One should not
assume the wrong interpretation of the word "virtual" in SQL-92. People
(including those who wrote the standard) use the word with abandon, but the
intention was merely to provide a word/phrase which aids in understanding a
concept - not to define a new class of object. That is one salient point!

<<...a SQL VIEW is a table (lowercase) but a SQL TABLE is not a SQL VIEW.>>
A SQL view is not a table; it is the mechanism by which a dataset can be
visualised, which may include data from one more more SQL tables. I agree, a
SQL table is not a SQL view; it is the physical implementation of a
relation, the contents of which cannot be viewed without a SQL view.

<<...but it only become a VIEW if I can achieve the same degree of
normalization as a comparable base table...>>
No, and that's where you misunderstand the concepts involved. Normalisation
is not applied to tables (in the physical implementation) - only to
relations (at logical phase, which is independent of physical
implementation). Relations are logical - tables are physical. Normalisation
is a logical activity - not a physical one. When I design a logical model, I
leave it to the DBA to create the physical model in a way that ensures my
logical design can be imlemented using the chosen DBMS technology. Strictly
speaking, I don't care what technology the DBA will use to implement the
database, and (s)he does not need to worry about normalisation, because by
the time (s)he gets the logical design, it's already done.

<<I simply do not apply normalization to a PROCEDURE's resultset.>>
Of couse not.

<<I'm not sure I even believe in 6NF>>
Me either <smirk>, although I've done quite a bit research with them.

<<A single column lookup table is in 5NF...>>
Yes, point taken.

Regards,
Graham R Seach
Microsoft Access MVP
Sydney, Australia
 
D

datadrenaline

Hey Jamie,

I beleive you mis-understood my post, in conjuction with me probably
not using clear laguage to explain ...

What I said ...
Normalization is the level of efficiency your data model stores data
that is INPUT. A View is an OUTPUT of the normalized data and has no
buisness being part of the data storage model.

Said much better by someone else ...
In any case, you have to make the distinction between (a) the storage
of
data, and (b) the use of that data. Pure storage requires
normalisation. Use
of the data requires whatever form is necessary to convert that data
into
information or knowlege.
.....

Also, the OUPUT does not guide my data model ...

See ya,
Brent
 
J

Jamie Collins

You have completely lost me when you say the PK
designation is always arbitrary. Are you saying the designation is
arbitrary?

Yes.

Consider my favourite example: a temporal database with a business
rule, "at no time can an employee have two salaries." The logical
model identifies a table EmployeeSalaryHistory comprising
employee_number, salary_amount, start_date and end_date; all columns
required; end_date = maximum date (#9999-12-31 23:59:59# in
implementation) is used to represents a row in the current state
current. The logical model candidate keys are:

(employee_number, start_date)
(employee_number, end_date)
(employee_number, start_date, end_date)

In implementation we can put a SQL UNIQUE constraint on all of the
above but I choose to omit the third one because it's already covered
by the first two; this is an example how something in the logical
model may not directly related to a single corresponding object in the
implementation.

There's something missing of course: the above candidate keys only
ensure each *period* is unique, whereas the business rule is "at no
time". To cut a long (and hopefully familiar) story short, we need a
sequenced key where overlapping periods are prevented (in
implementation a CHECK constraint may be used plus some other
embellishments).

It is this sequenced key that I would consider as being the logical
primary key (lowercase) of the table.

As described we have implemented a table with a primary key but no SQL
(the language) PRIMARY KEY (PK) designation. Just about everyone says
that every SQL table should have a PK. Take a look in the ANSI SQL
spec:

1) "none of the values in the specified column or columns be the null
value" (section 4.10).
2) PK is a unique constraint (section 4.10.2).
3) there can only be one PK per table (section 11.7).
4) For the references specification of a FOREIGN KEY: "If the
<referenced table and columns> does not specify a <reference column
list>, then the table descriptor of the referenced table shall include
a unique constraint that specifies PRIMARY KEY" (section 11.8) [i.e.
you don't specify the columns involved, those of the PK will be used.]

That's it as far as SQL (the language) is concerned. The unwritten
implication is that PRIMARY KEY will have an 'implementation specific'
meaning. For Access/Jet, they are:

1) Determines clustering (physical ordering) on disk on compact.
2) PK columns appear as bold text in the Relationships diagram.
3) If you don't specify a PK the Access interface will nag you,
"Although a primary key isn't required, it's highly recommended. A
table must have a primary key for you to define a relationship between
this table and other tables in the database [incorrect!]. Do you want
[Access] to create a primary key now?"

The choice is arbitrary. Whereas it is objectively demonstrable
whether a table is in, say, 3NF, you can't say look at a table and
say, "Hey, you've picked the wrong candidate for PK there," because
the choice of PK subjective, determined by personal preference,
prejudice and ignorance.

Back to the EmployeeSalaryHistory: we have two candidate keys defined
using UNIQUE. Do I really have to choose one to promote to PK. Just
about everyone says every table should have a PRIMAY KEY ...or do they
really mean primary key? FWIW David Portas SQL Server MVP seems to
have similar dilemmas:

Down with Primary Keys?
http://blogs.conchango.com/davidportas/archive/2006/09/14/Down-with-Primary-Keys_3F00_.aspx

I say that yes, every table should have a PRIMARY KEY because
otherwise at the very least you are missing out on the
implementation's meaning; worse, though, you would be at the mercy of
the implementation's designers' default behaviour e.g. in Access/Jet
another NOT NULL (?) UNIQUE will be used for physical ordering
(clustering) but this is undocumented and because I have no idea how
the other candidates are eliminated (first one defined
chronologically? one defined on first columns in left-to-right order?
one defined on first columns in chronologically created order? does
nullable columns affect the choice) I'll make an explicit choice based
on my own criteria.

I've previously mentioned a 'telephone directory' style table where
phone number is the real world unique key and clustering would favour
last name and because in Access/Jet there is no explicit way of
specifying clustering then PRIMARY KEY (last_name, phone_number) would
be best for these aims. Thus, in this scenario the PRIMARY KEY isn't
even a candidate key!
when I refer
to a PK I mean the field or fields that are used for relationships.

Do you not have tables for which more than one key are referenced by
other tables?

Jamie.

--
 
J

Jamie Collins

you have to make the distinction between (a) the storage
of
data, and (b) the use of that data. Pure storage requires
normalisation. Use
of the data requires whatever form is necessary to convert that data
into
information or knowlege.
....

Also, the OUPUT does not guide my data model ...

I think I see what you are getting at. A report is output, hence
doesn't need to be normalized. A stored procedure's resultset is
output, hence doesn't need to be normalized. However, a SQL VIEW is a
table and tables should be normalized.

Jamie.

--
 
J

Jamie Collins

A SQL view is not a table; it is the mechanism by which a dataset can be
visualised

I think you are confusing the concepts of 'SQL VIEW' and 'resultset'.
As I previously stated, if by 'view' you mean 'resultset' then we are
in broad agreement.

In SQL terms, however, a VIEW is indeed a table. The SQL-92 spec is
explicit about this
:

"4.9 Tables: A table is either a base table, a viewed table, or a
derived table... A viewed table is a named derived table defined by a
Normalisation is not applied to tables (in the physical implementation)

Yes, I'm being a little informal by comparing a table in the
implementation to the normal forms. So if you want to be strict,
there's no way of saying which normal form a single column 'all key'
SQL table is in because the term simply does not apply. However, I
think most people would choose an informal approach and say, "This SQL
table is in 5NF."

Jamie.

--
 
G

Graham R Seach

Hi Jamie,

....<<a VIEW is indeed a table. The SQL-92 spec is explicit about this...>>
(Sigh) Sect. 4 is a section describing the concepts involved. Although your
quote is verbatim, that section simply seeks to describe the concepts. It is
not defining classes of objects.

In any case, the original point you tried to make was that it is wrong to
include calculated fields in a persisted view (capitalisation
nothwithstanding), due to the rules of normalisation. I restate that it is
OK to
include calculated fields in such views because they do not form part of the
logical model, which is where the rules of normalisation apply. That's my
point.

Regards,
Graham R Seach
Microsoft Access MVP
Sydney, Australia


Jamie Collins said:
A SQL view is not a table; it is the mechanism by which a dataset can be
visualised

I think you are confusing the concepts of 'SQL VIEW' and 'resultset'.
As I previously stated, if by 'view' you mean 'resultset' then we are
in broad agreement.

In SQL terms, however, a VIEW is indeed a table. The SQL-92 spec is
explicit about this
:

"4.9 Tables: A table is either a base table, a viewed table, or a
derived table... A viewed table is a named derived table defined by a
Normalisation is not applied to tables (in the physical implementation)

Yes, I'm being a little informal by comparing a table in the
implementation to the normal forms. So if you want to be strict,
there's no way of saying which normal form a single column 'all key'
SQL table is in because the term simply does not apply. However, I
think most people would choose an informal approach and say, "This SQL
table is in 5NF."

Jamie.
 
B

BruceM

Jamie Collins said:
Yes.

Consider my favourite example: a temporal database with a business
rule, "at no time can an employee have two salaries." The logical
model identifies a table EmployeeSalaryHistory comprising
employee_number, salary_amount, start_date and end_date; all columns
required; end_date = maximum date (#9999-12-31 23:59:59# in
implementation) is used to represents a row in the current state
current. The logical model candidate keys are:

(employee_number, start_date)
(employee_number, end_date)
(employee_number, start_date, end_date)

Can end_date be part of a candiate key considering that for the current
salary level there is no end date, and the field would be null?
In implementation we can put a SQL UNIQUE constraint on all of the
above but I choose to omit the third one because it's already covered
by the first two; this is an example how something in the logical
model may not directly related to a single corresponding object in the
implementation.

There's something missing of course: the above candidate keys only
ensure each *period* is unique, whereas the business rule is "at no
time". To cut a long (and hopefully familiar) story short, we need a
sequenced key where overlapping periods are prevented (in
implementation a CHECK constraint may be used plus some other
embellishments).

It is this sequenced key that I would consider as being the logical
primary key (lowercase) of the table.

As described we have implemented a table with a primary key but no SQL
(the language) PRIMARY KEY (PK) designation. Just about everyone says
that every SQL table should have a PK. Take a look in the ANSI SQL
spec:

1) "none of the values in the specified column or columns be the null
value" (section 4.10).
2) PK is a unique constraint (section 4.10.2).
3) there can only be one PK per table (section 11.7).
4) For the references specification of a FOREIGN KEY: "If the
<referenced table and columns> does not specify a <reference column
list>, then the table descriptor of the referenced table shall include
a unique constraint that specifies PRIMARY KEY" (section 11.8) [i.e.
you don't specify the columns involved, those of the PK will be used.]

That's it as far as SQL (the language) is concerned. The unwritten
implication is that PRIMARY KEY will have an 'implementation specific'
meaning. For Access/Jet, they are:

1) Determines clustering (physical ordering) on disk on compact.
2) PK columns appear as bold text in the Relationships diagram.
3) If you don't specify a PK the Access interface will nag you,
"Although a primary key isn't required, it's highly recommended. A
table must have a primary key for you to define a relationship between
this table and other tables in the database [incorrect!]. Do you want
[Access] to create a primary key now?"

The choice is arbitrary. Whereas it is objectively demonstrable
whether a table is in, say, 3NF, you can't say look at a table and
say, "Hey, you've picked the wrong candidate for PK there," because
the choice of PK subjective, determined by personal preference,
prejudice and ignorance.

Back to the EmployeeSalaryHistory: we have two candidate keys defined
using UNIQUE. Do I really have to choose one to promote to PK. Just
about everyone says every table should have a PRIMAY KEY ...or do they
really mean primary key? FWIW David Portas SQL Server MVP seems to
have similar dilemmas:

Down with Primary Keys?
http://blogs.conchango.com/davidportas/archive/2006/09/14/Down-with-Primary-Keys_3F00_.aspx

I say that yes, every table should have a PRIMARY KEY because
otherwise at the very least you are missing out on the
implementation's meaning; worse, though, you would be at the mercy of
the implementation's designers' default behaviour e.g. in Access/Jet
another NOT NULL (?) UNIQUE will be used for physical ordering
(clustering) but this is undocumented and because I have no idea how
the other candidates are eliminated (first one defined
chronologically? one defined on first columns in left-to-right order?
one defined on first columns in chronologically created order? does
nullable columns affect the choice) I'll make an explicit choice based
on my own criteria.

I've previously mentioned a 'telephone directory' style table where
phone number is the real world unique key and clustering would favour
last name and because in Access/Jet there is no explicit way of
specifying clustering then PRIMARY KEY (last_name, phone_number) would
be best for these aims. Thus, in this scenario the PRIMARY KEY isn't
even a candidate key!

Just to be picky, a relative could end up with the phone number after moving
into the ancestral home, or something like that. I've seen it several
times, once with the son having the same name as the father (meaning that
the suffix "Jr" is needed to distinguish the two). I realize that you are
talking about a phone number *style* table and not an actual phone
directory, but the model raises some questions.
Do you not have tables for which more than one key are referenced by
other tables?

A few where more than one field is used as the PK (junction tables,
especially). The only way I know how to do that is by designating the
combination of fields as the PK.

As I have mentioned, we just had the employeeID number change. I have
linked to the Employee table in other databases. Now I have a "real world"
key with no validity. Employee 5487 is now 8279. Maybe I can update if
there is no 8279, but if there is already an 8279 I need to change that
person to 8597, assuming nobody is 8597. This could go on for a while, and
as I've said there are several databases involved. I would rather avoid
this problem, especially considering that it could happen again when
corporate decides that there should be a prefix to identify the facility, or
whatever. If you do not see a problem here, you may make a different
choice.

Again, please understand that I still require that the EmployeeID number be
unique. I just don't use it as the field for relationships.
 
J

Jamie Collins

...<<a VIEW is indeed a table. The SQL-92 spec is explicit about this...>>
(Sigh) Sect. 4 is a section describing the concepts involved. Although your
quote is verbatim, that section simply seeks to describe the concepts. It is
not defining classes of objects.

If section 4 is too 'conceptual' for you, try 11.19:

11.19 <view definition>

Function

Define a viewed table.

Format
<view definition> ::=
CREATE VIEW <table name> [ <left paren> <view column list>
<right paren> ]
AS <query expression>
[ WITH [ <levels clause> ] CHECK OPTION ]

Note the use of the word 'table'. This is what I mean when I use 'SQL
VIEW' (or just use uppercase 'VIEW' to indicate a SQL keyword) i.e.
the one that means a table in the SQL language.

Again, if you want to use 'view' informally to mean 'resultset' then
we are in broad agreeement.
In any case, the original point you tried to make was that it is wrong to
include calculated fields in a persisted view (capitalisation
nothwithstanding), due to the rules of normalisation.
I restate that it is OK to
include calculated fields in such views because they do not form part of the
logical model, which is where the rules of normalisation apply. That's my
point.

Do you not agree that one can *informally* compare a SQL table to the
normal forms? I certainly see in the groups people say things like,
"Your table is not fully normalized" where only the implemented table
has been posted, not the table from the logical model (I know it's the
implemented table because they frequently mention an autonumber PK
which cannot exist in the logical model!)

Jamie.

--
 
J

Jamie Collins

Can end_date be part of a candiate key considering that for the current
salary level there is no end date, and the field would be null?

Yes it can and good point too because that would be an example of a
logical candidate key that is ineligible for consideration for a SQL
PRIMARY KEY merely due to the fact that the person implementing the
logical model has chosen to use NULL to signify the current state (a
very good approach to modelling current periods, IMO).
Just to be picky, a relative could end up with the phone number after moving
into the ancestral home, or something like that.

Yes, so what? Telephone number alone is logical key. The use of

PRIMARY KEY (last_name, phone_number)

here is merely for clustering purposes, PRIMARY KEY (uppercase) not
actually being a primary key in this case.
A few where more than one field is used as the PK (junction tables,
especially). The only way I know how to do that is by designating the
combination of fields as the PK.

Consider a 'supeclass' table Automobiles with columns VIN (unique) and
vehicle_type ('SUV', 'Sedan', etc) with 'subclass' tables for Sedans
and SUVs respectively. You want to prevent rows where
vehicle_type='Sedan' from appearing in the SUVs table. With a
composite key (vehicle_type, VIN) used to reference the Automobiles
tables I can put a validation rule CHECK (vehicle_type = 'SUV') in the
subclassed table. How do you do this with a sole autonumber PK column?
Other tables may pay no regard to vehicle type, therefore may
reference Automobiles using just VIN. Automobiles would be an example
of a table referenced by two keys, (vehicle_type, VIN) and VIN
respectively.

Jamie.

--
 
B

BruceM

Jamie Collins said:
Yes it can and good point too because that would be an example of a
logical candidate key that is ineligible for consideration for a SQL
PRIMARY KEY merely due to the fact that the person implementing the
logical model has chosen to use NULL to signify the current state (a
very good approach to modelling current periods, IMO).

I think you are saying that the PK designation need not be used. In any
case, I still don't get how a combination of fields that includes values
subject to change (such as End_Date becoming not null) would be used in
relationships. I think I get what you are saying about candidate keys and a
number of other related things, but I still don't get how relationships are
modeled.
Yes, so what? Telephone number alone is logical key. The use of

PRIMARY KEY (last_name, phone_number)

here is merely for clustering purposes, PRIMARY KEY (uppercase) not
actually being a primary key in this case.

So PRIMARY KEY is not really a primary key, and the use of a field in a
composite PRIMARY KEY that is not really a primary key is about clustering
rather than uniqueness? Do you really wonder that I am having trouble
following your points?
Consider a 'supeclass' table Automobiles with columns VIN (unique) and
vehicle_type ('SUV', 'Sedan', etc) with 'subclass' tables for Sedans
and SUVs respectively. You want to prevent rows where
vehicle_type='Sedan' from appearing in the SUVs table. With a
composite key (vehicle_type, VIN) used to reference the Automobiles
tables I can put a validation rule CHECK (vehicle_type = 'SUV') in the
subclassed table. How do you do this with a sole autonumber PK column?

I would limit the user choice in the front end to vehicles of the
appropriate category. When "SUV" is selected, a combo box from which to
select the model would show only SUVs.
 
D

datAdrenaline

Well ... I had a HUGE disertation that I intended to send to the
group ... But I think I sent it directly to you Jamie! ... (shows my
lack of familiarity with GoogleGroups and news groups in general!)

Anyway the long an short of it was this:

A VIEW/view is a table by definition in SQL-92 ... however, a 'table'
by SQL-92 definition reads like this: "A table is a multiset of rows."
which seems to be the equivalent to what you have called a
resultset .... which, by the way is not a term in the spec I
referenced, nor is rowset or recordset.

table
A table is a multiset of rows...<clip>..A table is either a base
table, a viewed table, or a derived table.

base table
A base table is either a persistent base table, a global temporary
table, a created local temporary table, or a declared local temporary
table.
A persistent base table is a named table defined by a <table
definition> that does not specify TEMPORARY.

derived table
A derived table is a table derived directly or indirectly from one or
more other tables by the evaluation of a <query expression>.

viewed table
A viewed table is a named derived table defined by a <view
definition>.

Which leads to ....

<table definition> (11.3)
A persistent base table, a created local temporary table, or a global
temporary table.

<query expression> (7.1) {put together with some of my own words and
arrangement}
A <row value constructor> which will specify an ordered set of values
to be constructed into a row or partial row...

<view definition> (4.9/11.13) {summarized with my own words ...
hopefully we can agree}
A named <query expression>. Maybe a persistant <query expression> ...
Either way ... seems to be derived! (ie: an 'output')

So ... with these definitions a VIEW is a table, but a table is not a
base table, and thus, a VIEW is not required to meet the set
informalized rules for normalization implied by the statement:

"Are your tables normalized?"

Which should be properly stated (can there be proper imformal?) ...

"Are your base tables normalized?"

.... So, as I stated to Jamie in an individual reply, we have spent
all this time pegging the geek meter ... proving that we are truly not-
normal, by discussing something in depth to which we all seem to
broadly agree.

Regards,
Brent Spaulding
 
J

Jamie Collins

So ... with these definitions a VIEW is a table, but a table is not a
base table, and thus, a VIEW is not required to meet the set
informalized rules for normalization implied by the statement:

"Are your tables normalized?"

Which should be properly stated (can there be proper imformal?) ...

"Are your base tables normalized?"

... So, as I stated to Jamie in an individual reply, we have spent
all this time pegging the geek meter ... proving that we are truly not-
normal, by discussing something in depth to which we all seem to
broadly agree.

Agree? Not really, other than we all seemed to agree that a resultset
(my term) does not need to be normalized.

You seem to be saying that normalization can be applied to SQL objects
but only SQL base tables; I've agreed this can only be done
*informally*.

Graham R Seach seems to be saying that normalization cannot be applied
to SQL objects; while I've agreed this is the correct *formal*
approach and I've tried to clarify whether he would permit it
informally (as the person I was originally replying to had done) but
he has not replied.

I'm saying that if you choose to apply normalization to SQL base
tables, as many people do in the groups, then you should apply it
equally to SQL viewed tables (VIEWs); as a practical approach I've
suggested that if your VIEW would not 'pass muster' as a normalized
table then alter it to become the resultset of a stored procedure and
that will prevent it being used as a table (being queried as a table,
SQL updates as a table) -- and you seem to disagree with all this!

Prehaps it's time to "agree to disagree" :)

Jamie.

--
 
J

Jamie Collins

I think you are saying that the PK designation need not be used. In any
case, I still don't get how a combination of fields that includes values
subject to change (such as End_Date becoming not null) would be used in
relationships. I think I get what you are saying about candidate keys and a
number of other related things, but I still don't get how relationships are
modeled.

We have a minor language issue. You seem to use "relationship" to mean
"Access Relationship with RI enforced", whereas I would use "FOREIGN
KEY" or perhaps "referenced table" and "referencing tables" in
context. FWIW I model a relationship using a table which I call a
"relationship table" which IIRC you call a "junction table".

So, I'm assuming you are referring to a FOREIGN KEY (my term). You are
correct that the NULL value here is a complication and one worth
avoiding by design. In Access/Jet, NULL and non-NULL values in the
referencing table must match exactly in the referencing table by
treating the NULL value as if it were an actual value; while this is
not strictly the correct way to handle the NULL value, it has utility
because the non-matching non-NULL values will cause the FK to bite,
also altering the NULL value to a non-NULL value in the referencing
table will result in the new value being CASCADEd to the referencing
table (assuming the ON UPDATE CASCADE referential action has been
specified for the FK). Contrast this with SQL Server which will not
test any FK rows involving the NULL value (i.e. the non-NULL values
may not match but the still FK will not bite as it would in Access/
Jet) and changing from the NULL value to a non-NULL value does not
CASCADE; arguably more correct but has less utility. I'm finding it
hard to describe long hand so here's an example (sorry its ANSI-92
Query Mode and includes the dreaded DDL):

CREATE TABLE Test1
(
col1 INTEGER NOT NULL,
col2 INTEGER, UNIQUE (col1, col2)
)
;
CREATE TABLE Test2
(
col1 INTEGER NOT NULL,
col2 INTEGER,
FOREIGN KEY (col1, col2)
REFERENCES Test1 (col1, col2)
ON DELETE CASCADE
ON UPDATE CASCADE
)
;
INSERT INTO Test1 (col1, col2)
VALUES (1, NULL)
;
INSERT INTO Test2 (col1, col2)
VALUES (1, NULL)
;

Consider this query:

SELECT *
FROM Test1 AS T1
INNER JOIN Test2 AS T2
ON T1.col1 = T2.col1
AND T1.col2 = T2.col2;

returns no rows because T1.col2=T2.col2, NULL=NULL, causes the row to
be removed from the resultset; put crudely, you can't compare the NULL
value to anything, even the NULL value.

Now consider this:

DELETE FROM Test1;

causes the row in Test2 to be removed; Access/Jet successfully matched
NULL=NULL. Impressive or no? In SQL Server, the row would remain (not
orphaned because it was never considered to be referencing in the
first place).

That's one of the reasons why I used a NOT NULL end_date when I posted
the example i.e. to avoid this complexity ...but you did ask :)
Regardless, I would expect any designer to use (employee_number,
start_date) as the key for referencing tables.
So PRIMARY KEY is not really a primary key, and the use of a field in a
composite PRIMARY KEY that is not really a primary key is about clustering
rather than uniqueness? Do you really wonder that I am having trouble
following your points?

I've tried to show you tables with a PRIMARY KEY but no logical key,
tables where a PRIMARY KEY cannot prevent duplicate data, and tables
where PRIMARY KEY is used for purposes of physical indexing rather
than logical keys.

When you choose to use PRIMARY KEY, the choice of what you use it for
it arbitrary; feel free to omit a PRIMARY KEY but be ready for someone
to challenge you on it (e.g. document your reasons) because there is
this 'touch stone' in SQL that every table should have a primary key/
PRIMARY KEY -- I really don't know what variety they mean! Personally,
I think every table should have a both a primary key and a PRIMARY
KEY, basing their choice for the latter on good (documented) reasons.
For Access/Jet, clustering is IMO the best reason for using PRIMARY
KEY but I do know that many people value that bold text in the Access
'Relationships' diagram.

Final point on this subject: if someone is unaware of Access/Jet's
clustering behaviour, can they really make an informed decision about
PRIMARY KEY designation?
I would limit the user choice in the front end to vehicles of the
appropriate category. When "SUV" is selected, a combo box from which to
select the model would show only SUVs.

Do I need to tell you that your approach doesn't actually solve the
problem at hand? The aim is to prevent this (aircode):

INSERT INTO SUVs (VIN, etc)
SELECT VIN, etc)
FROM Vehicles
WHERE vehicle_type = 'Sedan';

and similar updates that are contrary to the business rules.

What's your position on PRIMARY KEY? If I showed you this table:

CREATE TABLE Employees
(
employee_number INTEGER
);

and pointed out that all nine front end programs known to use the
database have front end code to trap duplicates and NULLs, would you
think it was a good idea to omit a NOT NULL unique constraint from the
SQL table?

Jamie.

--
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top