Autonumber Fields

B

BruceM

Thanks for taking the time to reply. In particular, I now understand that
making a field (or combination of fields) the PK creates what you have been
referring to as a clustered index.
I am going to combine things from several of your replies here. If I use
name and address as the PK, what becomes of relationships if the name and/or
address changes? I would want my PK to be something that will not change.
Also, having two people with the same name at the same address is not at all
unusual. For women in particular the designation Jr., III, etc. is rarely
used, so the names are literally indistinguishable without further
clarification. Birth date is one way of distinguishing the two, but it may
not be appropriate to gather such information. It may be that a comments
field or something of the sort is the only way of telling the two apart. A
mother and daughter under the same roof, whether at a family business or at
home, is far from an unlikely occurence, and one against which I must guard.
I use queries (SQL) for ordering. In a Contacts table or some such listing
of names I generally use name, for instance. I base forms and reports on
the query, not on the table. How does this enter into ordering of data as
you have described it?
 
B

BruceM

Thank you for the reply. I think the person (or people) who argue for a
"natural" PK have made the point that the PK does not need to be used in the
relationships. In other words, a unique field can be on the one side of a
one-to-many, and the PK can be a separate entity. The point of a PK, in
that way of thinking, is to guard against duplication (e.g. entering the
same person twice) in a way that would not be possible if an individual
field is set to Indexed (No Duplicates).
I would rather use a form's Before Update event for such checking. In
another posting I referred to the real-life situation of a mother and
daughter with the same name in a family-run business. Even if it is
uncommon, it is certainly at least possible. I would want the option of
allowing the apparent duplication in at least some cases. I would need to
reinvent the PK if I was using a multi-field PK for that purpose and
discovered an apparent duplication.
I am inclined to use autonumber or some other fixed value (e.g. Invoice
Number) as the PK. I use code to contrive an incrementing value in the case
of the aforementioned Invoice Number and other situations in which the
number is part of what the user sees. I don't see a need for two guaranteed
unique identifiers in one record. Even in the case of Employee ID numbers I
lean toward a meaningless number, because if the company changes hands they
may switch to a different numbering system for Employee IDs. However, I am
interested in hearing other views on the subject. In the course of the
dialogue I am learning some new things, even if I don't decide to change to
multi-field PKs as a matter of course.
 
R

Roger Carlson

Well, the battle between whether to use Natural Keys or Surrogate Keys has
waged for years and we will not resolve it here. But please do not tell me
and others what I do not understand just because our opinions differ.

A Primary Key has two properites: 1) it must be unique, and 2) it must not
be NULL. In any table there may be several fields (or combinations) of
fields which could possibly fulfill these conditions. These are called
Candidate Keys. The job of the database developer is to decide between
these CKs to see which is the absolute best. In the case of the Contacts
table (mentioned in a separate post); last_name, first_name and
postal_address could provide a key, but it is not a really good choice.
While this may provide uniqueness (maybe), there is a real possibility that
a person does not have a postal address (no part of a compound primary key
can be NULL). This is one of the difficulties of finding a good natural
keys to make up the primary key.

Now, there are actually two purposes for uniqueness: 1) Real-world
uniqueness to make sure an entity appears only once in a table, and 2)
Relational uniqueness for relating records between tables. As you noted,
autonumber fields are lousy in terms of real-world uniqueness. However,
they are perfect for creating uniqueness for relating records. In this
sense, the autonumber field is absolutely unique. It can appear ONLY once
and can never be repeated. Further, as a long integer, it is VERY efficient
for relating records compared to a compound primary key composed of text
fields.

So when I recommend autonumbers for primary keys, I also recommend creating
a separate UNIQUE INDEX that is composed of fields from a Natural Key. This
fulfils the function of real-world uniqueness and is more flexible, because
you CAN allow Nulls in a unique index if you want. In this way, both
functions of a Primary Key are fulfilled to their maximum efficiency.

You may not agree with this. I know many respected developers who don't.
Opinions differ, but this does not give you the right to disparage those who
disagree with you.

--
--Roger Carlson
Access Database Samples: www.rogersaccesslibrary.com
Want answers to your Access questions in your Email?
Free subscription:
http://peach.ease.lsoft.com/scripts/wa.exe?SUBED1=ACCESS-L
 
P

peregenem

BruceM said:
If I use
name and address as the PK, what becomes of relationships if the name and/or
address changes?

If your table only has three attributes (e.g. first_name, last_name,
postal_address) and a composite/concatenation does not provide a key
then you don't have a key. You must expand your data model e.g. collect
more information, create an artificial key and expose it (e.g. assign
it to your contact entities). Adding a 'notes' column into the mix is
probably not going to lead to an acceptable solution to the problem.

If you are going down the road of creating your own encoding system,
you should be able to do better than the autonumber algorithm (i.e.
MAX+1 or MAX+10000 if something could have gone 'wrong'), as you have
done with your

CHECK(EmployeeID LIKE '[1-9][0-9][0-9][0-9]')

constraint (it has 'internal' validation i.e. a value other than four
numeric characters is obviously incorrect).

ALTER TABLE Prospects
ADD CONSRAINT fk_prospects__contacts FOREIGN KEY
(contact_first_name, contact_last_name, contact_postal_address)
REFERENCES Contacts
(first_name, last_name, postal_address)
ON DELETE CASCADE
ON UPDATE CASCADE;

As I alluded to before, you may have problems with the postal address
which may not be captured in a suitable way to be used for in a key.
I would want my PK to be something that will not change.

Permanency is a boon but one doesn't always have that luxury.
 
B

BruceM

I think I understand that your PK and your field(s) on the one side of
one-to-many may not be the same. If so, and if the PK is not part of a
relationship, would the purpose of your PK be to guard against duplication?
If so, do you regard that as a more efficient use of recources than data
validation code in the form's Before Update event?
Also, if by "expose it" (in reference to an artificial key) you mean show it
to the user, why would that be necessary?

If I use
name and address as the PK, what becomes of relationships if the name
and/or
address changes?

If your table only has three attributes (e.g. first_name, last_name,
postal_address) and a composite/concatenation does not provide a key
then you don't have a key. You must expand your data model e.g. collect
more information, create an artificial key and expose it (e.g. assign
it to your contact entities). Adding a 'notes' column into the mix is
probably not going to lead to an acceptable solution to the problem.

If you are going down the road of creating your own encoding system,
you should be able to do better than the autonumber algorithm (i.e.
MAX+1 or MAX+10000 if something could have gone 'wrong'), as you have
done with your

CHECK(EmployeeID LIKE '[1-9][0-9][0-9][0-9]')

constraint (it has 'internal' validation i.e. a value other than four
numeric characters is obviously incorrect).

ALTER TABLE Prospects
ADD CONSRAINT fk_prospects__contacts FOREIGN KEY
(contact_first_name, contact_last_name, contact_postal_address)
REFERENCES Contacts
(first_name, last_name, postal_address)
ON DELETE CASCADE
ON UPDATE CASCADE;

As I alluded to before, you may have problems with the postal address
which may not be captured in a suitable way to be used for in a key.
I would want my PK to be something that will not change.

Permanency is a boon but one doesn't always have that luxury.
 
P

peregenem

Roger said:
the battle between whether to use Natural Keys or Surrogate Keys has
waged for years

Again, you've misunderstood. That's a different debate which I don't
want to do.
A Primary Key has two properites: 1) it must be unique,
and 2) it must not be NULL.

Again, you've misunderstood. I can add to your list of properties: 3)
It has special meaning in the SQL implementation. In the case of
Access/Jet, this special meaning is clustered index. Are you aware of
this special meaning? If you are, why do you think an incrementing
integer with no meaning in the data model makes for a good clustered
index?

Your two properties can be satisfied by constraining with NOT NULL
UNIQUE. There's only one way of creating a clustered index in
Access/Jet and that's using PRIMARY KEY.
As you noted,
autonumber fields are lousy in terms of real-world uniqueness.
However, they are perfect for creating uniqueness for relating records.

Perfect? As I said up thread, the only good things about autonumber is
that it is convenient (because it provided by the 'system') and it is
efficient (data type = INTEGER). Do you really think there is no better
key generation algorithm than autonumber's? If autonumbers were
perfect, we wouldn't see daily posts of the like, 'Why do I have gaps
in my ID numbers?' I can see how 'quick and dirty' may equate to 'good'
given the right circumstances but 'perfect' may be an overstatement.
when I recommend autonumbers for primary keys, I also recommend
creating a separate UNIQUE INDEX that is composed of fields from
a Natural Key. This fulfils the function of real-world uniqueness and
is more flexible, because you CAN allow Nulls in a unique index if
you want.

Why not use a UNIQUE on the autonumber column and use your clustered
index (PRIMARY KEY) for a more worthy candidate?
You may not agree with this. I know many respected developers who don't.
Opinions differ, but this does not give you the right to disparage those who
disagree with you.

I am saying I believe you have misunderstood what PRIMARY KEY means.
All you have said indicates you are unaware of the special clustered
index meaning that PRIMARY KEY has in Access/Jet. I am open minded to
there being another explanation: why do you think an autonumber makes a
good clustered index?

Apologies if you feel disparaged. That was not my intention. We all
make misstatements from time to time.
 
P

peregenem

BruceM said:
I think I understand that your PK and your field(s) on the one side of
one-to-many may not be the same. If so, and if the PK is not part of a
relationship, would the purpose of your PK be to guard against duplication?

No, it would be the 'field(s) on the one side of one-to-many' that
would guard against duplication. The purpose of the PK would be to
avoid a performance-degrading clustered index and, if possible, to
provide for a performance-enhancing clustered index.
If so, do you regard that as a more efficient use of recources than data
validation code in the form's Before Update event?

I do not rely on front end applications to enforce data integrity and
would strongly discourage such development. If I have a business rule
which Jet cannot enforce via constraints then I would recommend porting
to MSDE. For me, 'efficiency' doesn't come anything close to data
integrity in terms of importance.
Also, if by "expose it" (in reference to an artificial key) you mean show it
to the user, why would that be necessary?

You know Miss N E Person ID=1 of Main Street and Miss N E Person ID=2
of Main Street are different people, but when you speak to one of them
on the phone, how do *they* tell you which one they are?
 
R

Roger Carlson

Again, you've misunderstood. I can add to your list of properties: 3)
It has special meaning in the SQL implementation. In the case of
Access/Jet, this special meaning is clustered index. Are you aware of
this special meaning? If you are, why do you think an incrementing
integer with no meaning in the data model makes for a good clustered
index?

I don't. I don't recall talking about clustered indexes at all. I also
believe that a clustered index is a requirement for a primary key.
Certainly, EF Codd said nothing about it. This is an implementation issue,
not a design issue.
Perfect? As I said up thread, the only good things about autonumber is
that it is convenient (because it provided by the 'system') and it is
efficient (data type = INTEGER). Do you really think there is no better
key generation algorithm than autonumber's? If autonumbers were
perfect, we wouldn't see daily posts of the like, 'Why do I have gaps
in my ID numbers?' I can see how 'quick and dirty' may equate to 'good'
given the right circumstances but 'perfect' may be an overstatement.

Yes, perfect. (BTW, autonumbers are Long Integer, not Integer). In terms
of Relational uniqueness, why do you care if gaps develop? This is internal
to the database and the user should never see them. People who ask those
questions don't understand the purpose of an autonumber primary key. It's
got nothing to do with 'quick and dirty'. And in fact, convenience is only
a small part of what makes autonumber PKs perfect. As you will see below,
they also minimize redundant data in the database, make query design and
application design quicker and easier. And if you don't think that's
important, then you are not an application developer.
Why not use a UNIQUE on the autonumber column and use your clustered
index (PRIMARY KEY) for a more worthy candidate?

Because this is backwards. There is no reason to put a Unique constraint on
an autonumber. It is unique by definition. One major problem with compound
primary keys is that it increases the amount of Redundant data. Since
Normalization is the process of removing redundant data, by Codd's
definition, the smallest unique identifier is the best. An autonumber will
provide the smallest amount of redundancy.

In terms of efficiency, not only is the autonumber more efficient for
executing Joins, it is much easier to write queries with autonumber
primary/foreign keys than compound text fields. Consider the Contact table.
Suppose each Contact could have multiple phone numbers. In this case,
proper design dictiates you would need a separate table for the phone
numbers. The tables would look like this:

Contacts PhNumbers
========= ===========
LastName(cpk)-----<LastName(cpk fk)
FirstName(cpk)----<FirstName(cpk fk)
Address(cpk)------<Address(cpk fk)
(more fields) Phone (cpk)

Creating a query would look like this:
SELECT Contacts.*, PHNumbers.Phone
FROM Contacts INNER JOIN PHNumbers ON
(Contacts.Address = PHNumbers.Address) AND
(Contacts.FirstName = PHNumbers.FirstName) AND
(Contacts.LastName = PHNumbers.LastName);

Whereas with an autonumber primary key, you reduce the size of your database
by storing smaller primary/foreign key values, it is also easier to write
queries:

Contacts PhNumbers
========= ===========
ContactID(pk)--| PhoneID (pk)
LastName --<ContactID( fk)
FirstName Phone
Address
(more fields)

Creating a query would look like this:
SELECT Contacts.*, PHNumbers.Phone
FROM Contacts INNER JOIN PHNumbers ON
(Contacts.ContactID = PHNumbers.ContactID);

This would be even worse if Contacts participated in a Many-To-Many
relationship.
I am saying I believe you have misunderstood what PRIMARY KEY means.
All you have said indicates you are unaware of the special clustered
index meaning that PRIMARY KEY has in Access/Jet. I am open minded to
there being another explanation: why do you think an autonumber makes a
good clustered index?

I understand quite well what a Primary Key means. By database theory it has
nothing to do with custered indexes and I don't know why you insist upon it.
While a clustered index may be a good thing, it is by no means required by
database theory or for a good database design. Especially when it
complicates things like creating queries, forms, reports and the like.
Database design which complicates application design is not a good thing
either.
Apologies if you feel disparaged. That was not my intention. We all
make misstatements from time to time.

I have made no misstatements. Certainly I disagree with you, but that
doesn't by definition make me wrong.
 
B

BruceM

"I do not rely on front end applications to enforce data integrity and would
strongly discourage such development. If I have a business rule which Jet
cannot enforce via constraints then I would recommend porting to MSDE."

I really don't know what you mean by "porting to MSDE". I found out what
MSDE is, but for my purposes I will find a way to validate the data, whether
at the table level or in the front end. You can strongly discourage using
the front end for such purposes, but have not provided a reason why. If
there is a Spouse First Name field it may be required if the person is
married, but certainly not otherwise. My choice is the front end for such
validation rather than another piece of software. Before Update works for
my purposes. I will continue to use it. Data integrity is not compromised.
The database works smoothly and quickly. If it is "inefficient" it is so on
a level that is not important to me.

"Miss N E Person ID=1 of Main Street and Miss N E Person ID=2 of Main Street
are different people". Then I will figure out a way of telling them apart
that is useful to the person who needs to call or contact one or the other.
Knowing that they are different records in the database (because they have
different ID numbers) is not helpful in telling them apart.
 
R

Roger Carlson

You've misunderstood what PRIMARY KEY means. An unique integer which
has no meaning in respect fo the entities being modelled makes a lousy
PRIMARY KEY. Google for "clustered index" in the Access groups.

An autonumber is a convenient uniqueifier but unquieness for its own
sake make not be such a good thing.
 
J

John Vinson

For a Contacts table, last_name, first_name and postal_address makes a
fine natural key

Fred Brown, xxx D Street, Parma, Idaho
Fred Brown, xxx D Street, Parma, Idaho

Two friends of mine.


John W. Vinson[MVP]
 
R

Rick Brandt

BruceM said:
"I do not rely on front end applications to enforce data integrity
and would strongly discourage such development. If I have a business
rule which Jet cannot enforce via constraints then I would recommend
porting to MSDE."

I really don't know what you mean by "porting to MSDE". I found out
what MSDE is, but for my purposes I will find a way to validate the
data, whether at the table level or in the front end. You can
strongly discourage using the front end for such purposes, but have
not provided a reason why. [snip]

The basic argument is that if you enforce rules via your front end then the
rules are ONLY enforced in your front end. If someone else created another
front end linked to your tables then the rules would not be enforced. Or if
you yourself end up creating another front end or form to edit the same
tables you have to replicate your "code rules" over and over each time. If
the rules are enforced by the database engine then they are written once and
are automatically enforced in all situations.
 
B

BruceM

I see. Thanks for pointing it out. I am in a situation where I am pretty
much the only one working on database design, and my projects have all been
on a rather modest scale. I had not considered the problems of somebody
else making another front end linked to the existing tables, because it
isn't going to happen in the current scheme of things. I can see where it
would definitely be worth considering in other situations.
I am still not sure what I would do on a table level in a situation where a
field is required only if another field is filled in. I guess that's where
that MSDE business comes in (if one is to avoid front-end validation rules).

Rick Brandt said:
BruceM said:
"I do not rely on front end applications to enforce data integrity
and would strongly discourage such development. If I have a business
rule which Jet cannot enforce via constraints then I would recommend
porting to MSDE."

I really don't know what you mean by "porting to MSDE". I found out
what MSDE is, but for my purposes I will find a way to validate the
data, whether at the table level or in the front end. You can
strongly discourage using the front end for such purposes, but have
not provided a reason why. [snip]

The basic argument is that if you enforce rules via your front end then
the
rules are ONLY enforced in your front end. If someone else created
another
front end linked to your tables then the rules would not be enforced. Or
if
you yourself end up creating another front end or form to edit the same
tables you have to replicate your "code rules" over and over each time.
If
the rules are enforced by the database engine then they are written once
and
are automatically enforced in all situations.
 
A

Amy Blankenship

And what if you absolutely have to tell Miss NE personID = 1 from Miss NE
personID = 2, for exam results and such? For a contact database it may NOT
be important who it is, but for other purposes, like storing course
completion results, etc., it is VERY important!

-Amy
 
B

BruceM

Let's suppose I add an arbitrary number to each record. I still can't tell
them apart without some other sort of information. What that may be depends
on the circumstances. If I am storing course information I will probably
use StudentID. Since StudentID is what distinguishes them from each other,
I will just use that as the PK. If two people with the same name enter the
school the same year, are in the same graduating class, and have the same
middle initial, I will need to come up with something else to assure they
are not confused with each other in the records. I am not going to use a
multi-field PK. Names and addresses change. A name and address combination
is a poor choice for PK. Updating many records if somebody's address
changes makes no sense whatever. It destroys the whole point of using a
relational database.
 
P

peregenem

Roger said:
I don't recall talking about clustered indexes at all.

That's exactly where you are going wrong. For Access/Jet, you need to
start thinking in terms of PK = clustered index.
I also DO
NOT
believe that a clustered index is a requirement for a primary key.

In Access/Jet, you get a clustered index with every PK, whether you
like it or not. That's why you need to choose your PK carefully.
Certainly, EF Codd said nothing about it. This is an implementation issue,
not a design issue.

Your knowledge of database theory seems to be outdated. Some recycling:

Why are we only allowed one PRIMARY KEY per table? Allow me to quote my
old pal Joe Celko:

"In the first papers that Dr. Codd wrote, he talked about
candidate keys -- all the possible keys that exist in a table. Then
you
were to pick one of them to be called the PRIMARY KEY.

"Frankly, this was a hold-over from the days of sequential files --
hey,
Dr. Codd did not come up with the whole RDBMS model all at once. When
we did EDP (Electronic Data Processing) on magnetic tapes in the 1950's

and 1960's, the typical application merged tapes together, so both the
master tape and the transaction tapes had to be sorted on the same key
(account numbers, or whatever). You do not do random seeks on a
magnetic tape.

"Very quickly, The Good Doctor realized that a key is a key, and giving
a
special name to one of them changes nothing about its nature. Primary
keys were then dropped from database theory.

"However, System R and SQL software had been built on top of old file
systems and Dr. Codd's first papers. The PRIMARY KEY was implemented
using the existing keys and indexing methods in the old file systems.
And it has stayed there since.

"Some SQL systems assumed that the PRIMARY KEY would be the preferred
access path and optimized for it."

This last sentence is significant. The question now is: What special
meaning was given to PRIMARY KEY in the Jet implementation of SQL? To
cut a long story short, I'll give you the answer: clustered indexes
i.e. physical ordering on disk. You can only have one physical order
(think paper copy telephone directory: its physical order is fixed)
hence only one PRIMARY KEY. In Jet there is no other way of specifying
the physical ordering for than to use PRIMARY KEY.

If you are using PRIMARY KEY to merely mean NOT NULL UNIQUE then you
are under-utilizing it at best. If you designate a sole autonumber as
PRIMARY KEY you are certainly using it incorrectly because an
incrementing integer (worse, random GUID) makes for a lousy physical
order (think paper copy telephone directory ordered on telephone number
when your primary usage is to retrieve data by last name then first
name). Choosing a bad PRIMARY KEY can result in placing a performance
hit on your database.

If you decide (and I urge you to resist doing so) to use an autonumber
(ID) to force your rows to be unique for uniqueness' sake where you
have no natural key, then put it last in your PRIMARY KEY definition
and put the columns for your clustered index in appropriate order first
e.g.

PRIMARY KEY (last_name, first_name, ID)

This way, the physical order for the table will be rebuilt in last_name
then first_name order with ID merely to satisfy the uniqueness
requirement.

If you are using autonumber as an artificial/surrogate key on
performance grounds e.g. on the basis that a compound natural key is
less efficient for table joins etc, then NOT NULL UNIQUE is sufficient.
But to use an autonumber for efficientcy then take a performance hit by
making it PRIMARY KEY makes no sense!
 
R

Roger Carlson

The physical ordering of the records is NOT the most important database
design issue. I've used clustered indexes profitably in SQL Server and in
fact did not know you could create a clustered index in Access. (I never
claimed to know everything.) But a clustered index is not the
be-all-and-end-all of database design. I doubt very much if a custered
index on 3 text fields like LastName, FirstName, Address will be more
efficient than a non-clustered index on an autonumber field. But even if it
is, if the database is that big and performance is such an issue, you should
move up to SQL server anyway.

I have no more time to waste on arguing with you. If it pleases you to
believe you have won, then do so.

--
--Roger Carlson
Access Database Samples: www.rogersaccesslibrary.com
Want answers to your Access questions in your Email?
Free subscription:
http://peach.ease.lsoft.com/scripts/wa.exe?SUBED1=ACCESS-L
 
P

peregenem

BruceM said:
I really don't know what you mean by "porting to MSDE".

By MSDE I mean SQL Server, the free version. The support for
constraints is IMO slightly worse than Jet 4.0 but what MSDE does have
that Jet doesn't is triggers. If there isn't a suitable constraint,
then a trigger may be employed (a bit like your Before Update, I
assume, but at the database level).
You can strongly discourage using
the front end for such purposes, but have not provided a reason why.

Suggested reading:

Mop the Floor and Fix the Leak, Part 1, by Joe Celko
http://www.dbazine.com/ofinterest/oi-articles/celko25
 
A

Amy Blankenship

Yes, and StudentID is usually an AutoNumber in my applications, because the
way I know which student it is is usually an artificially created but
exposed Login ID that will be unique but makes a lousy Foriegn Key to other
tables. So I associate that with the actual primary key that's an
autonumber. I guess you could do it the other way, associating some
arbitrarily assigned Integer with the exposed LoginID and use the integer to
make your application development easier (as discussed earlier), but it
seems six of one and half dozen of the other...

-Amy
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top