Upsized primary key fields and "aaaaa"

J

John W. Vinson

For example: ISO country codes. Country Code determines Country Name.
Country Name determines Country Code.

Ummmm... not really.

USA = United States of America, Etats-Unis, Estados Unidos, ...

John W. Vinson [MVP]
 
D

David Portas

Ummmm... not really.

USA = United States of America, Etats-Unis, Estados Unidos, ...

John W. Vinson [MVP]

I stated the business rules and proposed a table design to implement
them. Do you think it helps to invent a new problem? Do you have an
example model to go with it?

--
David Portas, SQL Server MVP

Whenever possible please post enough code to reproduce your problem.
Including CREATE TABLE and INSERT statements usually helps.
State what version of SQL Server you are using and specify the content
of any error messages.

SQL Server Books Online:
http://msdn2.microsoft.com/library/ms130214(en-US,SQL.90).aspx
--
 
S

Sylvain Lafontaine

« In a well designed system, there will only be one unique index on each
table :~). »

Well, if you don't take performance into account for your well designed
system, then that might be true but otherwise, what do you make about N:N
relationships or for external constraints (like not having to player in a
sporting ligue with exactly the same name) ?

The case for external constraints might be more or less easily solved by
having a verification step before making any insert or update but both the
performance and the overall reliability will go down.

In the case of any N:N relationships, it's easy to see the performance
penalty that many requests will suffer if you don't add a second index to
covert the other direction for accessing the two ID fields.
 
G

Guest

Yes, I have a country table like that, where the relationship is defined
by an external standards body. (That is, the relationship between the
2chr and 3chr ISO country codes: we use localised country names)

But in general, my type tables are user defined, and I need to be able
to handle the fact that countries come and go, change name, double
up, split, share names, use multiple names etc.

What do you do when a country changes name? Just show all the
case records using the new name? I may not be able to do that for
4-10years: all the old contracts need to roll out of the system. In the
mean time, all of the new contracts need to use the new country
name.

Then when all the old contracts are gone, they still exist in the history
tables. My reports will need to group by country code, but display
country name.

It's easy to start out with multiple unique columns, but after 10 years,
most of the uniqueness seems to leak out.

(david)
 
G

Guest

In a not-badly-designed real system, there will probably be only a
As I pointed out, if you don't like using "aaaaa", those tables can be
handled by explicitly selecting the key fields.

Regarding your sporting example, what are you going to do the first
time you have two players with identical names? Systems don't usually
get to have that many clients that name collision is a problem if they
are so unfriendly that they refuse a true client name. If a system does
have that problem, it would usually be discarded or fixed. I've never
seen it addressed by telling the client what his/her name will be on any
system large enough so that performance was an issue.

(david)
 
D

David Portas

Yes, I have a country table like that, where the relationship is defined
by an external standards body. (That is, the relationship between the
2chr and 3chr ISO country codes: we use localised country names)

But in general, my type tables are user defined, and I need to be able
to handle the fact that countries come and go, change name, double
up, split, share names, use multiple names etc.

What do you do when a country changes name? Just show all the
case records using the new name? I may not be able to do that for
4-10years: all the old contracts need to roll out of the system. In the
mean time, all of the new contracts need to use the new country
name.

Then when all the old contracts are gone, they still exist in the history
tables. My reports will need to group by country code, but display
country name.

It's easy to start out with multiple unique columns, but after 10 years,
most of the uniqueness seems to leak out.

(david)


So what model do you propose? You still haven't made that clear. For
the purposes of this discussion I'll agree with your stated
requirements. Obviously we'll need a "history-preserving" version of
the table and there are various temporal models commonly used for that
purpose. But the textbook solutions tend to have something in common:
composite keys and/or surrogate keys.

For example:

CREATE TABLE Countries
(CountryID INT NOT NULL PRIMARY KEY,
CountryCode CHAR(3) NOT NULL,
CountryName VARCHAR(50) NOT NULL,
StartDate DATETIME NOT NULL,
EndDate DATETIME NULL,
UNIQUE (CountryCode, StartDate));

CREATE TABLE tbl
(..., CountryID INT NOT NULL
REFERENCES Countries (CountryID));

Do you have a "better" alternative in mind that eliminates some keys?
In what way does it improve upon having multiple keys?

--
David Portas, SQL Server MVP

Whenever possible please post enough code to reproduce your problem.
Including CREATE TABLE and INSERT statements usually helps.
State what version of SQL Server you are using and specify the content
of any error messages.

SQL Server Books Online:
http://msdn2.microsoft.com/library/ms130214(en-US,SQL.90).aspx
--
 
G

Guest

You have created a country table with two unique keys, so that
you can identify which "country" to assign to a "contract" based
on "start date". (This is a general example of a concept).

I don't need to a assign a country to a contract based on start
date: I already have my countries assigned to contracts. All I
need is a table to hold the country information, based on a unique
key. However, I do not have information about the actual name
change date. Storing extra information like that often imposes a
signifigant hidden penalty, as it does in this case, and should be
avoided.

There are situations where it makes sense to have duplicate
unique keys.

One such situation arises from "not fully relational" data storage
mechanisms, which impose restrictions or performance penalties
on updating across multiple tables.

Another such situation is the example we are dealing with here.
I want the 3 character country code for reporting to the tax office,
and the 2 character country code for dealing with the banks. It
is an interpretation issue equivalent to converting between a binary
and an ASCI representation of a number, but my client interface
does not have the capacity to convert between 2 character and
3 character ISO country codes, so I use the database to provide
this capacity.

Generally, you can use a database system to map between
various representations. Often, you do this because the multiple
representations are required (and guaranteed) by external
constraints.

However, I have only a few tables where either reason is true.
I shudder at the thought of maintaining a system with many m:n
relationship tables. If you've got such a system, you're a better
(genus) man than I am. And even if you start out by assuming that
there will always be a 1:1 mapping between people and Social
Security Numbers, after a while you find out that you have to
relax your constraints.

We know that most of the designs that put multiple unique keys
into a table, like unique keys on person name, SSN or country
name, are bad designs, which will have to be modified it the
system stays in use.

We also know that most m:n designs are an incredible amount
of work to use effectively, and that a design with multiple m:n
tables is probably a bad design, which will be more work than
it is worth.

Which leads back to where I started:
In a well designed system, there will only be one unique index
on each table :~).
In a not-badly-designed real system, there will probably be only
a few tables that have multiple unique indexes.

(david)
 
G

Guest

Perhaps your comments primarily relate to Access? I

5th Normal form would be bad design in Access, or
Access would be a bad design for 5th Normal form.
That is a specific limitation.


Also, I've been able to modify my database schemas as my
understanding of the real world has evolved. I know that many
dba's don't have that luxury, and I think it colors their thinking.

(david)
 
D

David Portas

You have created a country table with two unique keys, so that
you can identify which "country" to assign to a "contract" based
on "start date". (This is a general example of a concept).

I don't need to a assign a country to a contract based on start
date: I already have my countries assigned to contracts. All I
need is a table to hold the country information, based on a unique
key.

Very well, but you still haven't told us your view of what the key(s)
should be. You haven't proposed a design that models the given
business rules with only one key per table. You haven't stated the
advantage of such a model over some alternative that uses more than
one key to achieve the same thing. Am I to understand that you don't
ever use surrogate keys?
There are situations where it makes sense to have duplicate
unique keys.

One such situation arises from "not fully relational" data storage
mechanisms, which impose restrictions or performance penalties
on updating across multiple tables.

We are talking about relational keys and their implementation in SQL.
I'm not sure what you mean by "not fully relational" in this context.
We know that most of the designs that put multiple unique keys
into a table, like unique keys on person name, SSN or country
name, are bad designs, which will have to be modified it the
system stays in use.

We also know that most m:n designs are an incredible amount
of work to use effectively, and that a design with multiple m:n
tables is probably a bad design, which will be more work than
it is worth.

You can't justify sweeping statements of that kind with one or two
alleged "bad" examples. I work with, and have designed many systems
where a one-to-one mapping is not just a simple and effective solution
but also a business requirement - possibly a legal or regulatory
requirement in some cases. Financial accounting and other
transactional systems are a source of many examples.
Which leads back to where I started:

Non sequitur.

--
David Portas, SQL Server MVP

Whenever possible please post enough code to reproduce your problem.
Including CREATE TABLE and INSERT statements usually helps.
State what version of SQL Server you are using and specify the content
of any error messages.

SQL Server Books Online:
http://msdn2.microsoft.com/library/ms130214(en-US,SQL.90).aspx
--
 
D

David Portas

5th Normal form would be bad design in Access, or
Access would be a bad design for 5th Normal form.
That is a specific limitation.

I have a very bad impression and a low opinion of what passes for
"good design" in the Access world, but if what you say is true then
that's a pretty damning verdict on that product.

If you intended your original remarks to apply only to Access
databases then I don't have any comment to make about it. Sorry for
wasting your time.

--
David Portas, SQL Server MVP

Whenever possible please post enough code to reproduce your problem.
Including CREATE TABLE and INSERT statements usually helps.
State what version of SQL Server you are using and specify the content
of any error messages.

SQL Server Books Online:
http://msdn2.microsoft.com/library/ms130214(en-US,SQL.90).aspx
--
 
D

David Portas

Probably a misstatement, though. A single-column lookup table where
the sole column is the primary key would be in 5NF and that's good
design even in Access land ;-)

Jamie.

--

Absolutely. In general, ANY schema in at least 2NF could also be in
5NF. So the assertion about 5NF being a "specific limitation" is
almost inconceivable and makes about as much sense as the original
claim. But that's about as far as I want to go with the topic.
Anything that involves MDB tends to leave me feeling depressed and
angry :-<.

--
David Portas, SQL Server MVP

Whenever possible please post enough code to reproduce your problem.
Including CREATE TABLE and INSERT statements usually helps.
State what version of SQL Server you are using and specify the content
of any error messages.

SQL Server Books Online:
http://msdn2.microsoft.com/library/ms130214(en-US,SQL.90).aspx
--
 
J

James A. Fortune

David said:
Absolutely. In general, ANY schema in at least 2NF could also be in
5NF. So the assertion about 5NF being a "specific limitation" is
almost inconceivable and makes about as much sense as the original
claim. But that's about as far as I want to go with the topic.
Anything that involves MDB tends to leave me feeling depressed and
angry :-<.

--
David Portas, SQL Server MVP

Whenever possible please post enough code to reproduce your problem.
Including CREATE TABLE and INSERT statements usually helps.
State what version of SQL Server you are using and specify the content
of any error messages.

SQL Server Books Online:
http://msdn2.microsoft.com/library/ms130214(en-US,SQL.90).aspx
--

On the flip side, there's a reason why you and Jamie can do seriously
complicated SQL. It's because you are forced to create complicated SQL
to deal with 5NF. For a random database designed in Access, I believe
5NF would place a burdensome level of SQL complexity on the database
designer and maintainer. We can all look down our noses at less
normalized databases but most Access database designers have other
things to do besides writing SQL. At one company I worked for they had
a separate position for SQL creation so I could normalize as far as I
wanted. Perhaps Access programmers will gradually be comfortable enough
with SQL to take the extra step in normalization and overcome the
sometimes severe time penalty involved with 5NF.

James A. Fortune
(e-mail address removed)
 
G

Guest

Very well, but you still haven't told us your view of what the key(s)
should be.

How can I suggest what keys(s) should be used? None
of the suggested examples are things which should have
unique constraints.

You claim to have many examples which should have
multiple unique keys. Are these examples secrets?
I'm not sure what you mean by "not fully relational" in this

Some data storage mechanisms impose a distinction between
a "view" and a "table", where a view may not be updateable,
or only some fields in a view may be updateable. This sometimes
forces you to do updates on "tables" that are different to your
"views". In doing so, you may choose to use a different index.
You can't justify sweeping statements of that kind with one o
or two alleged "bad" examples.

You haven't justified your sweeping statements with any
"good" examples. I invite you to do so.

In doing so, keep in mind that the example must be a "good
design", not an imposed constraint:

(david)
 
G

Guest

Absolutely. In general, ANY schema in at least 2NF could also be in
5NF. So the assertion about 5NF being a "specific limitation" is
almost inconceivable and makes about as much sense as the original

5NF refers specifically to situations where a circular relationship
is normalised into 3 (or more) tables.

Yes, it is also true that hierarchical relationships are in "5th normal
form", but that is the degenerate case: it's like saying that you know
part of your 7 times table because you know that 1x7=7.

5th normal form is relevant here because it is one of the reasons
for using multiple unique keys: D:A, A:B, B:C, C:A. There is a
m:n relationship between D and (B or C) through A, and you
might choose to use different indexes m:A, A:n.


(david)

PS: I think most people would appreciate that 5NF is different
from 2NF even if they didn't know the details. Maybe you should
be getting more sleep?
 
D

David Portas

5NF refers specifically to situations where a circular relationship
is normalised into 3 (or more) tables.

That's inaccurate and misleading. Normal forms are not defined by the
process of decomposition. 5NF is a property that a single relvar
either has or has not. I could agree with your point that the
difference between 2NF and 5NF is "obvious" but that only strengthens
my argument against your assertion that "5th Normal form would be a
bad design". As Jamie pointed out, if that were so then even the most
trivially simple designs must be "bad".

--
David Portas, SQL Server MVP

Whenever possible please post enough code to reproduce your problem.
Including CREATE TABLE and INSERT statements usually helps.
State what version of SQL Server you are using and specify the content
of any error messages.

SQL Server Books Online:
http://msdn2.microsoft.com/library/ms130214(en-US,SQL.90).aspx
--
 
D

David Portas

How can I suggest what keys(s) should be used? None
of the suggested examples are things which should have
unique constraints.

You are saying that you wouldn't use any keys at all? Please can you
at least be clear about that. Do you normally recommend designs
without keys because if so I don't think there's any point in further
discussion. If I have misunderstood then please tell us what the keys
should be in the example you talked about.
You claim to have many examples which should have
multiple unique keys. Are these examples secrets?

I was hoping you might respond with a logical model for the example
already given. Some other "every day" examples from my own personal
experience: Surrogate keys used to support peer-to-peer replication
models (see SQL Server Books Online if you want the details). Hashing
in document management systems (required verification that each
document has a unique number and a unique hash). Financial systems
with double-entry ledgers. Distribution in a Nominal Ledger (one to
one mapping between a transaction source and a ledger entry). Foreign
exchange transactions (originating and counterparty BICs and
references). Spatial models that use regions with unique geometry. Any
number of EAI or BI source-to-target mapping patterns (key in the ODS
versus the key in the target/DW).

Also you might lookup some examples that use multiple keys to solve
certain less obvious problems. In particular: Halpin's Information
Modelling book has some examples; Date in his Writings series, and
especially his papers "On the Primacy of Primary Keys" and "Composite
Keys"; Celko, Tropashko and others have hierarchy models that result
in multiple keys; Kimball (the Data Warehouse series) requires
multiple keys for his dimensional model solutions.
Some data storage mechanisms impose a distinction between
a "view" and a "table", where a view may not be updateable,
or only some fields in a view may be updateable. This sometimes
forces you to do updates on "tables" that are different to your
"views". In doing so, you may choose to use a different index.

I guess you are referring to Codd's "view update rule". View update is
a highly complex topic and I don't think we can do it justice here.
However, your reference to views is actually one of the strongest
arguments in support of multiple keys. In a relational system, a view
by definition must have at least one key. It is easy to see that a
view could have more than one key implied by its definition (a "one-to-
one" join that includes keys from both sides of the join). If unique
indexes are your DBMS's method of implementing keys then why would you
want to leave out one of those indexes when both keys are
automatically implied and enforced by the design?

--
David Portas, SQL Server MVP

Whenever possible please post enough code to reproduce your problem.
Including CREATE TABLE and INSERT statements usually helps.
State what version of SQL Server you are using and specify the content
of any error messages.

SQL Server Books Online:
http://msdn2.microsoft.com/library/ms130214(en-US,SQL.90).aspx
--
 
A

Aaron Kempf

DAO hasn't been included with Windows, Office or MDAC for a decade

SQL Server works like a charm; if your upsizer isn't working then bust out
query analyzer and do it by hand
 
N

newscorrespondent

Beauty is in the eye of the beholder. I can get excellent performance and
very good response to business change using normal forms.

You fail to mention what you have replaced them with and where we can find
the rational to abandon them and adopt what ever you are using.

Tom Groszko
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top