Normalizing????????

M

Mani

Hi can you help me completely normalize a databes. I can send you the files
and stuff.

I am getting really confused on how to normalize 4 address fields.
Should i put the into another table and does it match 1NF if it is the other
table but still 4 fields. How can i orginize it into 1 field???

I am really interested in emailing the file.

Thanks
 
J

John Vinson

Hi can you help me completely normalize a databes. I can send you the files
and stuff.

That would be OK if we had a consulting arrangement set up, for pay.
You might prefer to do it here for free.
I am getting really confused on how to normalize 4 address fields.

Almost certainly into one Addresses table with four records.
Should i put the into another table and does it match 1NF if it is the other
table but still 4 fields. How can i orginize it into 1 field???

Without knowing what "four fields" you have, or what you're proposing,
it's hard to say. An Address typically will CONSIST of multiple
fields: [Address1], [Address2], [City], [State], [Country],
[Postcode], [AddressType] for example.
I am really interested in emailing the file.

Are you really interested in paying for consulting services? If so
post a spam-trapped email address (don't post your real address,
conceal it such as myname <at> domain <dot> com to keep spammers from
harvesting it).

If not please post the fieldnames and examples of the data that they
contain, and explain what the distinction between these four addresses
might be.


John W. Vinson[MVP]
 
V

Vincent Johns

Usually, the purpose in normalizing a database is to get rid of any
fields whose values can be calculated. But "completely" normalizing it
may not be the best possible practice, as in some cases the time
required to calculate a value may be excessive. You probably will want
to go most or all of the way toward normalizing your database, but it
should be for the purpose of making the database easier to maintain, not
as a Holy Grail that you must pursue for its own sake. (Unless this is
a class project, in which case you will greatly benefit from doing it
for yourself, not from asking someone else to do it for you.)

-- Vincent Johns <[email protected]>
Please feel free to quote anything I say here.

John said:
Hi can you help me completely normalize a databes. I can send you the files
and stuff.

That would be OK if we had a consulting arrangement set up, for pay.
You might prefer to do it here for free.
I am getting really confused on how to normalize 4 address fields.

Almost certainly into one Addresses table with four records.
Should i put the into another table and does it match 1NF if it is the other
table but still 4 fields. How can i orginize it into 1 field???

Without knowing what "four fields" you have, or what you're proposing,
it's hard to say. An Address typically will CONSIST of multiple
fields: [Address1], [Address2], [City], [State], [Country],
[Postcode], [AddressType] for example.
I am really interested in emailing the file.

Are you really interested in paying for consulting services? If so
post a spam-trapped email address (don't post your real address,
conceal it such as myname <at> domain <dot> com to keep spammers from
harvesting it).

If not please post the fieldnames and examples of the data that they
contain, and explain what the distinction between these four addresses
might be.

John W. Vinson[MVP]
 
K

Ken Sheridan

Normalization is about the elimination of redundancy. Loosely speaking this
means each 'fact' should be stated once only in the database; in the jargon
of the relational model this is more formally expressed in terms of
functional dependencies. Lets take a simple example and see if it helps
point you in the right direction. Take the Address:

42 Any Street
San Francisco
California
USA

We are told here that San Francisco is in California, which is in the USA.
If this was all in one Addresses table we could have different rows putting
San Franciso in Arizona and California in the UK (we do actually have several
Californias here in fact!). These would be 'update anomalies' arising form
the redundancy. To normalize this we 'decompose' the table into tables
Addresses, Cities, States and Countries. Addresses keeps the Address column
(42 Any Street) and has a City ID column as a foreign key; Cities has columns
CityID (the primary key) and City, and StateID; States has StateID (the
primary key) State and CountryID; and Countries has columns CountryID (the
primary key) and Country.

So by simply entering the CityID for San Francisco in the Addresses table
for every address in San Francisco we need no further columns in that table
to know its in California in the USA, and there can be no update anomalies as
we have eliminated the redundancy.

Even by decomposing the table in this way it is not completely normalized as
Any Street could be mistakenly given the CityID for New York, or even London
UK, though its in San Francisco. Here in the UK commercial addresses
databases maintained by bodies such as utilities companies usually would have
just the PostCode and HouseNumber in the Addresses table as the PostCode maps
to a very specific area (mine is just one side of my street). A PostCodes
table (commercially available here) would then reference the Street , City
etc. via the relationships between the tables. Most small scale hand-rolled
databases would not go as far as that, however, and would generally accept
the redundancy of having the street, post code and CityID in the addresses
table.
 
M

Mani

Example of Data:


MembID Title Firstname Surname Add1 Add2 Add3
Add4

1 Professor SUZANNE VANDERSTRAETEN 14 Abba St Dyce Aberdeen


HomeTel Deptno Dept Uni
D house

0141604389721 28 Accountancy Dublin, Trinity College 24



DeptAdd1 DeptAdd2 D-pc Work-Tel Work-Fax
Use-Add

Chapman Terrace Dublin D63 47B 0136991278474 0136991278480 Dept



e-mail T-Spec 1 T-Spec 2 T-Spec3 R-Spec1
RSpec2

[email protected] 3 15 0 3
15




That is one record from the first table. There are 2 tables and they have 4
common fields MembID
First name
Lastname
Uni

The second table has less records 100 records.

The first table has 200 records

Duplication summary


Duplicated choices duplicates
Title 6 200
lastname 3 7
Add1 58 146
Add2 20 41
Add3 2 4
Add4 49 71


this summary was just for the home section, the department section has not
been done yet



I have some specific questions as well

What is the best way of normalizing the table?

should i move the highly duplicated field TITLE into a table on its own?

Should i move the address fields in to seperate tables for each field?
What about the fields about specs?

The name field is all in capital letters, how can i make this more pleasing
to the eye by making it capital for the first letter and small letters for
the rest?

Note some of the names in the first name field are actual 2 names like JEAN
- PAUL and like JONATHAN MICHAEL?

How can i keep the format of small letters when editing the data? for phone
numbers, names and email address

Maybe you want to look at the database, let me know and we can arrange file
transfer.
 
K

Ken Sheridan

Let's start with a few basic principles. I said in my last post that
normalization is defined by functional dependencies. A column (I'll use
this more correct term rather than field from now on) is functionally
dependent on another column if the second column's value is determined by the
first. This means that for every time that a particular value of column 1
appears, which might be in the result set of a query, on a form etc., the
value of column 2 is the same. Non-key columns must be functionally
dependent on the key, e.g. in your example Title, FirstName, Surname Add1,
Add2 etc are all functionally dependent on the key MembID.

If we look at the other columns in this table then, DeptNo for example, is
also functionally dependent on MembID, assuming that each Member is in only
one Department. The Dept column, however, is functionally dependent on
DeptNo, and as this is functionally dependent on the key MembID it is said to
be Transitively Functionally Dependent. A transitive functional dependency
means that the table is not properly normalized, and there is the possibility
of update anomalies; it would be perfectly feasible for another row
incorrectly to have a different Department value for the same DeptNo.
Consequently the table should contain only the DeptNo column as a foreign
key, with all the other columns representing attributes of the Department
moved to another table; this is called decomposing the table.

Once you have moved the relevant columns to a Departments table that itself
will need decomposing, so this process should be undertaken on paper first
before actually splitting up the tables in the database. There will be many
Universities with Accountancy departments no doubt, so the Departments table
should include a Uni column as a foreign key (or more likely a numeric UniID
column) referencing the primary key of a Universities table which will in
turn have columns representing the attributes of the universities. Other
columns in the Departments table would represent attributes of the
departments, which as well as DeptNo (the key) and Department will include
address columns as university department in my experience can have far flung
locations rather than using a single address of the university per se.

I hope by now you can see how this process of decomposition to achieve
normalization works; its essentially one of applied common sense, and boils
down to identifying the non-transitive functional dependencies, i.e. what
columns are solely determined by the table's key. A useful way of
remembering the rules of normalization to Third Normal Form was once given by
John Vinson; 'the key, the whole key and nothing but the key, so help me
Codd'. Edward Codd was the de facto inventor of the database relational
model in his papers of around 1970 while at IBM. What this means is that
every non key column should be determined by the primary key alone and the
whole of the primary key where this comprises more than one column, so that
for any value of the primary key the values of the non-key columns will
always be known.

I think you might be getting confused as regards duplicated values.
Duplicated values in a column do not necessarily constitute redundancy.
Duplication would only imply redundancy where the values are not functionally
dependent on the key. So having duplicate values of DeptNo in the Members
table does not imply redundancy, it merely means that more than one member
comes from the same department. Duplication of values in the Uni column in
that table would, however, as that column is not solely determined by the key
MembNo, but also by the non-key column DeptNo, which is why all the
University data needs to be moved out to a different table and referenced by
a UniID foreign key in the Departments table.

Also don't assume that the duplication of a value means that value refers to
the same thing. The Accountancy department at Trinity College, Dublin is not
the same thing as the Accountancy department at University College, Dublin
(if it has one), so the Departments table would have more than one row with
'Accountancy' as the Department value, but with different DeptNo key values
and different UniID foreign key values. Whichever Accountancy department the
member was in would be represented by its DeptNo in the relevant row of the
Members table.

Now lets deal with your specific questions which are not covered by the
general principles outlined above:

You can do this with the StrConv function in an UPDATE query, e.g.

UPDATE MyTable
SET MyField = StrConv(MyField,3);

The value of 3 for the second argument is the value of the vbProperCase
constant, but you can't use constants in a query, only in VBA, so you must
use its literal numeric value here.

My view would be that hyphenated names should be in a single column, but
separate first and second names are better stored in separate columns in the
table. Even with names in one column the StrConv would work as it operates
on each name separately, but with some names you would have to do some manual
editing, e.g. Victoria de Los Angeles and names such as MacDonald or O'Reilly.

Firstly data should only be entered/edited via forms, not in raw datasheet
view. In a form you can use the KeyPress event procedure of a control to
convert the characters to lower case as typed in:

KeyAscii = Asc(LCase(Chr(KeyAscii)))

I'm afraid, like John, I could only look at your file on a commercial
consulting basis.
 
Top