Duplicate Records in a Query/Report

D

DaniY

I am a fairly new user to Access, and I've had no formal training on it, so
I'm learning as I go. I'll try to describe my problem in an
detailed-but-succinct manner, but bear with me:

I have a database that I created by importing Excel spreadsheets as linked
Access (2003) tables. The problem I'm having is that I have a query of three
linked tables that is showing duplicates of some of the records, which, of
course, then show up on my reports. Table one is a list of physician
numbers, names, specialties and markets along with production data. The
second is a list of physician contact information (number, name, address,
phone number). The third is a list of clinic managers. The report is
grouped by market, then by clinic address, then sorted by specialty. The
physicians who are showing up twice seem to be only those that have any
production data entered in Table 1. The first and second table have some of
the same information (physician number, name, specialty, and market).

I have already tried:
-Setting "Unique Values" to "Yes" in the Query
-Setting "Unique Records" to "Yes" in the Query
-Attempting to designate the physician number as the primary key in a linked
table
-Removing the duplicate name, specialty, and market from one of the tables.
I had to keep the physician number on both because this is how they are
linked in the Query

Ok, not so succinct, but any ideas? Please let me know if I wasn't specific
enough or didn't convey the correct information. Any suggestions you have to
remove these duplicates would be GREATLY appreciated. I might even bake you
cookies :)
 
G

Golfinray

There is an option under new query to build a Find Duplicates query. Once you
find your duplicates, you could use a delete query to delete them from your
tables, or you could (the hard way) just go in and manually delete each.
 
K

Ken Sheridan

I'm afraid my reply is going to be even less succinct than your post!

The problem probably stems from your first table as it sounds like this is a
bit of a dog's dinner. Lets look at mangers and clinics first, however.

A relational database models a part of the real world in terms of 'entity
types' and the 'relationship types' between them. Each entity type is
modelled by a table whose columns model the 'attributes' of the entity type.
So Physicians is an entity type with attributes Firstname, LastName etc. A
very important principle is that the attributes of an entity type must only
be stored in the table modelling that entity type. So you should not have
the physicians' names etc in the first table, only a PhysicianID column as a
foreign key referencing the PhysicianID column of the Physicians table.

Clinics is another attribute type with attributes, and therefore columns,
for the clinic's address etc. How you record the manager depends on a number
of factors. If managers are physicians, and each clinic has one manager,
then you simply need a foreign key column referencing the primary key of
Physicians.

If on the other hand a clinic can have several managers and a physician can
manage more than one clinic then the relationship type is many-to-many. To
model a many-to-many relationship type you'd need a third table,
ClinicManagers say, with foreign key columns ManagerID and ClinicID.

My guess would be that there is also a many-to-many relationship between
Clinics and Physicians in a more general clinical context rather than
managerial. So you'd have a table, ClinicPhysicians say, to model this, with
ClinicID and PhysicianID foreign key columns.

If the managers are not physicians then you can create a separate Managers
table and reference it in the same way as you would the Physicians table if
the managers were physicians. Or, if each clinic has one manager and each
manager is responsible for one clinic only you could simply include columns
for their names etc in the Clinics table.

A more complex scenario would be if managers might or might not be
physicians also. You would not want to repeat the names etc in both the
Physicians and Managers tables; that would be redundancy, which is 'a bad
thing'. The correct model would be to have a Personnel table and Physicians
and Managers tables related one-to-one to it. This model is in fact one of a
Type (Personnel) and two Sub-types (Physicians and Managers). The Personnel
table would contain columns common to all personnel (names, address etc), the
Physicians and Managers tables columns representing attributes specific to
each sub-type.

When it comes to your first table this does not at present seem to be
modelling any specific entity type, but is a mixture of different ones,
Specialities, Markets etc. So this needs breaking down into tables which
each represent one entity type. As the relationship between each of these
and physicians is, I'd imagine, many-to-many you'd need tables to model the
relationship types between them, e.g. a PhysicianSpecialities table with
columns PhysicianID and SpecialityID modelling the relationship type between
Physicians and Specialities. However a relationship type could possibly be a
3-way one (a ternary relationship type in the jargon) between Physicians,
Clinics and Production say (the production per physician per clinic) in which
case the table modelling the relationship type would have three foreign key
columns referencing the primary keys of each. Relationships might even be
4-way or more. On the basis of the information you've given its not possible
to be categorical about what the 'logical model' should be; it would need a
more detailed analysis of the underlying reality behind the database.

While your question related to a report, you'll have noted that I haven't
addressed that issue at all. This is deliberate; its essential that you get
the logical model right in the first place. I can't stress this too much.
Once you have a set of related tables which correctly represents the
underlying reality things like forms and reports fall naturally into place.

One thing I would suggest is that you draw out the model diagrammatically on
paper as an 'entity relationship diagram' with boxes representing each entity
type (table) and lines between them representing the relationships. You'll
find that, like the prospect of execution, this concentrates the mind
wonderfully.

You might think that getting the data imported from Excel into a set of
correctly 'normalized' tables is an unduly onerous task, but it can be
automated to a large degree by a series of 'append' and 'update' queries. I
have as it happens produced a little demo file which shows how his can be
done. I'm pasting all the text from the labels for each stage in the demo
below, but if you'd like a copy of the file mail me at:

kenwsheridan<at>yahoo<dot>co<dot>uk

Here's the bumph from the file:

"The following set of queries and explanatory notes are taken from a demo
file I produced some time ago to show how imported data can be decomposed
into normalized tables. The demo uses a simple international contacts lists
so there are countries, regions and cities involved in the addresses. Also
each contact can have more than one employer:

The query below inserts rows into the Countries table with unique values of
the Country column imported from Excel into the table MasterTable. This is
necessary before rows can be inserted into the Regions table as that table
references the Countries table in a many-to-one relationship.

INSERT INTO Countries (Country)
SELECT DISTINCT Country
FROM MasterTable;

Having inserted rows into the Countries table rows can now be inserted into
the Regions table with the query below. This joins the MasterTable to the
newly filled Countries table on the Country columns and inserts unique values
from the Region column of the MasterTable and the CountryID column of the
Countries table into the Regions table.

INSERT INTO Regions (Region, CountryID)
SELECT DISTINCT Region, CountryID
FROM MasterTable INNER JOIN Countries
ON MasterTable.Country=Countries.Country;

Having inserted rows into the Regions table rows can now be inserted into
the Cities table with the query below. This joins the MasterTable to the
newly filled Regions table on the Region columns. The Countries table is
joined to the MasterTable on the Country columns and to the Regions table on
the CountryID columns, thus taking account of any regions of the same name in
different countries. The query inserts unique values from the City column of
the MasterTable and the RegionID column of the Regions table into the Cities
table.

INSERT INTO Cities (City, RegionID)
SELECT DISTINCT MasterTable.City, Regions.RegionID
FROM Countries INNER JOIN (MasterTable INNER JOIN Regions
ON MasterTable.Region=Regions.Region)
ON (MasterTable.Country=Countries.Country)
AND (Countries.CountryID=Regions.CountryID);

The previous queries inserted rows into the Countries, Regions and Cities
tables. Following the insertion of data into the last of these, Cities, it is
now possible to insert rows into the Contacts table as this only needs to
reference the Cities table, the relevant Region and Country being referenced
via the relationships between these three tables. The query below does this
by joining the MasterTable to both the Cities table, on the City columns, and
to the Regions table, on the Region columns. The Cities table is also joined
to Regions on RegionID and the Countries table is joined to the MasterTable
on Country and the Regions table on Country ID. This is to take account of
the possibility of two cities having the same name, but being in different
regions, which themselves could theoretically have the same name but be in
different countries, so that the correct CityID value is inserted into
Contacts.

For simplicity it is assumed that contacts at the same address have unique
names. This might not always be the case, particularly with commercial
premises (the developer of this demo once worked with two Maggie Taylors in
the same building!). In such cases, however, there is likely to be some
distinguishing value such as Job Title or Department which could be used.

INSERT INTO Contacts ( FirstName, LastName, Address, CityID )
SELECT DISTINCT MasterTable.FirstName, MasterTable.LastName,
MasterTable.Address, Cities.CityID
FROM Countries INNER JOIN ((MasterTable INNER JOIN Cities ON
MasterTable.City = Cities.City)
INNER JOIN Regions ON (Regions.RegionID = Cities.RegionID)
AND (MasterTable.Region = Regions.Region))
ON (Countries.CountryID = Regions.CountryID)
AND (Countries.Country = MasterTable.Country);

The query below inserts rows into the Employers table with unique values of
the Employer column imported from Excel into the table MasterTable. This is
necessary before rows can be inserted into the ContactEmployers table as that
table references the Employers table.

INSERT INTO Employers (Employer)
SELECT DISTINCT Employer
FROM MasterTable;

Having inserted rows into the Contacts and Employers table it is now
possible to insert rows into the ContactEmployers table which models the
many-to-many relationship between Contacts and Employers. The query below
does this by joining the MasterTable to Contacts on the Address, LastName and
Firstname columns and to the Employers table on the Employer columns. The
ContactID values from Contacts and EmployerID values from Employers are
inserted into the two columns of ContactEmployers.

INSERT INTO ContactEmployers (ContactID, EmployerID)
SELECT Contacts.ContactID, Employers.EmployerID
FROM (Contacts INNER JOIN MasterTable
ON (Contacts.Address=MasterTable.Address)
AND (Contacts.LastName=MasterTable.LastName)
AND (Contacts.FirstName=MasterTable.FirstName))
INNER JOIN Employers ON MasterTable.Employer=Employers.Employer;

The above does assume that the imported data is completely consistent, which
might not be the case"

Ken Sheridan
Stafford, England
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top