Removing Duplicates

S

Steve

I have to import transactions from a Master Database. It is,
unfortunately, not very "Normal". The Field that can be used for the
unique identifier has multiple duplications across multiple rows. How
can I construct a query to pare this down to unique values?

Steve
reply here, not to email address
 
L

Larry Linson

I have to import transactions from
a Master Database. It is,
unfortunately, not very "Normal".
The Field that can be used for the
unique identifier has multiple
duplications across multiple rows.
How can I construct a query to pare
this down to unique values?

Do you mean that transactions are duplicated, or just that multiple
different transactions may share the "identifying field"? What you seem to
have described is that it clearly is NOT a "unique identifier".

If it is just a matter of returning an arbitrary one of several records with
the same value in the chosen identifier field, you can, in the Query
Builder, on the menu View | Totals, an in the "Total:" line, "Group" By the
identifying field and select "First" or "Last" for each of the other Fields.

But, it doesn't seem to me that an arbitrary selection is likely what you
really mean. Please clarify, and perhaps someone can offer a worthwhile
suggestion.

Larry Linson
Microsoft Access MVP
 
S

Steve

Larry,
Thanks for your reply. There were multiple records sharing the same
"PatientID" field with slightly different data in other fields to the
right. In other words, a very UN-normalized table: 10 records(say)
starting with the same "PatientID", "Last Name", "First Name", etc, and
each record differing in some "Transaction Number" field. I know, very
stupid, but I have to work with what they give me. This is a short term
project as I convert a medical database from the crap they have been
using into something more worthwhile.
I have seemed to work out some of my difficulties by setting up a query
with a "SELECT DISTINCT " as my first words in the SQL statement. This
pares down my list to what I need: a unique "PatientID" for each record.
I'm sorry if my earlier description didn't really give you a good idea
of my problem.

Steve
 
L

Larry Daugherty

Hi Steve,

I'm a different Larry; just so you don't blame Larry Linson if I give you
some bum
scoop :))

It looks like you've been given an Excel spreadsheet or some other
"flat-file" and are trying to create something relational in Access. Based
on those assumptions here are some suggestions ...

If the above assumptions are correct you might try importing the .XLS or
..CSV file using the wizard before trying the below ...

You're on the right path for the first part of the process. I'd cut your
DISTINCT ROW query down to just those fields that apply to this patient and
not to any Transaction. Turn your new query into a Make Table Query and
make tblPatient. I'm going to assume that PatientID is an administratively
assigned ID number and that it has meaning for humans. I'd run the make
table query to create tblPatient with fields PatientID, LastName,
FirstName, and any fields that are logically part of the patient's
information and have nothing to do with Transactions.

Run the query and look at your new table's data either by eyeball or with a
Duplicates query. If you have duplicates you'll have to determine if it's
because of data entry errors in the data given you or you may have included
fields that are part of the Transaction data. You'll have to figure that
one out and resolve the issues such that you have only one patient record
for each patient.

Once you have exactly one record for each patient you need to open
tblPatient in design mode and insert an Autonumber field as the first field
and name it PatID (since they've already used the best name for the primary
key for their PatientID). Click the key to make PatID the primary key.
When you save your new design Autonumber values will be generated in field
PatID.

tblPatient is now ready to serve as a parent table in a relationship.

If one of the fields they gave you was a Transaction Type you'll want to
create a many-to-many relationship between tblPatient and
tblTransactionType. In that case you'd run another DISTINCT make table
query on just the transaction type to create and populate your new table,
tblTransactionType. If they didn't give you a transaction type then you'll
just have a one-to-many relationship from tblPatient to
tblPatientTransaction.

In any case, I'd go with a form/subform design based on tables tblPatient
and tblPatientTransaction. Your clients should be much happier with the new
application you're building.

Any questions, post back.

HTH
 
Top