Delete mulitple entries

G

grantman

I have an address list that has a ton of duplicates. How do I, with
essentially one command, get rid of all the duplicates?

Thanks

Grantman
 
H

Helpful Harry

grantman said:
I have an address list that has a ton of duplicates. How do I, with
essentially one command, get rid of all the duplicates?

You can't.

The best you can do is sort the list by whichever column(s) you're
using to determine "duplicates" and then manually double-check each one
and delete the entries that REALLY are duplicates.

Helpful Harry
Hopefully helping harassed humans happily handle handiwork hardships ;o)
 
C

Christopher MacLeod

No way that I know of with a single command, but you can simplify the chore
of identifying the duplicates.

If you can use two or three fields two judge whether or not entries are the
same, then do something like the following:

Sort by the fields -- e.g., last name, first name, address (columns A, B, &
C let's say) so that duplicates will appear in contiguous rows

In a new column, enter the formula

=AND(A2=A1,B2=B1,C2=C1)

Copy/paste the formula down through your entire list.

If the entries in the three fields of a given row do not match the entries
above them, the formula column will read FALSE; if they are duplicates, the
column will read TRUE

Now copy the entire column containing the formulas, then paste special
(values only) into the same location. This is an essential step.

Next, resort your address list by the column containing the TRUE or FALSE
entries.

All the TRUE entries will be clustered. You can delete all the rows labeled
TRUE. They are duplicates.

Finally, delete the column that now contains just FALSE entries.
 
J

JE McGimpsey

grantman said:
I have an address list that has a ton of duplicates. How do I, with
essentially one command, get rid of all the duplicates?

*IF* by duplicates you mean that the duplicates can be identified by a
single field (i.e., you're not trying to differentiate "John Q. Public"
from "John Public" and "John Queue Public"), you can extract the unique
entries to another worksheet using Advanced Filter (Unique entries
only). See the technique at Debra Dalgleish's Contextures site:

http://www.contextures.com/xladvfilter01.html#ExtractWs
 
J

JE McGimpsey

Helpful Harry said:
You can't.

You're making blanket statement based on a large assumption about the
OP's data and intent. You may in fact be right, but if duplicate records
can be identified by a single field, it's quite easy to get rid of
duplicates by performing an advanced filter, extracting unique records
only...

That requires standard forms for that field, at least, but that's hardly
rare.
The best you can do is sort the list by whichever column(s) you're
using to determine "duplicates" and then manually double-check each one
and delete the entries that REALLY are duplicates.

Again, that is a rather blanket statement based on an assumption that
may or may not be valid - the OP didn't give enough info to say.

And there are certainly techniques in addition to sorting and doing a
manual search that would make eliminating duplicates easier.
 
C

CyberTaz

Another tool that often goes unnoticed is the Date Form... Especially useful
if the issue is not as 'cut & dried' as the other techniques require. Still
time consuming, but particularly handy.

With your active cell in the Data Range, go to Data>Form, then click the
Criteria button. Type the criteria into the appropriate fields (note that
wildcards & comparison operators _can_ be used), then click Find Next. Use
the Delete button to delete the found record or click Find Next again to
skip over it. Of course this would have to be done for each set of criteria
& only deletes one record at a time, not the found set.

Two *cautions* - First, the Form will display the next record in the list
(regardless of whether it meets the criteria) after deleting the last record
that _does_ match the criteria, so don't just keep clicking Delete... Check
each record. Secondly, deleting records using the Form is _not_ an undoable
action, so be careful re point #1, otherwise the only way to recover a
deleted record is to close without saving, which will recover all deleted
records.

If you're looking for a Delete Duplicates button, however, there isn't one
in XL or most other 'responsible' software.

HTH |:>)
 
H

Helpful Harry

JE said:
You're making blanket statement based on a large assumption about the
OP's data and intent. You may in fact be right, but if duplicate records
can be identified by a single field, it's quite easy to get rid of
duplicates by performing an advanced filter, extracting unique records
only...

That requires standard forms for that field, at least, but that's hardly
rare.


Again, that is a rather blanket statement based on an assumption that
may or may not be valid - the OP didn't give enough info to say.

And there are certainly techniques in addition to sorting and doing a
manual search that would make eliminating duplicates easier.

There may or may not be other techniques, but the problem is that many
so-called "duplicates" can only be assessed manually to make sure they
are in fact duplicates - this is especially true in an address list
where there can easily be two people with the same name, they may even
live at the same address (father and son, mother and daughter), but
aren't in fact true duplicates. Trying to automate such a process is
asking for some records that are wanted being incorrectly deleted.


Helpful Harry
Hopefully helping harassed humans happily handle handiwork hardships ;o)
 
J

JE McGimpsey

Helpful Harry said:
There may or may not be other techniques, but the problem is that many
so-called "duplicates" can only be assessed manually to make sure they
are in fact duplicates - this is especially true in an address list
where there can easily be two people with the same name, they may even
live at the same address (father and son, mother and daughter), but
aren't in fact true duplicates. Trying to automate such a process is
asking for some records that are wanted being incorrectly deleted.

Again, you're making assumptions and blanket statements that may not be
warranted, and your solution may not be any better than an automated one.

The OP's data may in fact be of the nature you cite, and it's worth
alerting the OP of the potential problem. But the data may *not* have
that problem, too, in which case there's no reason for the OP not to
avail himself of an automated solution.

But your example belies your argument - father/son or mother/daughter
with the same name and address are no more tractable using a manual
assessment than using an automatic technique. Unless the assessor has
personal knowledge of the individual or individuals, it's just a swag.
 
H

Helpful Harry

JE said:
Again, you're making assumptions and blanket statements that may not be
warranted, and your solution may not be any better than an automated one.

The OP's data may in fact be of the nature you cite, and it's worth
alerting the OP of the potential problem. But the data may *not* have
that problem, too, in which case there's no reason for the OP not to
avail himself of an automated solution.

But your example belies your argument - father/son or mother/daughter
with the same name and address are no more tractable using a manual
assessment than using an automatic technique. Unless the assessor has
personal knowledge of the individual or individuals, it's just a swag.

See, now you're making assumptions that there is no other data (eg.
birthdate) that makes it obvious that it is or isn't a duplicate ... in
fact if there is no other data you're best to leave it there in case it
isn't a true duplicate, rather than deleting someone that may be
important data.

We always have to make some assumptions because nobody ever explains
100% of the details. It's best to err on the side of caution than
simply saying "here's the automated process", which they then use and
possibly screw up their hard work of entering the data. The other way
is to ask lots of questions until you finally do know 100% of the
details.

Whether you like my answer or not, it was an appropriate answer.
Obviously any one answer may not be the only answer, but that's what
the newsgroup is for - to get multiple replies from people all over the
planet, some of whom may have better or different ideas. That's what
makes the newsgroups a great source of information ... at least the few
newgroups that aren't full of morons, spam and scams.



Helpful Harry
Hopefully helping harassed humans happily handle handiwork hardships ;o)
 
C

CyberTaz

Hi Harry -

I'm not taking sides as both you & JE make quite valid points. But just for
clarification, as far as the Adv Filter;

1- If _any_ field of data varies (such as birth date) the record will be
considered Unique & copied to the specified location, even if all other
field content is identical, &

2- The word "extract" is misleading as the matching records are _copied_ to
the specified range and the original list is left completely intact.

Regards |:>)
 
J

JE McGimpsey

Helpful Harry said:
See, now you're making assumptions that there is no other data (eg.
birthdate) that makes it obvious that it is or isn't a duplicate ...

Given the context of the discussion, in which I was pointing out that
you were assuming information not given, I did assume that you were
being explicit in your further example.

However, your added info again undermines your argument. If in fact
there's data which differentiates between records, such as a birth date,
then there's no need to manually assess the records to avoid deleting
non-duplicates, since Advanced Filter will consider them unique.

Obviously, that doesn't guard against keeping records that aren't
duplicates but refer to the same address. But since the OP said there
*were* duplicates, those weren't really under discussion.
in fact if there is no other data you're best to leave it there in
case it isn't a true duplicate, rather than deleting someone that may
be important data.

May be true, may not be. There's no way to determine that without a more
explicit problem statement. However, it's hard to see how the duplicate
data could be truly important if there's no distinction between the
duplicate records.
Whether you like my answer or not, it was an appropriate answer.
Obviously any one answer may not be the only answer,

What I initially objected to was your statement:

which, as I read it, makes a blanket statement about the "only answer".
That, as you correctly point out, is not usually the case.

Your points about the difficulty of scrubbing address lists are well
taken. But to revert solely to manually checking large lists with known
duplicates is almost never cost-effective, especially in a commercial
environment.
 
H

Helpful Harry

JE said:
Given the context of the discussion, in which I was pointing out that
you were assuming information not given, I did assume that you were
being explicit in your further example.

However, your added info again undermines your argument. If in fact
there's data which differentiates between records, such as a birth date,
then there's no need to manually assess the records to avoid deleting
non-duplicates, since Advanced Filter will consider them unique.

Obviously, that doesn't guard against keeping records that aren't
duplicates but refer to the same address. But since the OP said there
*were* duplicates, those weren't really under discussion.


May be true, may not be. There's no way to determine that without a more
explicit problem statement. However, it's hard to see how the duplicate
data could be truly important if there's no distinction between the
duplicate records.


What I initially objected to was your statement:


which, as I read it, makes a blanket statement about the "only answer".
That, as you correctly point out, is not usually the case.

Your points about the difficulty of scrubbing address lists are well
taken. But to revert solely to manually checking large lists with known
duplicates is almost never cost-effective, especially in a commercial
environment.

Whatever you say - I give up. I haven't got the time to waste
continuing on with this silliness. :eek:\




Helpful Harry
Hopefully helping harassed humans happily handle handiwork hardships ;o)
 
P

Paul Berkowitz

Whatever you say - I give up. I haven't got the time to waste
continuing on with this silliness. :eek:\

It would have been better to say nothing. This is not helpful - JE was. Not
silliness at all.

--
Paul Berkowitz
MVP MacOffice
Entourage FAQ Page: <http://www.entourage.mvps.org/faq/index.html>
AppleScripts for Entourage: <http://macscripter.net/scriptbuilders/>

Please "Reply To Newsgroup" to reply to this message. Emails will be
ignored.

PLEASE always state which version of Microsoft Office you are using -
**2004**, X or 2001. It's often impossible to answer your questions
otherwise.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top