Cleaning Duplicate Records

S

silva

I'm trying to clean up a database with duplicates in it. As there are
thousands of records, it would be a horrendous task to sift through them all
to find the duplicates. Considering that among the duplicates, the street
address and/or the e-mail address should be the same (as the last name may be
missing, there could be fifty Jakes or Kristens or what-have-you), what would
be the best tactic to go about finding these duplicates so the associated
records could be redirected to the proper person's record? (I know how to
"adjust" the records, just need a better way to find the duplicates.)
 
A

Arvin Meyer [MVP]

I'd start with the Find Duplicates query wizard to get an idea of the
possibilities. Cleaning out duplicates relies mostly on being able to define
what makes a duplicate.
 
P

Paul Shapiro

If you have access to SQL Server Integration Services there is a fuzzy
matching tool that can do a pretty good job of this by ignoring missing data
and estimating the likelihood of a duplicate. By sorting on that probability
you can identify the most likely dupes. You can use an Access db as the SSIS
data source.

Matching street address usually doesn't work well because a missing comma or
St. vs. Street would cause two almost-identical fields to not match, as well
as the usual spelling/typing mistakes. The fuzzy matching would recognize
that they mostly match and assign an appropriate likelihood.

If you have thousands or records, and not 10's of thousands, you could start
by looking for duplicated email addresses. Then you could look for
duplicated street addresses, accepting that you'll only find exact
duplicates. What kind of data is missing last names? Once you process the
duplicated emails, you could try matching last name and first initial, to
find duplicates among the rows that do have names entered.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top