fuzzy matching help please in an Access 2003 query

V

VMT-London

I have two data files each containing phone numbers but in one file some of
the numbers have text such as "home" as well as the numbers. Also one file
has landline and mobile numbers separated by a /. I have set both of the
fields I want to match as text rather than number. I think there is a way
to match on part of a field [using "Like"] but I can't make it work using the
expression builder- can anyone help with this please?
 
J

John Spencer

Does one file (A) have strictly phone numbers?
AND the other file (B) have numbers that are just numbers, or have "home", or
have two (or more) numbers separated by a slash?

How about formatting? Do some numbers have spaces or dashes separating the
number into its components or are all phone numbers just the numeric characters?

I would probably look at stripping out all non-numeric characters and then
attempting to match on whether the B numbers were like the A numbers.

Also HOW MANY records are involved? The simplest matching would be something
like the following. You can only build this query in SQL design view.

Assumptions:
A.PhoneNumber field consists of ONLY the phone Number
B.PhoneNumber field has the phone number portion in the same format as the
A.Phone number field.

SELECT B.PhoneNumber, A.PhoneNumber
FROM B INNER JOIN A
ON B.PhoneNumber Like "*" & A.PhoneNumber & "*"

You can build a slower query by
== Add both tables to the query
== NO JOIN between the table
== Add B.PhoneNumber to the field list
== Set the criteria under B.PhoneNumber to
Like "*" & [A].[PhoneNumber] & "*"

You could expand the matching by adding or Criteria
== Add the A.PhoneNumber
== On the next criteria line down an OR Line set the criteria to
Like "*" & .[PhoneNumber] & "*"

If the assumptions are incorrect, post back. I have a VBA function that will
strip out all the non number characters from the phone numbers and increase
the likelihood of matching 123 224 1212 or 123-224-1212 in B to 1232241212 or
(123) 224.1212 in A. The function would turn all those values into 1232241212
and therefore the items would match. The problem is that this will be slow
for any significant amount of records.

John Spencer
Access MVP 2002-2005, 2007-2010
The Hilltop Institute
University of Maryland Baltimore County
 
M

Marshall Barton

VMT-London said:
I have two data files each containing phone numbers but in one file some of
the numbers have text such as "home" as well as the numbers. Also one file
has landline and mobile numbers separated by a /. I have set both of the
fields I want to match as text rather than number. I think there is a way
to match on part of a field [using "Like"] but I can't make it work using the
expression builder- can anyone help with this please?


Are the "files" linked as tables? If so you can use a query
Where condition such as:

[messy number field] Like "*" & [nice number field] & "*"
 
J

Jerry Whittle

Phone numbers should be stored as text, so that's a good thing.

Another problem that I see is some people format phone numbers differently.
(555) 555-5555 vs 555.555.5555 vs 555-555-5555.

Then there are phone number with or without an area code.

Let's not even think about phone numbers from other countries!

In one of the files, are the numbers very standardized so that all the phone
numbers look the same? If so there may be hope.

Actually before we go down that route, how many records are we talking about
and is this a one-time thing? If there aren't too many records, and the table
with the problems doesn't get updated from another system frequently, you
might just want to do it manually by adding new Home and Cell fields.

There's another problem if you are joining the two tables based on phone
numbers: More than one person could share a phone number - especially home
phones. That could cause duplicate records to show.

Back to the matther at hand. In the query below, tbl2 is the table with the
problem telephone numbers. tbl1 has standardized telephone number formatting
like (555) 555-5555.

I've extracted the middle 555 and end 5555 and used them in Like statements.
If the phone numbers in tbl1 are different, such as 555-555-5555, you will
need to adjust the Mid statements.

SELECT tbl1.*, tbl2.*
FROM tbl1, tbl2
WHERE tbl2.PhoneNumbers
Like "*" & Mid([tbl1]![PhoneNumbers],5,3) & "*"
AND tbl2.PhoneNumbers
Like "*" & Mid([tbl1]![PhoneNumbers],7,4) & "*" ;

With such a fuzzy match, there are plenty of places for bad data to creep
in. For example more than one person using the same home phone. Or something
unlikely such as two numbers the same except for the area code.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top