Comparing text in columns

G

GS

(e-mail address removed) presented the following explanation :
How can this be modified to deal with blank cells in the 'wordlist'
column?

Thanks!

Since it's not normal to have blanks in a 'list', why are blank cells
there?
 
B

bird_222

(e-mail address removed) presented the following explanation :






Since it's not normal to have blanks in a 'list', why are blank cells
there?

--
Garry

Free usenet access athttp://www.eternal-september.org
ClassicVB Users Regroup! comp.lang.basic.visual.misc

It's not really a 'list' per se, it's just a column of text values and
there are a couple of blank spots in it.
 
B

bird_222

I don't know how to modify the formula to do that.  Perhaps some lurkercan figure it out, if it is possible.

I would either use a VBA User Defined Function, or, if that is not allowed by your company, I would ensure that there are no blank cells in WordList..  

The VBA UDF would be based on what I presented in my second response to you a few days ago.  If that is what you want, you can either modify it yourself or, if you will present a full list of specifications in a single message, (as opposed to one new specification per message, which is what you have been doing), I would be happy to make the appropriate modifications.

Not knowing what I was doing I made this modification based on
searching around the net. =IF(ISNA(MATCH(TRUE,ISNUMBER(SEARCH($C$2:$C
$26,A2)),0)),"",INDEX($E$2:$E$26,MATCH(TRUE,ISNUMBER(SEARCH($C$2:$C
$26,A2)-COUNTIF($C$2:$C$26,0))),0)

This just gives me the last text in the list instead of not counting
the blank cells. I may look into modifying that UDF posted earlier
however, I don't think it was written to return the text that was x
number of columns over from the match so I don't think I'll be
successful at modifying it. :)
 
R

Ron Rosenfeld

This just gives me the last text in the list instead of not counting
the blank cells. I may look into modifying that UDF posted earlier
however, I don't think it was written to return the text that was x
number of columns over from the match so I don't think I'll be
successful at modifying it. :)

If all you need is to return something that is a few columns over, you can use the Index(Match... method within the UDF, just as you would outside the UDF. You need to add a third argument for the MatchList, but that modification is easy:

=========================
Option Explicit
Function MatchWord(Phrase As String, WordList As Range, MatchList As Range) As String
Dim re As Object, mc As Object
Dim sPat As String
Dim c As Range
If WordList.Rows.Count > MatchList.Rows.Count Then
MsgBox ("WordList cannot be longer than Matchlist")
Exit Function
End If
Set re = CreateObject("vbscript.regexp")

sPat = "\b("
For Each c In WordList
If Len(c.Text) > 0 Then sPat = sPat & c.Text & "|"
Next c
sPat = Left(sPat, Len(sPat) - 1) & ")\b"

With re
.Global = True
.Pattern = sPat
.ignorecase = True
End With

If re.test(Phrase) Then
Set mc = re.Execute(Phrase)
With WorksheetFunction
MatchWord = .Index(MatchList, .Match(mc(0), WordList, 0))
End With
End If

End Function
===================================

And then your formula might be:

=MatchWord(A2,$B$1:$B$12,$D$1:$D$12)

Because of using the Index(Match... construct, WordList cannot be longer than MatchList, or else there will be an error, so we check for that and exit the function with a message if that situation should be present.
 
R

Ron Rosenfeld

If all you need is to return something that is a few columns over, you can use the Index(Match... method within the UDF, just as you would outside the UDF. You need to add a third argument for the MatchList, but that modification is easy:

=========================
Option Explicit
Function MatchWord(Phrase As String, WordList As Range, MatchList As Range) As String
Dim re As Object, mc As Object
Dim sPat As String
Dim c As Range
If WordList.Rows.Count > MatchList.Rows.Count Then
MsgBox ("WordList cannot be longer than Matchlist")
Exit Function
End If
Set re = CreateObject("vbscript.regexp")

sPat = "\b("
For Each c In WordList
If Len(c.Text) > 0 Then sPat = sPat & c.Text & "|"
Next c
sPat = Left(sPat, Len(sPat) - 1) & ")\b"

With re
.Global = True
.Pattern = sPat
.ignorecase = True
End With

If re.test(Phrase) Then
Set mc = re.Execute(Phrase)
With WorksheetFunction
MatchWord = .Index(MatchList, .Match(mc(0), WordList, 0))
End With
End If

End Function
===================================

And then your formula might be:

=MatchWord(A2,$B$1:$B$12,$D$1:$D$12)

Because of using the Index(Match... construct, WordList cannot be longer than MatchList, or else there will be an error, so we check for that and exit the function with a message if that situation should be present.

An alternative, if you didn't want to worry about MatchList vs WordList size, would be to use Match to find the location of the word in column B; and then use Offset to return the result a few columns over; but that method is not as flexible.
 
B

bird_222

An alternative, if you didn't want to worry about MatchList vs WordList size, would be to use Match to find the location of the word in column B; and then use Offset to return the result a few columns over; but that method is not as flexible.

Ok I am just now getting back to this and I am having a problem on
actual data.

This is the formula I am using. Column C is the 'WordList' Column E
is the 'MatchList' and cell A2 (actually any part of A2) is what I
want to find in column C.
=IF(ISNA(MATCH(TRUE,ISNUMBER(SEARCH($C$2:$C$6,A2)),0)),"",INDEX($E$2:$E
$6,MATCH(TRUE,ISNUMBER(SEARCH($C$2:$C$6,A2)),0)))

This is what is in 'A2' "700 W FOREST AVE" (no quotes). This is in
Column C "700 West Forest Suite 200". I would think that it would
match either on '700' or 'Forest', but it doesn't and just returns a
blank because it gives an 'NA#' error.

Please advise
 
R

Ron Rosenfeld

Ok I am just now getting back to this and I am having a problem on
actual data.

This is the formula I am using. Column C is the 'WordList' Column E
is the 'MatchList' and cell A2 (actually any part of A2) is what I
want to find in column C.
=IF(ISNA(MATCH(TRUE,ISNUMBER(SEARCH($C$2:$C$6,A2)),0)),"",INDEX($E$2:$E
$6,MATCH(TRUE,ISNUMBER(SEARCH($C$2:$C$6,A2)),0)))

This is what is in 'A2' "700 W FOREST AVE" (no quotes). This is in
Column C "700 West Forest Suite 200". I would think that it would
match either on '700' or 'Forest', but it doesn't and just returns a
blank because it gives an 'NA#' error.

Please advise

Most likely, you did not enter this as an array formula.

Your formula must be **array-entered**:

To **array-enter** a formula, after entering
the formula into the cell or formula bar, hold down
<ctrl><shift> while hitting <enter>. If you did this
correctly, Excel will place braces {...} around the formula.
 
B

bird_222

Most likely, you did not enter this as an array formula.

Your formula must be **array-entered**:

To **array-enter** a formula, after entering
the formula into the cell or formula bar, hold down
<ctrl><shift> while hitting <enter>.  If you did this
correctly, Excel will place braces {...} around the formula.

No it is entered as an array.
 
R

Ron Rosenfeld

No it is entered as an array.

Then there is something that I have misunderstood about your data. If you could post a sample of your workbook on one of the file sharing sites, and post a link, I can download it and look at it to see where the problem lies.
 
R

Ron Rosenfeld

Ok here is the link for the file.
http://uploading.com/files/5274ec51/Formula+test.xls/

See the 'Info' tab for what I am trying to accomplish. Thanks

I've had a chance to look at your spreadsheet.

Your data isn't matching what you think it is, therefore you are not going to return anything.

You want to use data in column I to match items in C2:C26 and return what is in the same row in Col E of the two spreadsheets.

But consider:

I161: 42 AUTUMNWOOD CV

You expect to see, as noted in N161: MCMP

But in C2:C26, there is no Autumnwood.

With regard to

700 W FOREST AVE

There is no way it can match with

700 West Forest Suite 200


One is not included in the other.

Also:

I99: 1700 WOODLAWN AVE
N99: MedSouth

But the only thing similar in C2:C26 is:

17X0 Woodlawn

Again, this will not match with your current rules.

I note that many of your desired MCMP matches seem to be based on Column F data and not Column I data. Obviously, none of them will work. That is also true for others of your desired matches: sometimes referring to COL I but other times to COL F

---------------------------------------------

The formula works for the rule that you stated which had to do with matching text in one column with the text in another column and returning information a few columns over. But your rule does not describe what you want to do.

You are going to need much more specific rules that what you have supplied in order to develop a matching system that will be non-ambiguous.

As of now, sometimes you want to match the DestinationAddress1; sometimes the ShipperAddress1, but how do you decide?

Also, it seems as if you want to use some kind of fuzzy matching, and that is beyond what I can do for you here. You are going to have to "clean up" the data, or come up with some very specific rules for matching fragments.
 
B

bird_222

I've had a chance to look at your spreadsheet.

Your data isn't matching what you think it is, therefore you are not going to return anything.

You want to use data in column I to match items in C2:C26 and return whatis in the same row in Col E of the two spreadsheets.

But consider:

I161:   42 AUTUMNWOOD CV

You expect to see, as noted in N161:    MCMP

But in C2:C26, there is no Autumnwood.

With regard to

700 W FOREST AVE

There is no way it can match with

700 West Forest Suite 200

One is not included in the other.

Also:

I99:    1700 WOODLAWN AVE
N99:    MedSouth

But the only thing similar in C2:C26 is:

17X0 Woodlawn

Again, this will not match with your current rules.

I note that many of your desired MCMP matches seem to be based on Column F data and not Column I data.  Obviously, none of them will work.  Thatis also true for others of your desired matches: sometimes referring to COL I but other times to COL F

---------------------------------------------

The formula works for the rule that you stated which had to do with matching text in one column with the text in another column and returning information a few columns over.  But your rule does not describe what you want to do.

You are going to need much more specific rules that what you have supplied in order to develop a matching system that will be non-ambiguous.

As of now, sometimes you want to match the DestinationAddress1; sometimesthe ShipperAddress1, but how do you decide?

Also, it seems as if you want to use some kind of fuzzy matching, and that is beyond what I can do for you here.  You are going to have to "clean up" the data, or come up with some very specific rules for matching fragments.

Thanks for looking at this. Yes, I will need to look at multiple
columns to match but I was figuring if I could get the basic 'match'
formula I could adapt it to handle the additional columns. Can you
explain for example why the the '700 W Forest Av' doesn't match? I.e.
why doesn't it match on the word 'Forest'?

Thanks!
 
R

Ron Rosenfeld

Thanks for looking at this. Yes, I will need to look at multiple
columns to match but I was figuring if I could get the basic 'match'
formula I could adapt it to handle the additional columns. Can you
explain for example why the the '700 W Forest Av' doesn't match? I.e.
why doesn't it match on the word 'Forest'?


The match and various comparison functions look for one string within another. They don't, for example, look to see that there are matches of the:
First Word: 700
First letter of second word: W
Third word: Forest
no match of the fourth word: AVE

and conclude, "Well, that's good enough; let's call it a match!"

It doesn't match because you have not stated any rules for matching fragments. So looking for "700 W FOREST AVE" within the string "700 West Forest Suite 200" will fail after the "W"

A human can tell that this is likely at the same street address, but there are certainly any number of towns where there might be, for example, a Forest Ave; Forest St.; Forest Blvd; etc.
 
B

bird_222

The match and various comparison functions look for one string within another.  They don't, for example, look to see that there are matches of the:
        First Word:                     700
        First letter of second word:    W
        Third word:                     Forest
        no match of the fourth word:    AVE

and conclude, "Well, that's good enough; let's call it a match!"

It doesn't match because you have not stated any rules for matching fragments.  So looking for "700 W FOREST AVE" within the string "700 West Forest Suite 200" will fail after the "W"

A human can tell that this is likely at the same street address, but there are certainly any number of towns where there might be, for example, a Forest Ave; Forest St.; Forest Blvd; etc.

So I guess there is no easy way of doing this then. Thanks for your
input.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top