Use 'like' or regex to match number-letter pattern substrings

K

ker_01

There has to be an easier way that the one I'm currently working on...

I'm writing a UDF to help a colleague parse some key data out of some
inconsistently formatted raw data strings. One seach will be for machine part
numbers in the raw data string. All part numbers are in the format
Alpha-Alpha-Number-Number-Number, as shown in the following string:
"I pulled the engine apart and found that part MX452 was ok, but MV119 was
worn and needed to be replaced"

I've made the UDF with an input string parameter that must only contain the
characters "#" and "A", for example, "##AAA" for the above part numbers.
After performing it's VBA magic, the UDF should then return "MX452, MV119"

The actual pattern may differ for different searches; for example, searches
might be for an employee ID (#AAA), a tool ID (#######), a test number
(AAA#), or other (sub)strings. There may be a few where there will be a
non-alphanumeric character in a specific position (###-#A#).

One option is Regex, which I've used for fixed search strings, but my brain
hurts thinking about how to code Regex to dynamically deal with a user-input
string.

The other option (and what I've started) is to look at every substring in
the source string, and then compare each substring to the pattern to see if
it matches. I don't think this is very efficient, and given that there will
be thousands of records, it will also probably be painfully slow.

Example, based on the above string: Comparison string is 5 characters, so
check each 5-character substring ("I pul", " pull", "pulle", "ulled", lled ",
etc...)

I've looked at the "like" statement, and that seems to be a good lead. I can
use the user-input string to create something like:
[!0-9][!0-9][!A-Z][!A-Z][!A-Z]

And maybe use that to match each substring ("I pul", " pull", "pulle",
"ulled", lled ", etc...) but that still seems inefficient. However, that
seems to be the only way to know where the match is (to actually return it),
and to see if there is more than one match in the raw data string.

Are there any easier approaches to finding (and pulling) every substring
that matches a user-designated pattern?

Thanks,
Keith
 
R

Ron Rosenfeld

There has to be an easier way that the one I'm currently working on...

I'm writing a UDF to help a colleague parse some key data out of some
inconsistently formatted raw data strings. One seach will be for machine part
numbers in the raw data string. All part numbers are in the format
Alpha-Alpha-Number-Number-Number, as shown in the following string:
"I pulled the engine apart and found that part MX452 was ok, but MV119 was
worn and needed to be replaced"

I've made the UDF with an input string parameter that must only contain the
characters "#" and "A", for example, "##AAA" for the above part numbers.
After performing it's VBA magic, the UDF should then return "MX452, MV119"

The actual pattern may differ for different searches; for example, searches
might be for an employee ID (#AAA), a tool ID (#######), a test number
(AAA#), or other (sub)strings. There may be a few where there will be a
non-alphanumeric character in a specific position (###-#A#).

One option is Regex, which I've used for fixed search strings, but my brain
hurts thinking about how to code Regex to dynamically deal with a user-input
string.

The other option (and what I've started) is to look at every substring in
the source string, and then compare each substring to the pattern to see if
it matches. I don't think this is very efficient, and given that there will
be thousands of records, it will also probably be painfully slow.

Example, based on the above string: Comparison string is 5 characters, so
check each 5-character substring ("I pul", " pull", "pulle", "ulled", lled ",
etc...)

I've looked at the "like" statement, and that seems to be a good lead. I can
use the user-input string to create something like:
[!0-9][!0-9][!A-Z][!A-Z][!A-Z]

And maybe use that to match each substring ("I pul", " pull", "pulle",
"ulled", lled ", etc...) but that still seems inefficient. However, that
seems to be the only way to know where the match is (to actually return it),
and to see if there is more than one match in the raw data string.

Are there any easier approaches to finding (and pulling) every substring
that matches a user-designated pattern?

Thanks,
Keith

Just use a regex to match your pattern. Use an argument in your UDF
to determine which regex to use.

For example:

=================================
Option Explicit
Function ExtrStr(s As String, ssType As Long) As Variant
Dim re As Object, mc As Object, m As Object
Dim sPat As String
Dim sTemp() As String
Dim i As Long

Set re = CreateObject("vbscript.regexp")
Select Case ssType
Case Is = 1
sPat = "\b[A-Z]{2}\d{3}\b"
Case Is = 2
sPat = "\b[A-Z]\d{3}]\bb"
Case Is = 3
sPat = "\b\d{3}-[A-Z]\d[A-Z]\b"
Case Else
ExtrStr = CVErr(xlErrNum)
Exit Function
End Select

re.Pattern = sPat
re.Global = True
re.ignorecase = True

If re.test(s) = False Then
ExtrStr = s
Exit Function
End If

Set mc = re.Execute(s)
ReDim sTemp(0 To mc.Count - 1)
For i = 0 To mc.Count - 1
sTemp(i) = mc(i)
Next i

ExtrStr = Join(sTemp, ", ")

End Function
==========================================
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top