Impossible with regular expressions?

R

Robert Crandal

Hi everyone. So I think I finally have a better understanding of
how to use regular expressions in VBA, but I have a more
difficult question.

I am working with strings that basically contain random sets
of character strings (or "tokens") that are separated by any
number of whitespace characters. Here are some example
strings that I might encounter:

* "The age of the dog is 100 years!!!"
* "aa bb cc dd ee 01 10 111 ooo"
* " Here is a sting that contains a total of 13 tokens... got it?"

I am trying to develop a regex pattern string that will let me enumerate
or collect each of the tokens in the string. I would like to use the
Submatches() function to retrieve any "token" in the string by index.

So, using the third string above as an example, I would want
the Submatches() function to return the following:

.Submatches(0) = "Here"
.Submatches(1) = "is"
.Submatches(2)= "a"
.Submatches(3)= "string"
.Submatches(4)= "that"
etc... etc....
.Submatches(12) = "it?"

I hope my question makes sense. I just need help with the
pattern string. So far, the only thing I could think of is the
following:

"(\s+(\S+)\s*)+" ?????

Got any ideas?

Robert Crandal
 
R

Ron Rosenfeld

Hi everyone. So I think I finally have a better understanding of
how to use regular expressions in VBA, but I have a more
difficult question.

I am working with strings that basically contain random sets
of character strings (or "tokens") that are separated by any
number of whitespace characters. Here are some example
strings that I might encounter:

* "The age of the dog is 100 years!!!"
* "aa bb cc dd ee 01 10 111 ooo"
* " Here is a sting that contains a total of 13 tokens... got it?"

I am trying to develop a regex pattern string that will let me enumerate
or collect each of the tokens in the string. I would like to use the
Submatches() function to retrieve any "token" in the string by index.

So, using the third string above as an example, I would want
the Submatches() function to return the following:

.Submatches(0) = "Here"
.Submatches(1) = "is"
.Submatches(2)= "a"
.Submatches(3)= "string"
.Submatches(4)= "that"
etc... etc....
.Submatches(12) = "it?"

I hope my question makes sense. I just need help with the
pattern string. So far, the only thing I could think of is the
following:

"(\s+(\S+)\s*)+" ?????

Got any ideas?

Robert Crandal

I don't understand why you want to use submatches.

The simplest is usually the most efficient, and I would think the simplest regex, to collect strings separated by spaces, would be "\S+". Your collection, instead of being submatches, would be matches.

e.g:
===================
Option Explicit
Function foo(s As String)
Dim re As Object, mc As Object
Const sPat As String = "\S+"
Set re = CreateObject("vbscript.regexp")
re.Global = True
re.Pattern = sPat
Set mc = re.Execute(s)

End Function
 
R

Ron Rosenfeld

I don't understand why you want to use submatches.

The simplest is usually the most efficient, and I would think the simplest regex, to collect strings separated by spaces, would be "\S+". Your collection, instead of being submatches, would be matches.

e.g:
===================
Option Explicit
Function foo(s As String)
Dim re As Object, mc As Object
Const sPat As String = "\S+"
Set re = CreateObject("vbscript.regexp")
re.Global = True
re.Pattern = sPat
Set mc = re.Execute(s)

End Function


And to do something with it:

===================
Option Explicit
Function foo(s As String)
Dim re As Object, mc As Object
Dim i As Long
Const sPat As String = "(\S+)(?=\s+|$)"
Set re = CreateObject("vbscript.regexp")
re.Global = True
re.Pattern = sPat
Set mc = re.Execute(s)
For i = 0 To mc.Count - 1
Debug.Print i, mc(i)
Next i

End Function
========================


Or you could do something like:

Function foo(s as string, optional Index as long = 0)
..
..
..
..
..
foo = mc(i)
end function
 
R

Ron Rosenfeld

Hi everyone. So I think I finally have a better understanding of
how to use regular expressions in VBA, but I have a more
difficult question.

I am working with strings that basically contain random sets
of character strings (or "tokens") that are separated by any
number of whitespace characters. Here are some example
strings that I might encounter:

* "The age of the dog is 100 years!!!"
* "aa bb cc dd ee 01 10 111 ooo"
* " Here is a sting that contains a total of 13 tokens... got it?"

I am trying to develop a regex pattern string that will let me enumerate
or collect each of the tokens in the string. I would like to use the
Submatches() function to retrieve any "token" in the string by index.

So, using the third string above as an example, I would want
the Submatches() function to return the following:

.Submatches(0) = "Here"
.Submatches(1) = "is"
.Submatches(2)= "a"
.Submatches(3)= "string"
.Submatches(4)= "that"
etc... etc....
.Submatches(12) = "it?"

I hope my question makes sense. I just need help with the
pattern string. So far, the only thing I could think of is the
following:

"(\s+(\S+)\s*)+" ?????

Got any ideas?

Robert Crandal

One other note: Although I know you are trying to become proficient in regular expressions, this same task can be accomplished fairly simply in VBA. For example:

==================
Function SplitString(s As String) As Variant
SplitString = Split(WorksheetFunction.Trim(s))
End Function
======================

SplitString will be a zero-based array containing your collection.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top