Another regular expression question

R

Robert Crandal

My input data strings have roughly the following format:

"Item1 scissors"
"Item2 two notebooks"
"item3"
"itm4 keyboards and scissors"
"item_5 glue,paper,scissors"

My strings begin with an "item number" string followed by a
description of the item(s). It is possible that an item description
could be missing, as seen in "item3" above. (Assume that the
"item number" token will always be present)

What is a good regular expression that I can use to save the
item number in a variable named "$ItemNum" and save the
description in a variable named "$Description"?? (If the
description string is missing I want the "$Description" variable
to be set to the empty string.

Thank you.
 
R

Ron Rosenfeld

My input data strings have roughly the following format:

"Item1 scissors"
"Item2 two notebooks"
"item3"
"itm4 keyboards and scissors"
"item_5 glue,paper,scissors"

My strings begin with an "item number" string followed by a
description of the item(s). It is possible that an item description
could be missing, as seen in "item3" above. (Assume that the
"item number" token will always be present)

What is a good regular expression that I can use to save the
item number in a variable named "$ItemNum" and save the
description in a variable named "$Description"?? (If the
description string is missing I want the "$Description" variable
to be set to the empty string.

Thank you.

Your question requires clarification for me to respond.

Since this is an Excel programming group, I would normally assume you are writing about vbscript (essentially the same as Javascript) flavor of regular expressions. Subgroup naming is not supported. But you could use subgroups to save the relevant portions of your string, and then assign them to variables within your VBA routine. However, VBA does not support names which begin with "$".

If you are in the wrong group, there are flavors of regex that do support variable naming but, at least in the .NET flavor, they do not begin with the "$" as you have shown above. For example, in the .NET flavor, you might have a named capturing group called ItemNum and would use it in a replace string as $[ItemNum].

Also, there is considerable variation in your "item number" tokens, and it is not clear if you want to capture only the number (which presumeably is an integer), or the entire word. Nor is it clear what the extent of variability in your item number tokens might be, or whether we could just identify it more simply as merely (^"\w+\d+)

Item
item
itm
item_
 
R

Robert Crandal

Hi Ron.... I only posted this question on the Excel VBA board
because I'm using regular expressions in Excel. I forgot that
I should NOT use "$" in my variable names. For now, assume that
my variable names will be "sItem" and "sDescription".

And yes, there is variation in my "item number", and I want to capture
the ENTIRE first token string, whether it be "Item4", "item_4", or even
"b6".
Our data format looks roughly like this:

"item number string" [option string description]

So, real examples are:

Item1 scissors
Item2 two notebooks
item3
i4 keyboards and scissors
item_5 glue,paper,scissors
b6 broken keyboard


Sorry about the confusion. I hope I made it more clear now. Please
let me know if you need further clarification. Thank you.

Robert
 
R

Ron Rosenfeld

Hi Ron.... I only posted this question on the Excel VBA board
because I'm using regular expressions in Excel. I forgot that
I should NOT use "$" in my variable names. For now, assume that
my variable names will be "sItem" and "sDescription".

And yes, there is variation in my "item number", and I want to capture
the ENTIRE first token string, whether it be "Item4", "item_4", or even
"b6".
Our data format looks roughly like this:

"item number string" [option string description]

So, real examples are:

Item1 scissors
Item2 two notebooks
item3
i4 keyboards and scissors
item_5 glue,paper,scissors
b6 broken keyboard


Sorry about the confusion. I hope I made it more clear now. Please
let me know if you need further clarification. Thank you.

Robert

Well, the only commonality I can see is the your
Item token is always the first word.

Then there are some itervening spaces, followed by the rest of the string which comprises the description.

That being the case, a regex such as:

^(\S+)(?:\s+(.+))?

should be all that you need.

Code might look like:

============================
Option Explicit
Public sItem As String
Public sDescription As String
Sub ItemDescr(s As String)
Dim re As Object, mc As Object
Set re = CreateObject("vbscript.regexp")
With re
.Pattern = "^(\S+)(?:\s+(.+))?"
.MultiLine = False
.Global = True
If .test(s) = True Then
Set mc = .Execute(s)
sItem = mc(0).submatches(0)
sDescription = mc(0).submatches(1)
End If
End With
End Sub
=======================

You could call it in a variety of ways. Here's one, if your strings are in column A:

================================
Sub test()
Dim c As Range
For Each c In Range("A1", Cells(Rows.Count, "A").End(xlUp))
ItemDescr (c)
Debug.Print sItem, sDescription
Next c
End Sub
==================================
 
R

Robert Crandal

Ron Rosenfeld said:
That being the case, a regex such as:

^(\S+)(?:\s+(.+))?

should be all that you need.

Hi Ron. Just out of curiousity, what is the meaning of the colon
in the above pattern string?
 
R

Ron Rosenfeld

Hi Ron. Just out of curiousity, what is the meaning of the colon
in the above pattern string?

You should have access to a comprehensive definition of the syntax for the relevant flavor if you are going to be using regular expressions.
Here is a link to the syntax used in the VBscript flavor: http://msdn.microsoft.com/en-us/library/1400241x(v=vs.84).aspx

From that link:

==========================
(?:pattern)

Matches pattern but does not save the match, that is, the match is not stored for possible later use. This is useful for combining parts of a pattern with the "or" character (|).

industr(?:y|ies) is equivalent to industry|industries.
===========================

So using this makes the regex more efficient, as it does not have to store the part of the match that includes the spaces between the two elements that we want to match.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top