extract data from middle / right of string

E

eddyrapps

I have a column in a spreadsheet containing the following type of
string:

330189149967 Ideal - Series 2 (DVD) 3 12.99 GBP 38.97 GBP

I would like to extract the number after the (DVD) in this string.
However, the set of characters (DVD) does not always appear in the
string. The only thing that is always constant is the number of spaces
before the 3 on the right of the string. So I think the best way to
extract the number is count 4 spaces from the right and return the
number after the 4th space. This number could also be more than 1
character which might make things difficult.
Any ideas?

Thanks
R
 
R

Ron Rosenfeld

I have a column in a spreadsheet containing the following type of
string:

330189149967 Ideal - Series 2 (DVD) 3 12.99 GBP 38.97 GBP

I would like to extract the number after the (DVD) in this string.
However, the set of characters (DVD) does not always appear in the
string. The only thing that is always constant is the number of spaces
before the 3 on the right of the string. So I think the best way to
extract the number is count 4 spaces from the right and return the
number after the 4th space. This number could also be more than 1
character which might make things difficult.
Any ideas?

Thanks
R

=MID(TRIM(A1),FIND(CHAR(1),SUBSTITUTE(
TRIM(A1)," ",CHAR(1),6))+1,FIND(CHAR(1),
SUBSTITUTE(TRIM(A1)," ",CHAR(1),7))-FIND(
CHAR(1),SUBSTITUTE(TRIM(A1)," ",CHAR(1),6))-1)

will return the 5th word from the right.

If you would prefer a UDF, then:

==================================
Option Explicit
Function GetNum(str As String) As Double
Dim re As Object, mc As Object
Set re = CreateObject("vbscript.regexp")
re.Pattern = "\s(\S+)(\s+\S+){4}$"
Set mc = re.Execute(str)
GetNum = mc(0).submatches(0)
End Function
====================================

--ron
 
R

Rick Rothstein \(MVP - VB\)

If you would prefer a UDF, then:
==================================
Option Explicit
Function GetNum(str As String) As Double
Dim re As Object, mc As Object
Set re = CreateObject("vbscript.regexp")
re.Pattern = "\s(\S+)(\s+\S+){4}$"
Set mc = re.Execute(str)
GetNum = mc(0).submatches(0)
End Function
====================================

Why so "wordy"?

Function GetNum(str As String) As Double
GetNum = Split(str)(UBound(Split(str)) - 4)
End Function

Sorry... couldn't resist. <g>

Rick
 
R

Ron Rosenfeld

Why so "wordy"?

Function GetNum(str As String) As Double
GetNum = Split(str)(UBound(Split(str)) - 4)
End Function

Sorry... couldn't resist. <g>

Rick

Well, when you have a hammer <g>.
--ron
 
R

Ron Rosenfeld

Why so "wordy"?

Function GetNum(str As String) As Double
GetNum = Split(str)(UBound(Split(str)) - 4)
End Function

Sorry... couldn't resist. <g>

Rick

You do have to lengthen yours a bit in order to make it equivalent, though.

Function GetNum(str As String) As Double
GetNum = Split(Application.WorksheetFunction.Trim(str)) _
(UBound(Split(Application.WorksheetFunction.Trim(str))) - 4)
End Function


--ron
 
R

Rick Rothstein \(MVP - VB\)

If you would prefer a UDF, then:
You do have to lengthen yours a bit in order to make it equivalent,
though.

Function GetNum(str As String) As Double
GetNum = Split(Application.WorksheetFunction.Trim(str)) _
(UBound(Split(Application.WorksheetFunction.Trim(str))) - 4)
End Function

So I am guessing then that the regular expression you used automatically
removes all duplicated blank spaces... that is what \s+ is doing, correct?
My assumption, of course, was that the OP's text values would be as shown
(without multiple, leading or trailing spaces). However, if we assume those
are possibilities then, while I would lose the one-liner, I would probably
do the following instead of performing the call back to the spreadsheet's
functions...

Function GetNum(ByVal str As String) As Double
Do While InStr(str, " ")
str = Replace(str, " ", " ")
Loop
GetNum = Split(Trim$(str))(UBound(Split(Trim$(str))) - 4)
End Function

By the way, if we are going to assume spacing situations where multiple,
leading or trailing spaces are a possibility, then try your function on this
text value (without the quote marks)...

" 1 A - B 2 (C) 3 12.99 D 38.97
E "

Rick
 
R

Ron Rosenfeld

By the way, if we are going to assume spacing situations where multiple,
leading or trailing spaces are a possibility, then try your function on this
text value (without the quote marks)...

" 1 A - B 2 (C) 3 12.99 D 38.97
E "

Well, if you are including leading and trailing spaces (which I did not in my
original), it only requires a slight change in the regex to accomplish that:

Function GetNum(str As String)
Dim re As Object, mc As Object
Set re = CreateObject("vbscript.regexp")
re.Pattern = "\s(\S+)\s+(\S+\s*){4}$"
Set mc = re.Execute(str)
GetNum = mc(0).submatches(0)
End Function

And yes the "+" is a quantifier. It means match the previous expression
between 1 and an unlimited number of times.

So

\s matches a single whitespace character
\s+ matches at least one (with no upper limit)
and, from the regex above, \s* matches a whitespace character between zero and
an unlimited number of times.

The "$" is a token for the end of the string.
--ron
 
R

Rick Rothstein \(MVP - VB\)

Well, if you are including leading and trailing spaces (which I did not
in my original),

I figured if you could include them in the middle, then the outer areas were
it only requires a slight change in the regex to accomplish that:

I was pretty sure that would be the case.

And yes the "+" is a quantifier. It means match the previous expression
between 1 and an unlimited number of times.

So

\s matches a single whitespace character
\s+ matches at least one (with no upper limit)
and, from the regex above, \s* matches a whitespace character between zero
and
an unlimited number of times.

The "$" is a token for the end of the string.

Yeah, I remembered the $ sign token and was pretty sure of the + sign... you
know, your postings of these regex solutions are going to make me pull out
my old notes and one book I have on them in order to relearn what I once
knew about regular expressions... one day, that is (but maybe sooner than
later).<g>


Rick
 
H

Harlan Grove

Ron Rosenfeld said:
If you would prefer a UDF, then:

==================================
Option Explicit
Function GetNum(str As String) As Double
Dim re As Object, mc As Object
Set re = CreateObject("vbscript.regexp")
re.Pattern = "\s(\S+)(\s+\S+){4}$"
Set mc = re.Execute(str)
GetNum = mc(0).submatches(0)
End Function
====================================

If you're going to go the regex route, simplify.

re.Pattern = "\S+"
re.Global = True
Set mc = re.Execute(str)
GetNum = mc(mc.Count - 5)

But it's probably more efficient to use worksheet formulas only. The
formula you gave is specific to records with 11 space-separated
tokens. That is, your formula actually returns the 7th token from the
left of the string. Remove the '-' from the OP's sample record and see
what your formula returns.

Parsing from the right requires something like

=MID(TRIM(x),FIND(CHAR(y),SUBSTITUTE(TRIM(x)," ",CHAR(y),
LEN(TRIM(x))-LEN(SUBSTITUTE(x," ",""))-4))+1,
FIND(CHAR(y),SUBSTITUTE(TRIM(x)," ",CHAR(y),
LEN(TRIM(x))-LEN(SUBSTITUTE(x," ",""))-3))
-FIND(CHAR(y),SUBSTITUTE(TRIM(x)," ",CHAR(y),
LEN(TRIM(x))-LEN(SUBSTITUTE(x," ",""))-4)))

Not pretty. More compact (and possibly more efficient) to define a
name like seq referring to a formula like

=ROW(INDEX(Sheet1!$1:$65536,1,1):INDEX(Sheet1!$1:$65536,255,1))

and use an array formula like

=MID(A1,LARGE((MID(TRIM(A1),seq,1)=" ")*seq,5)+1,
SUM(LARGE((MID(TRIM(A1),seq,1)=" ")*seq,5-{1,0})*{1,-1})-1)
 
R

Ron Rosenfeld

If you're going to go the regex route, simplify.

re.Pattern = "\S+"
re.Global = True
Set mc = re.Execute(str)
GetNum = mc(mc.Count - 5)

Nice. I like that. A much simpler way to think about the expression being
parsed.

With regard to the worksheet formula, you are absolutely correct about what I
posted being wrong, and ugly <g>. But I forgot to merge back into the version
I posted, the version that would compute the proper places for the
SUBstitutions when parsing from the right.

As you point out, it should have been:

=MID(TRIM(A1),FIND(CHAR(1),SUBSTITUTE(TRIM(
A1)," ",CHAR(1),LEN(TRIM(A1))-LEN(SUBSTITUTE(
TRIM(A1)," ",""))-4))+1,FIND(CHAR(1),SUBSTITUTE(
TRIM(A1)," ",CHAR(1),LEN(TRIM(A1))-LEN(SUBSTITUTE(
TRIM(A1)," ",""))-3))-FIND(CHAR(1),SUBSTITUTE(TRIM(
A1)," ",CHAR(1),LEN(TRIM(A1))-LEN(SUBSTITUTE(TRIM(A1)," ",""))-4))-1)


Your array formula is much less ugly.

Thanks
--ron
 
R

Rick Rothstein \(MVP - VB\)

Parsing from the right requires something like
=MID(TRIM(x),FIND(CHAR(y),SUBSTITUTE(TRIM(x)," ",CHAR(y),
LEN(TRIM(x))-LEN(SUBSTITUTE(x," ",""))-4))+1,
FIND(CHAR(y),SUBSTITUTE(TRIM(x)," ",CHAR(y),
LEN(TRIM(x))-LEN(SUBSTITUTE(x," ",""))-3))
-FIND(CHAR(y),SUBSTITUTE(TRIM(x)," ",CHAR(y),
LEN(TRIM(x))-LEN(SUBSTITUTE(x," ",""))-4)))

A little shorter this way...

=LEFT(MID(TRIM(A1),FIND(CHAR(1),SUBSTITUTE(TRIM(A1),"
",CHAR(1),LEN(TRIM(A1))-LEN(SUBSTITUTE(A1," ",""))-4))+1,999),FIND("
",MID(TRIM(A1),FIND(CHAR(1),SUBSTITUTE(TRIM(A1),"
",CHAR(1),LEN(TRIM(A1))-LEN(SUBSTITUTE(A1," ",""))-4))+1,999))-1)

Rick
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top