Extract text plus "n" characters or date on partial match

F

FabZ

Hi everybody,

Here there are two string samples of my cells in columnA:
der. husky b/n f 4 aa. rmd 2161 huma - p.p. adopted 10.02.04
dobermann n/f m. steril+rmd24378 faruk - p.p. trasf. palombara 13/1/05

I need to extract:

rmd 2161
rmd24378

and I started from function extracteMailAddress() found on this
newsgroup

Sub extractTattoo()

sstr = Range("A4").Text

sstr = ActiveCell.Text

p = InStr(1, sstr, "rm") - 1
Do While char <> " " And p > 0
char = Mid(sstr, p, 1)
Debug.Print "'" & char & "'"
p = p - 1
Loop

'Get tattoo address
If p > 0 Then
p = p + 1
tattoo = Mid(sstr, p, 9)
ActiveCell.Offset(0, 9).Value = tattoo

Debug.Print tatuaggio
End If
End Sub

On some cell it works, but, really, I can't say this code works and
anyway I tried also to make it work on all the column range with no
results ...

Maybe starting from the same code I need to extract dates, first,
looking for partial text match(for ex. looking for "adopted" or "-
adop." or "ado." with "*ado*") and then fill one of two cells in
different columns on the same row, always, formatting date with "dd/mm/
yyyy".

I think I need help, I'm a newbie and these codes are just a little
bit hard for my actual knowledges...
Any help would be really appreciated!
Thanks
FabZ
 
R

Ron Rosenfeld

Hi everybody,

Here there are two string samples of my cells in columnA:


I need to extract:

rmd 2161
rmd24378

and I started from function extracteMailAddress() found on this
newsgroup

Sub extractTattoo()

sstr = Range("A4").Text

sstr = ActiveCell.Text

p = InStr(1, sstr, "rm") - 1
Do While char <> " " And p > 0
char = Mid(sstr, p, 1)
Debug.Print "'" & char & "'"
p = p - 1
Loop

'Get tattoo address
If p > 0 Then
p = p + 1
tattoo = Mid(sstr, p, 9)
ActiveCell.Offset(0, 9).Value = tattoo

Debug.Print tatuaggio
End If
End Sub

On some cell it works, but, really, I can't say this code works and
anyway I tried also to make it work on all the column range with no
results ...

Maybe starting from the same code I need to extract dates, first,
looking for partial text match(for ex. looking for "adopted" or "-
adop." or "ado." with "*ado*") and then fill one of two cells in
different columns on the same row, always, formatting date with "dd/mm/
yyyy".

I think I need help, I'm a newbie and these codes are just a little
bit hard for my actual knowledges...
Any help would be really appreciated!
Thanks
FabZ

I don't understand your references to dates when you write that you want to
extract the "rmd" strings.

To extract the rmd strings as you write above, you could use this "regular
expression" routine:

=======================================
Option Explicit

Sub ExtractTattoo()
Dim sStr
Dim i As Long
sStr = Array("der. husky b/n f 4 aa. rmd 2161 huma - p.p. adopted 10.02.04", _
"dobermann n/f m. steril+rmd24378 faruk - p.p. trasf. palombara
13/1/05")

Dim oRegExp As Object
Dim colMatches As Object
Const sPattern As String = "rmd\s?\d+"

Set oRegExp = CreateObject("VBScript.RegExp")

With oRegExp
.IgnoreCase = True
.Global = True
.Pattern = sPattern

For i = 0 To UBound(sStr)
If oRegExp.Test(sStr(i)) = True Then
Set colMatches = oRegExp.Execute(sStr(i))
Debug.Print i, colMatches(0)
End If
Next i
End With
End Sub
=====================================
0 rmd 2161
1 rmd24378
=====================================

If you want to extract the rmd strings and the dates, you could try this
similar routine, which assumes the dates are always at the end:

=======================================

Sub ExtractTattoo()
Dim sStr
Dim i As Long
sStr = Array("der. husky b/n f 4 aa. rmd 2161 huma - p.p. adopted 10.02.04", _
"dobermann n/f m. steril+rmd24378 faruk - p.p. trasf. palombara
13/1/05")

Dim oRegExp As Object
Dim colMatches As Object
Const sPattern As String = "(rmd\s?\d+)[\s\S]+(\s\S+$)"

Set oRegExp = CreateObject("VBScript.RegExp")

With oRegExp
.IgnoreCase = True
.Global = True
.Pattern = sPattern

For i = 0 To UBound(sStr)
If oRegExp.Test(sStr(i)) = True Then
Set colMatches = oRegExp.Execute(sStr(i))
Debug.Print i, colMatches(0).submatches(0),
colMatches(0).submatches(1)
End If
Next i
End With
End Sub
============================================
0 rmd 2161 10.02.04
1 rmd24378 13/1/05
=============================================

The "work" is done by the Pattern (sPattern).

In the first case

"rmd\s?\d+"

says look for a pattern starting with
"rmd" then
an optional <space> then
all of the following digits.

In the second

"(rmd\s?\d+)[\s\S]+(\s\S+$)"

The parentheses enclose "submatches", so the first submatch will be the same
"rmd" string as before.

The code then accepts all characters and newlines (spaces and non-spaces) until
it gets to the second set of parentheses which is looking for a substring that

starts with a <space>
is followed by consecutive <non-space>'s and then by
the End of the string.
--ron
 
R

Ron Rosenfeld

Hi everybody,

Here there are two string samples of my cells in columnA:


I need to extract:

rmd 2161
rmd24378

and I started from function extracteMailAddress() found on this
newsgroup

Sub extractTattoo()

sstr = Range("A4").Text

sstr = ActiveCell.Text

p = InStr(1, sstr, "rm") - 1
Do While char <> " " And p > 0
char = Mid(sstr, p, 1)
Debug.Print "'" & char & "'"
p = p - 1
Loop

'Get tattoo address
If p > 0 Then
p = p + 1
tattoo = Mid(sstr, p, 9)
ActiveCell.Offset(0, 9).Value = tattoo

Debug.Print tatuaggio
End If
End Sub

On some cell it works, but, really, I can't say this code works and
anyway I tried also to make it work on all the column range with no
results ...

Maybe starting from the same code I need to extract dates, first,
looking for partial text match(for ex. looking for "adopted" or "-
adop." or "ado." with "*ado*") and then fill one of two cells in
different columns on the same row, always, formatting date with "dd/mm/
yyyy".

I think I need help, I'm a newbie and these codes are just a little
bit hard for my actual knowledges...
Any help would be really appreciated!
Thanks
FabZ

I don't understand your references to dates when you write that you want to
extract the "rmd" strings.

To extract the rmd strings as you write above, you could use this "regular
expression" routine:

=======================================
Option Explicit

Sub ExtractTattoo()
Dim sStr
Dim i As Long
sStr = Array("der. husky b/n f 4 aa. rmd 2161 huma - p.p. adopted 10.02.04", _
"dobermann n/f m. steril+rmd24378 faruk - p.p. trasf. palombara
13/1/05")

Dim oRegExp As Object
Dim colMatches As Object
Const sPattern As String = "rmd\s?\d+"

Set oRegExp = CreateObject("VBScript.RegExp")

With oRegExp
.IgnoreCase = True
.Global = True
.Pattern = sPattern

For i = 0 To UBound(sStr)
If oRegExp.Test(sStr(i)) = True Then
Set colMatches = oRegExp.Execute(sStr(i))
Debug.Print i, colMatches(0)
End If
Next i
End With
End Sub
=====================================
0 rmd 2161
1 rmd24378
=====================================

If you want to extract the rmd strings and the dates, you could try this
similar routine, which assumes the dates are always at the end:

=======================================

Sub ExtractTattoo()
Dim sStr
Dim i As Long
sStr = Array("der. husky b/n f 4 aa. rmd 2161 huma - p.p. adopted 10.02.04", _
"dobermann n/f m. steril+rmd24378 faruk - p.p. trasf. palombara
13/1/05")

Dim oRegExp As Object
Dim colMatches As Object
Const sPattern As String = "(rmd\s?\d+)[\s\S]+(\s\S+$)"

Set oRegExp = CreateObject("VBScript.RegExp")

With oRegExp
.IgnoreCase = True
.Global = True
.Pattern = sPattern

For i = 0 To UBound(sStr)
If oRegExp.Test(sStr(i)) = True Then
Set colMatches = oRegExp.Execute(sStr(i))
Debug.Print i, colMatches(0).submatches(0),
colMatches(0).submatches(1)
End If
Next i
End With
End Sub
============================================
0 rmd 2161 10.02.04
1 rmd24378 13/1/05
=============================================

The "work" is done by the Pattern (sPattern).

In the first case

"rmd\s?\d+"

says look for a pattern starting with
"rmd" then
an optional <space> then
all of the following digits.

In the second

"(rmd\s?\d+)[\s\S]+(\s\S+$)"

The parentheses enclose "submatches", so the first submatch will be the same
"rmd" string as before.

The code then accepts all characters and newlines (spaces and non-spaces) until
it gets to the second set of parentheses which is looking for a substring that

starts with a <space>
is followed by consecutive <non-space>'s and then by
the End of the string.
--ron

Please note there are a few lines with unwanted word-wraps. The Const line
setting up the strings for the array to test; and the debug.print line.
--ron
 
M

merjet

Your msg wasn't all clear, but the following does what was clear.

Hth,
Merjet

Sub extractTattoo()
Dim iPos1 As Integer
Dim iPos2 As Integer
Dim sstr As String
Dim char As String
Dim iRow As Integer
Dim ws As Worksheet

Set ws = Sheets("Sheet1")
iRow = 4
Do Until ws.Range("A" & iRow) = ""
sstr = ws.Range("A" & iRow)
iPos1 = InStr(1, sstr, "rmd")
iPos2 = InStr(iPos1 + 3, sstr, " ")
If iPos2 = iPos1 + 3 Then iPos2 = InStr(iPos1 + 4, sstr, " ")
char = Mid(sstr, iPos1, iPos2 - iPos1)
ws.Range("J" & iRow) = char
iRow = iRow + 1
Loop
End Sub
 
F

FabZ

Thanks for your answers.
Well, my first post, effectively, was not so clear.

My situation is:
W2K+Excel2003
I have a worksheets with thousands of rows data, in the first column
there is a text like:
der. husky b/n f 4 aa. rmd 2161 huma - p.p. adopted 10.02.04
dobermann n/f m. steril+rmd24378 faruk - p.p. trasf. palombara 13/1/05

I need to extract (copy-paste) to a different column, the text it
matches the criteria:

-Start with RM
+ Have a third different character ("D" or "A" or "E" etc.)
+ Have a sequence of four or five numbers, sometimes preceded by a
<space>.

The results sound like "rmd 2161" or "rmd24378" or "rme45758".

Having a working formula for the above first case, I tought to re-use
it, adapted, for my second "challenge":
extract dates looking for their previous word always is present and
insert it in one of two different column:

if "adopted" or "ado" or "adopt." is found with "*ado*" then the
following date must to be copy-past to column K (Exits).

if "re-entered" or "re-ent" or "reenter." is found, the following date
must to be copy-past to column J (Entrances).

Anyway I found that sometimes there are two key words and two dates,
then I suppose I need a third "L" column, but, at current time, it's
important to extract only the last date.

Now I'm trying your formulas, It seems to me that the merjet one could
suit my needed but it returns

"Run-time error '5': Invalid Procedure Call or Argument"

at this point:

char = Mid(sstr, iPos1, iPos2 - iPos1)

and I found, looking for, in the newsgroups, it could be a problem of
mid with InStr.

Thank again for your help!

FabZ
 
R

Ron Rosenfeld

Thanks for your answers.
Well, my first post, effectively, was not so clear.

My situation is:
W2K+Excel2003
I have a worksheets with thousands of rows data, in the first column
there is a text like:


I need to extract (copy-paste) to a different column, the text it
matches the criteria:

-Start with RM
+ Have a third different character ("D" or "A" or "E" etc.)
+ Have a sequence of four or five numbers, sometimes preceded by a
<space>.

The results sound like "rmd 2161" or "rmd24378" or "rme45758".

Having a working formula for the above first case, I tought to re-use
it, adapted, for my second "challenge":
extract dates looking for their previous word always is present and
insert it in one of two different column:

if "adopted" or "ado" or "adopt." is found with "*ado*" then the
following date must to be copy-past to column K (Exits).

if "re-entered" or "re-ent" or "reenter." is found, the following date
must to be copy-past to column J (Entrances).

Anyway I found that sometimes there are two key words and two dates,
then I suppose I need a third "L" column, but, at current time, it's
important to extract only the last date.

Now I'm trying your formulas, It seems to me that the merjet one could
suit my needed but it returns

"Run-time error '5': Invalid Procedure Call or Argument"

at this point:

char = Mid(sstr, iPos1, iPos2 - iPos1)

and I found, looking for, in the newsgroups, it could be a problem of
mid with InStr.

Thank again for your help!

FabZ

Well, your explanation still does not cover everything, and may well be
incomplete. However, the SUB below will do what you describe on the data you
posted.

It still assumes that the date is at the very end of the string.

It looks for one of your listed variants of ado or reent to determine which
column to place the date.

It places the rm number in the column adjacent to the string.

If, as in your second example, there is NO adopted or reentered, it will not
extract a date.

---------------------------------------------------
Sub ExtractTattoo()
Dim i As Long
Dim c As Range

Dim oRegExp As Object
Dim colMatches As Object
Const sPattern As String = "(rm[a-z]\s?\d+).*(\s\S+$)"
Const sDateAdopt As String = "\b((adopted)|(ado)|(adopt.))"
Const sDateReent As String = "\b((re-entered)|(re-ent)|(reenter.))"

Set oRegExp = CreateObject("VBScript.RegExp")

With oRegExp
.IgnoreCase = True
.Global = True
For Each c In Selection
i = 0
.Pattern = sDateAdopt
If oRegExp.test(c.Text) = True Then i = 11 'column K
.Pattern = sDateReent
If oRegExp.test(c.Text) = True Then i = 10 'column J
.Pattern = sPattern
If oRegExp.test(c.Text) = True Then
Set colMatches = oRegExp.Execute(c.Text)
c.Offset(0, 1) = colMatches(0).submatches(0) 'rmd adjacent
If i <> 0 Then
Cells(c.Row, i).Value = colMatches(0).submatches(1)
End If
End If
Next c
End With
End Sub
==============================================


--ron
 
M

merjet

iPos1 = 0 produces that type of error. Excel is looking for string2 in
string1 and not finding it. Looking for "rmd" when string1 contains
"rme" instead is an example.

Merjet
 
R

Ron Rosenfeld

Thanks for your answers.
Well, my first post, effectively, was not so clear.

My situation is:
W2K+Excel2003
I have a worksheets with thousands of rows data, in the first column
there is a text like:


I need to extract (copy-paste) to a different column, the text it
matches the criteria:

-Start with RM
+ Have a third different character ("D" or "A" or "E" etc.)
+ Have a sequence of four or five numbers, sometimes preceded by a
<space>.

The results sound like "rmd 2161" or "rmd24378" or "rme45758".

Having a working formula for the above first case, I tought to re-use
it, adapted, for my second "challenge":
extract dates looking for their previous word always is present and
insert it in one of two different column:

if "adopted" or "ado" or "adopt." is found with "*ado*" then the
following date must to be copy-past to column K (Exits).

if "re-entered" or "re-ent" or "reenter." is found, the following date
must to be copy-past to column J (Entrances).

Anyway I found that sometimes there are two key words and two dates,
then I suppose I need a third "L" column, but, at current time, it's
important to extract only the last date.

Now I'm trying your formulas, It seems to me that the merjet one could
suit my needed but it returns

"Run-time error '5': Invalid Procedure Call or Argument"

at this point:

char = Mid(sstr, iPos1, iPos2 - iPos1)

and I found, looking for, in the newsgroups, it could be a problem of
mid with InStr.

Thank again for your help!

FabZ

Well, your explanation still does not cover everything, and may well be
incomplete. However, the SUB below will do what you describe on the data you
posted.

It still assumes that the date is at the very end of the string.

It looks for one of your listed variants of ado or reent to determine which
column to place the date.

It places the rm number in the column adjacent to the string.

If, as in your second example, there is NO adopted or reentered, it will not
extract a date.

---------------------------------------------------
Sub ExtractTattoo()
Dim i As Long
Dim c As Range

Dim oRegExp As Object
Dim colMatches As Object
Const sPattern As String = "(rm[a-z]\s?\d+).*(\s\S+$)"
Const sDateAdopt As String = "\b((adopted)|(ado)|(adopt.))"
Const sDateReent As String = "\b((re-entered)|(re-ent)|(reenter.))"

Set oRegExp = CreateObject("VBScript.RegExp")

With oRegExp
.IgnoreCase = True
.Global = True
For Each c In Selection
i = 0
.Pattern = sDateAdopt
If oRegExp.test(c.Text) = True Then i = 11 'column K
.Pattern = sDateReent
If oRegExp.test(c.Text) = True Then i = 10 'column J
.Pattern = sPattern
If oRegExp.test(c.Text) = True Then
Set colMatches = oRegExp.Execute(c.Text)
c.Offset(0, 1) = colMatches(0).submatches(0) 'rmd adjacent
If i <> 0 Then
Cells(c.Row, i).Value = colMatches(0).submatches(1)
End If
End If
Next c
End With
End Sub
==============================================


--ron

Corrections in sDateAdopt and sDateReent:

--------------------------------------------
Sub ExtractTattoo()
Dim i As Long
Dim c As Range

Dim oRegExp As Object
Dim colMatches As Object
Const sPattern As String = "(rm[a-z]\s?\d+).*(\s\S+$)"
Const sDateAdopt As String = "\s(adopted|ado|adopt\.)\s"
Const sDateReent As String = "\s(re-entered|re-ent|reenter\.)\s"

Set oRegExp = CreateObject("VBScript.RegExp")

With oRegExp
.IgnoreCase = True
.Global = True
For Each c In Selection
i = 0
.Pattern = sDateAdopt
If oRegExp.test(c.Text) = True Then i = 11 'column K
.Pattern = sDateReent
If oRegExp.test(c.Text) = True Then i = 10 'column J
.Pattern = sPattern
If oRegExp.test(c.Text) = True Then
Set colMatches = oRegExp.Execute(c.Text)
c.Offset(0, 1) = colMatches(0).submatches(0) 'rmd adjacent col
If i <> 0 Then
Cells(c.Row, i).Value = colMatches(0).submatches(1)
End If
End If
Next c
End With
End Sub

==================================================
--ron
 
F

FabZ

Ron Rosenfeld ha scritto:
Thanks for your answers.
Well, my first post, effectively, was not so clear.

My situation is:
W2K+Excel2003
I have a worksheets with thousands of rows data, in the first column
there is a text like:

der. husky b/n f 4 aa. rmd 2161 huma - p.p. adopted 10.02.04
dobermann n/f m. steril+rmd24378 faruk - p.p. trasf. palombara 13/1/05

I need to extract (copy-paste) to a different column, the text it
matches the criteria:

-Start with RM
+ Have a third different character ("D" or "A" or "E" etc.)
+ Have a sequence of four or five numbers, sometimes preceded by a
<space>.

The results sound like "rmd 2161" or "rmd24378" or "rme45758".

Having a working formula for the above first case, I tought to re-use
it, adapted, for my second "challenge":
extract dates looking for their previous word always is present and
insert it in one of two different column:

if "adopted" or "ado" or "adopt." is found with "*ado*" then the
following date must to be copy-past to column K (Exits).

if "re-entered" or "re-ent" or "reenter." is found, the following date
must to be copy-past to column J (Entrances).

Anyway I found that sometimes there are two key words and two dates,
then I suppose I need a third "L" column, but, at current time, it's
important to extract only the last date.

Now I'm trying your formulas, It seems to me that the merjet one could
suit my needed but it returns

"Run-time error '5': Invalid Procedure Call or Argument"

at this point:

char = Mid(sstr, iPos1, iPos2 - iPos1)

and I found, looking for, in the newsgroups, it could be a problem of
mid with InStr.

Thank again for your help!

FabZ

Well, your explanation still does not cover everything, and may well be
incomplete. However, the SUB below will do what you describe on the data you
posted.

It still assumes that the date is at the very end of the string.

It looks for one of your listed variants of ado or reent to determine which
column to place the date.

It places the rm number in the column adjacent to the string.

If, as in your second example, there is NO adopted or reentered, it will not
extract a date.

---------------------------------------------------
Sub ExtractTattoo()
Dim i As Long
Dim c As Range

Dim oRegExp As Object
Dim colMatches As Object
Const sPattern As String = "(rm[a-z]\s?\d+).*(\s\S+$)"
Const sDateAdopt As String = "\b((adopted)|(ado)|(adopt.))"
Const sDateReent As String = "\b((re-entered)|(re-ent)|(reenter.))"

Set oRegExp = CreateObject("VBScript.RegExp")

With oRegExp
.IgnoreCase = True
.Global = True
For Each c In Selection
i = 0
.Pattern = sDateAdopt
If oRegExp.test(c.Text) = True Then i = 11 'column K
.Pattern = sDateReent
If oRegExp.test(c.Text) = True Then i = 10 'column J
.Pattern = sPattern
If oRegExp.test(c.Text) = True Then
Set colMatches = oRegExp.Execute(c.Text)
c.Offset(0, 1) = colMatches(0).submatches(0) 'rmd adjacent
If i <> 0 Then
Cells(c.Row, i).Value = colMatches(0).submatches(1)
End If
End If
Next c
End With
End Sub
==============================================


--ron

Corrections in sDateAdopt and sDateReent:

--------------------------------------------
Sub ExtractTattoo()
Dim i As Long
Dim c As Range

Dim oRegExp As Object
Dim colMatches As Object
Const sPattern As String = "(rm[a-z]\s?\d+).*(\s\S+$)"
Const sDateAdopt As String = "\s(adopted|ado|adopt\.)\s"
Const sDateReent As String = "\s(re-entered|re-ent|reenter\.)\s"

Set oRegExp = CreateObject("VBScript.RegExp")

With oRegExp
.IgnoreCase = True
.Global = True
For Each c In Selection
i = 0
.Pattern = sDateAdopt
If oRegExp.test(c.Text) = True Then i = 11 'column K
.Pattern = sDateReent
If oRegExp.test(c.Text) = True Then i = 10 'column J
.Pattern = sPattern
If oRegExp.test(c.Text) = True Then
Set colMatches = oRegExp.Execute(c.Text)
c.Offset(0, 1) = colMatches(0).submatches(0) 'rmd adjacent col
If i <> 0 Then
Cells(c.Row, i).Value = colMatches(0).submatches(1)
End If
End If
Next c
End With
End Sub

==================================================
--ron

Ok, the code for "RM+ car.+ numbers" works fine, great!
About the second part, with:

..Pattern = sDateAdopt
If oRegExp.test(c.Text) = True Then i = 11 'column K

i = 11 it uses column B (1) to paste text and anyway it fail to
recognize many dates, apparently not different from the other one,
recognized.

Better with first version of sDateAdopt and sDateReent.
Now I will check further your sub looking for what's wrong.

Thanks again.

FabZ
 
R

Ron Rosenfeld

.Pattern = sDateAdopt
If oRegExp.test(c.Text) = True Then i = 11 'column K

i = 11 it uses column B (1) to paste text and anyway it fail to
recognize many dates, apparently not different from the other one,
recognized.

Better with first version of sDateAdopt and sDateReent.
Now I will check further your sub looking for what's wrong.

Thanks again.

FabZ

I have written that the date MUST be at the end of the string, the way this is
set up. Also, "adopt" or one of your variations MUST be in the string (or one
of the reenter variations) or a date will NOT be extracted.

So, in your 2nd example, the date will not be extracted because there is no
"adopt" or "reenter". If you want the date extracted under those
circumstances, you need to tell where you want it extracted.

Also, if the date is not at the end of the string, or there are <space>'s
within the date, nonsense may be extracted. If such is the case, you need to
be very specific as to what you want.

For example: adopt (or one of your variations) followed by
<space> followed by
8 characters with no <space>
followed by a <space> or the <end> of the string.

All this can be done; even checking, for example, that the date is of a certain
format (e.g. 2 digits followed by a dot or slash followed by 2 digits followed
by a dot or slash followed by 2 or 4 digits). But again, you need to specify
this.

Perhaps if you post a few of the strings where the date is not getting
extracted, I can see where your specifications differ from what is in the
string.

--ron
 
F

FabZ

Ron Rosenfeld ha scritto:
I have written that the date MUST be at the end of the string, the way this is
set up. Also, "adopt" or one of your variations MUST be in the string (or one
of the reenter variations) or a date will NOT be extracted.

So, in your 2nd example, the date will not be extracted because there is no
"adopt" or "reenter". If you want the date extracted under those
circumstances, you need to tell where you want it extracted.

Also, if the date is not at the end of the string, or there are <space>'s
within the date, nonsense may be extracted. If such is the case, you need to
be very specific as to what you want.

For example: adopt (or one of your variations) followed by
<space> followed by
8 characters with no <space>
followed by a <space> or the <end> of the string.

All this can be done; even checking, for example, that the date is of a certain
format (e.g. 2 digits followed by a dot or slash followed by 2 digits followed
by a dot or slash followed by 2 or 4 digits). But again, you need to specify
this.

Perhaps if you post a few of the strings where the date is not getting
extracted, I can see where your specifications differ from what is in the
string.

--ron

I tried to be more specific and I inserted new and different
combinations, i.e. adding <space> before and after "adopted" I
obtained much more results next to 98% and I find it good.
I made Sub to work with several different "key-words" and I got dates
too.

I understand that for particular strings I would need a very specific
code and maybe I can "make up for" in a different way.

Sometimes I obtained nothing and I find that curious looking at the
reference strings:

fox terrier m. b steril toby --p.p. adopted 10.05.04
beagle f. tipic. tiger--p.p. adopted 23.12.03

Logically I could have been expected for a different solution:
no particular text composition, same cell property of other cells
but...no results.

Ok it doesn't matter to me, your sub works fine and you gave me also a
lot of explanations too, I really can't ask more.

Thanks and Have a nice day!

FabZ
 
R

Ron Rosenfeld

Ron Rosenfeld ha scritto:


I tried to be more specific and I inserted new and different
combinations, i.e. adding <space> before and after "adopted" I
obtained much more results next to 98% and I find it good.
I made Sub to work with several different "key-words" and I got dates
too.

I understand that for particular strings I would need a very specific
code and maybe I can "make up for" in a different way.

Sometimes I obtained nothing and I find that curious looking at the
reference strings:

fox terrier m. b steril toby --p.p. adopted 10.05.04
beagle f. tipic. tiger--p.p. adopted 23.12.03

Logically I could have been expected for a different solution:
no particular text composition, same cell property of other cells
but...no results.

Ok it doesn't matter to me, your sub works fine and you gave me also a
lot of explanations too, I really can't ask more.

Thanks and Have a nice day!

FabZ

Ah, seeing those examples explains the problem.

It was not clear from your original specifications that you would have strings
that did not have the "RM" strings and that you would want anything extracted
in that instance.

All we need to do is make the "RM" string optional.

Try this:

----------------------------------------------------------
Sub ExtractTattoo()
Dim i As Long
Dim c As Range

Dim oRegExp As Object
Dim colMatches As Object
Const sPattern As String = "(rm[a-z]\s?\d+)?.*(\s\S+$)"
Const sDateAdopt As String = "\s(adopted|ado|adopt\.)\s"
Const sDateReent As String = "\s(re-entered|re-ent|reenter\.)\s"

Set oRegExp = CreateObject("VBScript.RegExp")

With oRegExp
.IgnoreCase = True
.Global = True
For Each c In Selection
i = 0
.Pattern = sDateAdopt
If oRegExp.test(c.Text) = True Then i = 11 'column K
.Pattern = sDateReent
If oRegExp.test(c.Text) = True Then i = 10 'column J
.Pattern = sPattern
If oRegExp.test(c.Text) = True Then
Set colMatches = oRegExp.Execute(c.Text)
c.Offset(0, 1) = colMatches(0).submatches(0) 'rmd adj
If i <> 0 Then
Cells(c.Row, i).Value = colMatches(0).submatches(1)
End If
End If
Next c
End With
End Sub
================================================


--ron
 
Top