Remove all phrases containing a year within parenthesis

E

Elfego Baca

I have a very long document (3000 pages) which contains thousands of phrases
surrounded by parenthesis. I would like to delete all of those sections where
the goups of words are surrounded by parenthesis, contain a year such as
1989, 2007 etc) and are of alength no greater than 50 characters. Is there a
macro that would accomplish this.

For example I would lkie the macro to delete the following groups of words
including the parenthesis:

(born in 1948)
(first noted in 1957 but not prior to that date)
(1968)

But not remove the following:
(156)
(first noted in 1957 but not prior to that date or after 1999)
(no longer seen in the United States)
 
G

Graham Mayor

This is similar to your earlier question concerning 'memo' and a slightly
modified version of that solution would work ... however there is no way
that I can think of that can differentiate accurately between a four digit
number and a year. So to narrow the search, we would have to assume that a
year begins with a 1 or a 2.
In which case the following will work with your examples. As in the previous
version it copies all the deleted strings, with their page numbers to a new
document, so that you can check the deletions before saving the document.

If all the dates are in the range 1900 - 2010 you could change the line
.Text = "\(*[12]{1}[0-9]{3}*\)"
to
.Text = "\(*[12]{1}[09]{1}[0-9]{2}*\)"

Dim oRng As Range
Dim sText As String
Dim iStart As Integer
Dim iEnd As Integer
Dim oDoc As Document
Dim oNewDoc As Document
Set oDoc = ActiveDocument
Set oNewDoc = Documents.Add
Set oRng = oDoc.Range
With oRng.Find
.Text = "\(*[12]{1}[0-9]{3}*\)"
Do While .Execute(Forward:=True, _
MatchWildcards:=True) = True
iStart = InStrRev(oRng.Text, "(")
oRng.Start = oRng.Start + (iStart - 1)
If Len(oRng) < 51 Then
oNewDoc.Range.InsertAfter "Page " & _
oRng.Information(wdActiveEndPageNumber) & _
vbTab & oRng.Text & vbCr
oRng.Delete
End If
Loop
End With


--
<>>< ><<> ><<> <>>< ><<> <>>< <>><<>
Graham Mayor - Word MVP


<>>< ><<> ><<> <>>< ><<> <>>< <>><<>
 
E

Elfego Baca

Thank you for the macro. I think we are getting close but I keep getting an
error in the following line: iStart = InStrRev(oRng.Text, "(")

I ran the macro and this is what I get in the new document:

Page 2 (pain–pleasure)
Page 2 (gluteus medius muscle)
Page 3 (clinical)
Page 4 (pathological)
Page 6 (e.g., withdrawal)
Page 8 (or malignant)
Page 9 (IASP)

As you see, none of these contains any years nor does it contain 4 digit
numbers. So apparently this is not working appropriately. I had the same
problem with the previous macro that you wrote. I am using Word 2007.
--
Butch Cassidy


Graham Mayor said:
This is similar to your earlier question concerning 'memo' and a slightly
modified version of that solution would work ... however there is no way
that I can think of that can differentiate accurately between a four digit
number and a year. So to narrow the search, we would have to assume that a
year begins with a 1 or a 2.
In which case the following will work with your examples. As in the previous
version it copies all the deleted strings, with their page numbers to a new
document, so that you can check the deletions before saving the document.

If all the dates are in the range 1900 - 2010 you could change the line
.Text = "\(*[12]{1}[0-9]{3}*\)"
to
.Text = "\(*[12]{1}[09]{1}[0-9]{2}*\)"

Dim oRng As Range
Dim sText As String
Dim iStart As Integer
Dim iEnd As Integer
Dim oDoc As Document
Dim oNewDoc As Document
Set oDoc = ActiveDocument
Set oNewDoc = Documents.Add
Set oRng = oDoc.Range
With oRng.Find
.Text = "\(*[12]{1}[0-9]{3}*\)"
Do While .Execute(Forward:=True, _
MatchWildcards:=True) = True
iStart = InStrRev(oRng.Text, "(")
oRng.Start = oRng.Start + (iStart - 1)
If Len(oRng) < 51 Then
oNewDoc.Range.InsertAfter "Page " & _
oRng.Information(wdActiveEndPageNumber) & _
vbTab & oRng.Text & vbCr
oRng.Delete
End If
Loop
End With


--
<>>< ><<> ><<> <>>< ><<> <>>< <>><<>
Graham Mayor - Word MVP


<>>< ><<> ><<> <>>< ><<> <>>< <>><<>


Elfego Baca said:
I have a very long document (3000 pages) which contains thousands of
phrases
surrounded by parenthesis. I would like to delete all of those sections
where
the goups of words are surrounded by parenthesis, contain a year such as
1989, 2007 etc) and are of alength no greater than 50 characters. Is
there a
macro that would accomplish this.

For example I would lkie the macro to delete the following groups of
words
including the parenthesis:

(born in 1948)
(first noted in 1957 but not prior to that date)
(1968)

But not remove the following:
(156)
(first noted in 1957 but not prior to that date or after 1999)
(no longer seen in the United States)


.
 
G

Graham Mayor

The macro works here, however it occurs to me that you may not be using an
English installation of Word and that your regional list separator character
may not be a comma but a semicolon.

Try changing the line
iStart = InStrRev(oRng.Text, "(")
to
iStart = InStrRev(oRng.Text; "(")
i.e replace the comma with a semicolon

There is a further comma in the line below that would probably need to be
changed also, if that premise is correct.

Do While .Execute(Forward:=True, _

--
<>>< ><<> ><<> <>>< ><<> <>>< <>><<>
Graham Mayor - Word MVP


<>>< ><<> ><<> <>>< ><<> <>>< <>><<>


Elfego Baca said:
Thank you for the macro. I think we are getting close but I keep getting
an
error in the following line: iStart = InStrRev(oRng.Text, "(")

I ran the macro and this is what I get in the new document:

Page 2 (pain-pleasure)
Page 2 (gluteus medius muscle)
Page 3 (clinical)
Page 4 (pathological)
Page 6 (e.g., withdrawal)
Page 8 (or malignant)
Page 9 (IASP)

As you see, none of these contains any years nor does it contain 4 digit
numbers. So apparently this is not working appropriately. I had the same
problem with the previous macro that you wrote. I am using Word 2007.
--
Butch Cassidy


Graham Mayor said:
This is similar to your earlier question concerning 'memo' and a slightly
modified version of that solution would work ... however there is no way
that I can think of that can differentiate accurately between a four
digit
number and a year. So to narrow the search, we would have to assume that
a
year begins with a 1 or a 2.
In which case the following will work with your examples. As in the
previous
version it copies all the deleted strings, with their page numbers to a
new
document, so that you can check the deletions before saving the document.

If all the dates are in the range 1900 - 2010 you could change the line
.Text = "\(*[12]{1}[0-9]{3}*\)"
to
.Text = "\(*[12]{1}[09]{1}[0-9]{2}*\)"

Dim oRng As Range
Dim sText As String
Dim iStart As Integer
Dim iEnd As Integer
Dim oDoc As Document
Dim oNewDoc As Document
Set oDoc = ActiveDocument
Set oNewDoc = Documents.Add
Set oRng = oDoc.Range
With oRng.Find
.Text = "\(*[12]{1}[0-9]{3}*\)"
Do While .Execute(Forward:=True, _
MatchWildcards:=True) = True
iStart = InStrRev(oRng.Text, "(")
oRng.Start = oRng.Start + (iStart - 1)
If Len(oRng) < 51 Then
oNewDoc.Range.InsertAfter "Page " & _
oRng.Information(wdActiveEndPageNumber) & _
vbTab & oRng.Text & vbCr
oRng.Delete
End If
Loop
End With


--
<>>< ><<> ><<> <>>< ><<> <>>< <>><<>
Graham Mayor - Word MVP


<>>< ><<> ><<> <>>< ><<> <>>< <>><<>


Elfego Baca said:
I have a very long document (3000 pages) which contains thousands of
phrases
surrounded by parenthesis. I would like to delete all of those sections
where
the goups of words are surrounded by parenthesis, contain a year such
as
1989, 2007 etc) and are of alength no greater than 50 characters. Is
there a
macro that would accomplish this.

For example I would lkie the macro to delete the following groups of
words
including the parenthesis:

(born in 1948)
(first noted in 1957 but not prior to that date)
(1968)

But not remove the following:
(156)
(first noted in 1957 but not prior to that date or after 1999)
(no longer seen in the United States)


.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top