Using Microsoft VBScript Regular Expressions 5.5

J

Joel Finkel

Folks,

I have a VBA script that I use to convert articles that I receive in Word format to simple (and I mean simple) HTML. The script simply performs a long series of search-and-replaces by using the Selection.Find object.

However, there are some strings that I need to convert that cannot be parsed without using a more robust regular expression parser. For example, I need to find a single line that comprises 8 or fewer words and that is followed by only a single ^p and replace it with "^p<h3>the words</h3>^p".

I added a reference to the Microsoft VBScript Regular Expressions 5.5 and created the following Sub:

Sub DoSectionHeaders(ByRef s As String)

Set re = New RegExp

re.Pattern = "\r(\w{1,8})\r{1}"
re.IgnoreCase = True
re.Global = True

s = re.Replace(s, "\r<h3>\1</h3>\r")

End Sub

My question: How do I pass the entire formatted text into this Sub so it can be processed?

I tried this:

Call DoSectionHeaders(Selection.FormattedText)

But Selection.FormattedText is an empty string.

Thanks in advance for all suggestions. Oh yes, I am using Word 2003.

Joel Finkel
(e-mail address removed)
 
P

Peter

How are you setting up the Selection?

-Peter

Folks,

I have a VBA script that I use to convert articles that I receive in Word format to simple (and I mean simple) HTML. The script simply performs a long series of search-and-replaces by using the Selection.Find object.

However, there are some strings that I need to convert that cannot be parsed without using a more robust regular expression parser. For example, I need to find a single line that comprises 8 or fewer words and that is followed by only a single ^p and replace it with "^p<h3>the words</h3>^p".

I added a reference to the Microsoft VBScript Regular Expressions 5.5 and created the following Sub:

Sub DoSectionHeaders(ByRef s As String)

Set re = New RegExp

re.Pattern = "\r(\w{1,8})\r{1}"
re.IgnoreCase = True
re.Global = True

s = re.Replace(s, "\r<h3>\1</h3>\r")

End Sub

My question: How do I pass the entire formatted text into this Sub so it can be processed?

I tried this:

Call DoSectionHeaders(Selection.FormattedText)

But Selection.FormattedText is an empty string.

Thanks in advance for all suggestions. Oh yes, I am using Word 2003.

Joel Finkel
(e-mail address removed)
 
J

Joel Finkel

I have done this:

Dim rng As Range
Set rng = ActiveDocument.Range(Start:=0, End:=ActiveDocument.Content.End)
Call DoSectionHeaders(rng.FormattedText)

The Sub now gets the formatted text, and the RegExp works. However, when the Call returns the rng.FormattedText is unchanged even though I explicitly call it by reference.

/Joel Finkel
(e-mail address removed)



"Peter" <peterguy -at- hotmail -dot- com> wrote in message How are you setting up the Selection?

-Peter

Folks,

I have a VBA script that I use to convert articles that I receive in Word format to simple (and I mean simple) HTML. The script simply performs a long series of search-and-replaces by using the Selection.Find object.

However, there are some strings that I need to convert that cannot be parsed without using a more robust regular expression parser. For example, I need to find a single line that comprises 8 or fewer words and that is followed by only a single ^p and replace it with "^p<h3>the words</h3>^p".

I added a reference to the Microsoft VBScript Regular Expressions 5.5 and created the following Sub:

Sub DoSectionHeaders(ByRef s As String)

Set re = New RegExp

re.Pattern = "\r(\w{1,8})\r{1}"
re.IgnoreCase = True
re.Global = True

s = re.Replace(s, "\r<h3>\1</h3>\r")

End Sub

My question: How do I pass the entire formatted text into this Sub so it can be processed?

I tried this:

Call DoSectionHeaders(Selection.FormattedText)

But Selection.FormattedText is an empty string.

Thanks in advance for all suggestions. Oh yes, I am using Word 2003.

Joel Finkel
(e-mail address removed)
 
P

Peter

Pass it as a Range, not a String:

Dim rng As Range
Set rng = ActiveDocument.Range(Start:=0, End:=ActiveDocument.Content.End)
Call DoSectionHeaders(rng.FormattedText)

Sub DoSectionHeaders(ByRef r As Range)

Set re = New RegExp

re.Pattern = "\r(\w{1,8})\r{1}"
re.IgnoreCase = True
re.Global = True

r.Text = re.Replace(r.Text, "\r<h3>\1</h3>\r")

End Sub

Doesn't matter if you pass byref of byval if you use the Range object, too.
I think that applies to all objects, but my theory is a little rusty.

hth,

-Peter

I have done this:

Dim rng As Range
Set rng = ActiveDocument.Range(Start:=0, End:=ActiveDocument.Content.End)
Call DoSectionHeaders(rng.FormattedText)

The Sub now gets the formatted text, and the RegExp works. However, when the Call returns the rng.FormattedText is unchanged even though I explicitly call it by reference.

/Joel Finkel
(e-mail address removed)



"Peter" <peterguy -at- hotmail -dot- com> wrote in message How are you setting up the Selection?

-Peter

Folks,

I have a VBA script that I use to convert articles that I receive in Word format to simple (and I mean simple) HTML. The script simply performs a long series of search-and-replaces by using the Selection.Find object.

However, there are some strings that I need to convert that cannot be parsed without using a more robust regular expression parser. For example, I need to find a single line that comprises 8 or fewer words and that is followed by only a single ^p and replace it with "^p<h3>the words</h3>^p".

I added a reference to the Microsoft VBScript Regular Expressions 5.5 and created the following Sub:

Sub DoSectionHeaders(ByRef s As String)

Set re = New RegExp

re.Pattern = "\r(\w{1,8})\r{1}"
re.IgnoreCase = True
re.Global = True

s = re.Replace(s, "\r<h3>\1</h3>\r")

End Sub

My question: How do I pass the entire formatted text into this Sub so it can be processed?

I tried this:

Call DoSectionHeaders(Selection.FormattedText)

But Selection.FormattedText is an empty string.

Thanks in advance for all suggestions. Oh yes, I am using Word 2003.

Joel Finkel
(e-mail address removed)
 
J

Joel Finkel

Brilliant! Where do I send the Guiness?

Of course, now I have problems with the RegExp but that will have to wait
until tonight,

Thanks.

/Joel

"Peter" <peterguy -at- hotmail -dot- com> wrote in message
Pass it as a Range, not a String:

Dim rng As Range
Set rng = ActiveDocument.Range(Start:=0,
End:=ActiveDocument.Content.End)
Call DoSectionHeaders(rng.FormattedText)

Sub DoSectionHeaders(ByRef r As Range)

Set re = New RegExp

re.Pattern = "\r(\w{1,8})\r{1}"
re.IgnoreCase = True
re.Global = True

r.Text = re.Replace(r.Text, "\r<h3>\1</h3>\r")

End Sub

Doesn't matter if you pass byref of byval if you use the Range object, too.
I think that applies to all objects, but my theory is a little rusty.

hth,

-Peter

I have done this:

Dim rng As Range
Set rng = ActiveDocument.Range(Start:=0,
End:=ActiveDocument.Content.End)
Call DoSectionHeaders(rng.FormattedText)

The Sub now gets the formatted text, and the RegExp works. However, when
the Call returns the rng.FormattedText is unchanged even though I explicitly
call it by reference.

/Joel Finkel
(e-mail address removed)



"Peter" <peterguy -at- hotmail -dot- com> wrote in message
How are you setting up the Selection?

-Peter

Folks,

I have a VBA script that I use to convert articles that I receive in Word
format to simple (and I mean simple) HTML. The script simply performs a
long series of search-and-replaces by using the Selection.Find object.

However, there are some strings that I need to convert that cannot be parsed
without using a more robust regular expression parser. For example, I need
to find a single line that comprises 8 or fewer words and that is followed
by only a single ^p and replace it with "^p<h3>the words</h3>^p".

I added a reference to the Microsoft VBScript Regular Expressions 5.5 and
created the following Sub:

Sub DoSectionHeaders(ByRef s As String)

Set re = New RegExp

re.Pattern = "\r(\w{1,8})\r{1}"
re.IgnoreCase = True
re.Global = True

s = re.Replace(s, "\r<h3>\1</h3>\r")

End Sub

My question: How do I pass the entire formatted text into this Sub so it can
be processed?

I tried this:

Call DoSectionHeaders(Selection.FormattedText)

But Selection.FormattedText is an empty string.

Thanks in advance for all suggestions. Oh yes, I am using Word 2003.

Joel Finkel
(e-mail address removed)
 
P

Peter

Mmm, Guiness. Good stuff, served correctly.

Looking at what you're doing, you _might_ be able to do away with the regexp and parse your document by paragraph.
Perhaps the structure of your other processing won't fit with that, but it might make it a bit easier in this case (testing for the next paragraph made it messy):

Dim para As Paragraph
For Each para In ActiveDocument.Paragraphs
With para.Range.FormattedText
' one thing to remember about a paragraph is that
' it includes the ending paragraph mark as a word
If .Words.Count > 1 And .Words.Count <= 9 Then
If Not para.Next Is Nothing Then
If para.Next.Range.Words.Count > 1 Then
With .Words
.Item(1).InsertBefore "<h3>"
.Item(.Count - 1).InsertAfter "</h3>"
End With
End If
Else
With .Words
.Item(1).InsertBefore "<h3>"
.Item(.Count - 1).InsertAfter "</h3>"
End With
End If
End If
End With
Next para

hth,

-Peter
 
J

Joel Finkel

Peter,

I can't thank you enough. Your example was close enough to teach me well,
and I was able to modify it to do just what I need to do.

Shouldn't serve that Guiness too cold. Room temperature is cold enough.

/Joel Finkel
(e-mail address removed)



"Peter" <peterguy -at- hotmail -dot- com> wrote in message
Mmm, Guiness. Good stuff, served correctly.

Looking at what you're doing, you _might_ be able to do away with the regexp
and parse your document by paragraph.
Perhaps the structure of your other processing won't fit with that, but it
might make it a bit easier in this case (testing for the next paragraph made
it messy):

Dim para As Paragraph
For Each para In ActiveDocument.Paragraphs
With para.Range.FormattedText
' one thing to remember about a paragraph is that
' it includes the ending paragraph mark as a word
If .Words.Count > 1 And .Words.Count <= 9 Then
If Not para.Next Is Nothing Then
If para.Next.Range.Words.Count > 1 Then
With .Words
.Item(1).InsertBefore "<h3>"
.Item(.Count - 1).InsertAfter "</h3>"
End With
End If
Else
With .Words
.Item(1).InsertBefore "<h3>"
.Item(.Count - 1).InsertAfter "</h3>"
End With
End If
End If
End With
Next para

hth,

-Peter
 
P

Peter

I can't thank you enough. Your example was close enough to teach me well,
and I was able to modify it to do just what I need to do.

Glad it worked for you. :)
Shouldn't serve that Guiness too cold. Room temperature is cold enough.

Definitely. And tap is the only way to go.

-Peter
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top