Counting number of words in a sentence and highlighting or commenting

B

BHW

Hi,

I found some old code, from Helmut of Bavaria I believe, that is
supposed to find long sentences in a document:

Sub test002()
Dim rDcm As Range ' the documents main story range
Dim oWrd As Object ' a word
Dim oSnt As Object ' a sentence
Dim oPrg As Paragraph ' a paragraph
Dim lWrd As Long ' a counter for words
Dim lSnt As Long ' a counter for sentences
Dim lPrg As Long ' a counter for paragraphs

Set rDcm = ActiveDocument.Range

For Each oPrg In rDcm.Paragraphs
lPrg = lPrg + 1
lSnt = 0
For Each oSnt In oPrg.Range.Sentences
If Len(oSnt) > 50 Then
oSnt.Comments.Add _
Range:=oSnt, _
Text:="sentence too long"
End If
lSnt = lSnt + 1
lWrd = 0
' For Each oWrd In oSnt.Words
' lWrd = lWrd + 1
' If Len(oWrd) = 1 Then
' oWrd.Comments.Add _
' Range:=oWrd, _
' Text:="1 character word"
' End If
' Next
Next
Debug.Print lPrg, lSnt, lWrd
' paragraph, sentence, words
Next
End Sub

I ran this macro and it seems to comment *every* sentence, regardless
of the length. Does anyone know why? Has anyone improved on this code?
I'd like to find long sentences in a document and consider revising
them. Either commenting or highlighting them would be fine.

Thanks, Bruce
 
G

Greg Maxey

Bruce,

Yes that is Helmut's code and it works (as well as something like this
works) here. Are you sure that all over your sentences are not over 50
characters in length (that includes everything: spaces, punctuation,
characters)

Improved on? Not really. It can be shortened:

Sub test002()
Dim rDcm As Range ' the documents main story range
Dim oSnt As Object ' a sentence
Dim oPrg As Paragraph ' a paragraph
Dim lSnt As Long ' a counter for sentences
Dim lPrg As Long ' a counter for paragraphs
Set rDcm = ActiveDocument.Range
For Each oPrg In rDcm.Paragraphs
lPrg = lPrg + 1
lSnt = 0
For Each oSnt In oPrg.Range.Sentences
lSnt = lSnt + 1
If Len(oSnt) > 15 Then
oSnt.Comments.Add oSnt, "sentence too long"
Debug.Print lPrg, lSnt
End If
Next
Next
End Sub

Part of the problem is Word's misconception of "Sentence." Consider the
following three paragraphs each containing 1 sentence:


I came, I saw.

I came, I saw, I conquered.

Mr. X came, Mr. X saw, Mr. X conqoured.


If I set the condition to > 15 and run the code, I would expect that both
the second and third sentences would be commented. Sentence 2 is, but
sentence 3 is not. Why? Sentence 3 is clearly longer than sentence 2. The
reason is that Word treats every "." used in abbreviations as a sentence
stop. You can see this by running the following code:

MsgBox ActiveDocument.Range.Sentences.Count

While there are only 3 sentences as you and I understand them, the return is
6 because Word counts every "." stop.
 
B

BHW

Ah, 50 characters! I am concerned about the number of *words* in a
sentence, not the number of characters, although the latter matters,
too. Is there a way to get the macro to work with the number of words
rather than number of characters? Thanks much, and thanks also for the
pointer re periods.

Bruce
 
G

Greg Maxey

Sort of, but with the some of the stame issues. Word counts commas,
periods, and other non-word things as words. You might use something like:

Sub test002()
Dim rDcm As Range ' the documents main story range
Dim oSnt As Object ' a sentence
Dim oPrg As Paragraph ' a paragraph
Dim lSnt As Long ' a counter for sentences
Dim lPrg As Long ' a counter for paragraphs
Dim oWord As Range
Dim Cnt As Long
Set rDcm = ActiveDocument.Range
For Each oPrg In rDcm.Paragraphs
lPrg = lPrg + 1
lSnt = 0
For Each oSnt In oPrg.Range.Sentences
lSnt = lSnt + 1
Cnt = 0
For Each oWord In oSnt.Words
If Not InStr("., ?!" & Chr(11) & Chr(13), oWord) <> 0 Then
Cnt = Cnt + 1
End If
Next oWord
If Cnt > 4 Then
oSnt.Comments.Add oSnt, "sentence too long"
Debug.Print lPrg, lSnt
End If
Next
Next
End Sub
 
F

Fumei2 via OfficeKB.com

Greg: " Word counts commas, periods, and other non-word things as words. "

Technically speaking, this is not completely accurate.

Word itself does NOT count commas, periods etc as words. It has a built-in
delimiter function.

VBA does count those as "words".

If you have:

Mr. X came, Mr. X saw, Mr. X conqoured.

and you use Tools > Word Count, you get a word count of 9.

Mr.
X
came
Mr.
X
saw
Mr.
X
conquored

As does (assuming it is Paragraph(4)):

Dim r As Range
Set r = ActiveDocument.Paragraphs(4).Range
MsgBox r.ComputeStatistics(wdStatisticWords) - shows as 9

HOWEVER,

Set r = ActiveDocument.Paragraphs(4).Range
MsgBox r.Words.Count

gives 16.

1 Mr
2 .
3 X
4 came
5 ,
6 Mr
7 .
8 X
9 saw
10 ,
11 Mr
12 .
13 X
14 conquored
15 .
16 <p> a paragraph mark

So, going through one route (the menu route in Word itself) returns the
"right" number. As does .ComputeStatistics. Both use an internal delimiter
function to extract the commas. periods etc. A straight VBA Word.Count does
not use those internal delimiters.

Greg said:
Sort of, but with the some of the stame issues. Word counts commas,
periods, and other non-word things as words. You might use something like:

Sub test002()
Dim rDcm As Range ' the documents main story range
Dim oSnt As Object ' a sentence
Dim oPrg As Paragraph ' a paragraph
Dim lSnt As Long ' a counter for sentences
Dim lPrg As Long ' a counter for paragraphs
Dim oWord As Range
Dim Cnt As Long
Set rDcm = ActiveDocument.Range
For Each oPrg In rDcm.Paragraphs
lPrg = lPrg + 1
lSnt = 0
For Each oSnt In oPrg.Range.Sentences
lSnt = lSnt + 1
Cnt = 0
For Each oWord In oSnt.Words
If Not InStr("., ?!" & Chr(11) & Chr(13), oWord) <> 0 Then
Cnt = Cnt + 1
End If
Next oWord
If Cnt > 4 Then
oSnt.Comments.Add oSnt, "sentence too long"
Debug.Print lPrg, lSnt
End If
Next
Next
End Sub
Ah, 50 characters! I am concerned about the number of *words* in a
sentence, not the number of characters, although the latter matters,
[quoted text clipped - 104 lines]
 
F

Fumei2 via OfficeKB.com

This is also confusing with Sentences.

Mr. X came, Mr. X saw, Mr. X conquered. Mr. X came, Mr. X saw, Mr. X
conquered. Mr. X came, Mr. X saw, Mr. X conquered.

We, as humans, can parse that as THREE sentences. Word can not. That is why
there is no Sentences count with Tools > Word Count. Paragraphs, yes. Lines,
yes. Words, yes. But not Sentences.

If you use VBA to get a Sentences.Count, it = 12.

Mr.
X came, Mr.
X saw, Mr.
X conquered.
Mr.
X came, Mr.
X saw, Mr.
X conquered.
Mr.
X came, Mr.
X saw, Mr.
X conquered.

This should emphasize something a surprising number of people appear to miss.
Word does not "understand", or "know" ANYTHING about the text in the document.


"Understanding" and "askdgakhergwk" are exactly the same - in terms of
"meaning".

The only "meaning" comes from algorthims built up to do specific tasks. And
even with that there is still no real meaning. After all, if you put
"askdgakhergwk" into the dictonary Word will accept it as a proper word.

I have had a number of people honestly surprised that an input of "Mrr. Jogn
Glenn" in a text formfield (or a textbox on a userform) is accepted. They
say it does not "make sense". Hmmmm. Except Word has no sense. ALL text is
just that...text; ASCII characters. One at a time.

You can, of course, get the "right" Sentence count - 3 - from (again assuming
it is Paragraph(4)):

Mr. X came, Mr. X saw, Mr. X conquered. Mr. X came, Mr. X saw, Mr. X
conquered. Mr. X came, Mr. X saw, Mr. X conquered.

Sub MySentenceCount()
Dim r As Range
Dim oSnt
Dim counter As Long

Set r = ActiveDocument.Paragraphs(4).Range

For Each oSnt In r.Sentences
If Left(oSnt, 2) <> "Mr" Then
If Right(oSnt, 4) <> "Mr. " Then
counter = counter + 1
End If
End If
Next
MsgBox counter ' displays 3
End Sub

but only by hard-coding the logic regarding "Mr."
Greg: " Word counts commas, periods, and other non-word things as words. "

Technically speaking, this is not completely accurate.

Word itself does NOT count commas, periods etc as words. It has a built-in
delimiter function.

VBA does count those as "words".

If you have:

Mr. X came, Mr. X saw, Mr. X conqoured.

and you use Tools > Word Count, you get a word count of 9.

Mr.
X
came
Mr.
X
saw
Mr.
X
conquored

As does (assuming it is Paragraph(4)):

Dim r As Range
Set r = ActiveDocument.Paragraphs(4).Range
MsgBox r.ComputeStatistics(wdStatisticWords) - shows as 9

HOWEVER,

Set r = ActiveDocument.Paragraphs(4).Range
MsgBox r.Words.Count

gives 16.

1 Mr
2 .
3 X
4 came
5 ,
6 Mr
7 .
8 X
9 saw
10 ,
11 Mr
12 .
13 X
14 conquored
15 .
16 <p> a paragraph mark

So, going through one route (the menu route in Word itself) returns the
"right" number. As does .ComputeStatistics. Both use an internal delimiter
function to extract the commas. periods etc. A straight VBA Word.Count does
not use those internal delimiters.
Sort of, but with the some of the stame issues. Word counts commas,
periods, and other non-word things as words. You might use something like:
[quoted text clipped - 32 lines]
 
G

Greg Maxey

Fumei2,

Right of course. "VBA count ..." would have be more correct.

Bonus Tip: If one learns to apply it and pay attention to its flags, Word
will also help improve spelling: "conqoured"

Original error Greg's of course.

Fumei2 via OfficeKB.com said:
Greg: " Word counts commas, periods, and other non-word things as words.
"

Technically speaking, this is not completely accurate.

Word itself does NOT count commas, periods etc as words. It has a
built-in
delimiter function.

VBA does count those as "words".

If you have:

Mr. X came, Mr. X saw, Mr. X conqoured.

and you use Tools > Word Count, you get a word count of 9.

Mr.
X
came
Mr.
X
saw
Mr.
X
conquored

As does (assuming it is Paragraph(4)):

Dim r As Range
Set r = ActiveDocument.Paragraphs(4).Range
MsgBox r.ComputeStatistics(wdStatisticWords) - shows as 9

HOWEVER,

Set r = ActiveDocument.Paragraphs(4).Range
MsgBox r.Words.Count

gives 16.

1 Mr
2 .
3 X
4 came
5 ,
6 Mr
7 .
8 X
9 saw
10 ,
11 Mr
12 .
13 X
14 conquored
15 .
16 <p> a paragraph mark

So, going through one route (the menu route in Word itself) returns the
"right" number. As does .ComputeStatistics. Both use an internal
delimiter
function to extract the commas. periods etc. A straight VBA Word.Count
does
not use those internal delimiters.

Greg said:
Sort of, but with the some of the stame issues. Word counts commas,
periods, and other non-word things as words. You might use something
like:

Sub test002()
Dim rDcm As Range ' the documents main story range
Dim oSnt As Object ' a sentence
Dim oPrg As Paragraph ' a paragraph
Dim lSnt As Long ' a counter for sentences
Dim lPrg As Long ' a counter for paragraphs
Dim oWord As Range
Dim Cnt As Long
Set rDcm = ActiveDocument.Range
For Each oPrg In rDcm.Paragraphs
lPrg = lPrg + 1
lSnt = 0
For Each oSnt In oPrg.Range.Sentences
lSnt = lSnt + 1
Cnt = 0
For Each oWord In oSnt.Words
If Not InStr("., ?!" & Chr(11) & Chr(13), oWord) <> 0 Then
Cnt = Cnt + 1
End If
Next oWord
If Cnt > 4 Then
oSnt.Comments.Add oSnt, "sentence too long"
Debug.Print lPrg, lSnt
End If
Next
Next
End Sub
Ah, 50 characters! I am concerned about the number of *words* in a
sentence, not the number of characters, although the latter matters,
[quoted text clipped - 104 lines]
Thanks, Bruce
 
G

Greg Maxey

In view of Fumei2 valid observations, this may be simplier:

Sub test002()
Dim rDcm As Range ' the documents main story range
Dim oSen As Range
Dim oPrg As Paragraph ' a paragraph
Set rDcm = ActiveDocument.Range
For Each oPrg In rDcm.Paragraphs
For Each oSen In oPrg.Range.Sentences
If oSen.ComputeStatistics(wdStatisticWords) > 4 Then
oSen.Comments.Add oSen, "sentence too long"
Debug.Print lPrg, lSnt
End If
Next
Next
End Sub

It will still fall flat if it encounters periods after abbreviations.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top