Help with word/phrase counting macro?

J

jmmiller

I've created documents where every sentence is its own section and
want to count the number of times a given set of words/phrases appears
in each section of the given document. The following macro counts the
words/phrases and provides an output at the end of the document (to be
copied and pasted into excel where I will be looking at combinations
of the words and phrases).

I would like to have a larger set of search words/phrases (15-25) and
do this on 200 documents of 100+ pages each. Currently, it takes hours
for the macro to work on a given document and I think this is because
it of how it is looping through the sections, but am not sure.

Is there a better way to be doing this? Any suggestions or ideas? Is
there a better way to do this where it would search between periods
rather than using section breaks to make it faster or another way for
it to search through the document by sentence?


Any help is much appreciated.




Sub JMS_WordCount()
Application.Windows(ActiveDocument).View = wdNormalView
Application.ScreenUpdating = False
Application.Options.Pagination = False
Application.ActiveDocument.ShowGrammaticalErrors = False
Application.ActiveDocument.ShowSpellingErrors = False

Dim CurrPane As Pane
Set CurrPane = Application.Documents(1).Windows(1).ActivePane

Dim myRange As Range
Dim myWord As Range
Dim i As Long

Dim Unemploy As Long
Dim Underemp As Long
Dim Inflation As Long


'Selects the active document, collapse to the end,
'and puts a final section break at the end
With ActiveDocument.Range
.Collapse wdCollapseEnd
.InsertBreak Type:=wdSectionBreakContinuous
End With

'Identifies total number of sections as Total minus 1
For i = 1 To ActiveDocument.Sections.Count - 1
''''''''''''''''''''''''' UMEMPLOY
Set myRange = ActiveDocument.Sections(i).Range
Unemploy = 0
Selection.HomeKey wdStory 'Story refers to all that's in a
distinct unit
'HomeKey refers to the start of the
specified unit
'Here, that unit is Section(i)
Selection.Find.ClearFormatting
With Selection.Find 'Selection here is Section(i)
Do While .Execute(FindText:="<[Uu]nemploy",
MatchWildcards:=True, _
Wrap:=wdFindStop, Forward:=True) = True
Set myWord = Selection.Range 'Select the word
range when found
If myWord.InRange(myRange) = True Then 'If
selected word is in search range, then
Unemploy = Unemploy + 1 'count it
End If
Loop
End With
'''''''''''''''''''''''' UNDEREMPLOY
Set myRange = ActiveDocument.Sections(i).Range
Underemp = 0
Selection.HomeKey wdStory
Selection.Find.ClearFormatting
With Selection.Find
Do While .Execute(FindText:="<([Uu]nderemploy)",
MatchWildcards:=True, _
Wrap:=wdFindStop, Forward:=True) = True
Set myWord = Selection.Range
If myWord.InRange(myRange) = True Then
Underemp = Underemp + 1
End If
Loop
End With
'''''''''''''''''''''''' INFLATION
Set myRange = ActiveDocument.Sections(i).Range
Inflation = 0
Selection.HomeKey wdStory
Selection.Find.ClearFormatting
With Selection.Find
Do While .Execute(FindText:="([Ii]nflation)",
MatchWildcards:=True, _
Wrap:=wdFindStop, Forward:=True) = True
Set myWord = Selection.Range
If myWord.InRange(myRange) = True Then
Inflation = Inflation + 1
End If
Loop
End With
''''''''''''''''''''''''''
ActiveDocument.Range.InsertAfter Unemploy & vbTab & Underemp &
vbTab & Inflation & vbCr

Next i
End Sub
 
J

Jean-Guy Marcil

I've created documents where every sentence is its own section and
want to count the number of times a given set of words/phrases appears
in each section of the given document. The following macro counts the
words/phrases and provides an output at the end of the document (to be
copied and pasted into excel where I will be looking at combinations
of the words and phrases).

I would like to have a larger set of search words/phrases (15-25) and
do this on 200 documents of 100+ pages each. Currently, it takes hours
for the macro to work on a given document and I think this is because
it of how it is looping through the sections, but am not sure.

Is there a better way to be doing this? Any suggestions or ideas? Is
there a better way to do this where it would search between periods
rather than using section breaks to make it faster or another way for
it to search through the document by sentence?


Any help is much appreciated.




Sub JMS_WordCount()
Application.Windows(ActiveDocument).View = wdNormalView
Application.ScreenUpdating = False
Application.Options.Pagination = False
Application.ActiveDocument.ShowGrammaticalErrors = False
Application.ActiveDocument.ShowSpellingErrors = False

Dim CurrPane As Pane
Set CurrPane = Application.Documents(1).Windows(1).ActivePane

Dim myRange As Range
Dim myWord As Range
Dim i As Long

Dim Unemploy As Long
Dim Underemp As Long
Dim Inflation As Long


'Selects the active document, collapse to the end,
'and puts a final section break at the end
With ActiveDocument.Range
.Collapse wdCollapseEnd
.InsertBreak Type:=wdSectionBreakContinuous
End With

'Identifies total number of sections as Total minus 1
For i = 1 To ActiveDocument.Sections.Count - 1
''''''''''''''''''''''''' UMEMPLOY
Set myRange = ActiveDocument.Sections(i).Range
Unemploy = 0
Selection.HomeKey wdStory 'Story refers to all that's in a
distinct unit
'HomeKey refers to the start of the
specified unit
'Here, that unit is Section(i)
Selection.Find.ClearFormatting
With Selection.Find 'Selection here is Section(i)
Do While .Execute(FindText:="<[Uu]nemploy",
MatchWildcards:=True, _
Wrap:=wdFindStop, Forward:=True) = True
Set myWord = Selection.Range 'Select the word
range when found
If myWord.InRange(myRange) = True Then 'If
selected word is in search range, then
Unemploy = Unemploy + 1 'count it
End If
Loop
End With
'''''''''''''''''''''''' UNDEREMPLOY
Set myRange = ActiveDocument.Sections(i).Range
Underemp = 0
Selection.HomeKey wdStory
Selection.Find.ClearFormatting
With Selection.Find
Do While .Execute(FindText:="<([Uu]nderemploy)",
MatchWildcards:=True, _
Wrap:=wdFindStop, Forward:=True) = True
Set myWord = Selection.Range
If myWord.InRange(myRange) = True Then
Underemp = Underemp + 1
End If
Loop
End With
'''''''''''''''''''''''' INFLATION
Set myRange = ActiveDocument.Sections(i).Range
Inflation = 0
Selection.HomeKey wdStory
Selection.Find.ClearFormatting
With Selection.Find
Do While .Execute(FindText:="([Ii]nflation)",
MatchWildcards:=True, _
Wrap:=wdFindStop, Forward:=True) = True
Set myWord = Selection.Range
If myWord.InRange(myRange) = True Then
Inflation = Inflation + 1
End If
Loop
End With
''''''''''''''''''''''''''
ActiveDocument.Range.InsertAfter Unemploy & vbTab & Underemp &
vbTab & Inflation & vbCr

Next i
End Sub

There is something I do not understand with your code.

You search for the presence of three words, but by iterating sections.
Yet, each word search searches the whole document.

So, if you have 10 sections, you will search the whole document 30 times.
Also, in each section search, you count only the words that are found within
the current section number (i), but you end up searching all the
sections,thus ocunting all the words that re found in the whole document. So
why iterate the sections? Just do three document searches and count how many
times each word is found, then add the new section and write down the results.

No? Am I missing something?
 
J

jmmiller

I've created documents where every sentence is its own section and
want to count the number of times a given set of words/phrases appears
in each section of the given document. The following macro counts the
words/phrases and provides an output at the end of the document (to be
copied and pasted into excel where I will be looking at combinations
of the words and phrases).
I would like to have a larger set of search words/phrases (15-25) and
do this on 200 documents of 100+ pages each. Currently, it takes hours
for the macro to work on a given document and I think this is because
it of how it is looping through the sections, but am not sure.
Is there a better way to be doing this? Any suggestions or ideas? Is
there a better way to do this where it would search between periods
rather than using section breaks to make it faster or another way for
it to search through the document by sentence?
Any help is much appreciated.
Sub JMS_WordCount()
    Application.Windows(ActiveDocument).View = wdNormalView
    Application.ScreenUpdating = False
    Application.Options.Pagination = False
    Application.ActiveDocument.ShowGrammaticalErrors = False
    Application.ActiveDocument.ShowSpellingErrors = False
    Dim CurrPane As Pane
    Set CurrPane = Application.Documents(1).Windows(1).ActivePane
    Dim myRange As Range
    Dim myWord As Range
    Dim i As Long
    Dim Unemploy As Long
    Dim Underemp As Long
    Dim Inflation As Long
'Selects the active document, collapse to the end,
'and puts a final section break at the end
    With ActiveDocument.Range
        .Collapse wdCollapseEnd
        .InsertBreak Type:=wdSectionBreakContinuous
    End With
'Identifies total number of sections as Total minus 1
    For i = 1 To ActiveDocument.Sections.Count - 1
''''''''''''''''''''''''' UMEMPLOY
        Set myRange = ActiveDocument.Sections(i).Range
        Unemploy = 0
        Selection.HomeKey wdStory 'Story refers to all that's in a
distinct unit
                                  'HomeKey refers to the start of the
specified unit
                                  'Here, that unit is Section(i)
        Selection.Find.ClearFormatting
        With Selection.Find 'Selection here is Section(i)
            Do While .Execute(FindText:="<[Uu]nemploy",
MatchWildcards:=True, _
                Wrap:=wdFindStop, Forward:=True) = True
                    Set myWord = Selection.Range 'Select the word
range when found
                    If myWord.InRange(myRange) = True Then 'If
selected word is in search range, then
                        Unemploy = Unemploy +1   'count it
                    End If
            Loop
        End With
'''''''''''''''''''''''' UNDEREMPLOY
        Set myRange = ActiveDocument.Sections(i).Range
        Underemp = 0
        Selection.HomeKey wdStory
        Selection.Find.ClearFormatting
        With Selection.Find
            Do While .Execute(FindText:="<([Uu]nderemploy)",
MatchWildcards:=True, _
                Wrap:=wdFindStop, Forward:=True) = True
                    Set myWord = Selection.Range
                    If myWord.InRange(myRange) = True Then
                        Underemp = Underemp +1
                    End If
            Loop
        End With
'''''''''''''''''''''''' INFLATION
        Set myRange = ActiveDocument.Sections(i).Range
        Inflation = 0
        Selection.HomeKey wdStory
        Selection.Find.ClearFormatting
        With Selection.Find
            Do While .Execute(FindText:="([Ii]nflation)",
MatchWildcards:=True, _
                Wrap:=wdFindStop, Forward:=True) = True
                    Set myWord = Selection.Range
                    If myWord.InRange(myRange) = True Then
                        Inflation = Inflation+ 1
                    End If
            Loop
        End With
''''''''''''''''''''''''''
        ActiveDocument.Range.InsertAfter Unemploy & vbTab & Underemp &
vbTab & Inflation & vbCr
    Next i
End Sub

There is something I do not understand with your code.

You search for the presence of three words, but by iterating sections.
Yet, each word search searches the whole document.

So, if you have 10 sections, you will search the whole document 30 times.
Also, in each section search, you count only the words that are found within
the current section number (i), but you end up searching all the
sections,thus ocunting all the words that re found in the whole document.So
why iterate the sections? Just do three document searches and count how many
times each word is found, then add the new section and write down the results.

No? Am I missing something?


Basically I am trying to find the number of times per sentence that
each of the words appears; I am interested in more than just the
number of times a given word or phase appears in the entire document.
So the output at the end should be (and currently is) broken down by
each word in each section (in my case, by each sentence). If I were to
copy and paste the output into excel and sum each column, each sum
would be the number of times the word or phrase appeared in that
document, but again, I am interested in more than that - specifically,
the number of times different combinations of words and phrases appear
in the same sentence.

So if I have 3 search words and 10 sections, I want to see how many
times each word appears in each section, not just how many times in
the document the searched words appear. (Also note that there are
thousands of sentences (thus, sections) per document, more like 15-25
search words/phrases, and 200+ documents).
 
D

Doug Robbins - Word MVP

Try the following:

Dim i As Long, j As Long
Dim rtable As Table
Dim frange As Range
Dim fword
Dim Score As Long
Dim SectNum As Long
With ActiveDocument
Set frange = .Range
With frange
.Collapse wdCollapseEnd
.InsertBreak wdSectionBreakContinuous
End With
Set rtable = .Tables.Add(frange, .Sections.Count, 4)
With rtable
.Cell(1, 1).Range.Text = "Section"
.Cell(1, 2).Range.Text = "Unemploy"
.Cell(1, 3).Range.Text = "Underemploy"
.Cell(1, 4).Range.Text = "Inflation"
For i = 2 To .Rows.Count
.Cell(i, 1).Range.Text = i - 1
Next i
End With
End With
fword = Split("Unemploy|Underemploy|Inflation", "|")
For j = 0 To UBound(fword)
Selection.HomeKey wdStory
Selection.Find.ClearFormatting
SectNum = 1
Score = 0
With Selection.Find
Do While .Execute(Findtext:=fword(j), Forward:=True, _
MatchWildcards:=False, Wrap:=wdFindStop, MatchCase:=False) =
True
n = Selection.Information(wdActiveEndSectionNumber)
If n <> SectNum Then
rtable.Cell(SectNum + 1, j + 2).Range.Text = Score
SectNum = n
Score = 1
Else
Score = Score + 1
End If
Loop
End With
Next j

--
Hope this helps.

Please reply to the newsgroup unless you wish to avail yourself of my
services on a paid consulting basis.

Doug Robbins - Word MVP

I've created documents where every sentence is its own section and
want to count the number of times a given set of words/phrases appears
in each section of the given document. The following macro counts the
words/phrases and provides an output at the end of the document (to be
copied and pasted into excel where I will be looking at combinations
of the words and phrases).
I would like to have a larger set of search words/phrases (15-25) and
do this on 200 documents of 100+ pages each. Currently, it takes hours
for the macro to work on a given document and I think this is because
it of how it is looping through the sections, but am not sure.
Is there a better way to be doing this? Any suggestions or ideas? Is
there a better way to do this where it would search between periods
rather than using section breaks to make it faster or another way for
it to search through the document by sentence?
Any help is much appreciated.
Sub JMS_WordCount()
Application.Windows(ActiveDocument).View = wdNormalView
Application.ScreenUpdating = False
Application.Options.Pagination = False
Application.ActiveDocument.ShowGrammaticalErrors = False
Application.ActiveDocument.ShowSpellingErrors = False
Dim CurrPane As Pane
Set CurrPane = Application.Documents(1).Windows(1).ActivePane
Dim myRange As Range
Dim myWord As Range
Dim i As Long
Dim Unemploy As Long
Dim Underemp As Long
Dim Inflation As Long
'Selects the active document, collapse to the end,
'and puts a final section break at the end
With ActiveDocument.Range
.Collapse wdCollapseEnd
.InsertBreak Type:=wdSectionBreakContinuous
End With
'Identifies total number of sections as Total minus 1
For i = 1 To ActiveDocument.Sections.Count - 1
''''''''''''''''''''''''' UMEMPLOY
Set myRange = ActiveDocument.Sections(i).Range
Unemploy = 0
Selection.HomeKey wdStory 'Story refers to all that's in a
distinct unit
'HomeKey refers to the start of the
specified unit
'Here, that unit is Section(i)
Selection.Find.ClearFormatting
With Selection.Find 'Selection here is Section(i)
Do While .Execute(FindText:="<[Uu]nemploy",
MatchWildcards:=True, _
Wrap:=wdFindStop, Forward:=True) = True
Set myWord = Selection.Range 'Select the word
range when found
If myWord.InRange(myRange) = True Then 'If
selected word is in search range, then
Unemploy = Unemploy + 1 'count it
End If
Loop
End With
'''''''''''''''''''''''' UNDEREMPLOY
Set myRange = ActiveDocument.Sections(i).Range
Underemp = 0
Selection.HomeKey wdStory
Selection.Find.ClearFormatting
With Selection.Find
Do While .Execute(FindText:="<([Uu]nderemploy)",
MatchWildcards:=True, _
Wrap:=wdFindStop, Forward:=True) = True
Set myWord = Selection.Range
If myWord.InRange(myRange) = True Then
Underemp = Underemp + 1
End If
Loop
End With
'''''''''''''''''''''''' INFLATION
Set myRange = ActiveDocument.Sections(i).Range
Inflation = 0
Selection.HomeKey wdStory
Selection.Find.ClearFormatting
With Selection.Find
Do While .Execute(FindText:="([Ii]nflation)",
MatchWildcards:=True, _
Wrap:=wdFindStop, Forward:=True) = True
Set myWord = Selection.Range
If myWord.InRange(myRange) = True Then
Inflation = Inflation + 1
End If
Loop
End With
''''''''''''''''''''''''''
ActiveDocument.Range.InsertAfter Unemploy & vbTab & Underemp &
vbTab & Inflation & vbCr
Next i
End Sub

There is something I do not understand with your code.

You search for the presence of three words, but by iterating sections.
Yet, each word search searches the whole document.

So, if you have 10 sections, you will search the whole document 30 times.
Also, in each section search, you count only the words that are found
within
the current section number (i), but you end up searching all the
sections,thus ocunting all the words that re found in the whole document.
So
why iterate the sections? Just do three document searches and count how
many
times each word is found, then add the new section and write down the
results.

No? Am I missing something?


Basically I am trying to find the number of times per sentence that
each of the words appears; I am interested in more than just the
number of times a given word or phase appears in the entire document.
So the output at the end should be (and currently is) broken down by
each word in each section (in my case, by each sentence). If I were to
copy and paste the output into excel and sum each column, each sum
would be the number of times the word or phrase appeared in that
document, but again, I am interested in more than that - specifically,
the number of times different combinations of words and phrases appear
in the same sentence.

So if I have 3 search words and 10 sections, I want to see how many
times each word appears in each section, not just how many times in
the document the searched words appear. (Also note that there are
thousands of sentences (thus, sections) per document, more like 15-25
search words/phrases, and 200+ documents).
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top