Count numbers in a Word document

  • Thread starter mkboynton via OfficeKB.com
  • Start date
M

mkboynton via OfficeKB.com

I need to count number in a word .doc file. This is an example of a page...

3/23/06 S H I P P I N G S U M M A R Y W O R K S H E E T
WM0490 PAGE 1
RUN DATE 3/23/06
0 ------ T O T A L --
----
STORE SEQ CITY ST--ZIP-- PHONE CUBE WEIGHT
CASES SCHED DEL/DAY SHIP NO NAME EXCEPTIONS
0 6032_____010 FLORIDA CITY FL 33034 305-248-7341 1880 28697
1605 500 MON 2892267 ________
5114_____020 MIAMI FL 33138 305-751-1550 737 10802
621 500 MON 2892245 ________ DELIVER 6:30AM
------ ------ --
----
TOTAL SCHEDULE: 2618 39500
2226 49
Monday SWIFT 3/1109/529
*5114 6:30AM DELIVERY
176019









- DATE 3/23/06 S H I P P I N G S U M M A R Y W O R K S H E E
T WM0490 PAGE 2
RUN DATE 3/23/06
0 ______ T O T A L
______
STORE SEQ CITY ST--ZIP-- PHONE CUBE WEIGHT
CASES SCHED DEL/DAY SHIP NO NAME EXCEPTIONS
0 5909_____010 MIAMI FL 33147 786-318-2560 2608 40448
2029 501 MON 2892263 ________
------ ------ --
----
TOTAL SCHEDULE: 2608 40448
2029 49
Monday SWIFT 2/1056/528
176020

This has word wrapped by the way...

The number I need to count are the ones that appear under the SCHED header.
I am using the following macros which does a good job, but it also counts the
same numbers if the appear elsewhere in the document.

Sub WordFrequency()

Dim SingleWord As String 'Raw word pulled from doc
Const maxwords = 9000 'Maximum unique words allowed
Dim Words(maxwords) As String 'Array to hold unique words
Dim Freq(maxwords) As Integer 'Frequency counter for unique
Words
Dim WordNum As Integer 'Number of unique words
Dim ByFreq As Boolean 'Flag for sorting order
Dim ttlwds As Long 'Total words in the document
Dim Excludes As String 'Words to be excluded
Dim Found As Boolean 'Temporary flag
Dim j, k, l, Temp As Integer 'Temporary variables
Dim tword As String '
Dim pword As String 'Previous word
Dim cword As String 'Current word
Dim newword As String 'Combined words
Dim StartNum
Dim EndNum

' Set up excluded words
Excludes = "[the][a][of][is][to][for][this][that][by][be][and][are]
[am][pm]"

' Find out how to sort
ByFreq = False
'Ans = InputBox$("Sort by WORD or by FREQ?", "Sort order", "WORD")
'If Ans = "" Then End
'If UCase(Ans) = "FREQ" Then
' ByFreq = True
'End If
StartNum = InputBox$("Starting Route Number?", "Start Number")
EndNum = InputBox$("Ending Route Number?", "Ending Number")

Selection.HomeKey Unit:=wdStory
System.Cursor = wdCursorWait
WordNum = 0
ttlwds = ActiveDocument.Words.Count

' Control the repeat
For Each aword In ActiveDocument.Words
SingleWord = Trim(LCase(aword))
cword = Trim(LCase(aword))
newword = pword & cword

'If Mid(newword, 4, 1) = "-" Then
'MsgBox "Combined Word is " & newword
'End If

If Mid(newword, 4, 1) = "-" Then
SingleWord = ""
End If

'If SingleWord < "0" Or SingleWord > "9" Then SingleWord = ""
'Out of range?
If SingleWord < StartNum Or SingleWord > EndNum Then SingleWord =
""
If Len(SingleWord) <> 3 Then SingleWord = ""
If InStr(2, "[" & SingleWord & "]", "a") Then SingleWord = ""
If InStr(2, "[" & SingleWord & "]", "p") Then SingleWord = ""
If InStr(2, "[" & SingleWord & "]", "s") Then SingleWord = ""
If InStr(2, "[" & SingleWord & "]", "r") Then SingleWord = ""
If InStr(2, "[" & SingleWord & "]", "n") Then SingleWord = ""
If InStr(Excludes, "[" & SingleWord & "]") Then SingleWord = ""
'On exclude list?
If Len(SingleWord) > 0 Then
Found = False
For j = 1 To WordNum
If Words(j) = SingleWord Then
Freq(j) = Freq(j) + 1
Found = True

Exit For
End If
Next j
If Not Found Then
WordNum = WordNum + 1
Words(WordNum) = SingleWord
Freq(WordNum) = 1
End If
If WordNum > maxwords - 1 Then
j = MsgBox("The maximum array size has been exceeded.
Increase maxwords.", vbOKOnly)
Exit For
End If
End If
ttlwds = ttlwds - 1
StatusBar = "Remaining: " & ttlwds & " Unique: " & WordNum
pword = cword
Next aword

' Now sort it into word order
For j = 1 To WordNum - 1
k = j
For l = j + 1 To WordNum
If (Not ByFreq And Words(l) < Words(k)) Or (ByFreq And Freq(l)
Freq(k)) Then k = l
Next l
If k <> j Then
tword = Words(j)
Words(j) = Words(k)
Words(k) = tword
Temp = Freq(j)
Freq(j) = Freq(k)
Freq(k) = Temp
End If
StatusBar = "Sorting: " & WordNum - j
Next j

' Now write out the results
tmpName = ActiveDocument.AttachedTemplate.FullName
Documents.Add Template:=tmpName, NewTemplate:=False
Selection.ParagraphFormat.TabStops.ClearAll
With Selection
For j = 1 To WordNum
.TypeText Text:=Words(j) & vbTab & Trim(Str(Freq(j))) &
vbCrLf
Next j
End With

ActiveDocument.Range.Select
Selection.ConvertToTable , , , 125
Selection.Collapse wdCollapseStart
ActiveDocument.Tables(1).Rows.Add BeforeRow:=Selection.Rows(1)
ActiveDocument.Tables(1).Cell(1, 1).Range.InsertBefore "Route #"
ActiveDocument.Tables(1).Cell(1, 2).Range.InsertBefore "Number of
Stores"
ActiveDocument.Tables(1).Range.ParagraphFormat.Alignment =
wdAlignParagraphCenter

System.Cursor = wdCursorNormal
'j = MsgBox("There were " & Trim(Str(WordNum)) & " routes ", vbOKOnly,
"Finished")
Selection.HomeKey wdStory
'ActiveDocument.SaveAs FileName:=StartNum & "'s count"
End Sub

Is there a way to have it only count the numbers that appear under the word
SCHED in the document or count only the numbers that have a space before and
after them?
 
J

Jezebel

You'll need to define 'number' -- is "305-248-7341" one number or three? Is
"6032_____010" one number or two?





mkboynton via OfficeKB.com said:
I need to count number in a word .doc file. This is an example of a
page...

3/23/06 S H I P P I N G S U M M A R Y W O R K S H E E T
WM0490 PAGE 1
RUN DATE 3/23/06
0 ------ T O T A
L --
----
STORE SEQ CITY ST--ZIP-- PHONE CUBE WEIGHT
CASES SCHED DEL/DAY SHIP NO NAME EXCEPTIONS
0 6032_____010 FLORIDA CITY FL 33034 305-248-7341 1880 28697
1605 500 MON 2892267 ________
5114_____020 MIAMI FL 33138 305-751-1550 737 10802
621 500 MON 2892245 ________ DELIVER 6:30AM
------ ------
--
----
TOTAL SCHEDULE: 2618 39500
2226 49
Monday SWIFT 3/1109/529
*5114 6:30AM DELIVERY
176019









- DATE 3/23/06 S H I P P I N G S U M M A R Y W O R K S H
E E
T WM0490 PAGE 2
RUN DATE 3/23/06
0 ______ T O T A
L
______
STORE SEQ CITY ST--ZIP-- PHONE CUBE WEIGHT
CASES SCHED DEL/DAY SHIP NO NAME EXCEPTIONS
0 5909_____010 MIAMI FL 33147 786-318-2560 2608 40448
2029 501 MON 2892263 ________
------ ------
--
----
TOTAL SCHEDULE: 2608 40448
2029 49
Monday SWIFT 2/1056/528
176020

This has word wrapped by the way...

The number I need to count are the ones that appear under the SCHED
header.
I am using the following macros which does a good job, but it also counts
the
same numbers if the appear elsewhere in the document.

Sub WordFrequency()

Dim SingleWord As String 'Raw word pulled from doc
Const maxwords = 9000 'Maximum unique words allowed
Dim Words(maxwords) As String 'Array to hold unique words
Dim Freq(maxwords) As Integer 'Frequency counter for unique
Words
Dim WordNum As Integer 'Number of unique words
Dim ByFreq As Boolean 'Flag for sorting order
Dim ttlwds As Long 'Total words in the document
Dim Excludes As String 'Words to be excluded
Dim Found As Boolean 'Temporary flag
Dim j, k, l, Temp As Integer 'Temporary variables
Dim tword As String '
Dim pword As String 'Previous word
Dim cword As String 'Current word
Dim newword As String 'Combined words
Dim StartNum
Dim EndNum

' Set up excluded words
Excludes = "[the][a][of][is][to][for][this][that][by][be][and][are]
[am][pm]"

' Find out how to sort
ByFreq = False
'Ans = InputBox$("Sort by WORD or by FREQ?", "Sort order", "WORD")
'If Ans = "" Then End
'If UCase(Ans) = "FREQ" Then
' ByFreq = True
'End If
StartNum = InputBox$("Starting Route Number?", "Start Number")
EndNum = InputBox$("Ending Route Number?", "Ending Number")

Selection.HomeKey Unit:=wdStory
System.Cursor = wdCursorWait
WordNum = 0
ttlwds = ActiveDocument.Words.Count

' Control the repeat
For Each aword In ActiveDocument.Words
SingleWord = Trim(LCase(aword))
cword = Trim(LCase(aword))
newword = pword & cword

'If Mid(newword, 4, 1) = "-" Then
'MsgBox "Combined Word is " & newword
'End If

If Mid(newword, 4, 1) = "-" Then
SingleWord = ""
End If

'If SingleWord < "0" Or SingleWord > "9" Then SingleWord = ""
'Out of range?
If SingleWord < StartNum Or SingleWord > EndNum Then SingleWord
=
""
If Len(SingleWord) <> 3 Then SingleWord = ""
If InStr(2, "[" & SingleWord & "]", "a") Then SingleWord = ""
If InStr(2, "[" & SingleWord & "]", "p") Then SingleWord = ""
If InStr(2, "[" & SingleWord & "]", "s") Then SingleWord = ""
If InStr(2, "[" & SingleWord & "]", "r") Then SingleWord = ""
If InStr(2, "[" & SingleWord & "]", "n") Then SingleWord = ""
If InStr(Excludes, "[" & SingleWord & "]") Then SingleWord = ""
'On exclude list?
If Len(SingleWord) > 0 Then
Found = False
For j = 1 To WordNum
If Words(j) = SingleWord Then
Freq(j) = Freq(j) + 1
Found = True

Exit For
End If
Next j
If Not Found Then
WordNum = WordNum + 1
Words(WordNum) = SingleWord
Freq(WordNum) = 1
End If
If WordNum > maxwords - 1 Then
j = MsgBox("The maximum array size has been exceeded.
Increase maxwords.", vbOKOnly)
Exit For
End If
End If
ttlwds = ttlwds - 1
StatusBar = "Remaining: " & ttlwds & " Unique: " & WordNum
pword = cword
Next aword

' Now sort it into word order
For j = 1 To WordNum - 1
k = j
For l = j + 1 To WordNum
If (Not ByFreq And Words(l) < Words(k)) Or (ByFreq And
Freq(l)
Freq(k)) Then k = l
Next l
If k <> j Then
tword = Words(j)
Words(j) = Words(k)
Words(k) = tword
Temp = Freq(j)
Freq(j) = Freq(k)
Freq(k) = Temp
End If
StatusBar = "Sorting: " & WordNum - j
Next j

' Now write out the results
tmpName = ActiveDocument.AttachedTemplate.FullName
Documents.Add Template:=tmpName, NewTemplate:=False
Selection.ParagraphFormat.TabStops.ClearAll
With Selection
For j = 1 To WordNum
.TypeText Text:=Words(j) & vbTab & Trim(Str(Freq(j))) &
vbCrLf
Next j
End With

ActiveDocument.Range.Select
Selection.ConvertToTable , , , 125
Selection.Collapse wdCollapseStart
ActiveDocument.Tables(1).Rows.Add BeforeRow:=Selection.Rows(1)
ActiveDocument.Tables(1).Cell(1, 1).Range.InsertBefore "Route #"
ActiveDocument.Tables(1).Cell(1, 2).Range.InsertBefore "Number of
Stores"
ActiveDocument.Tables(1).Range.ParagraphFormat.Alignment =
wdAlignParagraphCenter

System.Cursor = wdCursorNormal
'j = MsgBox("There were " & Trim(Str(WordNum)) & " routes ",
vbOKOnly,
"Finished")
Selection.HomeKey wdStory
'ActiveDocument.SaveAs FileName:=StartNum & "'s count"
End Sub

Is there a way to have it only count the numbers that appear under the
word
SCHED in the document or count only the numbers that have a space before
and
after them?
 
M

mkboynton via OfficeKB.com

Jezebel said:
You'll need to define 'number' -- is "305-248-7341" one number or three? Is
"6032_____010" one number or two?
I need to count number in a word .doc file. This is an example of a
page...
[quoted text clipped - 192 lines]
and
after them?

The code I have now treats them as one number, which is the correct way for
them to be treated
 
M

mkboynton via OfficeKB.com

mkboynton said:
You'll need to define 'number' -- is "305-248-7341" one number or three? Is
"6032_____010" one number or two?
[quoted text clipped - 4 lines]
The code I have now treats them as one number, which is the correct way for
them to be treated
Disregard my previous answer, I misunderstood the question. The phone number
would be one number, the second example would be two numbers.
 
T

Tony Jollans

Counting all the numbers in a document is fairly straightforward - far less
effort than all the code you have posted. But is that really what you want
to do - or even close?

You have a report - from an IBM mainframe from the look of it - with data
which line up when in a fixed width font but I can't tell exactly what the
format is - can you explain in simple English what you are trying to do?

--
Enjoy,
Tony

mkboynton via OfficeKB.com said:
I need to count number in a word .doc file. This is an example of a page...

3/23/06 S H I P P I N G S U M M A R Y W O R K S H E E T
WM0490 PAGE 1
RUN DATE 3/23/06
0 ------ T O T A L --
----
STORE SEQ CITY ST--ZIP-- PHONE CUBE WEIGHT
CASES SCHED DEL/DAY SHIP NO NAME EXCEPTIONS
0 6032_____010 FLORIDA CITY FL 33034 305-248-7341 1880 28697
1605 500 MON 2892267 ________
5114_____020 MIAMI FL 33138 305-751-1550 737 10802
621 500 MON 2892245 ________ DELIVER 6:30AM
------ ------ --
----
TOTAL SCHEDULE: 2618 39500
2226 49
Monday SWIFT 3/1109/529
*5114 6:30AM DELIVERY
176019









- DATE 3/23/06 S H I P P I N G S U M M A R Y W O R K S H E E
T WM0490 PAGE 2
RUN DATE 3/23/06
0 ______ T O T A L
______
STORE SEQ CITY ST--ZIP-- PHONE CUBE WEIGHT
CASES SCHED DEL/DAY SHIP NO NAME EXCEPTIONS
0 5909_____010 MIAMI FL 33147 786-318-2560 2608 40448
2029 501 MON 2892263 ________
------ ------ --
----
TOTAL SCHEDULE: 2608 40448
2029 49
Monday SWIFT 2/1056/528
176020

This has word wrapped by the way...

The number I need to count are the ones that appear under the SCHED header.
I am using the following macros which does a good job, but it also counts the
same numbers if the appear elsewhere in the document.

Sub WordFrequency()

Dim SingleWord As String 'Raw word pulled from doc
Const maxwords = 9000 'Maximum unique words allowed
Dim Words(maxwords) As String 'Array to hold unique words
Dim Freq(maxwords) As Integer 'Frequency counter for unique
Words
Dim WordNum As Integer 'Number of unique words
Dim ByFreq As Boolean 'Flag for sorting order
Dim ttlwds As Long 'Total words in the document
Dim Excludes As String 'Words to be excluded
Dim Found As Boolean 'Temporary flag
Dim j, k, l, Temp As Integer 'Temporary variables
Dim tword As String '
Dim pword As String 'Previous word
Dim cword As String 'Current word
Dim newword As String 'Combined words
Dim StartNum
Dim EndNum

' Set up excluded words
Excludes = "[the][a][of][is][to][for][this][that][by][be][and][are]
[am][pm]"

' Find out how to sort
ByFreq = False
'Ans = InputBox$("Sort by WORD or by FREQ?", "Sort order", "WORD")
'If Ans = "" Then End
'If UCase(Ans) = "FREQ" Then
' ByFreq = True
'End If
StartNum = InputBox$("Starting Route Number?", "Start Number")
EndNum = InputBox$("Ending Route Number?", "Ending Number")

Selection.HomeKey Unit:=wdStory
System.Cursor = wdCursorWait
WordNum = 0
ttlwds = ActiveDocument.Words.Count

' Control the repeat
For Each aword In ActiveDocument.Words
SingleWord = Trim(LCase(aword))
cword = Trim(LCase(aword))
newword = pword & cword

'If Mid(newword, 4, 1) = "-" Then
'MsgBox "Combined Word is " & newword
'End If

If Mid(newword, 4, 1) = "-" Then
SingleWord = ""
End If

'If SingleWord < "0" Or SingleWord > "9" Then SingleWord = ""
'Out of range?
If SingleWord < StartNum Or SingleWord > EndNum Then SingleWord =
""
If Len(SingleWord) <> 3 Then SingleWord = ""
If InStr(2, "[" & SingleWord & "]", "a") Then SingleWord = ""
If InStr(2, "[" & SingleWord & "]", "p") Then SingleWord = ""
If InStr(2, "[" & SingleWord & "]", "s") Then SingleWord = ""
If InStr(2, "[" & SingleWord & "]", "r") Then SingleWord = ""
If InStr(2, "[" & SingleWord & "]", "n") Then SingleWord = ""
If InStr(Excludes, "[" & SingleWord & "]") Then SingleWord = ""
'On exclude list?
If Len(SingleWord) > 0 Then
Found = False
For j = 1 To WordNum
If Words(j) = SingleWord Then
Freq(j) = Freq(j) + 1
Found = True

Exit For
End If
Next j
If Not Found Then
WordNum = WordNum + 1
Words(WordNum) = SingleWord
Freq(WordNum) = 1
End If
If WordNum > maxwords - 1 Then
j = MsgBox("The maximum array size has been exceeded.
Increase maxwords.", vbOKOnly)
Exit For
End If
End If
ttlwds = ttlwds - 1
StatusBar = "Remaining: " & ttlwds & " Unique: " & WordNum
pword = cword
Next aword

' Now sort it into word order
For j = 1 To WordNum - 1
k = j
For l = j + 1 To WordNum
If (Not ByFreq And Words(l) < Words(k)) Or (ByFreq And Freq(l)
Freq(k)) Then k = l
Next l
If k <> j Then
tword = Words(j)
Words(j) = Words(k)
Words(k) = tword
Temp = Freq(j)
Freq(j) = Freq(k)
Freq(k) = Temp
End If
StatusBar = "Sorting: " & WordNum - j
Next j

' Now write out the results
tmpName = ActiveDocument.AttachedTemplate.FullName
Documents.Add Template:=tmpName, NewTemplate:=False
Selection.ParagraphFormat.TabStops.ClearAll
With Selection
For j = 1 To WordNum
.TypeText Text:=Words(j) & vbTab & Trim(Str(Freq(j))) &
vbCrLf
Next j
End With

ActiveDocument.Range.Select
Selection.ConvertToTable , , , 125
Selection.Collapse wdCollapseStart
ActiveDocument.Tables(1).Rows.Add BeforeRow:=Selection.Rows(1)
ActiveDocument.Tables(1).Cell(1, 1).Range.InsertBefore "Route #"
ActiveDocument.Tables(1).Cell(1, 2).Range.InsertBefore "Number of
Stores"
ActiveDocument.Tables(1).Range.ParagraphFormat.Alignment =
wdAlignParagraphCenter

System.Cursor = wdCursorNormal
'j = MsgBox("There were " & Trim(Str(WordNum)) & " routes ", vbOKOnly,
"Finished")
Selection.HomeKey wdStory
'ActiveDocument.SaveAs FileName:=StartNum & "'s count"
End Sub

Is there a way to have it only count the numbers that appear under the word
SCHED in the document or count only the numbers that have a space before and
after them?
 
M

mkboynton via OfficeKB.com

Tony said:
Counting all the numbers in a document is fairly straightforward - far less
effort than all the code you have posted. But is that really what you want
to do - or even close?

You have a report - from an IBM mainframe from the look of it - with data
which line up when in a fixed width font but I can't tell exactly what the
format is - can you explain in simple English what you are trying to do?

I am trying to count the number of stores that are delivering on each route
for a report we have to generate. This has to be done for every days
delivery schedule, and we service over 600 stores. So we are looking for a
faster way to do it than by hand. I have no experience with vba in Word, so
most of the object names are unfamiliar to me.
 
T

Tony Jollans

Yes but which numbers do you want to find and count?

In your sample you have two pages - the first page has two detail lines each
with a number (500) under SCHED, and the second has one with a number (501)
under SCHED - is counting these numbers not the same as counting the number
of detail lines?

Wouldn't this be more easily done on the mainframe?
 
J

Jezebel

If it's all fixed width output, read it into Excel rather than Word -- then
you can break the content into columns and get your results using the
CountIf() function.
 
M

mkboynton via OfficeKB.com

Tony said:
Yes but which numbers do you want to find and count?

In your sample you have two pages - the first page has two detail lines each
with a number (500) under SCHED, and the second has one with a number (501)
under SCHED - is counting these numbers not the same as counting the number
of detail lines?

Wouldn't this be more easily done on the mainframe?

--
Enjoy,
Tony
[quoted text clipped - 13 lines]
faster way to do it than by hand. I have no experience with vba in Word, so
most of the object names are unfamiliar to me.

Yes it is the same, but I don't know how to do that either. I do not have
access to anything but the Word document that is sent to me.
 
M

mkboynton via OfficeKB.com

Jezebel said:
If it's all fixed width output, read it into Excel rather than Word -- then
you can break the content into columns and get your results using the
CountIf() function.
[quoted text clipped - 17 lines]
so
most of the object names are unfamiliar to me.

Do you mean open the file in Excel or bring it in via VBA?
 
T

Tony Jollans

What character ends each line? Is it a manul line break as posted, or a
paragraph mark? If it's line breaks, I would convert them all to paragraph
marks.

You will then be able to process the document using

For Each Para in ActiveDocuments.Paragraphs
' Check the line (Para.Range.Text) here
' You are looking for something which marks the line as a detail
line
' Looking at your sample ....
' perhaps you can start counting after SCHED at position 82
' and stop when line length less than 82 (or a space in column
86)
' You will need to confirm exactly what to check
Next

Let us know if you need a bit more help but please post precise details -
because I don't know exactly what posting does to formatting of samples.

--
Enjoy,
Tony

mkboynton via OfficeKB.com said:
Tony said:
Yes but which numbers do you want to find and count?

In your sample you have two pages - the first page has two detail lines each
with a number (500) under SCHED, and the second has one with a number (501)
under SCHED - is counting these numbers not the same as counting the number
of detail lines?

Wouldn't this be more easily done on the mainframe?

--
Enjoy,
Tony
Counting all the numbers in a document is fairly straightforward - far less
effort than all the code you have posted. But is that really what you
want
[quoted text clipped - 13 lines]
faster way to do it than by hand. I have no experience with vba in Word, so
most of the object names are unfamiliar to me.

Yes it is the same, but I don't know how to do that either. I do not have
access to anything but the Word document that is sent to me.
 
J

Jezebel

Opening it directly is usually simplest.



mkboynton via OfficeKB.com said:
Jezebel said:
If it's all fixed width output, read it into Excel rather than Word --
then
you can break the content into columns and get your results using the
CountIf() function.
Counting all the numbers in a document is fairly straightforward - far
less
[quoted text clipped - 17 lines]
so
most of the object names are unfamiliar to me.

Do you mean open the file in Excel or bring it in via VBA?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top