Word frequency count

R

Raj

Hi,

I have a word document with the following words(sample) :

Word_this_and that KK857 9269875 King king King

I want a macro that shows word frequency count using space as the word
separator. The count of above should be displayed as:

Word_this_and 1
that 1
KK857 1
9269875 1
King 2
king 1

Thanks in advance for the help

Raj
 
K

Klaus Linke

Playing with the scripting dictionary recently, I've written such a macro...

You need to check the "MIcrosoft Scripting Runtime" in "Tools > References".

Regards,
Klaus

Dim sText As String
sText = ActiveDocument.Content.Text
Dim dicWords As New Scripting.Dictionary
Dim vPlaces(0)
Dim vItem As Variant
Dim i As Long
Dim sWord As String, sChar As String
sWord = ""
For i = 1 To Len(sText)
sChar = MID(sText, i, 1)
Select Case sChar
' delimiters:
Case ChrW(9), ChrW(10), ChrW(13), ChrW(32), ChrW(34), ".", ",", _
"<", ">", "/", ";", _
"(", ")", "[", "]", _
":", "#", "+", "*", _
"?", "!"
If Len(sWord) <> 0 Then
If dicWords.Exists(key:=sWord) Then
vItem = dicWords(key:=sWord)
ReDim Preserve vItem(UBound(vItem) + 1)
vItem(UBound(vItem)) = i
dicWords(key:=sWord) = vItem
Else
vPlaces(0) = i
dicWords.Add sWord, vPlaces
End If
sWord = ""
End If
Case Else
' Or, if you want to define what characters can appear in words explicitly
' (and ignore the rest):
' Case "a" To "z", "@", "~", _
' "$", "%", "§", "&", "A" To "Z", _
' "0" To "9", "'", "-", "_"
sWord = sWord & sChar
End Select
Next i
Dim vWord As Variant
For Each vWord In dicWords.Keys
Debug.Print vWord, UBound(dicWords.Item(key:=vWord)) + 1
Next vWord
 
K

Klaus Linke

BTW, the output does not show the real power of using the scripting
dictionary.
I wrote it that way to store the position (of the last character) with the
word.
The plan was to speed up look-up of words in a large file.

This is demonstrated if you replace the output section at the end with

Dim vWord As Variant
For Each vWord In dicWords.Keys
Debug.Print vWord, UBound(dicWords.Item(key:=vWord)) + 1,
For i = LBound(dicWords.Item(key:=vWord)) To
UBound(dicWords.Item(key:=vWord))
Debug.Print dicWords.Item(vWord)(i) - Len(vWord);
Debug.Print "/";
Next i
Debug.Print
Next vWord

The output:

Word_this_and 1 1 /
that 1 15 /
KK857 1 20 /
9269875 1 26 /
King 2 35 / 46 /
king 1 40 /

Klaus
 
G

Greg Maxey

Klaus,

Nice job!

I don't understand this part:
' Or, if you want to define what characters can appear in words
explicitly ' (and ignore the rest):
' Case "a" To "z", "@", "~", _
' "$", "%", "§", "&", "A" To "Z", _
' "0" To "9", "'", "-", "_"

Can you post (or send me e-mail) showing exactly how the code would look
using this option and an example. Thanks.


--
~~~~~~~~~~~~~~~~~~~~~~~~~~~
Greg Maxey - Word MVP

My web site http://gregmaxey.mvps.org
Word MVP web site http://word.mvps.org~~~~~~~~~~~~~~~...ords.Item(key:=vWord)) + 1 Next vWord[/QUOTE]
 
R

Raj

Klaus,

Nice job!

I don't understand this part:
' Or, if you want to define what characters can appear in words
explicitly ' (and ignore the rest):
'      Case "a" To "z", "@", "~", _
'        "$", "%", "§", "&", "A" To "Z", _
'        "0" To "9", "'", "-", "_"

Can you post (or send me e-mail) showing exactly how the code would look
using this option and an example.  Thanks.

--
~~~~~~~~~~~~~~~~~~~~~~~~~~~
Greg Maxey -  Word MVP

My web sitehttp://gregmaxey.mvps.org
Word MVP web sitehttp://word.mvps.org
~~~~~~~~~~~~~~~~~~~~~~~~~~~~



Klaus said:
Playing with the scripting dictionary recently, I've written such a
macro...
You need to check the "MIcrosoft Scripting Runtime" in "Tools >
References".
Regards,
Klaus
 Dim sText As String
 sText = ActiveDocument.Content.Text
 Dim dicWords As New Scripting.Dictionary
 Dim vPlaces(0)
 Dim vItem As Variant
 Dim i As Long
 Dim sWord As String, sChar As String
 sWord = ""
 For i = 1 To Len(sText)
   sChar = MID(sText, i, 1)
   Select Case sChar
     ' delimiters:
     Case ChrW(9), ChrW(10), ChrW(13), ChrW(32), ChrW(34), ".", ",", _
       "<", ">", "/", ";", _
       "(", ")", "[", "]", _
       ":", "#", "+", "*", _
       "?", "!"
       If Len(sWord) <> 0 Then
          If dicWords.Exists(key:=sWord) Then
           vItem = dicWords(key:=sWord)
           ReDim Preserve vItem(UBound(vItem) + 1)
           vItem(UBound(vItem)) = i
           dicWords(key:=sWord) = vItem
          Else
           vPlaces(0) = i
           dicWords.Add sWord, vPlaces
          End If
          sWord = ""
       End If
    Case Else
' Or, if you want to define what characters can appear in words
explicitly ' (and ignore the rest):
'      Case "a" To "z", "@", "~", _
'        "$", "%", "§", "&", "A" To "Z", _
'        "0" To "9", "'", "-", "_"
       sWord = sWord & sChar
   End Select
 Next i
 Dim vWord As Variant
 For Each vWord In dicWords.Keys
   Debug.Print vWord, UBound(dicWords.Item(key:=vWord)) + 1
 Next vWord- Hide quoted text -

- Show quoted text -

Hi Klaus,

It worked. Thanks Klaus.

Regards,
Raj
 
K

Klaus Linke

Greg Maxey said:
Nice job!

Oy, thanks!!
I don't understand this part:

You would replace the "Case Else" condition with this condition.
Only the characters listed would make it into "words".
All other characters would simply be removed (... nothing is done if they
are encountered).

Say if I take the original example sentence:

Word_this_and that KK857 9269875 King king King

and use

[...]
Case "a" To "z", "A" To "Z"
sWord = sWord & sChar
End Select

(notice that the numbers 0 to 9, and the underscore "_", neither appear
under the word delimiters now, nor under the word characters), I'd get

Wordthisand 1 3 /
that 1 15 /
KK 1 23 /
King 2 35 / 46 /
king 1 41 /

Regards,
Klaus
 
K

Klaus Linke

Wordthisand 1 **3** /

.... shows I haven't debugged this: If I ignore characters, I can't tell the
start position reliably any more.

Klaus
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top