Is there an efficient way to retrieve each range in a ProofReadingErrors collection?

H

Howard Kaikow

I find it hard to believe that there is no efficient way to retrieve the
ranges from a proofreadingerrors collection.

Using, For Each, For Next, GoTo or GoToNext is very slow.

My main test document is a 168 page document that has 267 spelling errors.
This is a real document. Only 1 of the spelling errors is really an error,
all the others just are not in the dictionary.

On my very slow system, it takes less than 10 seconds for Word to construct
the collection so I can retrieve .Count.

However, it then takes about 3 minutes to retrieve the ranges.

Since the object returned from the ProofReadingErrors collection is a range,
there sure must be a rather complex structure in the collection to cause
Word to take so much time to return the range.

It appears that each reference to an item in the collection causes Word to
rebuild the collection.
The time for both of the following loops is just about identical.

' yourErrors is a copy of SpellingErrors
QueryPerformanceCounter curQPStart
ReDim rngSpellingError(lngSpellingErrorCount - 1)
For i = 0 To lngSpellingErrorCount - 1
Set rngSpellingError(i) = yourErrors(i + 1)
Next i
QueryPerformanceCounter curQPEnd
lngTime = (curQPEnd - curQPStart) / dblQPFreq * 1000
Debug.Print "Copying ranges from collection: " & lngTime

QueryPerformanceCounter curQPStart
ReDim rngSpellingError(lngSpellingErrorCount - 1)
With ActiveDocument.Content
For i = 0 To lngSpellingErrorCount - 1
Set rngSpellingError(i) = .SpellingErrors(i + 1)
Next i
End With
QueryPerformanceCounter curQPEnd
lngTime = (curQPEnd - curQPStart) / dblQPFreq * 1000
Debug.Print "Using ActiveDocument.Content, Copying ranges from
collection: " & lngTime
 
C

Charles Kenyon

Hi Howard,

I know that Greg Maxey was working on code for this, but I doubt that he was
checking times. His code is at
http://gregmaxey.mvps.org/List_Spelling_Errors.htm.

--
Charles Kenyon

Word New User FAQ & Web Directory: http://addbalance.com/word

Intermediate User's Guide to Microsoft Word (supplemented version of
Microsoft's Legal Users' Guide) http://addbalance.com/usersguide

See also the MVP FAQ: http://www.mvps.org/word which is awesome!
--------- --------- --------- --------- --------- ---------
This message is posted to a newsgroup. Please post replies
and questions to the newsgroup so that others can learn
from my ignorance and your wisdom.
 
G

Greg

Howard,

I have read that sometimes you can speed things up by not using For
each. I can't really tell what you are trying to do, but here is a
snip of code that works through the whole range and IF the word
encountered (all words starting with the first error) is an error then
it prints the word. This way a collection is not rebuilt:

Public Sub Test()
Dim StartTime As Single
Dim spError As Range
Dim chkWord

StartTime = Timer
With ActiveDocument.Content
Set spError = .SpellingErrors(1)
Do
Set chkWord = spError.SpellingErrors
If chkWord.Count = 1 Then Debug.Print spError
spError.Collapse wdCollapseEnd
spError.MoveEnd wdWord, 1
Loop Until spError.End = .End
End With
Set spError = Nothing
MsgBox "Time taken was: " & (Timer - StartTime) & " seconds"
End Sub
 
J

Jezebel

The issue is that when you have a very large document, some of these
collections (like SpellingErrors) that get managed on the fly bog down
horribly. There's a discussion paper somewhere that explains the underlying
Word code, and in particular the threshholds for optimum collection size.
The code designers made assumptions (not unreasonably) about the sorts of
documents that most users would be working with most of the time.

A strategy for working around it is to process your documents in chunks. If
it's already divided into sections, maybe you can use those (that is,
iterate the sections, and iterate the spelling errors in each section). Or
process the document in chunks of maybe 1000 words at a time ... you'd need
to experiment to find the best step value, but perhaps something along these
lines --

pMax = ActiveDocument.Words.Count (even that is slow to retrieve
in a huge document)
pStart = 1
pCount = 0
Do
pCount = pCount + 1000
if pCount > pMax then
pCount = pMax
end if
pEnd = ActiveDocument.Words(pCount).End

For each pErrorRange in ActiveDocument.Range(pStart,
pEnd).SpellingErrors
....
Next

pStart = pEnd + 1
Loop until pStart > activedocument.Range.End
 
H

Howard Kaikow

Greg said:
Howard,

I have read that sometimes you can speed things up by not using For
each. I can't really tell what you are trying to do, but here is a
snip of code that works through the whole range and IF the word
encountered (all words starting with the first error) is an error then
it prints the word. This way a collection is not rebuilt:

Public Sub Test()
Dim StartTime As Single
Dim spError As Range
Dim chkWord

StartTime = Timer
With ActiveDocument.Content
Set spError = .SpellingErrors(1)
Do
Set chkWord = spError.SpellingErrors
If chkWord.Count = 1 Then Debug.Print spError
spError.Collapse wdCollapseEnd
spError.MoveEnd wdWord, 1
Loop Until spError.End = .End
End With
Set spError = Nothing
MsgBox "Time taken was: " & (Timer - StartTime) & " seconds"
End Sub

The following is looong!

On 6 April 2005, after much pain, I posted something like the following
problem in another forum:

"I found that GoToNext fails when there is punctuation (e.g., comma, period,
or slash) immediately after the spelling error.

In that case, one of two things happens:

1. If the spelling error is the first word in the document, that spelling
error is found, but no others are found.
2. If the spelling error is not the first word in the document, then all
spelling errors up to, but not including that error, are found, and no
errors after that spelling error are found.

The same problem occurs if I use GoToNext with the Range object, instead of
the Selection object.

The problem does not occur using the ProofReadingErrors collection, only
when navigating with GoToNext or GoTo using (what:=wdGoToSpellingError)."

I then found a solution.

One has to adjust the Range/Selection if the character IMMEDIATELY after a
spelling error is not "whitespace".

However, since that trick is not documented, I would not use it in
production code except in unusual circumstances.

I also noted that depending on the number of spelling errors, the
constituents of the document and the size of the document, one cannot know
in advance whether to use For Each, For Next GoTo or GoToNext to iterate the
ProofReadingErrors collection for spelling errors.

I then noticed that iterating using For Each or For Next seems to cause the
collection to be recalculated each time an item is referenced.

It seems to me that the ProofReadingErrors collection had an ill conceived
implementation. For most any collection, we should be able to access a
collection without causing the collection to be recalculated.

Each item in the ProofReadingErrors collection is returneed as a range, so
it is disappointing to find how inefficient is the process of
saving/accessing those ranges.

Iterating thru a doc a word at a time is way too slow. And it is not clear
whether the collection is recalculated anyway for the document, even when
moving a word at a time.

I used the following code, where IsWhiteSpace is a, yet to be completed,
function that detects whitespace.


Public Sub UseSpellingErrors()
Dim i As Long
Dim lngCount As Long
Dim rng As Range
Dim strError As String
Dim chkWord As Word.ProofreadingErrors

QueryPerformanceFrequency curQPFreq
dblQPFreq = CDbl(curQPFreq)

QueryPerformanceCounter curQPStart
i = 0
With ActiveDocument.Content
lngCount = .SpellingErrors.Count
Set rng = .SpellingErrors(1)
Do
Set chkWord = rng.SpellingErrors
If chkWord.Count = 1 Then
Set rng = chkWord(1)
With rng
strError = .Text
i = i + 1
Debug.Print i, strError
If Not IsWhiteSpace(ActiveDocument.Range(Start:=.End,
End:=.End + 1)) Then
.MoveEnd unit:=wdCharacter, Count:=1
End If
.Collapse direction:=wdCollapseEnd
End With
Else
rng.MoveStart wdWord, 1
End If
rng.MoveEnd wdWord, 1
Loop Until rng.End = .End
End With
QueryPerformanceCounter curQPEnd
lngTime = (curQPEnd - curQPStart) / dblQPFreq * 1000
Debug.Print "UseSpellingErrors: " & lngTime, IIf(i = lngCount, "Passed",
Format(i) & " " & Format(lngCount))
Set rng = Nothing
Set chkWord = Nothing
ActiveDocument.Saved = True
End Sub
 
J

Jezebel

I then noticed that iterating using For Each or For Next seems to cause
the
collection to be recalculated each time an item is referenced.

It seems to me that the ProofReadingErrors collection had an ill conceived
implementation. For most any collection, we should be able to access a
collection without causing the collection to be recalculated.

Iterating a collection, in general, does not cause the collection to be
recalculated. But ProofReadingErrors (amongst a number of other Word
objects) is not a simple collection -- it is a collection *object* in its
own right. I think the issue that pre-occupied the designers was that when
you iterate this collection in the normal way (ie through the normal Word
interface) you have options that necessarily redefine the collection -- eg
'Ignore all' or 'Change all'. Not surprisingly, the programmers have built
the object to meet the needs of ordinary users rather than VBAers -- but I
agree it's a pity. It wouldn't have been hard to add a property to the
object to give direct code access to the raw underlying collection.
 
H

Howard Kaikow

Jezebel said:
Iterating a collection, in general, does not cause the collection to be
recalculated. But ProofReadingErrors (amongst a number of other Word
objects) is not a simple collection -- it is a collection *object* in its
own right. I think the issue that pre-occupied the designers was that when
you iterate this collection in the normal way (ie through the normal Word
interface) you have options that necessarily redefine the collection -- eg
'Ignore all' or 'Change all'. Not surprisingly, the programmers have built
the object to meet the needs of ordinary users rather than VBAers -- but I
agree it's a pity. It wouldn't have been hard to add a property to the
object to give direct code access to the raw underlying collection.

Yes, ProofReadingErrors is a collection of objects, but the documented
return item is a mere range, so a program should be able to retrieve each
range without the collection being recalculated.

At worst, they could have added a, say, Lock property to lock the
collection, then New could be used to get a new collection.

Or there could have been an Items method that returned the ranges to an
array.
 
J

Jezebel

Yes, ProofReadingErrors is a collection of objects,

True, but what I meant (and said) was that ProofReadingErrors is not merely
a collection. It is, itself, an object -- in this case one that manages a
collection. Compare the DocVariables and DocumentProperties objects to see
the difference.
 
H

Howard Kaikow

Yes, but the documentation states that it returns range objects, so
programatically we should be able to treat it as a collection of ranges.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top