Sorting list via spelling errors

R

rob nobel

Hi group.
I have a Scrabble program which has a dictionary filled with words with
spelling errors and even words that don't exist.
I copied the list to Word and all those incidences are underlined as an
error which is great. But, the list is HUGE! and I would like to know if it
is possible somehow to sort that list so I can highlight the offenders to
all be together so I can delete them in one go.
Thanks,
Rob
 
M

Malcolm Smith

Rob

There's some spelling object or something (I haven't played with it) but
if you could get this to work then perhaps with the aid of this it could
just delete the incorrect text.

I'll have a rummage.

- Malc
www.dragondrop.com
 
M

Malcolm Smith

Rob

Here's something to get you up and going.

- Malc
www.dragondrop.com


Sub DelectSpellingErrors()

Dim oPara As Paragraph
Dim oSpellingSuggestions As SpellingSuggestions

For Each oPara In ActiveDocument.Paragraphs
Set oSpellingSuggestions = oPara.Range.GetSpellingSuggestions
If oSpellingSuggestions.SpellingErrorType = wdSpellingNotInDictionary
Then
oPara.Range.Delete
End If
Next oPara

End Sub
 
R

rob nobel

Malcolm,
That sortoff works great. It works fine if there's not too many pages this
procedure has to go through. (Did about 150 pages or so OK in about 1/4
hour).
Trouble is, there are about 2600 pages in this document and unless I copy
small sections to other documents and run the procedure separately on each
document, the procedure just whirrs away for hours and ctrl+alt+del tells me
that although the procedure's running, it's not responding.
I'm grateful already for what you've provided, but is there a problem with
it or is the file just to large for Word to handle the procedure in one big
file?
Also, can you tell me what the "o" means in front of some of the code? Is
that something specific you do in codes you provide or does it actually do
something ?(I'm still a beginner with VBA.)
Rob
 
J

Jonathan West

rob nobel said:
Malcolm,
That sortoff works great. It works fine if there's not too many pages this
procedure has to go through. (Did about 150 pages or so OK in about 1/4
hour).
Trouble is, there are about 2600 pages in this document and unless I copy
small sections to other documents and run the procedure separately on each
document, the procedure just whirrs away for hours and ctrl+alt+del tells me
that although the procedure's running, it's not responding.

For a document that big, the bottleneck is the For Each statement, which
gets very slow on large collections of paragraphs. Use this instead that
will be a lot faster

Sub DelectSpellingErrors()

Dim oPara As Paragraph
Dim oSpellingSuggestions As SpellingSuggestions
Set oPara = ActiveDocument.Paragraphs(1)
Do Until oPara.Range.End = ActiveDocument.Range.End
Set oSpellingSuggestions = oPara.Range.GetSpellingSuggestions
If oSpellingSuggestions.SpellingErrorType = wdSpellingNotInDictionary
Then
oPara.Range.Delete
Else
Set oPara = oPara.Next
End If
Application.StatusBar = CStr(Int(100 * _
oPara.Range.End / ActiveDocument.Range.End)) & "% complete"
Loop

End Sub

As a bonus, I've included a line of code that will give you a percent
complete indication the the status bar.

I'm grateful already for what you've provided, but is there a problem with
it or is the file just to large for Word to handle the procedure in one big
file?
Also, can you tell me what the "o" means in front of some of the code? Is
that something specific you do in codes you provide or does it actually do
something ?(I'm still a beginner with VBA.)

The o on the start of some variable names is a naming convention commonly
adopted to indicate that the variable is an Object of some sort (integers
often have an i prefix, and strings an str prefix). This has two benefits,
first of all, it is a reminder to you what the variable does, and second, it
eliminates any chance of accidentally giving a variable the same name as a
built-in object or keyword.
 
K

Klaus Linke

Also, accessing the dictionaries with VBA is rather slow. You may still have
to run it over the night.

If you want to stop the macro and see how far it has progressed (for example
to spit the file in two at that location and do the unfinished part later),
you can stop the macro with Ctrl+Pause and click on "Debug".
Then open the Immediate window (Ctrl+G) and type
oPara.Range.Select
(followed by Return).

The o on the start of some variable names is a naming convention
commonly adopted to indicate that the variable is an Object of
some sort

And I thought it was the Irish naming convention ;-)

Klaus
 
R

rob nobel

Thanks Jonathan, I did a quick test with it and it still took 1/2 hour to do
10% so I don't really know if it's going faster or not but I DO APPRECIATE
the bonus section!! At least I know it's still working even though
Alt+Ctrl+del says it is still not responding. Can't figure that out as the
%age completed still increases. But I'll need to run this one day when I
don't need the computer as it basically takes over everything else and by
the looks of it it will take some 5 hours or more to complete (and that's on
a Pentium 3 / 500 CPU and 128mb ram, running Win 98 & office 2000).
Rob
 
M

Malcolm Smith

2600 pages isn't a document; that's an encyclopedia! :)

The 'o' prefix is what I use to define an object or an object pointer. In
this case oPara points to the paragraph in question; regardless of whether
you did it my way or Jonathan's way.

- Malc
www.dragondrop.com
 
B

Bruce Brown

Your status bar message was a most excellent bonus, Jonathan. Thanks very much!

- Bruce
 
R

rob nobel

Thanks for that piece of advice Klaus. You're right though, this is
definately an overnight process.
Rob
 
M

Malcolm Smith

Ah, I thought that was what you were doing, putting each word on a line...

- Malc
 
B

Brian

Would adding a DoEvents line in the loop allow you to do other
activities while it loops through the document?

-Brian
 
M

Malcolm Smith

In theory one shouldn't need to as the operating system should handle the
multi-tasking. But if it doesn't then one DoEvents will do the trick but
will make the code run a lot slower.

If it came to that I would have a counter for each iteration and when I
get to the hundredth or whatever then I would reset the counter and then
do the DoEvents thingy.

But if you're planning to leaving it running whilst you're out at the pub
prior to crashing out unconscious on the floor on return home then why
bother with the DoEvents at all?

Cheers
Malc
 
J

JGM

Hi Jonathan,

Your little trick for the % in the status bar is so great I had to try it.

I have a project where I have to scan from 200 to 1000 fields in a document,
so I thought this would be perfect to keep the user informed as to the
progress...
Here is the code I first used

'First locate the field...
For i = 1 To TargetDoc.Fields.Count
If Not InStr(TargetDoc.Fields(i).Code.Text, FullTextKey) = 0 Then
TargetTextFound = True
Exit For
End If
Application.StatusBar = Cstr(Int(100 * _
i / TargetDoc.Fields.Count)) & " % of records scanned..."
Next i

It was fine exepct that I kept getting an "Capacity overload" error, as if
the functon could not handle "large" numbers.
For example, when i = 321 and TargetDoc.Fields.Count = 421, that was the
threshold, if I increased I got the error every time.
I tried with raw numbers to make sure:
Cstr(Int(100 * 321 / 421))
still same result.

So, I hate it when the machine foils my plans... I played around witht it
and came up with the following solution:

Application.StatusBar = CLng(CLng(100) * _
CLng(i) / CLng(TargetDoc.Fields.Count)) & " % of records
scanned..."

Why did I get the error message in the first case (CStr..), but not the
second one (CLng...)?

TIA
 
R

rob nobel

Just to let you know that I finally ran the procedure and it took about 16
hours. I was going to stop it but whilst it was running it could not be
paused by ctrl+p as was suggested by someone else and I did not know another
way. Yet, the result was great except there are still a bunch of foreign
words (mainly French), that even Word accepts as being OK. Pity, as I'd
rather have gotten rid of those as well!
And I thought the Americans were everwhere. Is there a txt dictionary
somewhere that's purely English?
(I'm an Ozzy, but England's English will have to do, me thinks)
Rob
 
M

Malcolm Smith

Rob

Do remember that an awful lot of French words make up the English
language; as Dubya famously got it wrong when he declared "There is no
French word for Entrepreneur".

Perhaps he should have said that there was no English word. I had heard
once that French words comprise 10% of the English language and I can
believe that.

Can you give an example of what sort of words are left in which are
French?

Regards
- Malc
[whom in talking to an Ozzie has totally manages to refrain from
mentioning the World Cup :) ]
www.dragondrop.com
 
B

Brian

I'm curious, what kind of loop did you choose to use (for each ...
next, do .. loop, or for ... next)? I'm also writing a macro that 1st
generates a list of "words" based on some rules then removes the ones
that aren't spelled properly, leaving only "real" words. Regardless,
the spellcheck is a real bottleneck. Last night I ran it using a Do
while activedocument.spellingerrors > 0 ... loop and it took about 9
hours to do a list about 115 pages long. While at work today it's
running using a For Each ... Next loop. I need to write a for ... to
.... next loop to see how that will do.

-Brian
 
R

rob nobel

Hi Malcolm, Just a few examples here. The dictionary sees these as errors
and suggests the same word with the ' mark above certain letters. Yet for
some reason they were not deleted via the procedure you gave me.
CANAPE, CLICHE, DETENTES, ECLAIR, etc.
What I've done is run the spelling check again separatately and changed the
spelling to show the suggested French method with the ' thingy above the
letter. What I'll probably do is sort the full list and delete them if I
can get the list to sort out those words that have the ' mark.?? (Although,
word says the list is to large to sort, so I may not bother.)

Rob (Who is still grieving over the fact that a bunch of Enlanders could
beat us in a game of rugby. Although the REAL game is Ozzie rules, which
you may not be familiar with, but outshines any other sport by far!! Still,
we've got to be content with the other rugby code win over England recently,
our win in the tennis Davis cup, our cricket team victories, etc.)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top